LeetCode 1242. Web Crawler Multithreaded

Post by ailswan Aug. 23, 2024

1242. Web Crawler Multithreaded

Problem Statement

link: LeetCode.cn LeetCode You are given an integer n. You have an n x n binary grid grid with all values initially 1’s except for some indices given in the array mines. The ith element of the array mines is defined as mines[i] = [xi, yi] where grid[xi][yi] == 0.

Return the order of the largest axis-aligned plus sign of 1’s contained in grid. If there is none, return 0.

An axis-aligned plus sign of 1’s of order k has some center grid[r][c] == 1 along with four arms of length k - 1 going up, down, left, and right, and made of 1’s. Note that there could be 0’s or 1’s beyond the arms of the plus sign, only the relevant area of the plus sign is checked for 1’s.

Example:

Input: urls = ["http://news.yahoo.com", "http://news.yahoo.com/news","http://news.yahoo.com/news/topics/", "http://news.google.com","http://news.yahoo.com/us"]edges = [[2,0],[2,1],[3,2],[3,1],[0,4]] startUrl = "http://news.yahoo.com/news/topics/" Output: [ "http://news.yahoo.com", "http://news.yahoo.com/news","http://news.yahoo.com/news/topics/","http://news.yahoo.com/us"]

Input: urls = ["http://news.yahoo.com","http://news.yahoo.com/news","http://news.yahoo.com/news/topics/","http://news.google.com"]edges = [[0,2],[2,1],[3,2],[3,1],[3,0]]startUrl = "http://news.google.com" Output: ["http://news.google.com"]

Constraints: Constraints:

1 <= urls.length <= 1000 1 <= urls[i].length <= 300 startUrl is one of the urls. Hostname label must be from 1 to 63 characters long, including the dots, may contain only the ASCII letters from ‘a’ to ‘z’, digits from ‘0’ to ‘9’ and the hyphen-minus character (‘-‘). The hostname may not start or end with the hyphen-minus character (‘-‘). See: https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_hostnames You may assume there’re no duplicates in the URL library.

Follow up:

Assume we have 10,000 nodes and 1 billion URLs to crawl. We will deploy the same software onto each node. The software can know about all the nodes. We have to minimize communication between machines and make sure each node does equal amount of work. How would your web crawler design change? What if one node fails or does not work? How do you know when the crawler is done?

Solution Approach

Algorithm

Implement

    class Solution: