Skip to content

Commit

Permalink
Format crawler.py
Browse files Browse the repository at this point in the history
  • Loading branch information
silvanocerza committed Aug 29, 2023
1 parent a9b8fd9 commit a613b1b
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions haystack/nodes/connector/crawler.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,10 @@ def __init__(
Init object with basic params for crawling (can be overwritten later).
:param urls: List of http(s) address(es) (can also be supplied later when calling crawl())
:param crawler_depth: How many sublinks to follow from the initial list of URLs. Can be any integer >= 0.
For example:
0: Only initial list of urls.
1: Follow links found on the initial URLs (but no further).
:param crawler_depth: How many sublinks to follow from the initial list of URLs. Can be any integer >= 0.
For example:
0: Only initial list of urls.
1: Follow links found on the initial URLs (but no further).
2: Additionally follow links found on the second-level URLs.
:param filter_urls: Optional list of regular expressions that the crawled URLs must comply with.
All URLs not matching at least one of the regular expressions will be dropped.
Expand Down Expand Up @@ -155,10 +155,10 @@ def crawl(
If no parameters are provided to this method, the instance attributes that were passed during __init__ will be used.
:param urls: List of http addresses or single http address
:param crawler_depth: How many sublinks to follow from the initial list of URLs. Can be any integer >= 0.
For example:
0: Only initial list of urls.
1: Follow links found on the initial URLs (but no further).
:param crawler_depth: How many sublinks to follow from the initial list of URLs. Can be any integer >= 0.
For example:
0: Only initial list of urls.
1: Follow links found on the initial URLs (but no further).
2: Additionally follow links found on the second-level URLs.
:param filter_urls: Optional list of regular expressions that the crawled URLs must comply with.
All URLs not matching at least one of the regular expressions will be dropped.
Expand Down Expand Up @@ -378,10 +378,10 @@ def run( # type: ignore
:param output_dir: Path for the directory to store files
:param urls: List of http addresses or single http address
:param crawler_depth: How many sublinks to follow from the initial list of URLs. Can be any integer >= 0.
For example:
0: Only initial list of urls.
1: Follow links found on the initial URLs (but no further).
:param crawler_depth: How many sublinks to follow from the initial list of URLs. Can be any integer >= 0.
For example:
0: Only initial list of urls.
1: Follow links found on the initial URLs (but no further).
2: Additionally follow links found on the second-level URLs.
:param filter_urls: Optional list of regular expressions that the crawled URLs must comply with.
All URLs not matching at least one of the regular expressions will be dropped.
Expand Down

0 comments on commit a613b1b

Please sign in to comment.