GitHub - pistonsky/yandex-1: This is a simple Java parser that extracts the url, title, description and links from the search engine results page (in this case, yandex.ru) and downloads the first 10 pages into results directory.

pistonsky / yandex-1 Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

This is a simple Java parser that extracts the url, title, description and links from the search engine results page (in this case, yandex.ru) and downloads the first 10 pages into results directory.

1 star 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Parser.class		Parser.class
Parser.java		Parser.java
README		README

Repository files navigation

The program utilizes regular expressions to parse html file.

To launch type: java Parser keywords to search
Replace "keywords to search" with your search string.

Sample output:

Document 1:
http://abc.com
Welcome to abc.com
The greatest website in the world!
abc.com

Document 2:
...

It will also download 10 pages to the results folder. The files will be named 1.html, 2.html and so on.

If the program throws an error, that means yandex.ru gives captcha page instead of serp.

About

This is a simple Java parser that extracts the url, title, description and links from the search engine results page (in this case, yandex.ru) and downloads the first 10 pages into results directory.

Readme

Activity

1 star

2 watching

0 forks

Report repository