Skip to content

This is a simple Java parser that extracts the url, title, description and links from the search engine results page (in this case, yandex.ru) and downloads the first 10 pages into results directory.

Notifications You must be signed in to change notification settings

pistonsky/yandex-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

The program utilizes regular expressions to parse html file.

To launch type: java Parser keywords to search
Replace "keywords to search" with your search string.

Sample output:

Document 1:
http://abc.com
Welcome to abc.com
The greatest website in the world!
abc.com

Document 2:
...

It will also download 10 pages to the results folder. The files will be named 1.html, 2.html and so on.

If the program throws an error, that means yandex.ru gives captcha page instead of serp.

About

This is a simple Java parser that extracts the url, title, description and links from the search engine results page (in this case, yandex.ru) and downloads the first 10 pages into results directory.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages