Skip to content

This Python program extract logo a website using scrapy package

Notifications You must be signed in to change notification settings

humayun/LogoExtraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LogoExtraction

Python Code

This program use scrapy package to parse a website for logo extraction. The name of spider to perform logo extraction is : logo

#Method Detail

This program only process

, and tag in order to extract logo. There are three case to extract logo:

Case 1: when contains with logo substring in its @src

Case 2: when
contains with logo substring in its @src

Case 3: when contains @href as home page address or index. and with possible file extension as like (.png, .gif, .jpg etc) and logo substring in its @class or @title or @alt

Limitation

1 - This program don't process CSS (style sheet) to parse for LogoExtraction 2 - This program don't process HTML pages having only

instead of
for Logo Extraction.

Run

In order to run this program. you can use following command at terminal inside LogoExtraction project

scrapy crawl logo

#Output It will extract the logo url and web page url and save in csv file.

About

This Python program extract logo a website using scrapy package

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages