GitHub - alienpsp/ATBS.Web-Scrap: New repo for self learning and practice on GitHub familiarization and Python web scraping from Automate the Boring Stuff by Al Swegart

#ATBS---Web-Scrap

Automate The Boring Stuff (ATBS) - Web-Scrap is a new repo for me to learn and practice on GitHub familiarization and Python web scraping from ATBS by Al Swegart.

This repo covers 3 python module example from ATBS - Requests & BeautifulSoup4

Requests

A third party module for downloading web pages and files
requests.get() returns a Response object.
raise_for_status() Response method will raise an exception if downloads failed.
you can save a downloaded file to your hard drive with calls to iter_content() method
more info on https://requests.readthedocs.org

Beautiful Soup 4 (bs4)

HTML can be parsed with BeautifulSoup Module
Pass the string with the HTML to the bs4.BeautifulSoup() function to get a Soup object
The Soup object has a select() method that can be passed a string of the CSS selector for HTML tag
Get CSS selector string from browser's developer tools by right clicking the element and copy CSS path
select() method will return a list of matching Element object hint any changes in the web or the css block might render this code useless until you redo the code to scrap from the new css env

Beautiful Soup will be covered over 2 examples,

Example one shows how I take an url and with requests module and selecting the css block using bs4, then print out the price it scrap from the targeted url.

Example two shows the same concept, the only difference is I scrap multipage pages in a go, on the beginning of example2, I make a library to store the details.

From there, I created 3 functions,

Scrap the price from the targeted url
Scrap the title from the targeted url
Retrieve both data from the functions above and print it out in a string

This automation will shorten your time for you to get similar data over multiple site, for example if you are constantly retrieving price list or inventory update from tens or hundreds of pages.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
ATBS Web Scrap - BeautifulSoup.py		ATBS Web Scrap - BeautifulSoup.py
ATBS Web Scrap - Request .py		ATBS Web Scrap - Request .py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

alienpsp/ATBS.Web-Scrap

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages