Project: Web Scraping

Scraping data from a website using python

Overview

The objective is to scrap data from https://www.carrefour.fr for all pages and start analysing products categories and promotions.

Deciding which type of page to scrap: products or promotions
For products, choosing a category to scrap
Counting the number of pages with BeautifulSoup
Scraping all the pages with json (API) on a timer
Cleaning the data obtained into a single dataframe with pandas
Analysing the data obtained:
- For product categories: what subcategories are the most represented, what are the top 10 brands selling products on Carrefour.com?
- For promotions: how many promotions, which category and subcategory, what has really been promoted?

A complete dataframe saved as a CSV
For products:
- 2 graphs: the number of products per subcategory and the top 10 brands selling products on Carrefour.
For promotions:
- A dataframe analysing the number of promotions per category and subcategory
- The number of products on the promotion page having the same price as before being promoted
- 2 graphs: the number of promotions per subcategory and the top 10 brands promoting products on Carrefour.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Output		Output
README.md		README.md
Web_scrapping_Carrefour_all.py		Web_scrapping_Carrefour_all.py
Web_scrapping_products.ipynb		Web_scrapping_products.ipynb
Web_scrapping_promotions.ipynb		Web_scrapping_promotions.ipynb