CRAWL PILOT

Video Demo: https://youtu.be/U_si7EKJaf4

Description

Crawl Pilot is a Python-based web scraping and summarization tool that allows users to extract content from web pages and generate concise summaries using OpenAI's language model.

Features

Web Content Fetching: Retrieves content from user-specified URLs or a CSV file containing multiple URLs.
Content Scraping: Extracts text and title from web pages using BeautifulSoup.
Text Summarization: Utilizes OpenAI's GPT model to generate summaries of scraped content.
Summary Storage: Option to save summaries to a CSV file for future reference.
User-friendly Interface: Employs ASCII art for the title and interactive prompts for user inputs.

How It Works

The user chooses between entering a single URL or providing a CSV file with multiple URLs.
The application fetches the content from the provided URL.
It then scrapes the content to extract the main text and title.
The extracted text is summarized using OpenAI's language model.
The summary is displayed to the user.
The user is given an option to save the summary to a CSV file.

Code Structure

The project.py file contains the following functions and classes:

main():
- The entry point of the application.
- Orchestrates the web crawling and summarization process.
- Calls other functions to fetch content, scrape, summarize, and save results.
read_urls_from_csv():
- Reads URLs from a CSV file.
- Returns a List of urls.
process_single_url():
- Processes a single URL response.
process_multiple_urls():
- Processes multiple URLs from a list.
fetch_content():
- Prompts the user for a URL and retrieves the web page content.
- Handles URL validation and automatically adds 'https://' if missing.
- Returns a requests.Response object.
scrape_content(content):
- Parses HTML content using BeautifulSoup.
- Extracts all paragraph texts and the page title.
- Returns a tuple of (page_text, page_title).
summarize_text(text):
- Uses OpenAI's language model to generate a summary of the input text.
- Handles authentication errors by reloading the API key if needed.
- Returns a dictionary with the summarized text.
save_summary(page):
- Prompts the user to save the summary.
- If confirmed, saves the title, summary, and date to a CSV file.
load_api_key(delete=False):
- Manages the OpenAI API key.
- Can delete existing key, prompt for a new one, and update environment variables.
class ScrapedPage:

A simple class to represent a scraped web page.
Contains title and summary attributes.
Provides a string representation for easy printing.

Error Handling

The program includes error handling for invalid URLs and API authentication issues.
It will continually prompt for a valid URL or API key until successful.

Dependencies

requests: For making HTTP requests
beautifulsoup4: For parsing HTML content
python-dotenv: For managing environment variables
art: For ASCII art generation
inquirer: For interactive command-line user interfaces
langchain: For text summarization using language models
openai: For accessing OpenAI's API

Setup

Clone the repository
Install the required dependencies:

 pip install -r requirements.txt

Set up your OpenAI API key in a .env file or when prompted by the application

Usage

Run the script using Python:

python project.py

Follow the prompts to enter a URL and interact with the application.

Future Improvements

Enhance the summarization algorithm and try to implment using localy installed model.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
project.py		project.py
requirements.txt		requirements.txt
test_project.py		test_project.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CRAWL PILOT

Video Demo: https://youtu.be/U_si7EKJaf4

Description

Features

How It Works

Code Structure

Error Handling

Dependencies

Setup

Usage

Future Improvements

About

Releases

Packages

Languages

pratikaman/crawl_pilot_cli

Folders and files

Latest commit

History

Repository files navigation

CRAWL PILOT

Video Demo: https://youtu.be/U_si7EKJaf4

Description

Features

How It Works

Code Structure

Error Handling

Dependencies

Setup

Usage

Future Improvements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages