Skip to content

alfredodeza/rust-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦀 Rust for Extract, Transform, and Load operations

Practice ETL with Rust and Polars

This repository will walk you through examples for each step in ETL so that you can apply Rust and Polars for these operations using a sample CSV dataset.

You will be using a sample dataset that contains wines from all over the world. Explore the wine dataset and familiarize yourself with the data before you start the ETL process.

Each example is a separate Cargo project and it is meant to be run independently. You can run each example by navigating to the project directory and running the following command:

cargo run ../../top-rated-wines.csv

Lesson 1: Extracting data

For this lesson, you will learn how to read a CSV file and load it into a DataFrame in Polars. You will do minor checking of the data to ensure that it was loaded correctly and that the data is in the expected format.

Lesson 2: Transforming data

For this lesson, you will learn how to transform the data by filtering out unnecessary columns and rows. You will use one hot encoding to convert columns. There are two examples in this lesson, one that does hot encoding on all columns and another that does hot encoding on selected columns.

Lesson 3: Loading data

Finally, for this lesson, you will learn how to save the transformed data into a Parquet file. A Parquet file is a columnar storage file that is optimized for reading and writing data.

Extra challenges

  1. Verify Parquet file: You will save the transformed data into a Parquet file and then read it back to ensure that the data was saved correctly using the Load project as a reference.
  2. Add options for saving: Currently, all projects do not save the CSV back to the file system. Add an option to save the transformed data back to the file system.
  3. Add more transformations: Add more transformations to the data such as sorting, grouping, and aggregating data.
  4. Implement Schema validation: Use Polars Schema validation to ensure that the data is in the expected format before transforming it.

About

Practice ETL with Rust and Polars

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published