Skip to content
View ddelange's full-sized avatar
💥
["translatio", "imitatio", "aemulatio"]
💥
["translatio", "imitatio", "aemulatio"]

Block or report ddelange

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

etl

Extract-Transform-Load, Data Wrangling, Data Mining, ...
249 repositories

YTsaurus is a scalable and fault-tolerant open-source big data platform.

C++ 1,880 134 Updated Nov 12, 2024

Quilt is a data mesh for connecting people with actionable data

TypeScript 1,328 90 Updated Nov 12, 2024

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

Java 2,513 779 Updated Nov 11, 2024

A Unified Toolkit for Deep Learning Based Document Image Analysis

Python 4,902 470 Updated Aug 15, 2024

Source code for my collection of articles on using pandas.

Jupyter Notebook 1,539 383 Updated Dec 14, 2022

Distributed task queue with full async support

Python 852 52 Updated Nov 6, 2024

A fast and reliable background task processing library for Python 3.

Python 4,338 311 Updated Nov 5, 2024

A Pure Python, React-style Framework for Scaling Your Jupyter and Web Apps

Python 1,902 140 Updated Nov 12, 2024

Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.

TypeScript 48,622 7,649 Updated Nov 12, 2024

Cluster tools for running Dask on Databricks

Python 13 5 Updated Jun 3, 2024

Port of Wappalyzer (uncovers technologies used on websites) to automate mass scanning.

Go 973 139 Updated Nov 26, 2023

This project aims to maintain Wappalyzer technologies

Python 237 52 Updated Nov 11, 2024

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

HTML 9,077 749 Updated Nov 11, 2024

Task pipelining for taskiq

Python 23 3 Updated Aug 3, 2024

Bokeh Plotting Backend for Pandas and GeoPandas

Python 879 112 Updated Apr 10, 2024

Easily create large video dataset from video urls

Python 545 65 Updated Jul 30, 2024

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,030 144 Updated Oct 31, 2024

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, …

C 975 58 Updated Nov 11, 2024

All-in-one infrastructure for search, recommendations, RAG, and analytics offered via API

Rust 1,560 133 Updated Nov 12, 2024

Data-Centric Pipelines and Data Versioning

Go 6,177 568 Updated Nov 8, 2024

Efficient data transformation and modeling framework that is backwards compatible with dbt.

Python 1,802 160 Updated Nov 12, 2024

PISA: Performant Indexes and Search for Academia

C++ 934 65 Updated Oct 13, 2024

⬛️ CLI tool for saving complete web pages as a single HTML file

Rust 11,201 315 Updated Sep 25, 2024

Capture a URL with Playwright

Python 30 3 Updated Nov 9, 2024

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 24,458 3,159 Updated Sep 24, 2024

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

TypeScript 18,460 1,398 Updated Nov 12, 2024

the file filesystem: mount semi-structured data (like JSON) as a Unix filesystem

Rust 463 14 Updated May 2, 2024

Python scraper based on AI

Python 15,701 1,275 Updated Nov 12, 2024

A lightweight message queue. Like AWS SQS and RSMQ but on Postgres.

PLpgSQL 2,669 72 Updated Nov 7, 2024

High-performance and seamless sharing and modification of Python objects between processes, without the periodic overhead of serialization and deserialization. Provides fast inter-process communica…

Python 60 3 Updated May 14, 2024