-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka integration #84
base: master
Are you sure you want to change the base?
Conversation
geoffreyaldebert
commented
Jun 10, 2022
- Kafka Integration (only consumer)
- Read message from udata-analysis-service
- Parse file (could be from minio instead of downloading again resource)
- Add csv-detective type detection to help agate to store resource into sqlite
- Add pandas profiling analysis (minimal) and generation of json report
- Store new infos into sqlite in new tables :
- general_infos : basic info on resource
- column_infos : basic info on each column of resource
- categorical_infos : categorical values for each columns (limit to 10)
- top_infos : top values for each columns (limit to 10)
- numeric_infos : basic info on each numeric column of resource (mean, std, min, max)
- numeric_plot_infos : repartition of values of numeric column in a plot
- Update API to list those new info if we have them
…al, store to sqlite, modify api with new infos
#url = r.json()['url'] | ||
if((message is not None) & (message['service'] == 'csvdetective')): | ||
#try: | ||
url = 'https://www.data.gouv.fr/fr/datasets/r/{}'.format(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to build an url instead of using the minio location?
This data.gouv.fr location url is environment dependent (dev / demo / prod).
* Switch to poetry - Explicitely upgrade to python >= 3.9 - Upgrade pandas and pandas-profiling * Cleanup - Overhaul CI file - Remove useless files - Update License attribution - Remove obsolete ansible roles * trigger CI * fix local tests on macos * add linting * add linting * upgrade flake8 and pytest * lint all the thingz * fix tests with strict asyncio mode * really really fix the tests * Update README * Use CI template, publish kafka-integration, bump 1.3.0 * poetry update * invalidate cache * CI: cache-prefix param
This branch is now published on pypi https://app.circleci.com/pipelines/github/etalab/csvapi/91/workflows/09dba6e2-b91f-4cf2-af03-71a9daee9bbb/jobs/605 |
* Check message structure to prevent errors * Add pandas profiling analysis (optional) in api * Update requirements (csv-detective) * Update message structure format * Add requirements * Remove requirements, switch to poetry * Add poetry lock file * Lint code * setuptools * upgrade and clean deps * lint test Co-authored-by: Geoffrey Aldebert <[email protected]> Co-authored-by: Alexandre Bulté <[email protected]>