Kafka integration #84

geoffreyaldebert · 2022-06-10T19:01:10Z

Kafka Integration (only consumer)
Read message from udata-analysis-service
Parse file (could be from minio instead of downloading again resource)
Add csv-detective type detection to help agate to store resource into sqlite
Add pandas profiling analysis (minimal) and generation of json report
Store new infos into sqlite in new tables :
- general_infos : basic info on resource
- column_infos : basic info on each column of resource
- categorical_infos : categorical values for each columns (limit to 10)
- top_infos : top values for each columns (limit to 10)
- numeric_infos : basic info on each numeric column of resource (mean, std, min, max)
- numeric_plot_infos : repartition of values of numeric column in a plot
Update API to list those new info if we have them

…al, store to sqlite, modify api with new infos

maudetes · 2022-07-18T13:40:51Z

csvapi/consumer.py

+    #url = r.json()['url']
+    if((message is not None) & (message['service'] == 'csvdetective')):
+        #try:
+            url = 'https://www.data.gouv.fr/fr/datasets/r/{}'.format(key)


Why do we need to build an url instead of using the minio location?
This data.gouv.fr location url is environment dependent (dev / demo / prod).

* Switch to poetry - Explicitely upgrade to python >= 3.9 - Upgrade pandas and pandas-profiling * Cleanup - Overhaul CI file - Remove useless files - Update License attribution - Remove obsolete ansible roles * trigger CI * fix local tests on macos * add linting * add linting * upgrade flake8 and pytest * lint all the thingz * fix tests with strict asyncio mode * really really fix the tests * Update README * Use CI template, publish kafka-integration, bump 1.3.0 * poetry update * invalidate cache * CI: cache-prefix param

abulte · 2022-08-26T09:15:30Z

This branch is now published on pypi https://app.circleci.com/pipelines/github/etalab/csvapi/91/workflows/09dba6e2-b91f-4cf2-af03-71a9daee9bbb/jobs/605

⚠️ remove this publication when merged on master

* Check message structure to prevent errors * Add pandas profiling analysis (optional) in api * Update requirements (csv-detective) * Update message structure format * Add requirements * Remove requirements, switch to poetry * Add poetry lock file * Lint code * setuptools * upgrade and clean deps * lint test Co-authored-by: Geoffrey Aldebert <[email protected]> Co-authored-by: Alexandre Bulté <[email protected]>

geoffreyaldebert added 8 commits June 10, 2022 20:53

Update with new vars

57f74ef

Update requirements

6194ada

Add consume kafka command

d5e7a0b

Add kafka integration, read message, analysis, pandas profiling minim…

ff7bffa

…al, store to sqlite, modify api with new infos

Updates

766246a

Add UDATA_INSTANCE_NAME env

937a020

Remove env variables from config file

4bf8ba9

Fix reading kafka message

dd78bb6

maudetes reviewed Jul 18, 2022

View reviewed changes

abulte added 5 commits August 12, 2022 10:32

Fix bucket location in message

bdc9a7f

Fix data location in message

e0d2970

Fix report_location

7d0e371

trigger build

a83e645

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka integration #84

Kafka integration #84

geoffreyaldebert commented Jun 10, 2022

maudetes Jul 18, 2022

abulte commented Aug 26, 2022

Kafka integration #84

Are you sure you want to change the base?

Kafka integration #84

Conversation

geoffreyaldebert commented Jun 10, 2022

maudetes Jul 18, 2022

Choose a reason for hiding this comment

abulte commented Aug 26, 2022