Consolidate Power :) #1100

yevgenypats · 2023-01-30T13:54:54Z

Hi folks and @achantavy 👋

Following this project for a few years given we work in very similar areas in CloudQuery.

We recently expanded the number of our destinations significantly, including Neo4j, BigQuery, Snowflake, Mongo, S3 (and soon Elasticsearch).

I know you are quite invested in this project but I was wandering if it makes sense to use the underlying CloudQuery ELT engine and plugins while focusing just on the analysis so this can free your time significantly?

If not a fit, totally understandable just wanted to share an idea. If it is an interesting idea but there are some blockers or missing resources on our end we would love to help with that so there can be a smooth migration.

Cheers!

achantavy · 2023-01-30T16:30:31Z

Hey @yevgenypats, sounds neat. We're in the process of rolling out our own light ORM (this, this, and this are part of it) to make writing nodes and relationships more standardized and less adhoc, so maybe CloudQuery can find using these libraries helpful once we get them a bit more solid.

I do think an architecture like this can make a lot of sense:

use CloudQuery for ETL from, say, AWS to S3,
and then have a second job load the data from S3 to Neo4j while making it have the shape of cartography's schema

There are rough edges in this process though since cartography modules are basically hardcoded to each pull from the upstream resource themselves, and making this more modular will be a large rewrite.

I also don't know enough about how cartography users are deploying the tool. It was originally written to be fairly flexible so that a user didn't need a specific architecture or storage configuration to deploy it, but I'd love to learn more here because maybe cartography in the future can afford be a bit more opinionated.

Anyway, happy to keep brainstorming for ideas here! I'll read through your docs too.

yevgenypats · 2023-01-30T17:13:14Z

Nice! Yeah so this how a lot of our users are running it.

ELT with CloudQuery
To either one or multiple destination
Run transformations on the data/database with dbt or any other custom tools
Query and profit.
Also, they expose the data with things like Grafana/Superset/Postgraphile or any visualization and access layers on top.

It's really kinda the standard way to go in the "modern data stack" approach where you separate the EL (Extract Load), Transformations and then analysis and visualization.

Re deployment - CloudQuery is single binary as well as all it's plugins so you can literally deploy it anywhere including EC2, ECS serverless or anywhere else.

How hard it is to adjust the current queries to work on top of CQ schema? if not too hard I think this might be the smoothest experience for users as no additional transform step but even if it is needed we can do an additional step as you suggest.

achantavy added the discussion label Jan 30, 2023

lyft locked and limited conversation to collaborators Jul 7, 2023

achantavy converted this issue into discussion #1202 Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Consolidate Power :) #1100

Consolidate Power :) #1100

yevgenypats commented Jan 30, 2023

achantavy commented Jan 30, 2023

yevgenypats commented Jan 30, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Consolidate Power :) #1100

Consolidate Power :) #1100

Comments

yevgenypats commented Jan 30, 2023

achantavy commented Jan 30, 2023

yevgenypats commented Jan 30, 2023 • edited Loading

This issue was moved to a discussion.

yevgenypats commented Jan 30, 2023 •

edited

Loading