Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate Power :) #1100

Closed
yevgenypats opened this issue Jan 30, 2023 · 2 comments
Closed

Consolidate Power :) #1100

yevgenypats opened this issue Jan 30, 2023 · 2 comments

Comments

@yevgenypats
Copy link

Hi folks and @achantavy 👋

Following this project for a few years given we work in very similar areas in CloudQuery.

We recently expanded the number of our destinations significantly, including Neo4j, BigQuery, Snowflake, Mongo, S3 (and soon Elasticsearch).

I know you are quite invested in this project but I was wandering if it makes sense to use the underlying CloudQuery ELT engine and plugins while focusing just on the analysis so this can free your time significantly?

If not a fit, totally understandable just wanted to share an idea. If it is an interesting idea but there are some blockers or missing resources on our end we would love to help with that so there can be a smooth migration.

Cheers!

@achantavy
Copy link
Contributor

Hey @yevgenypats, sounds neat. We're in the process of rolling out our own light ORM (this, this, and this are part of it) to make writing nodes and relationships more standardized and less adhoc, so maybe CloudQuery can find using these libraries helpful once we get them a bit more solid.

I do think an architecture like this can make a lot of sense:

  1. use CloudQuery for ETL from, say, AWS to S3,
  2. and then have a second job load the data from S3 to Neo4j while making it have the shape of cartography's schema

There are rough edges in this process though since cartography modules are basically hardcoded to each pull from the upstream resource themselves, and making this more modular will be a large rewrite.

I also don't know enough about how cartography users are deploying the tool. It was originally written to be fairly flexible so that a user didn't need a specific architecture or storage configuration to deploy it, but I'd love to learn more here because maybe cartography in the future can afford be a bit more opinionated.

Anyway, happy to keep brainstorming for ideas here! I'll read through your docs too.

@yevgenypats
Copy link
Author

yevgenypats commented Jan 30, 2023

Nice! Yeah so this how a lot of our users are running it.

  1. ELT with CloudQuery
  2. To either one or multiple destination
  3. Run transformations on the data/database with dbt or any other custom tools
  4. Query and profit.
  5. Also, they expose the data with things like Grafana/Superset/Postgraphile or any visualization and access layers on top.

It's really kinda the standard way to go in the "modern data stack" approach where you separate the EL (Extract Load), Transformations and then analysis and visualization.

Re deployment - CloudQuery is single binary as well as all it's plugins so you can literally deploy it anywhere including EC2, ECS serverless or anywhere else.

How hard it is to adjust the current queries to work on top of CQ schema? if not too hard I think this might be the smoothest experience for users as no additional transform step but even if it is needed we can do an additional step as you suggest.

@lyft lyft locked and limited conversation to collaborators Jul 7, 2023
@achantavy achantavy converted this issue into discussion #1202 Jul 7, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

2 participants