You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SELECT j.*, f.facetsFROM jobs_view AS j
LEFT OUTER JOIN job_versions AS jv ONjv.uuid=j.current_version_uuidLEFT OUTER JOIN (
SELECT run_uuid, JSON_AGG(e.facet) AS facets
FROM (
SELECTjf.run_uuid, jf.facetFROM job_facets_view AS jf
INNER JOIN job_versions jv2 ONjv2.latest_run_uuid=jf.run_uuidINNER JOIN jobs_view j2 ONj2.current_version_uuid=jv2.uuidWHEREj2.namespace_name=:namespaceName
ORDER BY lineage_event_time ASC
) e
GROUP BYe.run_uuid
) f ONf.run_uuid=jv.latest_run_uuidWHEREj.namespace_name= :namespaceName
ORDER BYj.nameLIMIT :limit OFFSET :offset
Problem Description:
For the namespaceName value of "MyNameSpace", which encompasses 854,743 facets for their 15,650 jobs, the above query used to take more than 10 minutes to execute due to a large join.
However, after upgrading our infrastructure to a PostgreSQL cluster db.t4g.medium (vCPU: 2, RAM: 4 GB), the execution time improved, but it still takes around 11 seconds with a limit of 25. This remains a concern especially considering this query runs every time I open the Marquez web UI, causing a noticeable delay in accessing the interface.
Proposed Solution:
I believe there might be optimization opportunities for this query or, if feasible, a way to cache some of its results, especially if they are frequently accessed and don't change often.
I'd be more than happy to collaborate, provide more information, or help in any way to improve this.
Thank you!
Labels: performance, database
Now with this added context, it's clearer that even with the database upgrade, the query's performance is still suboptimal and requires attention.
The text was updated successfully, but these errors were encountered:
Hello team,
I'd like to raise a performance concern regarding a specific SQL query in the
JobDao.java
file.File & Line Reference:
marquez/api/src/main/java/marquez/db/JobDao.java
Line 126 in e85127c
Query:
Problem Description:
For the namespaceName value of "MyNameSpace", which encompasses 854,743 facets for their 15,650 jobs, the above query used to take more than 10 minutes to execute due to a large join.
However, after upgrading our infrastructure to a PostgreSQL cluster
db.t4g.medium (vCPU: 2, RAM: 4 GB)
, the execution time improved, but it still takes around 11 seconds with a limit of 25. This remains a concern especially considering this query runs every time I open the Marquez web UI, causing a noticeable delay in accessing the interface.Proposed Solution:
I believe there might be optimization opportunities for this query or, if feasible, a way to cache some of its results, especially if they are frequently accessed and don't change often.
I'd be more than happy to collaborate, provide more information, or help in any way to improve this.
Thank you!
Labels:
performance
,database
Now with this added context, it's clearer that even with the database upgrade, the query's performance is still suboptimal and requires attention.
The text was updated successfully, but these errors were encountered: