-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gitlab Source is only loading 50 rows per project #21076
Comments
hey @eflorico thanks for the feedback. |
Hi @davydov-d I could reliably reproduce this issue. I have a group with ~16.8K issues within a few different projects. When I pull the group, I get a total of 1923 issues. Right now what Im doing is that I have one source per project which does actually get all issues. Not sure if you have access but here is a job I executed on a brand new source (Using the new OAuth authentication, well done and congrats) a few minutes ago. (The source is deleted now but you should probably be able to see logs) I'd love to help you reproduce and resolve this. |
@emilsedgh I got it, thanks. Will make a deep dive into it then |
@davydov-d I checked and can confirm I'm not filtering the results by accident. You're right that there are 56 projects total, but most pipelines, merge requests, and commits belong to only a handful of projects which should have thousands of them. |
* #21076 source gitlab: fix missing data issue * #21076 source gitlab: upd changelog * auto-bump connector version Co-authored-by: Octavia Squidington III <[email protected]>
@eflorico @emilsedgh hey, the fix has been released. Could you please check it and provide your feedback? |
Checking right now. |
Fantastic job and well done! It synchronized all my 16K issues. |
glad to know! thanks for your cooperation 🤝 |
Just tested it here as well, on first glance everything looks fine! It does appear to synchronize all commits, MRs, and branches for my projects 😊 Thank you for the speedy fix @davydov-d, I appreciate it! |
Environment
Current Behavior
For each Gitlab project, only a maximum of 50 pipelines, merge requests, and commits are retrieved. This means that most rows are missing from these streams. In our case, this means that tens of thousands of rows are missing. The retrieved data from Gitlab is therefore unusable.
I do see >50 branches for some projects, so I think branches may be fetched correctly.
Additionally, I see lots of duplicates in the users stream (all fields have the same values except for the airbyte* fields). In total, there are 617 rows for 12 distinct users. Mentioning this because it may be related; however, this is not the main issue for me.
Expected Behavior
All pipelines, merge requests, and commits since the specified start date should be retrieved.
Logs
log.txt
Notable findings in the logs:
Steps to Reproduce
Other observations
I am currently evaluating both Airbyte and Meltano. Meltano seems to fetch data from Gitlab mostly fine after fixing some configuration issues. Interestingly, Meltano has the same issue with duplicate users.
May be same issue as #12476
Are you willing to submit a PR?
Probably not.
The text was updated successfully, but these errors were encountered: