-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Public repos in a github org dont return all streams #3953
Comments
logs-2-0 (1).txt |
just attempted this with a smaller repo: https://github.com/ablevets/test-public and i was able to pull in data in all expected tables, is this a performance issue with airbyte? The repo i mentioned in the initial issue: https://github.com/department-of-veterans-affairs/vets-website is really large. |
@Moofasax from your logs
First, could you check the |
all stargazers tables are empty, the only two tables that have any data in them are issue_events, and the assignees tables, all other tables are empty. |
are you seeing this if you try what was stated in the original issue? exporting data out of this public repo: https://github.com/department-of-veterans-affairs/vets-website with these streams disabled the collaborators, projects, events? |
@Moofasax I may be experiencing the same thing. Try syncing the stargazers stream on its own (connection settings -> update latest source schema buttton -> deselect all except stargazers -> save changes and reset data button -> update latest source schema button in the pop up -> save changes button -> reset button -> let the reset finish and start a new sync) Also how many records does Airbyte say it synced when the task returns successful? |
With only stargazers checked, it does return 100 records for the _stargazers tables!! |
@Moofasax @marcosmarxm this lines up with my experience of this issue |
thanks for reporting, i am trying this with the newer github singer 0.2.8 and will report. Also nice username @garden-of-delete |
Same issue on 0.2.8 |
@Moofasax and @garden-of-delete I had run an integration sync yesterday. Is it possible to try again? |
What versions of everything? I can try!! |
Airbyte 0.24.7-alpha |
@marcosmarxm I am still seeing the failed behavior, no other tables get created. Did you test with this repo? https://github.com/department-of-veterans-affairs/vets-website |
I tested with the latest version too, same result: @marcosmarxm i dont seem to have the option for postgres 0.3.5 |
Worth adding to the conversation. I'm getting the same result as @Moofasax on github 0.2.8. I have tried with bigquery and postgres destinations and have the exact same result in both cases. Just FYI. |
based off @garden-of-delete comment, does that pin point its the github singer issue, and not a destination issue? |
@Moofasax probably. I'm running using apache/superset. The sync didn't finish yet but got records in _raw tables. After finished I'll update here. This is the current state of my sync connection btw Github => Postgres. |
Im not seeing the connection to what apache/superset has to do with this, but that data does not look right, there should not be that many stargazers... |
@Moofasax superset is the project @garden-of-delete works and it has 39k stars 😄 |
Oh I see I'm sorry. So any idea on why I can pull data from this public repo, https://github.com/ablevets/test-public and not this public repo: https://github.com/department-of-veterans-affairs/vets-website my access to both are the exact same. |
@Moofasax #3695 is a WIP and probably will solve the issue. Is possible to wait until this release? I'll keep you update on any news. |
Understood, keep me posted and I'll test!! Appreciate the support! |
What permissions the token being used has on the repo could be a factor, but it does not explain the full issue. Specifically, it doesn't explain why I pull empty tables when syncing all streams, but can pull a populated table with the same token when syncing one stream at a time. I pull data regularly from apache/superset, but i don't have write access to that repo, so my results should be the same as any of yours. |
blocked by #3695 |
Just tested this with the new native github source, still having this issue. Streams that did not pull data: issues, pull_requests, reviews Going to test with events streams enabled and see if that helps |
@marcosmarxm with the evetns streams disabled i still do not get data for issues, pull_requests, or reviews with this repo: |
@Moofasax could you please reopn this issue if the issue persists? as far as we can tell we can not reproduce this on the new connector version |
I've tested 0.1.2 and am getting data for all streams now. thanks for all the hard work! |
Expected Behavior
Disable the collaborators, projects, events, and teams streams for a public repo i dont have access to should still pull data from public repos that i dont have access to. I expect the streams to output data into postgres tables.
Current Behavior
Only the issue_events, and assignee tables are populated, every other table is blank. Not sure if this is because the repo i am pulling from is large and airbyte cant handle it, or if its because of permissions issues.
Logs
If applicable, please upload the logs from the failing operation. For sync jobs, you can download the full logs from the UI by going to the sync attempt page and clicking the download logs button at the top right of the logs display window.
Steps to Reproduce
Severity of the bug for you
Critical
Airbyte Version
0.24.7-alpha
Connector Version (if applicable)
Github singer: 0.2.7
Additional context
Environment, version, integration...
The text was updated successfully, but these errors were encountered: