Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OJ-36874: introduce ado adapter to agent #354

Merged
merged 5 commits into from
Aug 6, 2024

Conversation

gavinpitt-jf
Copy link
Contributor

@gavinpitt-jf gavinpitt-jf commented Aug 5, 2024

Description

Introduce the ADO adapter to Agent.

A lot of this seems to be isort and black?

The bulk of this logic is to allow somebody to provide an ADO config/creds combo and run the Agent. To see my proof of testing, see this PR (which is in Jellyfish, to protect a customer name)

Testing

To test backwards compatibility, I ran the Agent for orthog for both Jira and Git. Additionally I did more testing for a specific customer, but I am keeping that testing plan in this private PR for customer privacy concerns.

Logging setup complete with handlers for log file, stdout, and streaming.
Will write output files into ./output/20240805_175013
Running ingestion healthcheck validation!
Validating configuration...

Jira details:
  URL:      https://jelly-ai.atlassian.net
  Username: [email protected]
  Password: **********
==> Testing Jira connection...
Authenticating to Jira API at https://jelly-ai.atlassian.net using the username and password secrets for [email protected] of company orthogonal-networks
==> Getting Jira version...
Found Jira version as 1001.0.0-SNAPSHOT
==> Getting Jira deployment type...
Response headers does not contain X-ANODEID! Customer is NOT running Jira Data Center.
==> Getting Jira permissions...
Found granted permissions as ['DELETE_OWN_WORKLOGS', 'CREATE_ISSUES', 'WORK_ON_ISSUES', 'DELETE_OWN_COMMENTS', 'MODIFY_REPORTER', 'EDIT_ISSUES', 'ADD_COMMENTS', 'EDIT_OWN_COMMENTS', 'ASSIGN_ISSUES', 'BROWSE_PROJECTS', 'EDIT_OWN_WORKLOGS', 'EDIT_ALL_WORKLOGS', 'EDIT_ALL_COMMENTS', 'CLOSE_ISSUES', 'SET_ISSUE_SECURITY', 'SCHEDULE_ISSUES', 'USER_PICKER', 'ADMINISTER_PROJECTS', 'DELETE_ALL_COMMENTS', 'RESOLVE_ISSUES', 'DELETE_ISSUES', 'VIEW_READONLY_WORKFLOW', 'MOVE_ISSUES', 'ASSIGNABLE_USER', 'TRANSITION_ISSUES', 'DELETE_ALL_WORKLOGS', 'LINK_ISSUES']
==> Testing Jira user browsing permissions...
Downloading Users...
Done downloading Users! Found 505 users
We can access 505 Jira users.
==> Testing Jira project permissions...
With provided credentials, the following projects are discoverable: {'JFR', 'OJ'}.
Checking project access.
Testing access for project: "JFR"
With provided credentials, we can access issues, versions, and components within project JFR
Testing access for project: "OJ"
With provided credentials, we can access issues, versions, and components within project OJ
Checking access to fields
Checking access to resolutions
Checking access to issue types
Checking access to issue link types
Checking access to priorities
Checking access to boards
Checking access to sprints
. Skipping Git Validation.

Memory & Disk Usage:
  Available memory: 743.87 MB
  Disk usage for jf_agent/output: 299 GB / 460 GB
  Size of jf_agent/output dir:  24K
Attempting to upload healthcheck result to s3...
Successfully uploaded healthcheck.json
Successfully uploaded jf_agent.log
Successfully uploaded healthcheck result to s3!

Done
Obtained Jira configuration, attempting download...
Attempting to use JF Ingest for Jira Ingestion
Set global value INGESTION_TYPE to AGENT
Beginning load_and_push_jira_to_s3
Using local version of ingest
Authenticating to Jira API at https://jelly-ai.atlassian.net using the username and password secrets for [email protected] of company orthogonal-networks
Data will not be saved locally
Data will be submitted to jellyfish
Downloading Jira Projects...
Done downloading Projects!
Downloading Jira Project Components...
Done downloading Project Components!
Downloading Jira Versions...
Done downloading Jira Versions!
Done downloading Jira Project, Components, and Version. Found 2 projects
Downloading Jira Fields...
Done downloading Jira Fields! Found 220 fields
Downloading Users...
Done downloading Users! Found 505 users
Downloading Jira Resolutions...
Done downloading Jira Resolutions! Found 9 resolutions
Downloading IssueTypes...
Done downloading IssueTypes! found 34 Issue Types
Downloading IssueLinkTypes...
Done downloading IssueLinkTypes! Found 10 Issue Link Types
Downloading Jira Priorities...
Done downloading Jira Priorities! Found 5 priorities
Downloading Jira Statuses...
Done downloading Jira Statuses! Found 124
Downloading Boards...
Done downloading Boards! Found 58 boards
Downloading Sprints...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 58/58 [00:16<00:00,  3.48it/s]
Attempting to pull issue metadata for 2 projects, with a pull from date set as 2017-01-01 00:00:00+00:00
Getting total issue counts for 2 projects (Thread Count: 10): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  3.20it/s]
Pulling issue data across 2 projects by Date (Thread Count: 10): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37943/37943 [00:31<00:00, 1203.29it/s]
Attempting to pull metadata for an additional 13 issues, which represents issue parents that we need to potentially redownload. (Parent Search Depth = 1)
Pulling issue data for 13 Jira Issue IDs (Thread Count: 10): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 29.91it/s]
Only grabbing the first level of parents, because recursively_download_parents is False
Using IssueMetadata we have detected that 336 issues are missing, 745 issues are out of date, 0 issues need to be redownloaded (because of rekey and parent relations), for a total of 1081 issues to download
Attempting to pull 1081 full issues
Pulling issue data for 1081 Jira Issue IDs (Thread Count: 10): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1081/1081 [00:08<00:00, 126.51it/s]
Successfully saved 1081 Jira Issues in 2 separate batches, with each batch limited to 50MB per batch
22096 issues have been detected as being deleted
Downloading Jira Worklogs...
Fetching updated worklogs
Done fetching updated worklogs
Fetching deleted worklogs
Done fetching deleted worklogs
Done downloading Worklogs! Found 0 worklogs and 0 deleted worklogs
Data has not been saved locally, because save_locally was set to false in the ingest config!
Data has been submitted to jellyfish
Starting Git download for 1 provided git configurations
downloading github users... ✓
downloading github projects... ✓
downloading github repos... ✓
downloading commits on branch develop for jellyfish: 340commits [00:03, 104.12commits/s]
downloading PRs for jellyfish: 34prs [01:01,  1.81s/prs]
Shutting down Systems Diagnostics Thread
Closing Diagnostics file
Compressing ./output/20240805_175013/diagnostics.json
Compressing ./output/20240805_175013/healthcheck.json
Sending data to Jellyfish...
Starting 8 threads
Successfully uploaded healthcheck.json.gz
Successfully uploaded status.json.gz
Successfully uploaded git_8v1crHqhmq/bb_projects.json.gz
Successfully uploaded diagnostics.json.gz
Successfully uploaded git_8v1crHqhmq/bb_users.json.gz
Successfully uploaded git_8v1crHqhmq/bb_repos.json.gz
Successfully uploaded git_8v1crHqhmq/bb_prs.json.gz
Successfully uploaded git_8v1crHqhmq/bb_commits.json.gz
Successfully uploaded config.yml
Agent run succeeded: True
Successfully uploaded jf_agent.log
Successfully uploaded .done
Done!
Closing the agent log stream.
Log stream stopped.

pyproject.toml Outdated
@@ -18,7 +18,7 @@ dependencies = [
"click~=8.0.4",
"requests>=2.31.0",
"python-dotenv>=1.0.0",
"jf-ingest==0.0.105",
"jf-ingest==0.0.120",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: After this PR gets approved and deployed, this should be bumped to 121: https://github.com/Jellyfish-AI/jf_ingest/pull/160

@gavinpitt-jf gavinpitt-jf requested a review from a team August 5, 2024 17:48
@@ -371,6 +388,24 @@ def get_ingest_config(
if config.jira_url and (
(creds.jira_username and creds.jira_password) or creds.jira_bearer_token
):
issue_metadata: List[IssueMetadata] = IssueMetadata.from_json(
Copy link
Contributor

@jruel4 jruel4 Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is this to handle a change in jf_ingest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also did we do a quick Jira test to validate this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is logic we were already doing, extracting the issue_metadata

Directly below it, though, we're building a new dictionary of "project IDs to pull froms", which is something the new jira sync uses (Kirk's speedup). We need to supply JF Ingest with a pull_from date for each project in order to get all the updated issues for that project

The new sync is currently not used in Agent, so this is pretty low impact. I did, however, do a Jira test with orthogonal networks just to make sure and it behaved as expected

]
)
# Jira is supported by all customers, always skip it
directories_to_skip_uploading_for.add('jira')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Previously, were we dual-submitting, or is this due to changes in jf_ingest version bump?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there was a bug where we were double submitting github data. Not really a huge deal, but better practice to not do it. It might have had some run time performance if a customer tried to upload a massive file? But this is SUPER threaded so it was likely minimal

@gavinpitt-jf gavinpitt-jf merged commit ca0617e into master Aug 6, 2024
5 checks passed
@gavinpitt-jf gavinpitt-jf deleted the OJ-36874-introduce-ado-adpater-to-agent branch August 6, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants