Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GItlab Integatration #1022

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Blazingkevin
Copy link

@Blazingkevin Blazingkevin commented Sep 16, 2024

Description

Integration of Gitlab to Port
What -

Why -

How -

Type of change

Please leave one option from the following and delete the rest:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • New Integration (non-breaking change which adds a new integration)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Non-breaking change (fix of existing functionality that will not change current behavior)
  • Documentation (added/updated documentation)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • Integration able to create all default resources from scratch
  • Resync finishes successfully
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Scheduled resync able to abort existing resync and start a new one
  • Tested with at least 2 integrations from scratch
  • Tested with Kafka and Polling event listeners
  • Tested deletion of entities that don't pass the selector

Integration testing checklist

  • Integration able to create all default resources from scratch
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Resync finishes successfully
  • If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • Docs PR link here

Preflight checklist

  • Handled rate limiting
  • Handled pagination
  • Implemented the code in async
  • Support Multi account

Screenshots

Include screenshots from your environment showing how the resources of the integration will look.

Screenshot from 2024-09-20 13-42-44
Screenshot from 2024-09-20 13-43-02
Screenshot from 2024-09-20 13-43-11

Copy link
Contributor

@lordsarcastic lordsarcastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, however the class implementation is unnecessarily complex and could be cleaner. Please take a look at the changes

@@ -10,7 +10,7 @@ repos:
- id: check-merge-conflict
- id: check-executables-have-shebangs
- id: check-symlinks
- id: detect-aws-credentials
# - id: detect-aws-credentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not touch this file

Comment on lines 13 to 24
def is_retryable_status_code(status_code: Optional[int]) -> bool:
"""
Checks if the status code is retryable.
Handles rate limiting (HTTP 429) and transient errors (e.g., HTTP 500, 503).
"""
if status_code == 429:
return True

if status_code in RETRYABLE_ERROR_CODES:
return True

return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_retryable_status_code(status_code: Optional[int]) -> bool:
"""
Checks if the status code is retryable.
Handles rate limiting (HTTP 429) and transient errors (e.g., HTTP 500, 503).
"""
if status_code == 429:
return True
if status_code in RETRYABLE_ERROR_CODES:
return True
return False
def is_retryable_status_code(status_code: Optional[int]) -> bool:
return status_code in [429, *RETRYABLE_ERROR_CODES]

)


# Required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take this out if you aren't using it

async def on_start() -> None:
# Something to do when the integration starts
# For example create a client to query 3rd party services - GitHub, Jira, etc...
print("Starting gitlab2 integration")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect you should handle webhooks creation here.

await process_webhook_event(event_type, payload)


async def create_webhook_with_semaphore(semaphore: asyncio.Semaphore, gitlab_client: GitlabClient, project_id: int, webhook_url: str ):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utility functions should either be part of your client class or in the client file

self.token = token
self.base_url = base_url
self.http_client = http_async_client
self.http_client.headers.update()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't passing any headers to this line



# PaginatedData Class
class PaginatedData:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to handle pagination using a method in the client class. While this is a clever idea, it is good to keep things as simple as possible

self.app_host = app_host

@property
def api_auth_header(self) -> dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass this directly to the client

@lordsarcastic
Copy link
Contributor

Add screenshot of the data on your dashboard

Copy link
Contributor

@PeyGis PeyGis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please go over my comments and let me know if you need any further info

Comment on lines +1 to +8
# Changelog - Ocean - gitlab2

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

<!-- towncrier release notes start -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the changelog content for the feature implementation for GitLab.

something like

0.1.0 (2024-09-13)

Feature

  • Implemented GitLab integration using Ocean

"""Fetch paginated projects from GitLab and return an async generator of project lists."""
paginated_projects = PaginatedData(
url=f"{self.base_url}/projects",
query_data={"per_page": 5}, # You can adjust per_page as needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we are fetching only 5 items per batch?

Comment on lines 201 to 202
url = f"{self.base_url}/projects/{project_id}/hooks"
webhook_url = f"{self.app_host}/integration/webhook"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating webhooks per project is going to be spammy. You can look into how to create it on a group level such as group hooks. this way, all projects under the group will inherit the hook

Comment on lines +203 to +208
data = {
"url": webhook_url,
"issues_events": True,
"merge_requests_events": True,
"push_events": True,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitLab has way more events you can subscribe to. Update it with more events

return GitlabClient(
ocean.integration_config["gitlab_token"],
ocean.integration_config["base_url"],
ocean.integration_config["app_host"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the user doesn't provide app_host? this will fail

Comment on lines 31 to 60
# Listen to the resync event of all the kinds specified in the mapping inside port.
# Called each time with a different kind that should be returned from the source system.
# @ocean.on_resync()
# async def on_resync(kind: str) -> list[dict[Any, Any]]:
# # 1. Get all data from the source system
# # 2. Return a list of dictionaries with the raw data of the state to run the core logic of the framework for
# # Example:
# # if kind == "project":
# # return [{"some_project_key": "someProjectValue", ...}]
# # if kind == "issues":
# # return [{"some_issue_key": "someIssueValue", ...}]

# # Initial stub to show complete flow, replace this with your own logic
# if kind == "gitlab2-example-kind":
# return [
# {
# "my_custom_id": f"id_{x}",
# "my_custom_text": f"very long text with {x} in it",
# "my_special_score": x * 32 % 3,
# "my_component": f"component-{x}",
# "my_service": f"service-{x %2}",
# "my_enum": "VALID" if x % 2 == 0 else "FAILED",
# }
# for x in range(25)
# ]

# return []


# The same sync logic can be registered for one of the kinds that are available in the mapping in port.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete all these commented code

Comment on lines 92 to 114
webhook_url = ""
semaphore = asyncio.Semaphore(10)


async for projects in gitlab_client.get_projects():
logger.info(f"Received project batch with {len(projects)}")
tasks = [
create_webhook_with_semaphore(
semaphore,
gitlab_client,
project['id'],
webhook_url
)
for project in projects
]

results = await asyncio.gather(*tasks, return_exceptions=True)

for result in results:
if isinstance(result, Exception):
logger.error(f"Error occurred during webhook creation: {result}")

yield projects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the creation of webhook should be in the on_start event, instead of here

Comment on lines +141 to +167
async def process_webhook_event(event_type: str, payload: dict):
gitlab_client = init_client()

if event_type == "Issue Hook":
await handle_issue_event(payload, gitlab_client)
elif event_type == "Merge Request Hook":
await handle_merge_request_event(payload, gitlab_client)
elif event_type == "Push Hook":
await handle_push_event(payload, gitlab_client)
elif event_type == "Project Hook":
await handle_project_event(payload, gitlab_client)
elif event_type == "Group Hook":
await handle_group_event(payload, gitlab_client)
else:
logger.warning(f"Unhandled event type: {event_type}")

async def handle_issue_event(payload: dict, gitlab_client: GitlabClient):
issue_attributes = payload.get('object_attributes', {})
issue_id = issue_attributes.get('id')
project_id = payload.get('project', {}).get('id')
action = issue_attributes.get('action') # 'open', 'close', 'reopen', 'update'
logger.info(f"Issue {action}: {issue_id}")

if action in ['open', 'reopen', 'update']:
# Fetch the latest issue data
issue = await gitlab_client.get_single_issue(project_id, issue_attributes.get('iid'))
await ocean.register_raw(ObjectKind.ISSUE, [issue])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are great but I will prefer we keep the main.py clean and move these implementation to a separate file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants