Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#3042 python client updating data stream #3168

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

IsaakKrut
Copy link
Contributor

@IsaakKrut IsaakKrut commented Aug 20, 2024

Purpose

Closes #3042

Remarks

When testing I noticed the Data Stream is being updated in the database, but not in the UI. Should I use a different method to update the Data Streams?
image
image

Also, should any other functionality be implemented to automatically handle the changes?

PR introduces (a) breaking change(s): no

PR introduces (a) deprecation(s): <no

@github-actions github-actions bot added java Pull requests that update Java code python Pull requests that update Python code backend Everything that is related to the StreamPipes backend testing Relates to any kind of test (unit test, integration, or E2E test). labels Aug 20, 2024
@tenthe
Copy link
Contributor

tenthe commented Aug 21, 2024

@IsaakKrut  thank you for this PR. The code looks good, but before we proceed, I think we should clarify some conceptual ideas in StreamPipes that are not quite clear yet.

General Considerations:

  • Adapters vs. Data Streams: These are two distinct elements in our architecture, but they have interdependencies. We are currently not fully consistent with the concept due to these dependencies.

Topic 1:

  • Data Source Changes: Your PR modifies the Data Source only. When we change Data Sources, we may also need to update the Pipelines. The functionality for handling this in Adapters can be found in AdapterResource.performPipelineMigrationPreflight. A similar approach might be required for Data Source updates.

Topic 2:

  • Event Schema Handling:
    • This is not directly related to your PR, but it is an open issue in our system that your PR brought to light.
    • Problem: An Adapter creates a Data Stream. A user then changes the Event Schema of this Data Stream via the API. This can lead to a situation where the Adapter and the Data Source have different Event Schemas.
    • I'm not yet sure how to resolve this. We can discuss this separately if needed.

I think Topic 2 should be discussed in a separate thread, as there are additional points that need clarification.

How should we proceed with Topic 1? We could either:

  • Option 1: Address it in this PR by implementing the necessary updates for Data Source changes.
  • Option 2: Leave it for now and create a separate issue and PR after we have a solution for Topic 2.

@IsaakKrut
Copy link
Contributor Author

Hi @tenthe, I can work on the option 1. I have a couple of questions though.

  1. When updating the data stream should any changes be propagated to the corresponding AdapterDescription? They share fields inherited from NamedStreamPipesEntity. For example, I noticed name and description are set to the same value on adapter creation from the UI.
  2. Current updates only update EventSchema on the pipeline DataStream. Should this new API do the same or it should replace the whole DataStream object?

@tenthe
Copy link
Contributor

tenthe commented Aug 23, 2024

Hi @IsaakKrut,

good questions. Here are some thoughs from my side.

Propagation changes to AdapterDescription:
Propagating changes to the event schema directly to AdapterDescription could be problematic, as it might require modifications to the adapter itself, which might be different for each adapter type. Currently we think about that adapters should manage their own event schemas, which would change how we display event streams and adapter instances in the pipeline editor. Then pipelines could either have a data stream or an adapter as a data source. Currently, each adapter creates a data stream as a seperate object that needs to be kept in sync. I suggest we discuss this further in a separate issue.

Updating DataStream:
Regarding this PR I suggest to propagate event schema changes in the DataStream through the pipeline, similar to how we handle adapter updates. Check AdapterResource.performPipelineMigrationPreflight() for reference, and the update logic is in AdapterUpdateManagement.updateAdapter at line 75.

Feel free to share your thoughts or ask any questions. I'm also not entirely sure what the best solution is, so we can discuss this further.

public class DataStreamResource extends AbstractAdapterResource<AdapterMasterManagement> {

public DataStreamResource() {
super(() -> new AdapterMasterManagement(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me what the reason is why you added this code? I do not completely understand it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed access to AdapterMasterManagement instance. I used the same pattern as in AdapterResource to bring it in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Everything that is related to the StreamPipes backend java Pull requests that update Java code python Pull requests that update Python code testing Relates to any kind of test (unit test, integration, or E2E test).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatic Handling of Data Stream Changes in Python Client
2 participants