Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make session_id customisable. #1731

Open
merelcht opened this issue Jul 27, 2022 · 7 comments
Open

Make session_id customisable. #1731

merelcht opened this issue Jul 27, 2022 · 7 comments

Comments

@merelcht
Copy link
Member

merelcht commented Jul 27, 2022

Description

#1551

Implementation

  • Add a new argument session_id to the KedroSession.create() method
  • Verify the session_id against the VERSION_FORMAT format. Basically, it needs to be a timestamp of a specific format. This is to prevent users from picking a session id that will mess up the versioning system in Kedro.
  • Update deployment docs to describe how to customise the session id. In the docs we'll suggest users to use the generate_timestamp() method as session_id.
  • Be very clear in the documentation that we do not recommend people to change the session_id, because it's linked to the save_version id used for versioning.
@noklam
Copy link
Contributor

noklam commented Jul 27, 2022

I am still a bit concern about letting the user to overwrite the timestamp and have their own incremental id.

I think it will end up to be some incremental id or just appending on the timestamp(which is not a meaningful or human readable text anyway). If so, there are not much benefit added than by just letting user to "append" text to the id. This way we avoid the risk of messing up the order of version.

It would be particular confusing if user are using both custom id logic sometimes and the default timestamp, it could end up loading the wrong version even though both are sortable and incremental id.

@datajoely
Copy link
Contributor

I think we should have a customizable suffix and and a fixed sortable prefix

@merelcht merelcht added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Oct 17, 2022
@merelcht
Copy link
Member Author

This issue was discussed in Technical Design and we decided on the following implementation:

  • We'll add a new argument session_id to the KedroSession.create() method
  • This argument will be verified, because it needs to fit the VERSION_FORMAT format. Basically, it needs to be a timestamp of a specific format. This is to prevent users from picking a session id that will mess up the versioning system in Kedro.
  • In the docs we'll suggest users to use the generate_timestamp() method as session_id
  • We will only mention the customisation of the session_id in the deployment docs, because that's the only use case we recommend customising the session_id for.
  • We will be very clear in the documentation that we do not recommend people to change the session_id, because it's linked to the save_version id used for versioning.

@merelcht merelcht removed the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Oct 26, 2022
@fazilhero
Copy link

In order to use versioning and to be able to run nodes in a cloud - mostly one node at a time - current session approach doesn't work. We need a way to to be able to make sure we get to use the same session when i run different nodes in a different machines.

@noklam
Copy link
Contributor

noklam commented Jul 28, 2023

We need a way to to be able to make sure we get to use the same session when i run different nodes in a different machines.

@fazilhero Thanks for your comment, can you elaborate on what you need and what do you mean by "same session"?

I think we should have a customizable suffix and and a fixed sortable prefix

Would this be enough to solve your problem, or you need your own way of defining the session_id? One major concern we have is the session_id is couple with versioning, and it needs to be sortable otherwise incorrect version could be loaded.

@fazilhero
Copy link

Sure, @noklam ,

Imagine the following case, i have 10 noes which creates linear workflow. If I enable versioning and try to run kedro run --nodes=... in a single machine in cloud to execute a single node it creates a timestamp when that ran. When I want to run the second node, with the same command this time it creates another timestamp so data between don't work. The conclusion is the ability to replicate the Session across different machines. Does that help?

@astrojuanlu
Copy link
Member

For reference, we are talking about this piece of code:

session = cls(
package_name=package_name,
project_path=project_path,
session_id=generate_timestamp(),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

6 participants