Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro run CLI incorrectly splits the names of nodes at commas #1828

Closed
jmholzer opened this issue Sep 5, 2022 · 4 comments
Closed

kedro run CLI incorrectly splits the names of nodes at commas #1828

jmholzer opened this issue Sep 5, 2022 · 4 comments
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation

Comments

@jmholzer
Copy link
Contributor

jmholzer commented Sep 5, 2022

Description

Running kedro with the command kedro run --from-nodes "two_inputs([A0,B0]) -> [C1]" will cause an error: Pipeline does not contain nodes named ['B0]) -> [C1]', 'two_inputs([A0'].

Kedro has incorrectly split the name of the intended Node into the names of two Nodes that do not exist.

Context

This bug prevents the user from re-running kedro from any Node with a comma in its name. This is the case for any Node that is not explicitly given a name and has more than one input.

Steps to Reproduce

  1. Create a kedro project
  2. Attempt to run the pipeline from a node with a comma in its name (kedro run --from-nodes 'My node, a good node')
  3. Receive the error Pipeline does not contain nodes named ['a good node', 'My node'].

Expected Result

Kedro should try to run from the node with the specified name.

Actual Result

Kedro attempts to run from multiple nodes, with names drawn from the result of splitting the intended Node name at commas.

@noklam
Copy link
Contributor

noklam commented Sep 5, 2022

Linking this issue to #1795. This happens when node name is not specified and Kedro will generate a name automatically.

@antonymilne antonymilne added the Issue: Bug Report 🐞 Bug that needs to be fixed label Sep 5, 2022
@antonymilne
Copy link
Contributor

antonymilne commented Sep 5, 2022

As discussed in backlog grooming, this is a very long-standing issue bug and it would be nice to get it fixed. I've looked through our old Jira board to get the previous discussion for context too.

Possible solutions here:

  1. some syntax where you can escape a comma in a node name with a backslash so it doesn't get split by _split_string
  2. alter node name syntax to use ; rather than , - quick but hacky fix. This is a breaking change though (won't affect anyone using --from-nodes etc. arguments because those don't work, but would affect someone using an automatically-generated node name in --node though)
  3. @jmholzer idea of a fine-state machine csv-parsing type solution to do the string splitting rather than just blindly splitting on ,. This would basically mean that , inside () would basically not split the string. It wouldn't solve the case of a comma explicitly being used inside a node name, but it would solve the automatically-named node case, which is way more important
  4. change the resume scenario suggestion to use from-inputs instead. This should completely circumvent problems around commas (unless you choose to use a comma in your catalog entry name... but no one does that)
  5. change --from-nodes to be a multiple=True argument in click, just like --node is. This seems pretty inconsistent currently actually: you do --node node_1 --node node_2 but --from-nodes node_1, node_2. Why not --from-nodes node_1 --from-nodes node_2? At a glance this reads a bit weird to me though 🤔

Overall I like (in order): option 5/3, 4, 1, 2.


We don't need to solve this part at the same time, but just for completeness, let me note here part of the previous discussion that questioned how kedro automatically generates a name for a node. There was research piece on this done before but I'm not sure what the outcome was - we'd need @yetudada or @idanov for this I think. Some options I vaguely recall though:

  • hash the node somehow to generate a label
  • randomly generated words used to name nodes - like the above but more human-readable

Side-note: personally I'm not a big fan of exactly how the current automatic node name is formatted. What do the square brackets add? Why is there no space after the comma? I think this is a historical hangover from the days when kedro inputs had to be specified as a list.

If we decide we want to do one of these then fixing the case with commas will likely become irrelevant, because an automatically generated node name would no longer have a comma in it. Possibly we should still consider doing option 5 above though for consistency.

@antonymilne antonymilne added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Sep 5, 2022
@merelcht
Copy link
Member

merelcht commented Nov 9, 2022

This was discussed in Technical Design

  • The team agreed that we should fix the way commas are split and also alter the way we generate the automatic node name for 0.19.0.
  • @ankatiyar checked that nodes can't have commas in the name so we don't have to worry about the use case where users create nodes with commas.
  • We also discussed the idea of using --from-inputs instead of --from-nodes for the re-run suggestion but as @jmholzer pointed out the re-run logic is very complex, so while this is a good idea it would be a lot of work.

The actions to be done after this dicussion:

@merelcht
Copy link
Member

merelcht commented Nov 9, 2022

Closing this issue in favour of the follow up actions from Tech Design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation
Projects
Archived in project
Development

No branches or pull requests

4 participants