Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose optimal sync modes by default on UI #9625

Closed
sherifnada opened this issue Jan 20, 2022 · 20 comments · Fixed by #10320, #12411 or #12770
Closed

Choose optimal sync modes by default on UI #9625

sherifnada opened this issue Jan 20, 2022 · 20 comments · Fixed by #10320, #12411 or #12770
Assignees
Labels
area/frontend Related to the Airbyte webapp autoteam priority/high High priority project/onboarding-improvements type/enhancement New feature or request

Comments

@sherifnada
Copy link
Contributor

sherifnada commented Jan 20, 2022

Tell us about the problem you're trying to solve

Today a user needs to configure every sync mode by hand on the UI. If there is more than a handful of streams, this is a very arduous process.

Describe the solution you’d like

Most of the time we know the best sync mode to use is. In accordance with ux best practices, the UI should automatically choose these sync modes so the user doesn't have to configure it by hand.

If there is a source-defined cursor on a stream, the best sync modes, in order, are:

  • append dedupe
  • append

If there is no cursor defined:

  • full refresh overwrite

If we require a user defined cursor field, the user still needs to do this by hand, so I'm not sure there is any way around needing user input. The real solution is getting rid of user-specified cursors if possible :)

@sherifnada sherifnada added type/enhancement New feature or request area/frontend labels Jan 20, 2022
@sherifnada
Copy link
Contributor Author

cc @misteryeo

@jrhizor is this something that would make sense to put on the roadmap for GA? it's a high value low hanging fruit for usability.

@jrhizor
Copy link
Contributor

jrhizor commented Jan 24, 2022

@jamakase should we remove the default selection we talked about last week if this is in place?

@jamakase jamakase self-assigned this Feb 2, 2022
@jamakase
Copy link
Contributor

jamakase commented Feb 2, 2022

@jrhizor what to do with pkeys and cursors in this case? Should we then just pick any field? Or we should leave them empty and let users select them with bulk edit?

@jrhizor
Copy link
Contributor

jrhizor commented Feb 2, 2022

@sherifnada what do you think? Leave empty?

@sherifnada
Copy link
Contributor Author

Maybe this is too complex but magic wand version is

if an "optimal" setting requires a little bit of user input (e.g: incremental requires setting a cursor field but there's only like 10 streams in total), then I'd suggest we choose incremental, keep those fields blank, and nudge the user towards filling them out.

If the optimal setting requires a ton of user input (e.g: more than 10 such streams need input) then we just choose the easiest thing i.e: full refresh

@misteryeo do you have any thoughts on this user interaction?

@jrhizor
Copy link
Contributor

jrhizor commented Feb 3, 2022

@sherifnada since we have bulk edit can we just require cursor fields overall?

@sherifnada
Copy link
Contributor Author

the only issue is user-input cursor fields might be different for each stream, making it still cumbersome to do. But it's a two way door to do the approach you're describing. We can adjust from there if needed.

@misteryeo
Copy link
Contributor

misteryeo commented Feb 7, 2022

I love the premise behind this issue as we want to reduce the level of cognitive overhead for a user - so where possible, I think we should determine what makes the most sense as the optimal sync modes.

But I'm not sure I agree with this approach:

If the optimal setting requires a ton of user input (e.g: more than 10 such streams need input) then we just choose the easiest thing i.e: full refresh

The optimal sync should be decoupled from whether or not the user has to make more/less adjustments on the cursor fields. We should still choose whichever is the optimal sync mode (which is what's more important) and make sure the UI clearly indicates that they will need to fill in the user-defined cursor fields. And given we will have bulk edit soon, this seems reasonable for user input and we can iterate as needed later.

@sherifnada
Copy link
Contributor Author

Fwiw the reason I think that approach may be preferable is because Bulk edit is not useful in this particular case unless all cursor columns across all tables have the same name e.g updated_at otherwise the user still has to manually choose the cursor column for each table one by one. If I understand bulk edit correctly it's not a 100% solution because the user still needs to choose Which tables to bulk edit which means they potentially need to choose 100+ tables if there are that many

@jrhizor
Copy link
Contributor

jrhizor commented Feb 7, 2022

Number of streams often would correlate with the size of the db, so presumably it's even more important that they configure the correct/optimal settings?

@jamakase
Copy link
Contributor

@jrhizor @sherifnada Actually, what I have understood - syncMode and destinationSyncMode are already prefilled by backend and we just pick it from that values. Shouldn't we change it on the backend instead, that the catalog returns expected syncModes? Or it is fine to overwrite default values in create mode?

@timroes
Copy link
Collaborator

timroes commented Mar 12, 2022

Reopening, since we reverted the change.

@timroes
Copy link
Collaborator

timroes commented Mar 14, 2022

We had to revert this, since it caused a lot of confusion that the form didn't visually highlight that the cursor field and primary key field are required and were not set. Despite us needing to make this visually clearer and improve on the error message (tracked via #11084), I wanted to check in given this new information on the behavior we want to implement here for the next version of this.

Given the large amount of streams that a source can deliver, it feels for me (and came up in the discussion during our incident), that even with visual highlighting of those fields, it's quit a UX issue for users that they need to go and cnofigure all those fields themselves and doesn't contribute towards users being able to more easily setup a connection.

My suggestion would be, that we only apply the incremental sync mdoe by default, as long as we actually have a a source defined cursor field and/or primary field, and thus won't require further user input. As long as we require the user to configure additional fields, I'd suggest we stay with full sync - overwrite as a default, to make sure a user will be able to save this form with the minimum amount of steps to get a connection setup.

@pmossman @benmoriceau Since you were all involved in the incident I'd be happy to hear your thoughts around this.

@misteryeo I know that we above mentioned the bulk edit UI as a potential mitigation factor in why we're fine having users to need to configure these fields by themselves. Given the past confusion we've seen, I am not sure if we should stick to this opinion. Would be happy for your input.

@misteryeo
Copy link
Contributor

Issue was linked to Harvestr Discovery: Default Sync Modes

@sherifnada
Copy link
Contributor Author

@timroes That UX makes sense to me. So the new rules are:

if source has_incremental && source_defined_cursor
  if destination has_dedupe
    return incremental, append_dedupe
  else if destination has append
    return incremental, append
  else if destination has overwrite
    return full_refresh, overwrite
  else
    error
else 
  # if source only has FR or user-configured-incremental we prefer overwrite 
  if destination has overwrite
    return full_refresh, overwrite
  else if destination has incremental
    return full_refresh, append
  else
    error

@timroes timroes changed the title Choose optimal sync modes by default on UI t Apr 4, 2022
@timroes timroes changed the title t Choose optimal sync modes by default on UI Apr 4, 2022
@timroes
Copy link
Collaborator

timroes commented Apr 6, 2022

Talking offline to @nataliekwong and @sherifnada we decided to change the order here to:

if source has_incremental && source_defined_cursor
  if destination has_dedupe
    return incremental, append_dedupe
  else if destination has overwrite
    return full_refresh, overwrite
  else if destination has append
    return incremental, append
  else
    empty selection
else 
  # if source only has FR or user-configured-incremental we prefer overwrite 
  if destination has overwrite
    return full_refresh, overwrite
  else if destination has append
    return full_refresh, append
  else
    empty selection

@timroes
Copy link
Collaborator

timroes commented Apr 21, 2022

Refining that above, into a bit more natural logic/priority list:

if source.has_incremental && destination.has_dedupe && source.source_defined_cursor then
  incremental, append_dedupe (with source_defined_cursor)
else if destination.has_overwrite then
  full_refresh, overwrite
else if source.has_incremental && destination.has_append then
  incremental, append
else if destination.has_append then
  full_refresh, append
else
  no default selection (should in practice never happen)

@sherifnada @nataliekwong I don't think we talked the last time about that, but do we want to use incremental, append in the above logic even if there isn't a source_defined_cursor and thus the user would need to select the cursor field herself? Or should we only chose incremental, append if a source_defined_cursor is also available?

@nataliekwong
Copy link
Contributor

I prefer the latter, to only choose it if a cursor is defined.

As an aside, I also had thought from our conversation we would realistically never hit this scenario given we expect all connectors to have full refresh | overwrite, so this was just a fail safe?

@timroes
Copy link
Collaborator

timroes commented Apr 21, 2022

As an aside, I also had thought from our conversation we would realistically never hit this scenario given we expect all connectors to have full refresh | overwrite, so this was just a fail safe?

Yes that's still true, it should in practise not really happen.

@timroes
Copy link
Collaborator

timroes commented May 5, 2022

Reopening this (once more), since the change got reverted. The issue with the last implementation was, that it did always show the default optimal sync mode, even when editing a connection that already had different sync modes set. Our new implementation must make sure it's working also for existing streams and does not overwrite the already set stream sync mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment