Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean-up version history #2067

Open
1 of 2 tasks
ulliholtgrave opened this issue Feb 14, 2023 · 7 comments
Open
1 of 2 tasks

Clean-up version history #2067

ulliholtgrave opened this issue Feb 14, 2023 · 7 comments
Labels
effort: high Big change, which requires >12h feature New feature or request prio: medium Should be scheduled in the forseeable future.
Milestone

Comments

@ulliholtgrave
Copy link
Member

ulliholtgrave commented Feb 14, 2023

Motivation

If users are editing a single page for a couple of hours a lot of automatic saves are created which results in a very crowded versioning view and a lot of unnecessary database objects.

Proposed Solution

If the user triggers a manual save delete all automatic translations for this specific translations, but keep the latest 3 as a safeguard. (this is not wished anymore as of May 2024)
instead:

  • Only keep auto saves since the last manual save
    • Autosave field: translation.status == constants.status.AUTO_SAVE
  • Clean up of the current history
    • Past autosaves cannot be identified by the status field, thus:
      • We compare version to each immediate version whether they have the same content and only keep the newest version where we can't count on the auto save status to be accurate
      • Infer autosaves, which means: Delete all versions that are only one minute (+/- tolerance) apart because manual saves wouldn't happen in that speed
    • Write database migration to delete all versions we identified as auto saves

Alternatives

Adapt the view to only show a few automatic translations?

Additional Context

Design Requirements

@ulliholtgrave ulliholtgrave added feature New feature or request good first issue Good for newcomers prio: medium Should be scheduled in the forseeable future. labels Feb 14, 2023
@ulliholtgrave ulliholtgrave added this to the 23Q1 milestone Feb 14, 2023
@timobrembeck timobrembeck changed the title Delete all but the last 3 automatic saves after a manual save. Delete all but the last 3 automatic saves after a manual save Feb 16, 2023
@ulliholtgrave ulliholtgrave added prio: low Not urgent, can be resolved in the distant future. and removed prio: medium Should be scheduled in the forseeable future. labels Feb 17, 2023
@timobrembeck
Copy link
Member

Important note: We expect the version history to be consistent in several places of the CMS, so you would have to squash the version numbers to make sure there are no gaps.

PeterNerlich referenced this issue Mar 2, 2023
TODO: fix IntegrityError on update() (line 490):
  auto_saves.filter(version__gte=lastN).update(version=F('version') - deleted)
@PeterNerlich
Copy link
Collaborator

PeterNerlich commented Mar 21, 2023

We have probably identified a bug in the current logic: When saving a translation version as Draft or Published (if it is a minor edit), all versions of the translation (for Draft even all of the descendant languages in the language tree) are set to Draft/Published. We expect that only the status of the most recent version in each language is supposed to be updated.

If we get confirmation that this is the intended effect, we could cover this in our PR since we are already touching that code.

Edit: Summary from the team call: This emerged for historic reasons, the intention is for previously published versions to be reduced to Draft as well if the current translation is saved as Draft. We will adjust the queries to only affect published versions, keeping auto saves and versions with other status as they are.

@timobrembeck timobrembeck modified the milestones: 23Q1, 23Q2 Apr 3, 2023
@timobrembeck timobrembeck modified the milestones: 23Q2, 23Q3 Jul 2, 2023
@svenseeberg svenseeberg modified the milestones: 23Q3, 23Q4 Oct 4, 2023
@timobrembeck timobrembeck modified the milestones: 23Q4, 24Q1 Jan 10, 2024
@JoeyStk JoeyStk added the question Further information is requested label Apr 3, 2024
@JoeyStk
Copy link
Contributor

JoeyStk commented Apr 3, 2024

At the conference in March, we realized that we're stuck with this issue and will discuss it in a separate meeting again

@JoeyStk JoeyStk added prio: medium Should be scheduled in the forseeable future. and removed prio: low Not urgent, can be resolved in the distant future. labels Apr 3, 2024
@JoeyStk JoeyStk modified the milestones: 24Q1, 24Q2 Apr 3, 2024
@JoeyStk JoeyStk changed the title Delete all but the last 3 automatic saves after a manual save Clean-up version history May 25, 2024
@JoeyStk JoeyStk removed the question Further information is requested label May 25, 2024
@JoeyStk JoeyStk assigned JoeyStk and unassigned mael-fosso Jun 2, 2024
@timobrembeck
Copy link
Member

Infer autosaves, which means: Delete all versions that are only one minute (+/- tolerance) apart because manual saves wouldn't happen in that speed

This is not needed for a new feature, since the behavior will change anyway (see #2067 (comment)).

I'm against introducing complicated functionality only to comply to the historic database state. If we want to cleanup the past history, we should do so in a database migration, and then for the future we should just not change autosaves to normal draft/public versions anymore.

Add management commands

Why adding management commands for this when the functionality to do this as part of the form saving process & bulk actions is nearly finished? #2219

@PeterNerlich
Copy link
Collaborator

Infer autosaves

This is not needed for a new feature, since the behavior will change anyway (see #2067 (comment)).

If we want to cleanup the past history, we should do so in a database migration

Why adding management commands for this when the functionality to do this as part of the form saving process & bulk actions is nearly finished? #2219

The discussion at the conference was all about #2219 and how to deal with past history. #2219 is about fixing the behaviour for the future. The management commands and strategy mentioned above are to fix the past history and thin out the database, without deleting important data – whether in management commands, migrations or otherwise is not important at the end of the day. Feel free to suggest better technical ways to implement that strategy. About the strategy itself, we are only reflecting what was discussed and agreed upon at the conference with @dkehne.

@timobrembeck
Copy link
Member

The management commands and strategy mentioned above are to fix the past history and thin out the database, without deleting important data – whether in management commands, migrations or otherwise is not important at the end of the day.

Ok, I just wanted to point out that management commands are for recurring tasks, and not so useful for one-time tasks. So I'd prefer a migration script for this. I'll edit the issue description.

@timobrembeck timobrembeck added effort: high Big change, which requires >12h and removed good first issue Good for newcomers labels Jun 2, 2024
@david-venhoff
Copy link
Member

It would probably also make sense to cleanup some other pages. For example, we have over 1.7 million page translations about corona (including pages that were regularly updated with the latest numbers, for all languages) which are not really required anymore but still slow down every query that searches through page translations.

@MizukiTemma MizukiTemma modified the milestones: 24Q2, 24Q3 Jul 1, 2024
@PeterNerlich PeterNerlich modified the milestones: 24Q3, 24Q4 Aug 27, 2024
@JoeyStk JoeyStk modified the milestones: 24Q4, 25Q1 Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: high Big change, which requires >12h feature New feature or request prio: medium Should be scheduled in the forseeable future.
Projects
None yet
Development

No branches or pull requests

8 participants