-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopting a (narrow) backward-compatibility standard; implications for 4.0.0 #2967
Comments
I'll read all that when I find some time. Until then, I'm strongly in favour of supporting old models, as far back as possible. Stability is important. Changes I'd propose would go toward making compatibility easier going forward: a dedicated "model migration script", pickling additional metadata inside models, better testing etc. |
I made a detailed writeup & top-level issue because I'd prefer not to make a single-case decision, with some mumbled live-call compromises of the moment, where the full reasoning isn't captured and thus soon forgotten. I'd like to fix this as a source of bad code, technical debt, and drag on future contributions, not just rush out 4.0.0. I'd rather not have either me, nor any other contributor, waste their time on backward-compatibility code that is very likely to never even run, on the unproven theory that someone, somewhere might still be using years-old version A, and iffy feature B, and need to bring that into |
TLDR pros/cons of 3-point policy (now numbered to stick out more) I proposed above: Pros:
Cons:
|
Great, thanks! I'll finish the tutorial migrations today and look into this (serialization, support) tomorrow, so we move forward. |
Sorry Radu got sick yesterday, I'll try to finish over the weekend. |
The proposed policy seems reasonable: a) no guarantees of forward compatibility (=loading of new models in older versions) + b) limited guarantee on backward compatibility (=loading of old models in newer versions). +1 on that. In terms of practical execution, I conceptualize it in two parts below. @gojomo @mpenkov WDYT? 1. Migrating 4.n objects to 4.n+1Each minor Upgrading across multiple hops is the users' responsibility. Ideally as simple as running multiple one-hop scripts back-to-back. The idea of an "upgrade script" was Misha's and I like it because it allows us to move the all the "old model compatibility" stuff out of the model classes ( 2. Migrating 3.x objects to 4.0.0Again, a migration script, but this time migrating any old model to 4.0.0 where possible + error out where impossible. "Any old model" = any model and version we already have the migration code for, I'd put no extra effort into it. I'm not in favour of forcing users to install 3.8.3, load their models, store them in 3.8.3, then install gensim 4.0.0, and load those 3.8.3 models. What advantage would that give us, or them? Proposed roadmap
Let me know if that makes sense. |
Regarding general 4.0 roadmap: This is seeming to grow into a "steps to 4.0 checklist" issue, beyond one particular (back-compatibility) policy choice & supporting work. I'll again suggest there should be a top-level "release 4.0" issue so that discussion/checklists/decisions spanning many concerns, that carry through until final release, don't get hidden deep in other issues/PRs that may be closed/deferred/etc. I've put a temporary workaround for the real-head-scratcher test-failure into #2944 & marked it ready for merge. I think if #2944 is good enough to build off, it should just be merged in I've just made #2975 to describe the loose-threads from many-places that I've been trying to tie up, while also considering #2955. I hope the latest comment in #2955 clarifies what the mmap issue to-be-resolved entails. Regarding backward-compatibility questions A convenience "migrate forward" script is a good idea, but it doesn't really simplify the issue much, because there are many cases where it's trivial, but others where it just expands other complexity. In the pleasant where the design/intent of the
Voila! It will succeed in all the cases already covered by our tests. But that's deceptive, because in all the tricky cases:
And, if actually moving all prior-version fixup knowledge outside the classes themselves, into more dedicated purpose migration module(s), that's helpful in some ways – but also a bunch of extra work up-front, and that spreads the historical class-format info across more disparate places. But there will still often be unavoidable old-version cruft in the original packages, so that old types still deserialize before fixup code can run. And I'd fear that letting arbitrary amounts of old-format-special-casing code collect, far from the current classes, doesn't cap the growing tech-debt of dead-code/untested-code/unrealistic-expectations problems. (But if it meant I could just focus on the current functionality, and loading one-release back for a static set of probe files, while others struggle with arbitrarily-in-the-past issues that only show up later, I'd love it!) I'd prefer pruning the test directory, and methods, to only purport-to-test 3.8.3 models because it creates a well-defined target for what must be kept, and what can start to be discarded. We have a supposed ability to load lots of older models, and models in less-common configurations, simply because (1) it hasn't errored/been-reported-as-failure yet; (2) we have some super-shallow tests of a bunch of older versions through to 1.0. (I disabled pre-1.0 loading tests at one point when they were failing & the fix wasn't clear.) But as noted above, those aren't strong guarantees. If we want to start upgrading our guarantees to what is really tested, it will help to be focused on a smaller, sure-to-be-valuable reference set (some things we save out from 3.8.3 using TBD "version save-out test-suite" routine). If it's instead, "anything that seems to work now and someone might be using", we'll have to leave a lot of suspect, low-value code around. (For example, imagine we had a set of N save-files from |
Tasks are already collected in the 4.0 Milestone; what's needed here is the decision re. serialization & migrations. #2944 wasn't ready for merge that's why I suggest branching off there. But that's a minor technicality, makes no difference. I'll take up #2863 + migration script + #2960 then, branching off of |
Unfortunately, the Milestone can't hold discussion/decisions/reasoning like an issue can. The fact that analysis/rationales/counterproposals are springing up deep in unrelated transient issues is a hint that a necessary forum doesn't exist. I think #2944 is ready for merge & building off it after-merge is a less thorny approach. I'll get on those issues! (Though, any progress on #2873 unlikely until perhaps a final reordering.) |
Yes. I was also thinking of leaving it as "upgrade FROM any old (supported) version TO the current version". Kinda like it is now, but in a single explicit script, rather than Or, slightly less messy: keep the model-upgrade code version-specific, but maintain a look-up table that automatically knows which-version-needs-which-upgrade. Example:
Easy for users, but extra work for maintainers – that look-up table will be easy to forget to update. Not hard or time-consuming, just easy to miss. There must be other options too, this seems like a problem that's as old as time. Probably solved already somewhere. |
Per my other comment, my concern is projecting (in docs or code tools) more confidence in such migrations than is warranted. And similarly, implying maintainers are likely to do new extra work that's (1) not historically been that important; (2) not yet covered by unit tests or clear actual-elucidated-user-requests; (3) kinda boring work that risks being "done" to check-off-an-issue but in a rote-but-not-really-correct way. If the 'script' is more than...
...it's probably overkill. And while such a simple script, as a command-line executable tool, could still be valuable, its docs/output should feature a warning: "Test your model after migration" – not just report a shallow "Success!". |
I re-read the whole discussion with fresh eyes. Thanks for your detailed inputs @gojomo . Here's my conclusion:
Not sure whether I manage 1) and 3) in time for 4.0.0. Ideally yes, but don't want to delay the release too much. Help welcome! |
OK, I updated 2) and 4). @mpenkov IMO 1) and 3) can wait for after the release, not blocking. WDYT? I've been sick since Friday, so I don't think I'll get to it now realistically. |
I agree that 1) and 3) are not blockers, but leaving 1) till later does introduce some risk. More specifically, if 4.0.0 introduces some kind of backwards incompatibility, people will complain, and we will have to address that in a bugfix release. If we do 1) before the release, then we find the bug first, at the cost of delaying the release. Not a huge risk, in my opinion, so I'd probably err on the side of "get this thing out the door". |
From #2939:
Originally posted by @piskvorky in #2939 (comment)
OK, compiled long thoughts here on both general issues & specifics:
As far as I can tell, Gensim hasn't had any formal backward-compatibility policy other than "we'll try to maintain the ability to load older models".
This is generous, and easy when things change very little.
But always considering older models makes larger improvements via refactoring & new ideas harder. In addition to making a system that's simply better for commmon/tested tasks, such work then also has to write translation code that brings old data into the newer format (perhaps with compromises/limits). That code may have to retain/alias/work-around older class/property names, and include long-blocks of special-case handling.
But our features are often a grab-bag of things of wildly-different importance: some flags/options are seldom used (or probably shouldn't be used), while others are central. And our
test_data
directory is filled with droppings of unclear origin & purpose. (What versions/features are tested by 'doc2vec_old' or 'ldamodel_python_2_7'?)What attempts there have been to check backward-compatibility are far shallower than they may look. There's a bunch of version-labeled Word2Vec and Doc2Vec models, but only one per version, likely created with default parameters & tiny toy data. (I don't think the source code that created them is committed/referenceable anywhere.) The load-check they get is just a few rudimentary steps, so whether the real range of functionality has been maintained is unclear.
So this compatibility logic is error-prone code, hard-to-test, and under-tested – even in the best of conditions. And such code just keeps growing, barely tested & maybe unused, as long as it's not directly implicated in some reported bug/test-failure. And sometimes the impulse to "just make the test pass" or "just get the PR merged" has led to unthinkingly adding & duplicating code until it superficially works - as when #1777 added a giant
deprecated
subdirectory (https://github.com/RaRe-Technologies/gensim/tree/3.8.3/gensim/models/deprecated) of entirely-copied obsolete code. This was just to make passing a few back-compatibility tests a little easier, without actually thinking through the object-shape changes.And against this, there may not even be that many people using older models in a way that they'd need to load them in fresh code! If the model is trained & deployed and only using older features, it can and perhaps should keep running in older libraries. Many newer optimizations/features are only relevant when repeating creation/training from scratch, not when simply consulting or incrementally-changing an older model. Many users already have a workflow that refreshes models regularly, or would be better served by truly repeating model-creation in the latest code.
Or if they absolutely need to bring an old model forward, we should often accommodate that via multiple steps – load each model forward a short hop, then re-save, then another short-hop. Theoretically, this should work. In real cases, this might already be broken. But it's a more realistic incremental goal, to at each new version ensure that one hop works, than to maintain a many-versions-back growing hairball for (maybe but probably not really) handling anything. Users could also ask for, or hire, specific help for thorny but important situations - if they even arise. But many-version-hop situations that aren't best served by "train a new model" might be entirely theoretical.
In the meantime, we can make beneficial changes faster, & ease maintenance by:
First, adopting general guidelines about project goals that (a) let users know that only smallish-version-increment later-loads are within project aims, so they have reasonable expectations; and (b) give developers freedom to focus on a well-defined, minimal back-compatibility target - and when that's met, aggressively remove older code debt/complications. I'll say more about the guideline I think would be best further below in the 1-2-3.
(As far back as January I requested: "in particular, it would help to say either: 'gensim 4.0 is only compatible with models saved from 3.x versions; if you have older models, load them in 3.x, re-save them, to bring them to 4.0' - or perhaps even more aggressively: 'gensim-4.0 only tries to support models that were saved from gensim-3.8.1; older models may be loaded & re-saved in 3.8.1 to be brought to 4.0'".
It's been frustrating that when I've been 'warm on the code' & ready to clear out low-value old cruft, I've repeatedly asked about making such a clarifying compatibility decision - but no opinions either way have been shared until just recently. Having a reasoned-out standing policy would avoid this frustration & stunted progress.)
Second, clearing out the old, poorly-labeled, poor-coverage test-files and test-methods, and replacing them with a tighter set of current tests. Some steps in this process could be:
test_data
directory where we're logging file-touches - so that obsolete, untouched files can be finally discardedtest_data
, and new with load/functionality test methods against them added. But, these files would only be retained as long as that specific source version is within the stated back-compatibility goals, and promptly pruned when it moves out of the window.So, what could the default cross-version compatibility policy be?
I suggest the following as a baseline, subject to only rare exceptions:
Loading of models saved from newer versions, backward into older versions, is never an explicit goal, and will only happen by luck. That is, a save from version MAJOR.MINOR.MICRO might load in MAJOR.MINOR.(MICRO-N), or MAJOR.(MINOR-N).M or (MAJOR-N).M.P. But it's never a requirement, and always OK for development-in-progress to assume saved models need only be loaded into the same or later versions. (I believe this is pretty much assumed already.)
Fully-functional loading of models saved from older versions, into newer versions, is a strong goal when the MICRO version has incremented by any amount, or the MINOR by 1. That is, if the save is from version MAJOR.MINOR.MICRO, users should expect, and developers should try maintain, that it can be loaded into any MAJOR.MINOR.(MICRO+N) or MAJOR.(MINOR+1).N version. (Any exceptions, as perhaps if a feature is necessarily discontinued or onerous to maintain, would be the kind of thing to be highlighted prominently in release-notes.) Many times – indeed most of the time given that lots of code doesn't change much – models will successfully load in much-later versions. But this is the minimum we're aiming for, and it's generally OK when devs making other substantive improvements break any loads into a (MINOR+2)-or-more future version.
When the MAJOR version increments, it is only a strong goal for saves from the final release of the prior MAJOR version to load with full-functionality. That is, in versions MAJOR.0.N, only saves from (MAJOR-1).LAST_MINOR.LAST_MICRO are by default considered must-loads. Earlier saves might load, but if they don't, the recommended strategy will be to load them in the latest version that supports them, then re-save, & repeat until reaching MAJOR.0.N.
In the context of the current situation with a pending 4.0.0 release, adopting this policy as-is would mean:
In the future, having this as a standing policy – rather than something to be re-discussed each release or PR – will make it easier to improve things with confidence about how much backward-compatibility work is expected, and to know when older tests/test-files can be retired. It means any contributor, whether an old-hand or newbie, has a much more narrow & comprehensible set of things to consider. They then have a chance of improving things rather than "typing & hoping" around scary blocks of "who knows what might break if I change this" code.
The text was updated successfully, but these errors were encountered: