Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] unified scheme for snapshot versions #345

Open
AMDmi3 opened this issue Sep 28, 2017 · 32 comments
Open

[discussion] unified scheme for snapshot versions #345

AMDmi3 opened this issue Sep 28, 2017 · 32 comments

Comments

@AMDmi3
Copy link
Member

AMDmi3 commented Sep 28, 2017

TL;DR: see Summary below

So, we have support for normal versions, and we now also have a special support for prerelease versions. However, we still have to ignore a lot of packages most of which are snapshots. Sometimes snapshots are necessary evil and cannot be avoided. For example, if the release has fatal bug, or when upstream is dead, but there are useful commits in the master branch.

Now I wonder if repology can improve the situation by suggesting some kind of unified unambiguous snapshot format, so snapshot versions from different repos COULD be comparable.

The ideas on format:

  • It must be post-known-version format. E.g. if the latest known version is 0.4.7, the snapshot must be 0.4.7something, not 0.4.8something, because there should be no guessing on what the next version would be.
  • It must contain a date. Git commit hashes are not monotonic so are meaningless in version numbers, revision numbers have gone with svn, and are difficult to count for DVCSes.
    • The date should be of some fixed monotonic format. YYYYMMDD is an obvious choice.
    • The date should refer to commit date, not package creation date (otherwise, it's meaningless to compare)
    • Since the date does not contain time, it should be in UTC to avoid misinterpretation due to timezone difference
  • It must support the case where there's no past version at all.

Well, I don't see many choices on a format here, it's obviously 1.2.3somethingYYYYMMDD (or somethingYYYYMMDD when there's no past version). From repology point of view, it's the same as 1.2.3.somethingYYYYMMDD, so distros may use additional dot on their discretion.

So, we have to decide what to use as something.

  • We can't use most of currently used keywords (git, svn, bzr) as they are ambiguous and often used with "pre" meaning.
  • The only unambiguous keywords with "post" meaning are currently patch and post (I've just discovered the latter, and added support to libversion), but they are sometimes used upstream (hdf5, some python ports)

So, either we have to invent a new keyword, which has apparent "post" meaning and is not used upstream, or we could use one of post or patch ignoring their use upstream (which is not that wide). Inventing a keyword seem to be preferable. So, the ideas?

  • postsnap
  • post* (e.g. allow postgit, postsvn and whatever which begins with post, and compare them equally)
  • plus

Additional thoughts:

  • I do not like the idea of forcing anything repology-specific on repos at all - it doesn't look right, it creates tension and it never works to 100%. Instead, I'd gladly leave all snapshots ignored or fix them with rules on a per-project on-demand basis.
  • However, I still think that suggesting and favoring explicit pre and post suffixes to git/svn/hg/cvs would be beneficial on a global scale.
  • Not all distros will be able to properly support this schema anyway. Some don't like letters and tend to avoid them. Some allow letters, but compare them to numbers in a fixed way, e.g. 1.2preXXX and 1.2postXXX would both be lesser to 1.2.

Summary

When packaging snapshots, let's

  • use last known official version (not the supposed next one) as a base
  • add explicit post keyword after it (anything allowed after post, e.g. postgit, postsvn)
  • use UTC date of a snapshotted commit (not the date of packaging!)

The version of snapshot which comes after official 4.7 version may thus look like

  • 4.7postgit20170928 or
  • 4.7.postgit20170928 or
  • 4.7post20170928

See how it's better than:

  • 4.7git20170928 (it is not known whether snapshot is taken before or after 4.7)
  • 4.7git1234f6a (commit hashes are meaningless in versions, as they are not monotonic; however you may still append it: 4.7postgit20170928.1234f6a and I think we can still make it comparable in a sane way)
  • 4.8git20170928, 4.8pre20160928 (you are guessing which the next version would be, and you may be mistaken. For instance, it may be 4.7.1 which would make the version go backwards)

Note that this schema is not something synthetic and new, it's just a refinement of widely used VERSIONwordDATE schema which provides an explicit and unambiguous information on a snapshot which was packaged. As a side affect, it makes it possible for repology to compare these snapshots.

@blshkv
Copy link

blshkv commented Sep 29, 2017

Nice write up, seems like a proper solution.

I have one comment regarding "guessing" of a next version. Often, 4.8git20170928 is "guessed" based on source code where the author has changed it from the last release 4.7 and it is reflected using --version parameter or displayed when you run it. I agree that there is still no guarantee that a next version will be called 4.8 but there is a hope that it will be not below that version at least. 4.7post20170928 is more universal and straightforward solution for this problem although the "official" version might be higher

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 29, 2017

Well, having next version explicitely defined in the upstream code/documentation justifies using 'pre' somewhat, but there still is no guarantee that another version will not be released instead, messing everything up. "Post" way is bulletproof though.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Dec 20, 2017

I've just ran into post suffix used in actual official version:

https://pypi.python.org/pypi/flake8-builtins/1.0.post0

Which makes me think that the only option is really verbose unique suffix such as V.V.VpostsnapshotYYYYMMDD

@blshkv
Copy link

blshkv commented Dec 20, 2017

I think you should take any standard version scheme and normalise all software to it.
Software authors have way too many different creative ideas how to call their releases.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Dec 20, 2017

It is not possible.

@davidak
Copy link
Contributor

davidak commented Apr 29, 2018

What about using YYYY-MM-DD as a more human readable date format? Just 2 more characters, but way more readable.

or somethingYYYYMMDD when there's no past version

Do we really need something in that case?

We in nixpkgs often use just YYYY-MM-DD. We should use soemthing like post when there is a past version, but is there any problem for snapshot versions?

What if you want to package a second change at one day? Maybe YYYY-MM-DD-1?

@AMDmi3 do you plan to create a page with suggestions for package maintainers like this?

I think this is a great initiative to align versioning in software repositorys! Have you invited the packaging community of the major repositorys? Some more opinions and ideas might be helpful and they might be more willing to adopt this when they had a chance to participate in the discussion.

@blshkv
Copy link

blshkv commented Apr 29, 2018

I disagree with "-" ideas. For long file names it might break to a second line in some WM and it will become unreadable. Also, these two extra chars do no bring any value.

@davidak
Copy link
Contributor

davidak commented Apr 29, 2018

these two extra chars do no bring any value

It brings the value that it is more readable to humans #accessibility

Also, i just randomly found this XKCD comic about ISO 8601 again.

iso_8601

@blshkv
Copy link

blshkv commented Apr 30, 2018

Well yeah, but i feel like you didn't read my reasons. We are talking about version numbers, not about date standards

@AMDmi3
Copy link
Member Author

AMDmi3 commented May 5, 2018

What about using YYYY-MM-DD as a more human readable date format? Just 2 more characters, but way more readable.

I doesn't have to be readable (though I don't see any readability problems with YYYYMMDD), it must be simple, unambiguous and close to schemes which are already widely used.

repology=> select count(*) from packages where version ~ '20[0-9]{2}-[0-9]{2}-[0-9]{2}';                                                                                                                                                                             
 count                                                                                                                                                                                                                                                               
-------                                                                                                                                                                                                                                                              
  1859                                                                                                                                                                                                                                                               
(1 row)                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                     
repology=> select count(*) from packages where version ~ '20[0-9]{6}';                                                                                                                                                                                               
 count                                                                                                                                                                                                                                                               
-------                                                                                                                                                                                                                                                              
 66379                                                                                                                                                                                                                                                               
(1 row)                                                                                                                                                                                                                                                              

Also, some repositories do not support dashes in versions.

or somethingYYYYMMDD when there's no past version

Do we really need something in that case?

Yes, because when the actual version is released, it would automatically be ordered after somethingYYYYMMDD, but not YYYYMMDD, and for the uniformity sake.

Actually, I've just found out that from libversion perspective something1 is less than 0something1, while I'd expect them to be equal. May be related to repology/libversion#14, but anyway we may want to require 0something to make it miscomparison-proof and less ambigous. Or no, depending on how we and others do/want to handle versions like alpha1 (see below).

We in nixpkgs often use just YYYY-MM-DD. We should use soemthing like post when there is a past version, but is there any problem for snapshot versions?

These cases should not be separated, as the proposed snapshot scheme must coexist with past and future versions. Any scheme without prefix will break as soon as first official version is released. So the proposal is to treat all snapshots based on some upstream version, 0 if there isn't one. Actually even that will break if upstream releases e.g. alpha1, unless something is treated very specially (everywhere) which I'd like to avoid, to make the scheme usable with any generic version comparison algorithm, even not as elaborate as libversion.

What if you want to package a second change at one day? Maybe YYYY-MM-DD-1?

That's a very good question. Naive answer would be YYYYMMDD.1, but that can no longer be compared across different repositories. It seems to me that it can't be solved with the scheme at all, as any local suffix will break cross-repository comparison, and complicating the scheme by adding more time resolution would hinder its adoption.

Actually, most repositories have local package revisions which could be used for this purpose. I guess the scheme should suggest using revisions, while libversion could handle snapshots specially and ignore everything past the date. This is OK, since the special handling would only be required in libversion, all local algorithms will still be OK with handling suffixes normally.

@AMDmi3 do you plan to create a page with suggestions for package maintainers like this?

I think this is a great initiative to align versioning in software repositorys! Have you invited the packaging community of the major repositorys? Some more opinions and ideas might be helpful and they might be more willing to adopt this when they had a chance to participate in the discussion.

Not yet. I'm sure this topic will come up when repology is used by more people.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Aug 21, 2019

Returning to this, alternative solution would be for individual repositories to convey information on that they are packaging a snapshot. As soon as we have this flag and a snapshot date, we could compare snapshots specially by comparing dates instead of versions.

It could be further improved:

  • Introduce a grace period, and don't consider snapshots older than a freshest one by less than a, say, some weeks as outdated. This would prevent Repology from encouraging races and too frequent updates (which IMO is bad)
    • May also take this period (or introduce another, longer one) from current time instead of the latest snapshot time, to encourage eventual infrequent snapshot updates (e.g. to latest commit or to the latest existing snapshot)
  • Allow official releases with accurate date available to outdate all snapshots immediately.

After repology/repology-rules#20 is done (not even started yet), we'll have all snapshots which use date version marked up, so we can extract this information from them. However if any repository wishes to convey this information directly, it's most welcome and could be used right away.

Repology would need, roughly,

  • an indication that the package is a snapshot. It itself is enough to not introduce fake versions and upset users.
  • a snapshot date (since we're going to use grace period, it doesn't need to be accurate and YYYYMMDD is quite enough with any time zone; however need to note that some repos, namely openSUSE, provide snapshot versions with second accuracy (ISO8601 time format or epoch seconds))
  • last official version before snapshot

There are multiple ways to convey this data. The simpliest one would be to just use date suffix to the version (1.2.3.20190101) like most repositories already do (however it needs to be used consistently) and introduce a snapshot flag. This would be enough for Repology to handle snapshots consistently.

@blshkv
Copy link

blshkv commented Aug 22, 2019

Gentoo has a very clear policy:
https://wiki.gentoo.org/wiki/Project:ComRel/Developer_Handbook/Ebuild_policy

foo-x.y_preYYYYMMDD.ebuild
foo-x.y_pYYYYMMDD.ebuild

BUT ;-) there is an exception when the upstream did not release any version and x.y is not specified. In this case, the foo-YYYYMMDD.ebuild is used.
I could not find any place for the "snapshot" flag.

So as a generic rule, you can search for the suffix YYYYMMDD.ebuild

@AMDmi3
Copy link
Member Author

AMDmi3 commented Aug 22, 2019

Gentoo's policy is no better than other repositories using random suffixes - it mixes up with upstream versions using p with snapshots, it allows pre with nonexisting upstream versions, and YYYYMMDD is indistinguishable from upstream versions looking the same way.

@mikhailnov
Copy link
Contributor

I disagree with "-" ideas. For long file names it might break to a second line in some WM and it will become unreadable. Also, these two extra chars do no bring any value.

And also - is a separator equal to . in RPM, it will split the version in not needed places and lead to part-by-part comparing of components of the date and what goes after it instead of working with the whole date.

@ldv-alt
Copy link

ldv-alt commented Aug 19, 2020

FYI, in ALT we promote the following versioning scheme of git snapshots which is based on the idea implemented in https://git.savannah.gnu.org/cgit/gnulib.git/plain/build-aux/git-version-gen (which in turn is used in many projects):
If "git describe --abbrev=1" of the upstream commit is VERSION-NUMBER-gHASH, then the package version has to be VERSION.0.NUMBER.HASH .
Simples!

@AndersonTorres
Copy link

AndersonTorres commented Sep 7, 2021

Apologies per necro-bumping!

I will mark myself here, because we at Nixpkgs are struggling at a similar problem. The format I am using is something like x.y.z+unstable=YYYY-MM-DD, however it is still in "brainstorm phase".

(late edits to reflect the current state - thanks @davidak for the reminder)

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 8, 2021

@ldv-alt there's a rule back from 2018 which marks that specific scheme as incorrect. Thankfully that scheme hasn't gained wide adoption, as it's horrible in all aspects: not separating upstream and snapshot parts, needlessly long and uses commit hashes. Also violates RPM version policy. Actually, the whole sisyphus is currently pessimized for providing intolerable amount of fake versions (apart from snapshots, for which there's also nothing close to a single format).

@AndersonTorres that's good, but as far as I can see, YYYY-MM-DD scheme is still prevalent.

@davidak
Copy link
Contributor

davidak commented Sep 8, 2021

I think there is still no decision which format should be used in NixOS. It would be great if this issue results in a recommendation.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 8, 2021

The recommendation is in the issue body.

@ldv-alt
Copy link

ldv-alt commented Sep 8, 2021

Also violates RPM version policy.

@AMDmi3 Please elaborate.

@mikhailnov
Copy link
Contributor

mikhailnov commented Sep 9, 2021 via email

@ldv-alt
Copy link

ldv-alt commented Sep 9, 2021

@mikhailnov @ldv-alt

It is mentioned in ALT own docs:
https://www.altlinux.org/Spec#Промежуточные_upstream-релизы

I'm sorry to correct you, but the wiki page you're referencing is not a policy, let alone an RPM policy.

It was mentioned in Fedora packaging guidelines, but it turns out it's now thankfully deprecated.
https://docs.fedoraproject.org/en-US/packaging-guidelines/Versioning/#_traditional_versioning_with_part_of_the_upstream_version_information_in_the_release_field
https://web.archive.org/web/20181211075036/https://fedoraproject.org/wiki/Packaging:Versioning#Prerelease_versions

I'm sorry to correct you, but the Fedora document you're referencing is not an RPM policy.

Anyway, RPM permits the kind of versioning I recommend for use in case of git snapshots, and ALT packaging policies have nothing against it.

Like it or not, but the versioning scheme I recommend for git snapshots has its benefits and its users.
You opposition to this scheme is clear, but I'm respectfully disagree.
Anyway, it's up to distros to choose their packaging policies, and ALT has chosen the scheme you don't like.
Let's agree to disagree on this subject.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 9, 2021

Well, all I'm going to say is that this scheme will never be honored by Repology because it cannot be meaningfully compared neither to upstream, nor to other repositories, nor to other sources such as vulnerability databases.

@AndersonTorres
Copy link

@AndersonTorres that's good, but as far as I can see, YYYY-MM-DD scheme is still prevalent.

I am formulating a RFC to the NixOS community/organization. Until then, the mess will be there.

@ldv-alt
Copy link

ldv-alt commented Sep 9, 2021

Well, all I'm going to say is that this scheme will never be honored by Repology because it cannot be meaningfully compared neither to upstream, nor to other repositories, nor to other sources such as vulnerability databases.

Since versions produced by this versioning scheme are as easy to recognize as versions produced by other versioning schemes, I do not agree that they cannot be meaningfully compared with upstream versions, and you do not compare different snapshots between each other anyway.

BTW, how can you explain the following: https://repology.org/project/hasher-priv/versions ?
Is it the result of "the whole sisyphus is currently pessimized"?

@mikhailnov
Copy link
Contributor

Adding YYYY-MM-DD to the version requires manual work.

Here is an example of how git snapshot can be packaged:
https://abf.io/import/gimagereader/blob/a83f21be3b/gimagereader.spec

%define commit d3cdd00b3e848867d95db28354afc41814d5dd0c
%define commit_short %(echo %{commit} | head -c 5)
Version:	3.3.1
Release:	2.git%{commit_short}.3
Source0:	https://github.com/manisandro/gImageReader/archive/%{commit}.tar.gz?/gImageReader-%{commit}.tar.gz

Release tag consists of 3 parts. When upgrading to a new git snapshot, the first number is increased, when rebuilding an existing snapshot, the last number is increased.

As a package maintainer, I just go to github or another place, study commits history, then copy the commit hash, change it in the spec file, then run spectool -g *.spec && rm -fv .abf.yml && abf put and that's all, I have neither wish nor time to maintain a correct date of the git commit from which the snapshot was build. I would probably maintain it, but it will not help actually anyhow to neither users nor projects like repology (or am I wrong, will it help?).

I think other maintainers have a similar way of thinking and that is why I would not expect a wide adoption of naming schemes which require additional useless work like tracking date.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 10, 2021

Since versions produced by this versioning scheme are as easy to recognize as versions produced by other versioning schemes

No, they are not. Unlike any other snapshot schemes I've seen, they are completely indistinguishable. There is not a single property which can be reliably used to tell them from official versions.

BTW, how can you explain the following: https://repology.org/project/hasher-priv/versions ?
Is it the result of "the whole sisyphus is currently pessimized"?

Yes.

Adding YYYY-MM-DD to the version requires manual work.

I've never required to add YYYY-MM-DD to the version.

Here is an example of how git snapshot can be packaged:

There is no problem with this specific case at all, as
a) It's based upon official version
b) It clearly distinguishable as a snapshot (by presence of git in Release)

For instance, Repology can (and does) safely treat it as the unmodified version, which won't generate nonexisting release, will be marked newest/outdated correctly and can be compared to NVD with release granularity. The lack of date prevents it from being compared with higher granularity, but we don't to that anyway and I don't think we should and will.

However, it is still pessimized in a way that this version will not be treated as a new if it only comes from an RPM distro. Because the above mentioned "not policies" are widely used, there's no telling that the snapshot is based upon a real release, or a fake "next" release as the not policies suggest. There's no way to tell that by Release starting with 0 either, because these are not policies.

@mikhailnov
Copy link
Contributor

Ah, thanks, I think I understood, so if ALT's version-release was VERSION.0.NUMBER.gitHASH instead of VERSION.0.NUMBER.HASH, it would be recognizable as a git snapshot.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 10, 2021

While the other problems with it remain, yes, at least it would be possible to reliably tell that it's not an upstream version. It won't allow to tell it from snapshots which can be compared to upstream though.

@ldv-alt
Copy link

ldv-alt commented Sep 10, 2021 via email

@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 10, 2021

These versions are upstream versions followed by .0.distance.digest suffix
where distance is a decimal number and digest consists of at least 4
hexadecimal digits, so they are clearly recognizable.

No they are not. As can be seen by the link already given above, the probability is quite high for these hexadecimal digits to only consist of decimal digits, making a snapshot indistinguishable from a legal dot-separated numeric version:

1.18.0.27.0405
4.8.0.0.10.1157
2.6.4.0.88.9801
1.0.1.0.8.5087
2.13.0.5.8107
0.12.0.3.4174

Versions like these are used in the wild, in case you wonder.

In some other cases, even if a hash contains [a-f], it's still indistinguishable from legal prerelease or letter-suffixed version:

0.185.0.54.b561
4.06.0.7.100b
4.8.0.7.b352

Unfortunately, such a blanket ban approach makes the whole repology.org untrustworthy.

The very first thing Repology must do to be trustworthy is to prevent garbage from a misbehaving repository to be reported as a new upstream version to all other maintainers, and that we do. As I've already mentioned though, the discussed scheme is not the only and not the main cause for the ban - the amount of random made up versions from Sisyphus is, as the repository is the worst by the number of ignore rules I've had to add and maintain

% grep -R sisyphus repology-rules/900.version-fixes | wc -l
     301

by the number of known incorrect versions

repology=> select repo, count(distinct effname) from packages where versionclass = INCORRECT() group by repo order by count desc limit 10;
       repo       | count 
------------------+-------
 alt_p9           |    90
 alt_p10          |    86
 altsisyphus      |    83
 funtoo_1.4       |    69
 nix_unstable     |    68
 raspbian_testing |    67
 nix_stable       |    67
 gentoo           |    66
 raspbian_stable  |    65
 debian_unstable  |    61
(10 rows)

and by the number of complaints, e.g. cases which actually affect users:

repology=> select count(*) from reports where comment ilike '%sisyphus%';
 count 
-------
    47
(1 row)

So please don't mention untrustworthiness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants
@davidak @blshkv @AMDmi3 @ldv-alt @AndersonTorres @mikhailnov and others