Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continued retagging of the albumtype field (and potentially other fields) #4715

Open
JOJ0 opened this issue Mar 17, 2023 · 19 comments
Open

Continued retagging of the albumtype field (and potentially other fields) #4715

JOJ0 opened this issue Mar 17, 2023 · 19 comments

Comments

@JOJ0
Copy link
Member

JOJ0 commented Mar 17, 2023

After merging of #4582 there is a related issue remaining that was intentionally not addressed in that pull request.

We still see continued retagging of the albumtype (singular) field in certain cases.

Note: This bug is not about the ['a', 'l', ...] problem! This has been fixed in #4582 already! You might still have broken data in your database and thus see it. Follow this description to clean up your library: #4582 (comment)

The problem

The MusicBrainz API provides albumtypes and albumtype separately, but beets tries to derive the albumtype when reading files as being the first element of albumtypes. This can lead to the problem where beet write tries to modify the albumtype every time.

If an album has multiple types the MusicBrainz API seems to indicate that one of those is the "primary" type, and beets stores that in the db as the albumtype, but parsing the ID3 tag of the file it seems like it just uses the first element of albumtypes as the albumtype.

Writing the albumtype of those files doesn't actually write the file's tag, so beet write will always report trying to write it. It will update the timestamp of the file though, which could confuse other tools, making them "think" the file received relevant changes.

Reproduction of the problem

Timestamp of file is 10:34

$ ls -l "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac"
-rw-r--r--  1 jojo  staff  46857531 Dec 19 10:34 /Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac

Albumtype file tag is album / compilation:

$ mediainfo "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac" | grep -i type
Cover type                               : Cover (front)
RELEASETYPE                              : album / compilation
MUSICBRAINZ_ALBUMTYPE                    : album / compilation

which in reality is two values, each being set to a single tag:

$ mutagen-inspect "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac" | grep -i type
RELEASETYPE=album
RELEASETYPE=compilation
MUSICBRAINZ_ALBUMTYPE=album
MUSICBRAINZ_ALBUMTYPE=compilation

Now writing shows changes but they only show a single value instead of both values:

$ beet write body heat
Fockewulf 190 - The Dark Side of Italo Disco - Body Heat (vocal) - src:$data_source |Pop
  albumtype: album -> compilation

The file's timestamp gets updated:

$ ls -l "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac"
-rw-r--r--  1 jojo  staff  46857531 Dec 19 10:56 /Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac

The file's tag is still the same, beet write shouldn't have bothered to update the file!

$ mediainfo "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac" | grep -i type
Cover type                               : Cover (front)
RELEASETYPE                              : album / compilation
MUSICBRAINZ_ALBUMTYPE                    : album / compilation

$ mutagen-inspect "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac" | grep -i type
RELEASETYPE=album
RELEASETYPE=compilation
MUSICBRAINZ_ALBUMTYPE=album
MUSICBRAINZ_ALBUMTYPE=compilation

Setup

  • OS: ---
  • Python version: ---
  • beets version: 1.6.1 (dev)
  • Turning off plugins made problem go away (yes/no): no

My configuration (output of beet config) is:

not relevant

Related iterations of this bug

We might have a related issue if not exactly the same issue with other multi-valued fields (mb_albumartistids, ...) that have a single representation of that same field mb_albumartistid, ...). The logic in mediafile is identical. A detailed report of a user is found here: #5045

@judemille
Copy link

judemille commented Mar 18, 2023

Writing the albumtype of those files doesn't do anything

That's not entirely true. When beets does this rigmarole, it changes the file's timestamp, which will cause tools like Nextcloud to go through the whole procedure as if there's a whole new version of the file (Nextcloud, and WebDAV-based tools in general IIRC, do not allow incremental file updates.)

@JOJ0
Copy link
Member Author

JOJ0 commented Mar 18, 2023

Thanks a lot, important detail! I'll add that to the description later today!!

@judemille
Copy link

judemille commented Mar 20, 2023

A potential hacky fix could be to copy the albumtype from the file on disk to the tags to be written, so the comparison thinks all is well. Perhaps I'll give that a shot.

EDIT: removed question about what files are affected: inspected mediafile code, conclusion: all files affected. Also changing proposed solution.

@judemille
Copy link

judemille commented Mar 20, 2023

Can confirm that solution fixes the issue. Is it the best solution? Hell if I know. Does it stop the continued retagging? Yes.

E: Really, I think the best solution would be to make the code simply ignore the albumtype field in all cases where it would be trying to write to disk, not just this command, as well as not displaying the extraneous "changes" that would be patched out.

@dr-waterstorm

This comment has been minimized.

@judemille

This comment has been minimized.

@JOJ0

This comment has been minimized.

@dr-waterstorm
Copy link

dr-waterstorm commented Mar 22, 2023

Thank you @JOJ0 @judemille you were right about the issue with the release type. I did not update to GIT version, but only pulled the latest version from the Linux repo before. My apologies for not trying that :). That issue is fixed.

With the release types corrected, now I do observe the issue with he albumtype.
Beets keeps showing vast amounts of

albumtype: album -> compilation

and similar. Checking the file with ffmpeg for the tag still shows the old one, so it did not update the actual tag.

@gamoeri
Copy link

gamoeri commented Apr 24, 2023

I am running into a similar issue - it does not update the albumtype, even if reimported.

@ilmc888
Copy link

ilmc888 commented Jul 8, 2023

I have the same issue. Knowing next to nothing about the code but after some debugging I think the problem lies in mediafile.py (version 0.12 = same as master):

def update(self, dict):
        """Set all field values from a dictionary.

        For any key in `dict` that is also a field to store tags the
        method retrieves the corresponding value from `dict` and updates
        the `MediaFile`. If a key has the value `None`, the
        corresponding property is deleted from the `MediaFile`.
        """
        for field in self.sorted_fields():
            if field in dict:
                if dict[field] is None:
                    delattr(self, field)
                else:
                    setattr(self, field, dict[field])

When the albumtype tag is set correctly in this routine (to e.g., compilation) it gets immediately reverted back to album the next iteration of that loop which happen to set albumtypeS. My guess is that somehow mediafile overrides albumtype with the first entry of albumtypes again (by some operator overloading or such, haven't digged deeper yet).

Albumtype is also defined lower down in mediafile.py as a single field of albumtypes, which I assume will be the first entry of albumtypes:['album', 'compilation'].

    albumtypes = ListMediaField(
        MP3ListDescStorageStyle('MusicBrainz Album Type', split_v23=True),
        MP4ListStorageStyle('----:com.apple.iTunes:MusicBrainz Album Type'),
        ListStorageStyle('RELEASETYPE'),
        ListStorageStyle('MUSICBRAINZ_ALBUMTYPE'),
        ASFStorageStyle('MusicBrainz/Album Type'),
    )
    albumtype = albumtypes.single_field()

@ilmc888
Copy link

ilmc888 commented Jul 8, 2023

Reading some other issues, the cause might already been known :-)

Is there any agreement on how to handle this? Just write albumtype as is, or check whether albumtype is in albumtypes? Both reading and writing.

@JOJ0
Copy link
Member Author

JOJ0 commented Jul 28, 2023

@ilmc888 I'd love to finally get to fixing this but I'm not yet sure what's the best way. If you feel like it help me brainstorm. The following ideas and I didn't check any of it in code:

  • If the tagging source is MusicBrainz we should always ignore the single albumtype field. It should always be written from the multi field albumtypes
  • With other source, like Discogs for example, there is no multi field albumtypes, there will always only be a single albumtype field. It should be written as-is / as received from the Discogs API

Take everything with a grain of salt and double check. Probably I'm just thinking out loud, being too lazy to read code atm (other music coding things in mind right now ;-)) I'd love to get help with tackling that issue, it's been long due....

@ilmc888
Copy link

ilmc888 commented Aug 15, 2023

I might not be aware of all the intricacies yet, but I do understand that Musicbrainz has both albumtype and albumtypes information, but we can have only one tag 'MusicBrainz Album Type'?

Does it make sense to take a look at Musicbrainz Picard and how that information is handled there (just tag one track and ffprobe it)? Those are their own tags after all:
https://github.com/metabrainz/picard
https://picard-docs.musicbrainz.org/downloads/MusicBrainz_Picard_Tag_Map.html

In general I would just think about how it will be used by music player. If albumtype tag can / would be a list (or we make an extra albumtypes tag; if this is even possible), will this actually be used by music players? I suspect not (yet), so this would not be really practically useful?

On the other hand, I agree using the first entry of albumtypes might not be always informative (e.g., often just 'album' instead of 'live', 'compilation', etc.), so I understand why there was some logic included in beets to handle this. Still, I see this more as a flaw of Musicbrainz than anything else and wouldn't mind a one-on-one mapping ignoring albumtypes all together (until there is an accepted solution in the audio community). Including such logic will always be prone to errors (I believe I saw albumtypes of album / live / compilation / soundtrack, what would be the first one?).

Actually not sure if there is extra logic to extract one type from albumtypes? Does MB expose the primary release type (or can we find how they do it in the Picard code?)?

If the tagging source is MusicBrainz we should always ignore the single albumtype field. It should always be written from the multi field albumtypes

Well this seems like a solution to make the albumtype useful, but wasn't the problem that with a beet update you would then have to compare it to the internal albumtypes data (using the same approach to extract it). Otherwise you get the infinite update / write loop. Or just ignore the whole albumtypes in the database. On beet import / mbsync, use the MB albumtypes to extract a useful albumtype for the database and write it to the tag field (ignore the MB albumtype itself). A subsequent beet update will just compare the albumtype one-on-one and thus does not need any change.

This reply turned out longer than I expected, with likely a lot of (false) assumptions, but I hope it helps.

@ilmc888
Copy link

ilmc888 commented Aug 15, 2023

I only had time to give it a brief look, so unsure if I'm looking at the right snippet but
https://github.com/metabrainz/picard/blob/3b20cb2f2d3fb1df14e7d69a96d9280e4f219c15/picard/mbjson.py#L607

Here I see that releasetype, which is presumably written as a tag (see id3.py), is a concatenation of primary and secondary releasetypes. What would be the problem if beets is doing the exact same thing?

And on beet update, if a MB id exist (or a albumtypes is in the DB) compare the MB albumtype tag, which is a list then (again, assumption Im making), to that same concatenation.

@ilmc888
Copy link

ilmc888 commented Aug 17, 2023

@JOJ0 I'm probably overthinking things as it seems the only problem at the moment is writing the right tag to disk. In that case your solution will work fine. I think that change has to be in mediafile.py (?) as there a single album type is extracted from album types.

However, what about just changing the order of album types when pulling the data from MB. Reorder it so the most important type is in front (compilation, live, ...) instead of just album. In that case mediafile.py will just work as is (I think it simply extracts the first element)? Other sources will also just work, and the change is contained within the MB code itself?

@JOJ0
Copy link
Member Author

JOJ0 commented Dec 12, 2023

@arogl you could try quickfixing your issue #5042 by following my ideas in above comment. I still think that (at least) part of the problem could be solved like this: #4715 (comment)

So what I'm trying to say is: Try ignoring the (single) albumtype field in the musicbrainz autotag code and see if that changes anything.

We won't loose the information since musicbrainz also provides a multi field called albumtypes (I'm not sure of the name that comes from the MB api!) which will then be handled by mediafile when we write it to disk (#4715 (comment)). It could be though, that the useless writing of the albumtype file tag will still happen, regardless of the exclusion of the single albumtype field before (when getting it from the api and assigning it to the library objects) - I don't know - It would be super helpful to experiment with that!

Even a dirty quickfix would help us to draft a proper PR at some point!

@JOJ0 JOJ0 changed the title Continued retagging of the albumtype field. Continued retagging of the albumtype field (and potentially other fields) Dec 21, 2023
@celynw
Copy link

celynw commented Jan 3, 2024

Sorry to jump in, but this is driving me crazy so I thought I'd try something to help.
If I'm following correctly, I tried commenting these lines:

beets/beets/autotag/mb.py

Lines 555 to 558 in 296f01b

if "type" in release["release-group"]:
reltype = release["release-group"]["type"]
if reltype:
info.albumtype = reltype.lower()

Behaviour before:

$ beet ls -a -f '$album [$albumtype] [$albumtypes]'
Monstercat 001 - Launch Week [compilation] [album; compilation]
$ beet write
Arion - Monstercat 001 - Launch Week - Cold Blood & Ice Cream Cones
  albumtype: album -> compilation
$ beet write
Arion - Monstercat 001 - Launch Week - Cold Blood & Ice Cream Cones
  albumtype: album -> compilation

Behaviour after:

$ beet ls -a -f '$album [$albumtype] [$albumtypes]'
Monstercat 001 - Launch Week [album] [album; compilation]
$ beet write
$

So it fixes my infinite beet write loop.
I'm on the most recent beets commit, 296f01b (after version 1.6.1)

Based on mediafile, it seems like it's returning the first field for the singular albumtypes, which I also think is supposed to be the primary field in MusicBrainz.

Oh, and in both cases, metaflac tells me that there are two RELEASETYPE tags:

    comment[15]: RELEASETYPE=album
    comment[16]: RELEASETYPE=compilation

@JOJ0
Copy link
Member Author

JOJ0 commented Jan 4, 2024

Thanks @celynw yeah it helps.

I have something very similar in the queue, which basically just ignores albumtype and never uses it anywhere in beets in favor of albumtypes.

It's coded finished but too busy to file the PR this week but hopefully during the next month.

@JOJ0
Copy link
Member Author

JOJ0 commented Mar 14, 2024

Some notes on a very high level for this issue:

  • We need to decide on the source of truth for albumtypes and albumtype, I think it should be saved in the beets lib/db exclusively in one place, I suggest in the multifield albumtypes
  • If we want to derive a single albumtype (which quite a lot of plugins actually currently are using and requiring) we need to find a way to fetch it from albumtypes.
  • The current approach though, if I'm not mistaken, fetches albumtype from albumtypes on a very low level, i.e. it is implemented in beetbox/mediafile.git
  • Another problem of the current implementation is, that we save a single-valuealbumtype from various sources (Discogs, MusicBrainz, Spotify...) in the DB separately, which clashes with the information we have in the multi-field albumtypes
  • Maybe fetching albumtype from albumtypes should be handled on a higher level, i.e. on the beets library level / the database level.

Some of these issues I tried to address here: #5075

but I think that getting rid of albumtype (single-field) altogether is not the best solution. It breaks existing plugins that want to use it. We still need to find a working way of fetching a single (main) albumtype from albumtypes

daniele-athome added a commit to daniele-athome/beets that referenced this issue Aug 15, 2024
daniele-athome added a commit to daniele-athome/beets that referenced this issue Aug 15, 2024
daniele-athome added a commit to daniele-athome/beets that referenced this issue Aug 15, 2024
daniele-athome added a commit to daniele-athome/beets that referenced this issue Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants