Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glyphs file format #332

Closed
schriftgestalt opened this issue Mar 24, 2018 · 30 comments
Closed

Glyphs file format #332

schriftgestalt opened this issue Mar 24, 2018 · 30 comments

Comments

@schriftgestalt
Copy link
Collaborator

I’m working on a future version of Glyphs. That will bring some changes to the file format. I will help implementing the changes in glyphsLib in time.

  • The axes will be stored in dedicated key and the weight/width/custom keys will be gone.
  • One big change will be a more compact format to store outlines. Each path will store the nodes in a string similar to a SVG path.
  • There are some structural changes mostly about font info. Will publish that as soon as it is finalised.
  • next to the .appVersion, key, there will be a .formatVersion = 2 key. glyphsLib could already look for that key and stop processing if .formatVersion > 1.

Do you have any suggestions/requests about the file format?

@belluzj
Copy link
Collaborator

belluzj commented Mar 24, 2018

Having worked with format v1, here are my recommendations:

  • Please use a standard & currently supported format like JSON/XML, and do not add any tricks or extensions to it (like the quotes that are sometimes there or not). Also about quotes, if we were able to use a standard parser/serializer from the python stdlib, all the quoting would be handled automatically.
  • Please use a standard date format as well, and don't try to drop the timezone sometimes to save bytes.
  • For the outline data, I think relying more on strings goes in the wrong direction, instead I would suggest that each node in the path should be an object with attributes "x", "y", "smooth", "userData", "hints" etc. That would reduce considerably the number of exceptions that currently exist around the userData of a single node, for example.
  • Same for all the data elements that are an object or array disguised as a string, like the list of unicodes or some other lists of numbers that currently look like "{1,2}". Make them real arrays. That would suppress several double parsing steps.
  • In general, I think life would be easier for everyone involved if everything that is not part of Glyphs' "core business" of font editing was standard. Like, we could spend time on interpolating smart components instead of trying to guess whether some numbers are hexadecimal or base64.
  • And since a lot of changes to the current format are driven by "saving bytes", maybe provide an option to gzip the output? For example have ".glyphs" which is standard plain text JSON/XML and ".glyphz" which is the same plain text but gziped? That way, saving bytes does not come at the cost of working around custom parsing stuff, and can be implemented trivially using stdlib components of any language.

@schriftgestalt
Copy link
Collaborator Author

Many thanks for the comments. You are right about the exceptions. I try to get rid of them.

@schriftgestalt
Copy link
Collaborator Author

I have played around with json a bit. It produces just a bit bigger files (150MB as .glyphs file) when written as minified. Writing nodes as objects, produces 80% bigger files. ;( For big CJK files like this, the raw file size has a big impact on the saving speed. And another 100MB will take some time.

Zipping helps with the file size, for sure but it is quite slow. Only zipping the .json file takes 5 second.

I need the file writing to be really quick, as it happens quite often for autosaving. And waiting for 2 seconds every 10 or 30 second is "annoying".

@belluzj
Copy link
Collaborator

belluzj commented Mar 25, 2018

Hum, maybe try faster/more compact standard formats like:

Also there are faster compression algorithms than gzip:

Also about the wait time, it might be possible to "mask" it by running the serialization/compression in a different thread than the UI? (I don't know how Glyphs work internally so maybe you already do that)

@schriftgestalt
Copy link
Collaborator Author

Protobuff looks very promising but has the same problem than .zip. It doesn’t play well with git.

I will investigate this and will make it as compatible as possible.

@madig
Copy link
Collaborator

madig commented Mar 26, 2018

Please also consider or give the option to write out the file into a UFO-like file-directory. The biggest annoyance in my design work is that I have to use external tools to split the changes up. With UFOs, I can stage single files (= glyphs). Doing it this way may also reduce the writing pressure, as you don't have to re-dump those 150 MB again and again -- just the glyphs that changed ;) This would make the format super git-friendly.

Edit: Maybe this would even kill two stones with one bird: file directory for normal design work, compressed single file for e-mailing it to someone?

@madig
Copy link
Collaborator

madig commented Mar 26, 2018

Actually, I have to word this more strongly: going the file-directory route is essential when more than one person is working on a font. You don't want to make font designers deal with merge conflicts just because something unrelated to their changes flipped around.

@schriftgestalt
Copy link
Collaborator Author

I heard that there are plans to have a one file approach with .ufo. The multiple file route has advantages but also problems. Both have to do with tools that use the files. Git is a bit easier but Dropbox sucks.

@madig
Copy link
Collaborator

madig commented Mar 26, 2018

It might make sense to give people the option to do both -- e.g. give them a ".glyphsdir" and ".glyphs" save option so they take whatever suits their workflow best. Both formats should just be different ways to save the exact same information.

@schriftgestalt
Copy link
Collaborator Author

Or, what I wanted to add fix the tools ;)

@madig
Copy link
Collaborator

madig commented Mar 26, 2018

I suggest fixing Dropbox then 😁

@davelab6
Copy link
Member

davelab6 commented Mar 27, 2018 via email

@schriftgestalt
Copy link
Collaborator Author

I would love to use any of the Proto things. But then git is out of the door.

@madig
Copy link
Collaborator

madig commented Mar 27, 2018

I guess you can use text representations if you really want to? https://stackoverflow.com/questions/18873924/what-does-the-protobuf-text-format-look-like

But then you might as well experiment with JSON some more I guess. What if you represent nodes as simple lists with the line drawing operations represented as numbers? What did your node objects look like?

@schriftgestalt
Copy link
Collaborator Author

I played around a bit. It seems like that plist is still the best choice. It is very similar to JSON just uses some different characters. It can be ‘minified’ and doesn’t need quotes.

My current approach is to use shorter keys and collaps the element objects to one line. I thought about using list for the node, too. That would mean I could just replace the quotes on the node string with parentheses ;)

@madig
Copy link
Collaborator

madig commented Mar 27, 2018

Hm. Can you paste a snippet that shows how standard JSON increases file size by 80%? Really curious to see what the difference is.

@anthrotype
Copy link
Member

It seems like that plist is still the best choice

really? you mean xml plist, or your own customized NeXTStep-like ascii plist?
It's would be so much easier if you opted for some standard serialization format.

@schriftgestalt
Copy link
Collaborator Author

Writing nodes as objects, produces 80% bigger files.

It was not about JSON but about writing "proper" node objects instead of the "string" format that is currently used. Something like this:"476 99 LINE SMOOTH" > {x = 476; y = 99; type = "line smooth"} (the type could be made a bit better, i know)

I checked again with some smaller and bigger files. After minifying it the same way the difference between plist and JSON is not that much, actually. So the JSON don't look that bad after all.

you mean xml plist,

That produces huge files in comparison.

@madig
Copy link
Collaborator

madig commented Mar 27, 2018

You could try assigning integers to types. That would save a few bytes and maybe up parsing performance?

@madig
Copy link
Collaborator

madig commented Apr 24, 2018

One more thing: adding a fixed version number to the file might be a good idea, seeing as build numbers overlap between stable and dev releases (#352 (comment)). So, every time you make even the tiniest change to the file format, you increase the number. That would make it much easier for everyone and you to correctly parse things now and in the future.

@schriftgestalt
Copy link
Collaborator Author

What about 3.1-1203? Or is that to complicated?

@madig
Copy link
Collaborator

madig commented Apr 24, 2018

Well, if there's a difference between 3.1-1203 and 3.1-1204, you might as well call it 3.2 ;)

This would require that you really only change the file format if it is necessary for a feature or it gets unwieldy quickly.

@schriftgestalt
Copy link
Collaborator Author

I do not change the version number for each build or cutting edge release.

@madig
Copy link
Collaborator

madig commented Apr 24, 2018 via email

@madig
Copy link
Collaborator

madig commented Sep 12, 2018

I suppose this can be closed, as the format won't change soon?

@madig madig closed this as completed Sep 12, 2018
@schriftgestalt
Copy link
Collaborator Author

I’m heavily working on the new version. I hope I have an early beta by the end of the year.

@madig
Copy link
Collaborator

madig commented Sep 12, 2018

Alright, then I'll reopen this? Please keep us in the loop about any format change :)

@schriftgestalt
Copy link
Collaborator Author

schriftgestalt commented Sep 29, 2020

I just made a JSON schema for the Glyphs 3 file format.
https://github.com/schriftgestalt/GlyphsSDK/tree/Glyphs3/GlyphsFileFormat/validator
The file will still be plist but the script demonstrates how to use it. I hope that makes it easier to implement the new format.

I can help with implementing the new format. It would help a lot if someone could give me some hints how it would best fit in the code base.

@madig
Copy link
Collaborator

madig commented Sep 30, 2020

Interesting. Do you use the same schema internally in G3? I'm still a bit sad that we will still have to have a plist parser :(

@schriftgestalt
Copy link
Collaborator Author

I don’t use it internally as I just made it. I’ll use it in my test as it already found some problems. And also to keep the file spec up to date.

The parser is there and works well. When building the schema I wrote a bunch of JSON and it has it quirks (leave a comma after the last element and it will not load, sometimes without a proper error message).

And I removed deviations from the plist format (unicode is stored as plain integers now).

@moyogo moyogo closed this as completed Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants