-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve efficiency of entry-point parsing #283
Comments
You need to decide how well you want to handle malformed metadata (is misparsing them OK? is throwing some random unhelpful error OK?); obviously the more you want to take care of them the slower things will be. If you are OK with "we make no guarantees for malformed metadata", then the second patch at #281 (comment) should basically be usable as-is. Likewise, it is up to you to decide how to handle distributions with different non-normalized names but identical normalized names. If you're fine with confusing them, then the first patch in the linked issue should likewise be close to usable. |
Given that most metadata is mechanically generated, I'm okay with weak error handling, but I also have a good deal of respect for regularity in parsing. That is, I'd like to avoid routines that are heavily imperative and difficult to reason about. The normalized-names challenge concerns me more, mainly because it's going to demand consideration for distributions that don't present normalized names at all. It's currently not part of the protocol, but simply an implementation detail of PathDistributions that they have a normalized name. Distributions from another source might not have a normalized name at all. I'm less worried about uniqueness variance between PathDistributions with normalized and non-normalized names. If there's a difference, that's going to cause trouble elsewhere. Also, by relying on the normalized name as found in the filesystem, it adds a new dependency on that form, making it more difficult to later change that implementation detail. The only specified, reliable place to retrieve the name is through the metadata. This makes me wonder if maybe there's another approach that could optimize the loading of the proper name from the distribution. I think it's anything but straightforward. |
OK, let's deal with the simpler (first) problem first. I'd say the simple parser I wrote in the other thread is really as simple as it gets ("skip non-empty lines; if a line starts with a bracket it's a new group (record it); else it should have shape '{key} = {value}' and corresponds to a |
In #281, this project added support for uniqueness of distributions when parsing entry points. This change introduced some degradation in the performance when parsing entry points (due to need to load/inspect the metadata for every project). In that PR, a couple of suggestions were made to improve the performance:
Rely on a custom, optimized parser instead of ConfigParser for parsing the entry points themselves.Let's consider those two suggestions.
The text was updated successfully, but these errors were encountered: