-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simpler #11
Comments
|
I think |
This sounds reasonable. I am 🆗 with those changes. I am wondering about something, though: Since each language implementation defines its own For example, using the default config in the Haskell version I can generate an ID:
But |
Update: The below idea probably doesn't solve anything, since the fundamental problem here is more about integer precision, i.e., unless the number is encoded as a string, there would need to be a suitable data type (
So maybe not. 🤔 But nevertheless, it could be nice to distinguish, when decoding, between invalid IDs, and IDs that are (technically) valid, but too large for the language to handle. Would it be possible, and useful, to support arbitrarily large numbers by introducing a preliminary step such as the following:
The number in my previous example,
With the default alphabet,
These IDs are then combined into the final ID, intercalating the
|
@peterhellberg Cool @miquelfire Question for you. For PHP version, either one of the two extensions are required, but we still use @laserpants Yeah, good thought process. This was actually considered in the very first version of Hashids - what if we split up large numbers and use a dedicated separator. But then the cost slowly started outweighing the benefit:
That's why we kinda rolled back into keeping the lib clean and simple, and letting the user handle upper limits of ints. Edit: And to answer your question about Haskell version generating an ID that JS can't decode, to me it doesn't feel like it should be a library's issue because if the user has two systems (one running encoding via Haskell, the other decoding via JS), they should account for scenarios like these on their end. It would be our problem if we didn't communicate clearly what each language supports. |
Good morning everyone (it's morning here in Japan). Regarding As a further extension, I had considered it might be interesting to "specify a switch that represents the maximum value or range of values during configuration". For example, if you set I will check the new code and specifications when I have time. |
|
Thanks for clarifying this @4kimov. I had a feeling that this is something you'd thought about already. I agree that it should be up to the user to account for these inconsistencies. But doesn't this mean that we still need to make it possible to distinguish between invalid IDs and IDs that fail to decode because of overflow. What if the implementation of
In client code, one could then do:
If this creates a performance hit, then another option is to not do this check in
|
I'm a bit concerned about this turning into an XY-problem, do we even want to be able to decode from any unknown source? My understanding is that you are primarily meant to decode "your own" ids, making the cross language argument a bit less critical. (Since you then fully control what numbers you are encoding into your ids) Also, we are still talking about unsigned integers, right? From the point of view of Go that would be being able to encode/decode Another approach would be to decide (in the spec) on some lowest common denominator for the maximum value all implementations should be able to encode/decode, and then not support languages with smaller (unsigned) integer types. This is not an avenue I'd prefer to go down, but thought I should mention it at least. |
@peterhellberg: One example would be if you have a solution stack where the back-end application generates IDs from (potentially) large integers, and these IDs are then passed via the URL down to a front-end app, written in another language. Clearly, there is a bug here if the back-end is generating IDs from integers that are too large for the front-end to handle, but it is still useful to be able to distinguish between this situation and IDs that are algorithmically invalid. |
This is absolutely genius and perfect. I'm totally on board. Solves precisely the few qualms I had with the previous algorithm (i.e. rough min length handling and growing blocked ID re-generation); and does so brilliantly. Thank you @4kimov! Pure perfection. Removing I'll start re-working sqids-dotnet as soon as this gets finalized. |
@antimon2 Ok, I see. Yes, it sounds like @miquelfire I see, makes sense. Sounds like from 32/64 argument alone, PHP implementation might benefit from exposing @laserpants Before we discuss overflow, I figured it's important to clarify: In the playground when it says "Error: Invalid ID" - it's a very poor description of a message that comes from re-encoding decoded numbers, and seeing that it doesn't match the input ID. Basically a simple way of saying "something went wrong, but we don't know what". @peterhellberg Like @laserpants has mentioned, I've also seen people talk about how they might encode with one language, and decode with another. I wouldn't be surprised to find these use-cases in larger systems. @laserpants Thanks for the code samples, they were helpful. First of all, I don't think the If I were to design a split system like above, and if my decoding microservice would get the "Number out of range" error message, there's nothing I'd be able to programmatically do to address the issue (other than notifying APM / DevOps). That means my encoding microservice went over the boundaries of what my decoding microservice is capable of handling, which theoretically should be my fault for designing it this way. IMO, the proper way to design it would be to set an upper limit on the encoding microservice outside of the library - up to the number the decoding microservice can support. So, the above makes me think that it'd be helpful in finding out what the error is, but not much else. Which kind of circles me back to proper documentation on int limits. I'd appreciate some feedback on this in case I'm missing the point. @aradalvand Thank you for the kind words, and the efforts. |
@4kimov My point was not that you would use a single language, but rather that you would be able to control the range of numeric values encoded into ids used in your own system, even if using more than one language. |
With the logic we're using for the PHP library (I just looked at the code), JavaScript might want the |
@peterhellberg Ah I see, sorry, I misunderstood. Good, thanks for clarifying. @miquelfire Yeah, BigInt in JS is something I might need help with. Just pushed a few more cleanup items:
|
I've reviewed the new code and specifications thoroughly, and I largely agree with them. I also understand and concur with the philosophy that "upper limit of value should be set outside the library." @4kimov There's just one point I'd like to raise regarding the code (and the specifications presented at the beginning of this issue). There's a mention about increasing the min length limit (1_000 for example), and I've noticed the magic number |
@antimon2 Thank you for reviewing. Do you mean that the variable should be set in one place for the purposes of spec's readability & cleanliness, or for the purposes of exporting it to the end-user so they have access to it? |
@4kimov It's the former. I think no need to export it (I forgot to mention it earlier). |
@antimon2 Ah ok. Yeah, that spec repo is optimized for readability, so it'll have a few awkward places I'm sure. I cleaned it up a bit (it's set twice only because I don't want to export it from the library to give the impression it's needed externally). |
@4kimov I misunderstood one thing. The sqids-spec code is merely a specification, so it doesn't necessarily prioritize practicality or performance, correct? One more point to confirm: Is | 7. Increase min length limit from alphabet-length to an arbitrary value It is added "(1_000 for example)," but as I've reiterated, this is just an example, right? Is there a need to set a specific value? My understanding is that the practical upper limit for
|
@antimon2 My train of thought was simple: I can't imagine anyone needing super long IDs. Especially, since the library's goal is to generate short IDs, not long ones. So, I'm not even sure why someone would set it to 1_000 (to me personally, it becomes useless after 6 or 8). We can certainly entertain the idea of raising it or dropping it, but what would be the benefit? |
@4kimov I see no benefit in increasing the limit. I never said "it would be better to increase it." The benefit of removing the check is that "it simplifies the specifications", although it might be so minimalbenefit. I also can't imagine using this tool to generate long IDs. My main points were:
Especially regarding the latter, there used to be a need to check since If the approach is "let's just start with |
I also think it makes sense to just drop the check as it's technically no longer necessary — whereas it used to be in the previous algorithm. Otherwise, any number we choose would be arbitrary and subject to "why this one?". To my understanding, there's not really an actual limit as far as the new algorithm is concerned, so the check just seems like some residue from the previous algorithm, which we could simply get rid of.
Well, the benefit to dropping it is that it would simplify the spec + code, no? |
@antimon2 Sorry, didn't mean to ignore your questions. Ultimately, your last sentence was what I was trying to convey: "let's just start with 1_000 (or whatever number you guys think is more appropriate) and see how it goes". I also like 255 more, because it can fit into
Nope. Was just a nice round number.
I'll answer these two together, as these are the things I'm concerned about:
Anyway, as far as anyone asking "why such a number?" I don't have a problem answering it was chosen to be a reasonable default as a limit. For now, I'll lower the number to At the end of the day, if users find some legit reason to need a bigger limit, it won't be that much work for all of us to raise it to |
@4kimov Thank you for your clear and detailed response. For the Julia implementation I'm working on, I will proceed with Just for reference, in Julia, there is no explicit specification on string length, and it is mentioned that it can handle "a string as long as memory permits". Indeed, if one tries to generate a string of an exceedingly large size (for example, 2^53B = 8PB), the library will reject it and throw an |
ref: [Simpler · Issue #11 · sqids/sqids-spec](sqids/sqids-spec#11)
Good. Thank you all for the comments and the feedback. I'll merge these changes to I'll update the implementations that I've done over the next few days, feel free to join me with yours. Thank you for your continued work and efforts. To make it a bit easier - majority of the changes can be seen in this commit: dbb119a (the rest are minor changes that can be seen in the main branch). P.S: @laserpants I'll close this issue for now, but let me know if you'd like to discuss max integer encoding further. |
@aradalvand @laserpants @antimon2 @peterhellberg Minor FYI: feel free to remove the "uniques" test from the individual repos (if you'd like). I've noticed most of us have been lowering it just so development goes quicker. I've made the spec's uniques test a bit more comprehensive and it runs for a while. Since it tests the algorithm itself, it should be safe to remove from individual implementations. |
Since the initial library has been out for a few weeks, I got to mess around with it a bit more and realized there are a few rough edges we could iron out:
minValue
andmaxValue
are not that useful, let's remove? (cc @peterhellberg)Pros:
Cons:
Unknowns:
Current: https://sqids.org/playground (algorithm code / explanation)
Proposed: https://sqids.org/playground/simpler (algorithm code / explanation)
P.S: Also worth noting that I have no intention of changing the algorithm over and over; I figured this might be cleaner + address some current issues/questions, while we're still technically mostly pre-prod.
Thoughts/feedback?
cc @laserpants @vinkla @niieani @antimon2
The text was updated successfully, but these errors were encountered: