-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError when trying to sort a string of a number and a string #7
Comments
I was expecting this bug report sooner or later... The problem unfortunately is that Python3.x changed how it handles sorting objects of different (and unorderable) type. In Python2.x, if
It was decided that for Python3.x, instead of sorting objects that have no intrinsic order but something arbitrary like the name of the object type, it returns an error. The thinking is that if you are trying sort objects that have no intrinsic order, then you are probably doing something wrong and you should modify your input data. (We could argue if this is the correct approach or not, but either way we must live with this reality). This affects
Because of this fact, Python3.x is going to complain if you try and sort I am reluctant to try and modify I am open to suggestions on how to approach this problem. Perhaps you have a more specific example of what your problem is that can more easily be solved? Otherwise, I am afraid that the best solution is to think about what data you are actually trying to sort (as apparently the Python developers feel is the better approach). It might be possible that for this particular set of data |
We are sorting user input, names of files/directories. They are best off naturally-sorted — something Windows XP, Mac OS X and other modern OSes do. They can be anything. |
I don't understand the problem. If you want the same behaviour as in python 2, then just make it int < float < string, it's not arbitrary (or at least not more arbitrary than it was in python 2 :-) |
Unfortunately, implementing the Python2.x behavior on Python3.x will not be easy. The comparisons between the objects are done inside the Python3.x also did away with the |
You can use key(). For example, you can always compare tuples. The behaviour of sorted() is to compare elements left-to-right, and decide on the first one that's not equal. So, to compare floats and strings, you could compare ('string', '1') to ('float', 2.0). You can even generate the first element by using type() |
Are you saying that I should take the type of the first element in the tuple and place that in front? I ask because |
I think the first option you give would make it work pretty much like python 2 :-) |
Another option might be to make the following transformations: I'll give both options a try, and see which is most efficient. Thanks for all your input! |
OK, I have uploaded a new version of natsort (3.2.0, commit e69383f) that passes the unit tests that I have created to stress the unorderable types issue. Thanks for all the suggestions! I was definitely going to try to solve this in a much more convoluted manner before this discussion. Can you please try this out and let me know if it solves your problems? |
It looks sane: >>> natsort.natsorted(['a2', 'a1', 'a20', 'a11', 'a', '1', '11', '2'])
['1', '2', '11', 'a', 'a1', 'a2', 'a11', 'a20'] The same output is produced by natsort 3.1.1 in Python 2.7. All seems well. |
@SethMMorton cool, thanks for fixing this! |
I'm sorry to say this issue is not fixed, at least for certain pathological cases: For the minimal test-case: import natsort
def go():
print("Natsort version:", natsort.__version__)
data = ['c1 -', 'c1-1']
data = natsort.natsorted(data)
if __name__ == "__main__":
go() The output is:
Under python 2.7:
|
I don't have an opportunity to fix now (I'm at work and don't have access to python3), but I know what is going on and will be able to fix it (the previous fix only solved the problem if it occurred at the front of the string, but this issue is in the middle of the string). Having said that, in your case you might reconsider what it is that you are trying to sort, and consider using one of the natsort options to tell it to interpret your data differently. As you well know because you read the documentation, natsort assumes by default that the The reason I suggest that the problem is less the >>> a = ['c1 - 4', 'c1 -', 'c1-5', 'c1-1']
>>> natsort.natsorted(a)
['c1-5', 'c1-1', 'c1 -', 'c1 - 4'] I'll wager that you would not have wanted the >>> natsort.natsorted(a, signed=False)
['c1 -', 'c1 - 4', 'c1-1', 'c1-5'] The In summary, I will fix natsort so that the behavior on python3.x mimics that on python2.x, but I would reconsider the data that you are trying to sort and possibly pre-process it so that you are comparing apples to apples when the sorting occurs. |
I have made the fix. Using your test file, I get the following output:
Unittests have been added to test this functionality. |
Cool. To be honest, for certain edge-cases like this, I don't care too much if the sort is correct, just that it actually sorts. In this case I'm sorting file-names for presentation to the user, and if they have to deal with a slightly awkward sort order because they're using bizarre ordering, that's fine. |
I'm not sure if you're interested, but I wrote a little fuzz-tester to stress test import string
import natsort
import random
import traceback
random.seed()
def go():
loops = 0
while 1:
stLen = random.randint(3,20)
str1 = random.sample(string.printable,stLen)
str2 = str1.copy()
for dummy_x in range(random.randint(1,5)):
str2[random.randint(0,stLen-1)] = random.sample(string.printable, 1)[0]
str1 = "".join(str1)
str2 = "".join(str2)
try:
natsort.natsorted([str1, str2])
except KeyboardInterrupt:
print("Breaking")
break
except:
traceback.print_exc()
print("Str1 '%s'" % str1)
print("Str2 '%s'" % str2)
break
loops += 1
if loops % 5000 == 0:
print("Looped", loops)
if __name__ == "__main__":
go() It basically generates a random string, permutes a copy of the string, then tries to sort the two strings. 3 million sorted items, and no crashes yet. It's not ideal, but it should probably turn up edge-cases like the one I encountered eventually. |
I really like that. As of right now, I use doctest to unit test natsort, and things are starting to get a little out of hand. I've been meaning to switch to a legit unit test framework at some point, and with the inclusion of a stress-tester like this natsort could really be rock-solid. I really appreciate you writing this and letting me know... I can sleep better knowing that it hasn't failed after about 3 million items. I can add it to a more proper test suite by just copying your example, or if you want to make a pull request that's cool too. |
I actually left it running for a while, and it made it to ~35 million items before I got bored and stopped it. It doesn't seem like it's really worth bothering pull requesting for, just feel free to copy & paste it wherever. I'm not sure how one would integrate a actual random-fuzzer into a unit-testing system. They're kind of intrinsically non-repeatable, so they're not guaranteed to catch issues. I guess you could set it up so it just runs a number of iterations. You would need some mechanism to log the string that triggers failure, though. |
The fix for issue #7 (preventing the "unorderable types" TypeError between numbers and strings on Python 3) can raise an "unorderable types" TypeError between str and bytes on Python 3 if LOCALE is enabled. This is because the strxfrm function converts str to bytes when performing the string collation. This is because the empty string that is inserted for numbers is of the str type but did not get sent to strxfrm and thus did not get converted to bytes. The fix was to add either an empty bytes string or str string depending on if LOCALE is enabled or not. This fixes #22.
Python 3.3 and Python 3.4 fail to sort collections that contain strings that are also numbers:
This works just fine in Python 2.7.6.
via getnikola/nikola#1275 (cc @devurandom)
The text was updated successfully, but these errors were encountered: