Python integers turn into Numpy integers when fetched. #690

mohammadbashiri · 2019-11-06T15:06:23Z

In my table exist a non-primary column of type longblob that stores dictionaries. Inside the dictionary some of the values are Python integers. However, when fetched, these Python integers turn into Numpy integers. It would be helpful if the types are kept the same at insert and fetch time.

DataJoint version = 0.12.0

Below is a toy example:

@schema
class MyTable(dj.Manual):
    definition = """
    prim_key: varchar(32)
    ---
    attr: longblob
    
    """

MyTable().insert1(dict(prim_key=0, attr=dict(a=1)))

print(type(MyTable().fetch1()['attr']['a']))

# the output is <class 'numpy.int64'>

The text was updated successfully, but these errors were encountered:

dimitri-yatsenko · 2019-11-06T17:45:54Z

That's true. Python 3 integers are of indefinite length and we chose to encode them as 64-bit integers. Hm.. Let's think if it makes sense to encode them as indefinite integers.

dimitri-yatsenko · 2019-11-18T23:23:55Z

We are also missing the native float and complex. Hm.. Is it okay to just rely on the numpy datatypes?

mohammadbashiri · 2019-11-19T08:21:22Z

This problem came to my attention when I was using a DJ table to store the configuration (i.e., args and kwargs) for training a neural net. And I was getting an error for the batchsize - PyTorch will only accept native integer as batchsize value. Now, of course the way to get around that is trivial, and for the most cases Numpy vs native int does not even matter, but I think the main issue I am pointing to here is the inconsistency between what goes in and out of a DJ table, and its dependency on the assumptions about how other third-party libraries interpret datatypes. This potentially results in changing code for other non-DJ-related parts because of DJ.

At the moment we are using a function that gets the fetched entry from the DJ table and looks for any Numpy int and changes it to native int, which avoids changing the code for other non-DJ-related parts.

guzman-raphael · 2019-11-19T15:06:23Z

@mohammadbashiri Thank you for your report. While we work to accommodate such use cases, it might interest you to have a look at Adapted Types: a new feature released in preview in 0.12.x. It allows you to cast any type you wish to an appropriate DJ type.

Though formal documentation is still forthcoming, you may quickly see how it works by following along this Jupyter Notebook: https://github.com/datajoint/dj-python-101/blob/master/ch1/Adapted-Types.ipynb. Since the feature is in preview, do make sure to set the environment variable DJ_SUPPORT_ADAPTED_TYPES=TRUE prior to working with the feature.

ixcat · 2019-11-19T15:32:05Z

@guzman-raphael adapted types is a good point, but this issue is pointing at more lower-level serialization concerns w/r/t blobs directly that may need investigation as well.

guzman-raphael · 2019-11-19T15:35:36Z

@ixcat Certainly agree. Merely offered it as a suggestion here to fill a gap for the short term while we prioritize this issue into our development schedule.

dimitri-yatsenko · 2019-11-19T17:41:07Z

ok, let's add support for the native int, float, and complex.

dimitri-yatsenko · 2019-11-22T20:57:04Z

same thing happens to bool

dimitri-yatsenko · 2019-11-22T21:48:29Z

Here is the big question. For native int, do we need to support very large values that don't fit in int64?

eywalker · 2019-11-22T22:09:54Z

While I personally feel that int64 is good enough, would be curious to see what would actual unbounded int support on serialization would look like

dimitri-yatsenko · 2019-11-22T22:42:09Z

length following by bytes. It would just require encoding the length first. I feel that few would get mad if we raised an error for numbers that don't fit in int64.

fix #690 -- blob packing/unpacking of native python bool, int, float, and complex.

guzman-raphael closed this as completed in 681fb97 Jan 14, 2020

guzman-raphael added a commit that referenced this issue Jan 14, 2020

Merge pull request #709 from dimitri-yatsenko/master

a9aad89

fix #690 -- blob packing/unpacking of native python bool, int, float, and complex.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python integers turn into Numpy integers when fetched. #690

Python integers turn into Numpy integers when fetched. #690

mohammadbashiri commented Nov 6, 2019

dimitri-yatsenko commented Nov 6, 2019

dimitri-yatsenko commented Nov 18, 2019

mohammadbashiri commented Nov 19, 2019 •

edited

Loading

guzman-raphael commented Nov 19, 2019

ixcat commented Nov 19, 2019

guzman-raphael commented Nov 19, 2019

dimitri-yatsenko commented Nov 19, 2019

dimitri-yatsenko commented Nov 22, 2019

dimitri-yatsenko commented Nov 22, 2019

eywalker commented Nov 22, 2019

dimitri-yatsenko commented Nov 22, 2019

Python integers turn into Numpy integers when fetched. #690

Python integers turn into Numpy integers when fetched. #690

Comments

mohammadbashiri commented Nov 6, 2019

dimitri-yatsenko commented Nov 6, 2019

dimitri-yatsenko commented Nov 18, 2019

mohammadbashiri commented Nov 19, 2019 • edited Loading

guzman-raphael commented Nov 19, 2019

ixcat commented Nov 19, 2019

guzman-raphael commented Nov 19, 2019

dimitri-yatsenko commented Nov 19, 2019

dimitri-yatsenko commented Nov 22, 2019

dimitri-yatsenko commented Nov 22, 2019

eywalker commented Nov 22, 2019

dimitri-yatsenko commented Nov 22, 2019

mohammadbashiri commented Nov 19, 2019 •

edited

Loading