Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python integers turn into Numpy integers when fetched. #690

Closed
mohammadbashiri opened this issue Nov 6, 2019 · 11 comments
Closed

Python integers turn into Numpy integers when fetched. #690

mohammadbashiri opened this issue Nov 6, 2019 · 11 comments

Comments

@mohammadbashiri
Copy link

In my table exist a non-primary column of type longblob that stores dictionaries. Inside the dictionary some of the values are Python integers. However, when fetched, these Python integers turn into Numpy integers. It would be helpful if the types are kept the same at insert and fetch time.

DataJoint version = 0.12.0

Below is a toy example:

@schema
class MyTable(dj.Manual):
    definition = """
    prim_key: varchar(32)
    ---
    attr: longblob
    
    """

MyTable().insert1(dict(prim_key=0, attr=dict(a=1)))

print(type(MyTable().fetch1()['attr']['a']))

# the output is <class 'numpy.int64'>
@dimitri-yatsenko
Copy link
Member

That's true. Python 3 integers are of indefinite length and we chose to encode them as 64-bit integers. Hm.. Let's think if it makes sense to encode them as indefinite integers.

@dimitri-yatsenko
Copy link
Member

We are also missing the native float and complex. Hm.. Is it okay to just rely on the numpy datatypes?

@mohammadbashiri
Copy link
Author

mohammadbashiri commented Nov 19, 2019

This problem came to my attention when I was using a DJ table to store the configuration (i.e., args and kwargs) for training a neural net. And I was getting an error for the batchsize - PyTorch will only accept native integer as batchsize value. Now, of course the way to get around that is trivial, and for the most cases Numpy vs native int does not even matter, but I think the main issue I am pointing to here is the inconsistency between what goes in and out of a DJ table, and its dependency on the assumptions about how other third-party libraries interpret datatypes. This potentially results in changing code for other non-DJ-related parts because of DJ.

At the moment we are using a function that gets the fetched entry from the DJ table and looks for any Numpy int and changes it to native int, which avoids changing the code for other non-DJ-related parts.

@guzman-raphael
Copy link
Collaborator

@mohammadbashiri Thank you for your report. While we work to accommodate such use cases, it might interest you to have a look at Adapted Types: a new feature released in preview in 0.12.x. It allows you to cast any type you wish to an appropriate DJ type.

Though formal documentation is still forthcoming, you may quickly see how it works by following along this Jupyter Notebook: https://github.com/datajoint/dj-python-101/blob/master/ch1/Adapted-Types.ipynb. Since the feature is in preview, do make sure to set the environment variable DJ_SUPPORT_ADAPTED_TYPES=TRUE prior to working with the feature.

@ixcat
Copy link
Contributor

ixcat commented Nov 19, 2019

@guzman-raphael adapted types is a good point, but this issue is pointing at more lower-level serialization concerns w/r/t blobs directly that may need investigation as well.

@guzman-raphael
Copy link
Collaborator

@ixcat Certainly agree. Merely offered it as a suggestion here to fill a gap for the short term while we prioritize this issue into our development schedule.

@dimitri-yatsenko
Copy link
Member

ok, let's add support for the native int, float, and complex.

@dimitri-yatsenko
Copy link
Member

same thing happens to bool

@dimitri-yatsenko
Copy link
Member

Here is the big question. For native int, do we need to support very large values that don't fit in int64?

@eywalker
Copy link
Contributor

While I personally feel that int64 is good enough, would be curious to see what would actual unbounded int support on serialization would look like

@dimitri-yatsenko
Copy link
Member

length following by bytes. It would just require encoding the length first. I feel that few would get mad if we raised an error for numbers that don't fit in int64.

guzman-raphael added a commit that referenced this issue Jan 14, 2020
fix #690 -- blob packing/unpacking of native python bool, int, float, and complex.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants