pandas insert should use `index=False` to prevent extra `index` field generation #666

ixcat · 2019-10-01T22:33:10Z

No description provided.

ixcat · 2019-10-02T17:57:03Z

conversation snippet:

eywalker:spiral_calendar_pad:  we'd actually want to do to_records() after we reset index, I'd say
index=False simply drops the index field if I recall correctly and that we don't want
so something like rows.reset_index().to_records()
maybe there is also a smart option in to_records() to do this in one step
chris: would reset_index ‘do’ for user-created DataFrames? (or conversely, for DJ fetched ones?)
new messages
eywalker:spiral_calendar_pad:  I'd say so. Most people don't even have any index
and for ones returned by DJ fetch, the DataFrame has the primary keys as multi-index, and this certainly needs to go through reset_index()

dimitri-yatsenko · 2019-11-19T18:49:34Z

Will you please describe the scenario in a bit more detail? When is the bothersome extra index field generated? And why is it a problem?

ixcat · 2019-11-19T20:06:31Z

the below illustrates what this hopes to work but results in a KeyError with 'unknown column index', this should be fixed in some manner which does not break Dest.insert(Source.fetch(format='frame'))

I don't know 100% all of the situations where index is generated (not so familiar with Pandas here), it's a problem because we claim to support pandas insert and dealing with correct indexes will likely be a common scenario, it would be best if we could transparently strip any notion of index and still allow pandas insert.

#! /usr/bin/env python

import datajoint as dj
import numpy as np
import pandas as pd

from code import interact

dj.config['enable_python_native_blobs'] = True


schema = dj.schema('test_pd_insert')


@schema
class TestPdInsert(dj.Manual):
    definition = """
    desc:    char(16)
    ---
    data:    longblob
    """



def test_dataframe_index_strip_fails():

    try:
        TestPdInsert.insert(
            pd.DataFrame(data=[('error', np.zeros(10, dtype=np.int64))],
                         columns=['desc', 'data']))
    except KeyError as e:
        if '`index` is not in the table heading' in repr(e):
            print('fails as expected')
            pass


if __name__ == '__main__':
   test_dataframe_index_strip_fails()
   interact('test_dataframe_index_strip_fails', local=locals())

ixcat · 2020-04-14T22:46:22Z

todo: reconfirm, report back

ixcat · 2020-04-22T17:56:55Z

confirmed still valid. will adjust per comments above, ensure pandas test coverage for this case

ixcat · 2020-04-30T23:51:00Z

Currently running with:

diff --git a/datajoint/table.py b/datajoint/table.py
index e9bacd4..be79832 100644
--- a/datajoint/table.py
+++ b/datajoint/table.py
@@ -193,7 +193,9 @@ class Table(QueryExpression):
         """
 
         if isinstance(rows, pandas.DataFrame):
-            rows = rows.to_records()
+            rows = rows.reset_index(
+                drop=isinstance(rows.index, pandas.RangeIndex)).to_records(
+                    index=False)
 
         # prohibit direct inserts into auto-populated tables
         if not allow_direct_insert and not getattr(self, '_allow_insert', True):  # allow_insert is only used in AutoPopulate

would be good to get more pandas user confirmation if this makes sense... also probably needs xcheck on composite keys..

for single-key example (see attached), user created frames without explicit index get range index, which then can be 'dropped' on reset index, allowing to_records to work with index false. dj created records will not have the range index, so it is reset and then dropped within the to_records call

see attached for test/interaction example (rename to .py. if needed)
issue666.txt

…int#666

datajoint/table.py: smarter dataframe conversion (#666)

dimitri-yatsenko · 2020-05-16T00:10:47Z

Fixed with #776

eywalker added this to the Release 0.12.6 milestone Apr 14, 2020

ixcat self-assigned this Apr 14, 2020

ixcat mentioned this issue Apr 30, 2020

datajoint/table.py: smarter dataframe conversion (#666) #776

Merged

eywalker self-assigned this May 8, 2020

ixcat added a commit to ixcat/datajoint-python that referenced this issue May 15, 2020

tests/test_relation.py: add user-created pd.DataFrame test for datajo…

a3b7c65

…int#666

ixcat changed the title ~~[dev] pandas insert should use index=False to prevent extra index field generation~~ pandas insert should use index=False to prevent extra index field generation May 15, 2020

eywalker added a commit that referenced this issue May 15, 2020

Merge pull request #776 from ixcat/issue-666

518b882

datajoint/table.py: smarter dataframe conversion (#666)

dimitri-yatsenko closed this as completed May 16, 2020

dimitri-yatsenko modified the milestones: Release 0.12.6, Release 0.13.0 May 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas insert should use `index=False` to prevent extra `index` field generation #666

pandas insert should use `index=False` to prevent extra `index` field generation #666

ixcat commented Oct 1, 2019

ixcat commented Oct 2, 2019

dimitri-yatsenko commented Nov 19, 2019 •

edited

Loading

ixcat commented Nov 19, 2019 •

edited

Loading

ixcat commented Apr 14, 2020

ixcat commented Apr 22, 2020

ixcat commented Apr 30, 2020 •

edited

Loading

dimitri-yatsenko commented May 16, 2020

pandas insert should use index=False to prevent extra index field generation #666

pandas insert should use index=False to prevent extra index field generation #666

Comments

ixcat commented Oct 1, 2019

ixcat commented Oct 2, 2019

dimitri-yatsenko commented Nov 19, 2019 • edited Loading

ixcat commented Nov 19, 2019 • edited Loading

ixcat commented Apr 14, 2020

ixcat commented Apr 22, 2020

ixcat commented Apr 30, 2020 • edited Loading

dimitri-yatsenko commented May 16, 2020

pandas insert should use `index=False` to prevent extra `index` field generation #666

pandas insert should use `index=False` to prevent extra `index` field generation #666

dimitri-yatsenko commented Nov 19, 2019 •

edited

Loading

ixcat commented Nov 19, 2019 •

edited

Loading

ixcat commented Apr 30, 2020 •

edited

Loading