Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential conflict for non alphabetic character leading schema #294

Closed
chris920820 opened this issue Jul 8, 2020 · 1 comment
Closed

Comments

@chris920820
Copy link
Contributor

chris920820 commented Jul 8, 2020

Hey, @xitongsys !
Per a7314a1,
Seems we are adding a prefix P_ to the schema that is not leading by nonalphabetic character.
However, if a parquet has the schema P__x and _x, it will result in conflict, since we can no longer distinguish if data came from P__x or _x. Also, it might be problematic for the consumer to know this convention. For example, if the consumer is expecting the column _x exist, and try to read data using name _x it will fail because it has internally converted to P__x.

Do we have some places that enforce this naming convention (no leading non alphabetic char)? Does Golang compiler enforce that in some places?

Is there any better we could handle this more gracefully? To avoid using non alphabetic leading characters as variable name, could we can add a global prefix instead of just add a prefix of certain columns?

xitongsys added a commit that referenced this issue Sep 10, 2020
@xitongsys
Copy link
Owner

hi, @chris920820
Sorry for so late response.
For now I just mitigate this issue in the pull request #310 and also add some comment in readme.

@xitongsys xitongsys reopened this Sep 10, 2020
durango pushed a commit to edms/parquet-go that referenced this issue Apr 14, 2021
durango pushed a commit to edms/parquet-go that referenced this issue Apr 14, 2021
zolstein pushed a commit to zolstein/parquet-go that referenced this issue Jun 23, 2023
…itongsys#289)

* refactor packages to use encoding.Values container

* refactor page and dictionary creation to use encoding.Values

* go vet fix

* reduce memory footprint of encoding.Values

* refactor encoding.Encoding to use simple Go types

* port parquet-go package to use pair of values+offsets to represent byte arrays

* add fuzz tests back

* optimize DELTA_LENGTH_BYTE_ARRAY decoding (xitongsys#291)

* optimize DELTA_LENGTH_BYTE_ARRAY decoding

* add link to online documentation

* fix

* add a unit test for decodeByteArrayLengths

* Update encoding/delta/length_byte_array_amd64.s

Co-authored-by: Kevin Burke <[email protected]>

* optimize DELTA_LENGTH_BYTE_ARRAY encoding (xitongsys#292)

Co-authored-by: Kevin Burke <[email protected]>

* account for size of offsets buffer when benchmarking throughput

* optimize DELTA_BYTE_ARRAY decoding (xitongsys#294)

* PR feedback

Co-authored-by: Kevin Burke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants