Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve handling of charset during import and export of shapefiles #216

Merged
merged 3 commits into from
Mar 13, 2017

Conversation

nprigour
Copy link
Contributor

@nprigour nprigour commented Mar 7, 2017

Allows setting a default charset to via a system variable which will be used during shapefile import to udig.
Provides the ability to set the desired charset during export of shapefiles in udig (see attached screenshot)
charsethanding-export

Signed-off-by: Nikolaos Pringouris [email protected]

@fgdrf
Copy link
Contributor

fgdrf commented Mar 8, 2017

Thanks for this improvement, going to test it these days

Triggered a build : https://hudson.locationtech.org/udig/job/uDig-PR-NG/37/ (login with your Eclipse Account)

@nprigour Could you have a look at the docs, if and how this Operation can co-exists

and P_DEFAULT_CHARSET preference value  

Signed-off-by: Nikolaos Pringouris <[email protected]>
@nprigour
Copy link
Contributor Author

nprigour commented Mar 8, 2017

Hi Frank,
I was not aware of this preference value controlling also the shp file default encoding. Anyway I have amended the pull request with a proposal on how to address coexistence of these variables. The addition concerns only potential import of shp files and suggest that the system variable takes precedence over the UI preference option (that is quite rational since the UI option affects every aspect of UI which usually we want to be UTF-8 while usually shp files are of other encoding).

Set of desired charset during export operation is not affected by the issue you mention and the amendment (the initial enhancement provides the ability to export using a specific charset and does not touch the source shapefile)

@fgdrf fgdrf added this to the uDig-2.0.0 milestone Mar 10, 2017
@fgdrf
Copy link
Contributor

fgdrf commented Mar 11, 2017

I tested a bit .. and have some questions:

  • What System-Variable I can use to switch the default charset?
  • Why is that extra option required if a system-variable is responsible for charset handling on import/export?
  • If it is changed (from any default) to a specific, why should the user change it for export only. On the other hand, why can't the user choose the charset during import (Add Layer)?
  • If exported in an other charset than the default, how can I choose charset during re-import of the exported shapefile?

@fgdrf
Copy link
Contributor

fgdrf commented Mar 11, 2017

It would be great if we have a sample-/test-dataset to test charset handling ..
@nprigour Could you provide a shape with demo data and strings with a different charset?

@nprigour
Copy link
Contributor Author

Please find the answers below:

  • What System-Variable I can use to switch the default charset? --> the variable is shp.encoding. You can specify it as a java start option in the .ini file. I.e. -Dshp.encoding=iso-8859-7
  • Why is that extra option required if a system-variable is responsible for charset handling on import/export? --> I do not know. Is ithis UI option used only for shp file encoding? When I see it is not obvious if it refers to shp encoding only or it can be used and in other cases.
  • If it is changed (from any default) to a specific, why should the user change it for export only. On the other hand, why can't the user choose the charset during import (Add Layer)? --> This is very common. For example in our application the mysql database layers used UTF-8 encoding while we wanted to export shp data in ISO-8859-7 (Greek) encoding since other applications (ArcMap) did not understand UTF-8. Choosing charset during import can be a good option but required further implementation activities
  • If exported in an other charset than the default, how can I choose charset during re-import of the exported shapefile? --> You can't. In that case you should explicitly change it via Operations-->Change Character Set command that exists by right clicking on the shapefile row in the catalog view.

To summarize we define the shp.encoding variable since we wanted to ensure that multiple users (in their local installations) always used a specific encoding while importing shp in udig (and that is the main reason for setting this variable). We could not rely in the ui Option since most of the times users forgot to change it and we resulted in erroneous shp data encoding. In any case this shp.encoding affects encoding only if it is specified as a java startup option. So I think it is just another feature to impose a certain encoding during import and a suggested encoding during export. If not specified then existing behavior remains unaffected. I am attaching 2 example shp files using iso-8859-7 & UTF-8 encoding (check the address field)
shp_encodings.zip

@fgdrf
Copy link
Contributor

fgdrf commented Mar 12, 2017

Very interesting! The pull provides an option to set encoding for the Shapefile-Writer. I expect that the e.g. Strings-Attributes were created with this encoding.

Whenever this exported ressources are imported again the shp.encoding says how to encode, if not set the default ISO8859-1 is used to read elements from dbf files. The user has the chance to change it with the Operation on the Catalog-entry in Catalog view, let's say UTF-8.

Two things and it is ready for merge:

  • if you could update the docs for the export functionality as well.
  • I tested on mac OSX and the property was not shown, because the columns were pretty width, could you improve this too?

@nprigour
Copy link
Contributor Author

I have updated the documentation.
However since I do not have a mac I cannot address your second comment. In windows, linux everything seems to perfectly visible.

@fgdrf fgdrf merged commit ea45b5f into locationtech:master Mar 13, 2017
@fgdrf
Copy link
Contributor

fgdrf commented Mar 13, 2017

Agree, column width we can address independently from this pull request. I really appreciate the documented shp.encoding property! Many thanks for this improvement!

@fgdrf fgdrf requested review from fgdrf and removed request for fgdrf March 13, 2017 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants