-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve Global IDs #341
Comments
Yes, I do use it when pulling data from Internal into fgdb and it works as advertised - globalids are preserved.. thanks for getting this going. i think preserving these IDs is a huge step in getting us to the next step with data collaboration. |
Looks like we may run into issues if the source dataset doesn't have the appropriate index on the global id field. Here's my test: import arcpy
dest = r'W:\forklift\samples\data\sampledestination.gdb\Counties'
source = r'W:\forklift\samples\data\internal.sde\SGID.BOUNDARIES.Counties'
arcpy.env.preserveGlobalIds = True
print('truncating')
arcpy.TruncateTable_management(dest)
print('appending')
arcpy.Append_management(source, dest, 'NO_TEST') which threw this error:
Maybe a try/catch that searches for the error number and retries the append without the env variable set? I also realized that forklift only uses truncate table when it's wired up to change detection. For everything else, it uses update and insert cursors so I ran this test: import arcpy
dest = r'W:\forklift\samples\data\sampledestination.gdb\Counties'
source = r'W:\forklift\samples\data\internal.sde\SGID.BOUNDARIES.Counties'
arcpy.env.preserveGlobalIds = True
print('truncating')
arcpy.TruncateTable_management(dest)
print('insert cursor')
fields = [f.name for f in arcpy.ListFields(source) if not f.name.startswith('Shape')] + ['Shape@']
source_rows = arcpy.da.SearchCursor(source, fields)
with arcpy.da.InsertCursor(dest, fields) as cursor:
for row in source_rows:
cursor.insertRow(row)
source_rows.reset()
print(next(source_rows))
print(next(arcpy.da.SearchCursor(dest, fields))) which didn't throw any errors, but the global ids were not preserved:
|
that makes sense. speed bump. i wonder if we iterate through all the layers in Internal where GlobalID has no unique index and then assign a unique index to that field. also, should we ensuring all layers in Internal have a GlobalID? |
If you did that the truncate and load update methods would keep creating larger and larger ids. I think some of our work flows would need to change. |
creating larger objectids? |
Global ids. Which kind of makes them useless right? |
yeah, i see your point: if the data is updated in Internal via truncate and load then the globalids continually get recreated, making it only useful within the current update cycle. hmm... trying to tie this idea of consistent globalids across datasets (Internal, AGOL/OpenData, fgdb downloads, etc) into your chalkdust remake as well as promoting more collaboration/feedback with our users. ...using a stable id to keep us all on the same page as to what feature we're all talking about. for the roads data, the globalid works b/c I'll be preserving the globalids in Internal based on what's in UTRANS, where they are generated. this will allow us to have a consistent and stable globalid across Utrans, Internal, and the NG911 datasets. This globalid is used in the NG911 error reporting and it helps tie the feedback/fixes back to the source utrans data. It would be nice if the roads globalid was consistent in AGOL/OpenData as well. again, thinking about feedback and collaboration in the AGOL/OpenData world... users can recommend changes/fixes based and we can tie it back to the utrans segment based on the globalid. longwinded, I know. but, your point is taken. aside from the roads data, the current update process of truncate and load kind of throws a monkey wrench into this for most other datasets. Given, it would still be nice to preserve the ids (keep them consistent between Internal and AGOL/OpenData) for the roads dataset as those will remain stable. to make that happen, i guess that means we'd have to ensure that all existing data in Internal that has a GlobalID field has a unique index on it, right? that would prevent the preserve ids environmental variable from erroring, right (assuming the roads data is updated in AGOL/OpenData via the append method @stdavis described above - and not the insert cursor method)? |
I think that there is a possibility of making this work for roads at least in the short term. I think that we should add some checking to forklift for global id fields and the required index. Then apply the preserve ids env var for each append individually. This wouldn't fix it for stuff that's updated via the insert cursor, but that definitely doesn't apply to roads or anything else coming out of internal. I did verify that global ids are preserved when forklift pushes the data to AGOL by verifying that they match between features from the forklift staging FGDBs and the corresponding hosted feature layer in AGOL. I'll work on a PR to make this happen. |
We are running into this same issue in the WRI forklift instance. Global IDs are not being preserved in the Range Trends data (SiteInfo). This is a problem because they are used to link to the rand trends site (e.g. This instance does not use change detection. I wonder if the fix is to apply the same logic in change detection: forklift/src/forklift/change_detection.py Lines 71 to 84 in d415ed4
to the core forklift update: Line 105 in d415ed4
|
Could we calculate a field with those values as a string or something to preserve them for now? |
I think that could be a good fallback but I think that I've come up with a solution. I'm about to submit a PR. |
From the illustrious @gregbunce:
I wonder if this setting should be enabled globally in the base forklift project, or just in the AGOL pallet.
The docs mention that this setting only applies to enterprise geodatabase data when using the append tool. I'm not sure if that refers to the input, output, or both parameters. Forklift does use the append tool to update destination data.
@gregbunce Have you tried this setting when using the append tool to pull data from internal into a FGDB?
The text was updated successfully, but these errors were encountered: