Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade WDC connector to support Tableau v2 #1777

Closed
WinnyTroy opened this issue Jan 31, 2020 · 7 comments · Fixed by #1880
Closed

Upgrade WDC connector to support Tableau v2 #1777

WinnyTroy opened this issue Jan 31, 2020 · 7 comments · Fixed by #1880
Assignees

Comments

@WinnyTroy
Copy link
Contributor

WinnyTroy commented Jan 31, 2020

Environmental Information

production, stage, local

Expected behavior

The open-data endpoint should be able to import data to tableau for visualization.

Additional Information

A WDC (Web Data Connector) is a HTML page with JavaScript code that connects to web data (for example, by means of a REST API), converts the data to a JSON format, and passes the data to Tableau.

The current connector was created using WDC version 1.x. This connector does not work with newer versions of tableau.

To support for newer versions of tableau we could

  1. Upgrade the connector to support WDC v2.x as explained here

Unfortunately, there might be a need to keep the version 1.x connectors, i.e Not modify the open-data endpoint, if we hope to support earlier versions of Tableau.

  1. Alternatively, based on discussions held on this issue: Add Form Schema Endpoint #1296, I believe that we could use the same connector, but define our schema not only from the XForm spec, but also from the OSM Data. However this is an untested approach at the moment, but could maybe help us maintain support for all versions of tableau?

I feel this issue could be handled adequately by implementing the 2nd option. Are there any reservations I might have missed. Hoping to get people's thoughts on this
Cc: @ukanga @pld @ivermac @lincmba @DavisRayM

Aha! Link: https://ona.aha.io/features/PROD-172

@pld
Copy link
Member

pld commented Jan 31, 2020

This analysis makes sense to me

  • if our clients our using them, we want to keep support for earlier Tableau versions, can check w/@msschroeder on that

Next step sounds like testing that the 2nd option would work, and if so we can commit to it, unless anyone see's a problem w/that

@ukanga
Copy link
Member

ukanga commented Apr 3, 2020

With the schema:

['_duration',
 '_id',
 '_id',
 '_media_all_received',
 '_media_count',
 '_notes',
 '_review_comment',
 '_review_status',
 '_submission_time',
 '_submitted_by',
 '_tags',
 '_total_media',
 '_uuid',
 '_version',
 'meta_instanceID',
 'sec1_county',
 'sec1_hh_num',
 'sec1_location',
 'sec1_state']

And the row data having

['_attachments',
 '_bamboo_dataset_id',
 '_duration',
 '_edited',
 '_geolocation',
 '_id',
 '_last_edited',
 '_media_all_received',
 '_media_count',
 '_notes',
 '_status',
 '_submission_time',
 '_submitted_by',
 '_tags',
 '_total_media',
 '_uuid',
 '_version',
 '_xform_id',
 '_xform_id_string',
 'formhub_uuid',
 'meta_deprecatedID',
 'meta_instanceID',
 'sec1_county',
 'sec1_hh_num',
 'sec1_location',
 'sec1_state']

I understand that this difference is the result of the failure in tableau. If we have more column headers than row headers (if the above was the reverse) would still experience the issue?

I still recommend that we use the column headers as defined by the header. We may need to add some additional columns (meta_deprecatedID) and perhaps remove others (underscore fields perhaps with the exception of _id) as they may not be necessary. We can achieve returning the correct data by relying on a serializer at the data endpoint that only returns JSON data fields with only the defined columns. We can and should use serializer classes for this work which we may either extend to support dynamic fields or create dynamic serializer classes based on the headers generated for the form.

I disagree that we should pick columns from the first row of the data as the PR #1686 has done.

@WinnyTroy
Copy link
Contributor Author

Next step sounds like testing that the 2nd option would work, and if so we can commit to it, unless anyone see's a problem w/that

We were able to research on using OSM data to generate the column schema with Davis, we were able to find that not all forms have data generated for them in the OSM model.

It was only submissions that had geom data collected here that had generated OSM data.

@WinnyTroy
Copy link
Contributor Author

WinnyTroy commented Apr 6, 2020

@ukanga @ivermac

If we have more column headers than row headers (if the above was the reversed) would still experience the issue?

I dont believe we would. Previously, we had experimented with defining the column headers using all the fields in the row headers. With this implementation, we were able to create a connection in tableau.

We can achieve returning the correct data by relying on a serializer at the data endpoint that only returns JSON data fields with only the defined columns. We can and should use serializer classes for this work which we may either extend to support dynamic fields or create dynamic serializer classes based on the headers generated for the form.

I believe this could be handled by the open-data endpoint. Testing this on stage, by manually including these fields {'_xform_id', 'meta_deprecatedID', 'formhub_uuid', '_xform_id_string'} as was done here creates a tableau connection for Tableau v2019.1.14. So including columns in the column_headers seems to work.

Im not sure if there are other fields required to have this running, testing it with a couple of forms at the moment

@ukanga
Copy link
Member

ukanga commented Apr 6, 2020

I fail to see how OSM fields have anything to do with issue - not all forms have OSM fields, hence this may not be applicable.

@ukanga
Copy link
Member

ukanga commented Apr 6, 2020

I am open to review another PR that addresses this issue adequately. The PR #1686 is not the answer.

@WinnyTroy
Copy link
Contributor Author

WinnyTroy commented May 20, 2020

Including this here to note the pending issues with the tableau connector are

  1. Data collected within nested repeat groups is not downloaded. This is shown as null in Tableau.
  2. No data is downloaded for select_multiple questions. These also appear as null
  3. Data does not appear in the same order as the questions in the XLSForm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants