Custom Data Exports #2881

Ig-Rebollo · 2020-11-13T18:49:37Z

Goals

Ensure that exported data only includes relevant columns to conform with OSM requirements.
Create different data exports for ‘private’ and ‘public’ data that is used by different audiences

Background

There are columns that are irrelevant for OSM, both as part of the survey and as metadata (start, username, deviceid). For OSM-compatible output, exports should always drop the metadata columns and have a way to filter out questions/columns that are not meant to be imported, as well as a toggle to exclude the select_multiple split columns.

In other circumstances there is a need to share data publicly with information that is non-sensitive, while creating separate data exports with sensitive data.

User stories

As a survey administrator, I’d like to pre-defined data exports that I or my colleagues can export regularly but that do not contain sensitive data in order to protect our beneficiaries.

As a survey administrator, I’d like to create two data sets with one form, some of the data being sensitive and other data being for OSM and public. I need to retain in each export the same ID numbers so that we can link public data back to private data when needed. In other situations I need to simply separate OSM data from other data which isn’t appropriate for OSM from my survey (even though it may not be sensitive). A way to mark with questions or columns contain which type of data for export would help very much.

Ig-Rebollo · 2020-11-13T19:00:17Z

@tinok

As per our call yesterday with MapKibera for the KoBo/OSM integration, here is the initial proposal for their main request: custom data exports / sharing a subset of columns.

As you can see here I updated the interface slightly, hiding some options under 'advanced' so regular and casual users seeking to export everything as usual don't find it overwhelming. The main ideas we covered are there, though this is just the first idea and we might need to move things around a bit to make sure it makes sense. Hopefully this could give us a general idea so we can work on a timeline, estimate, etc.

You should also be able to click through the prototype yourself using this link: https://www.figma.com/proto/AzJPYVjf2CDBk1PGLv62sC/Project-Downloads?node-id=9%3A3&viewport=428%2C870%2C0.27639657258987427&scaling=min-zoom

cc @jnm @magicznyleszek

tinok · 2020-11-13T19:44:10Z

@Ig-Rebollo I really like this approach. 3 suggestions and 1 question:

Move the 'group separator' into the advanced options--it's not that important for most users to change that. Alternatively you could switch its place with 'include groups in headers').
Don't show the 'use predefined selection' field unless a custom export exists
Rename 'use predefined selection' to 'Custom exports'
Make the 'save selection as template' field stand out more; it looks like one more thing user can choose from to modify their exports when in reality it is a completely different feature that should be more visible. I also suggest renaming it to 'Save as custom export'

Q: How does a user delete a custom export?

Ig-Rebollo · 2020-11-13T21:07:38Z

@tinok thanks for the comments.

Great, much better.
Sure. Didn't want to overcomplicate it, but I think that would make it even more intuitive.
No problem.
Maybe you are right and it needs to be a button. Otherwise you need to click on the 'export' button for the 'save selection as a template' to actually be saved.

Q: Well, they don't. I didn't think they needed to delete them. It would only be stored for each project, right? So I'm guessing you wouldn't have a big list... but we can add that option as well.

jnm · 2020-11-19T07:00:16Z

For "Don't split select_multiple questions", could we change the label to "Split select_multiple questions" and have it be checked by default?

For what it's worth, I think the behind-the-scenes export machinery (formpack) supports three ways of handling select_multiples:

a single column where each cell has all of the selected choices separated by spaces, called summary in the code:

which colors?

cyan black
multiple columns, one for each choice, where the cells in each column indicate (with a 1 or 0) whether or not that choice was selected—called details in the code:

which colors?/cyan which colors?/magenta which colors?/yellow which colors?/black

1 0 0 1
including both (1) and (2) together, called both in the code.

which colors? which colors?/cyan which colors?/magenta which colors?/yellow which colors?/black

cyan black 1 0 0 1

The only one of these that KPI currently offers is (3).

I believe that the old KoBoCAT double-negative "DONT [sic] split select multiple choice answers into separate columns" (please, let's not do this again 🙏) effectively presents a choice between (3) when unchecked and (1) when checked.

I don't know if anyone would have a use for (2), but it should already work on the back-end if want to make it available.

Ig-Rebollo · 2020-11-20T18:18:04Z

Thanks for the comments @jnm . From my point of view doesn't look like there is a meaningful need for (2), so leaving the check/uncheck box for options (1) and (3) still seems like the best way to go. This is a question we could ask users for input in the forum though. In case we had to provide the 3 options, the UI would get a bit trickier.

@tinok I also updated the animation and prototype with some of the earlier comments:

You should be able to see the updated prototype following the earlier link

tinok · 2020-11-20T20:57:30Z

@Ig-Rebollo To be sure, what @jnm meant is that we have more than an on/off available as options. The current default is in fact (3), while some users may prefer to have only (1) or (2). For example, for our own research we never use the option from (1) but only use (2), so having both isn't necessary.

The question then would be whether this new setting would be a choice between (1) and (2) and not (1) and (3).

Ig-Rebollo · 2020-11-20T21:16:08Z

@tinok understood. I guess what I'm trying to say is that from the UI perspective it doesn't really matter whether we go with 'between (1) and (3)' or 'between (2) and (3)'. If there are any doubts, we could ask users (specifically those interested in this feature) to get a better idea. At this stage what I'm interested to know is if we need to provide 2 options (on/off) or three options, in which case maybe a dropdown would be a better alternative... even though it would be complicated to explain in a clear, intuitive way.

tinok · 2021-01-19T20:18:58Z

@Ig-Rebollo I personally think we should have all three options available. A dropdown could accommodate all three without needing more space than a single checkbox.

Option (3), which is currently our only available option for users, is the least useful, but people may have built workflows around having this available, so we can't do away with it. We could write to our user testers to see whether (1) or (2) should be the default. But for now let's include this element in Figma so the UI development can go ahead.

Ig-Rebollo · 2021-01-20T16:17:56Z

Thanks for your comments @tinok (both here and in the Figma file). I made a quick update to show the changes in the mockup below. I think that should be enough for implementation but do let me know if you want me to update the whole prototype.

Final arrangement:

tinok · 2021-01-22T20:35:30Z

I think this looks good, thanks.

@joshuaberetta please check the slightly updated UI design ^

joshuaberetta · 2021-01-22T20:40:40Z

@tinok, I think @magicznyleszek will be implementing the FE changes, I'll be handling the BE 👍

jnm · 2021-01-29T06:16:01Z

@magicznyleszek, @joshuaberetta, could we handle #2517 here as well, i.e. include __version__, _submitted_by, _status (and perhaps others) in the export column choices? KoBoCAT's to_dict_for_mongo() may reveal some other, useful fields that get added to every submission's JSON. Or, perhaps the front end has a list of these already for the table view.

__version__ is a little tricky: when that's desired as an export column, the back end ought to include either the inferred version (my preference) or every __version__, _version_, _version__001, _version__002, … column.

tinok · 2021-01-29T14:12:48Z

I wonder whether we should not treat the __version_ and __version__nnn fields differently from any other calculation columns, which is to include all of them separately. Once we fix #1465 these multiple columns won't exist anymore.

magicznyleszek · 2021-02-04T15:31:27Z

@tinok @Ig-Rebollo I have a question
cc @jnm

What is the user input for in "Include data from all # versions"? My guess is that we want to give users the option to choose what form versions should the exported data be from. But this isn't clear now, as I don't know what would it mean if I have e.g. 20 versions and I type here 5. Is it the last 5 versions? Is it the first 5? Maybe it would be better to present a multiple selection dropdown to users and let them pick and choose versions? Because I imagine someon would like to get data only from some older version and current input doesn't provide such versatility.

Ig-Rebollo · 2021-02-04T21:12:45Z

@magicznyleszek sorry I can see how it might be confusing. I think that is not an input but rather an indicator. It tells you the number of versions, but you can't type and change the number. It would probably be better to style it like other indicators, such as the number of deployed, draft, and archived projects. Very round corners, light grey fill and no border.

However seeing now your comment you make me hesitate. Maybe Tino's original suggestion was meant to do what you just said (being able to select the number of versions I want data from, such as the last 5 from a total of 20). If that is the case, we would leave it as is but remove the 'all' right before the input box and add a 'latest' before 'versions'.

@tinok can you clarify this just to be sure?

jnm · 2021-02-12T03:25:01Z

@Ig-Rebollo @magicznyleszek let's not attempt to include a subset of the versions, i.e. let's keep it as either the latest version or all versions.

It's actually going to be a bit weird to offer the "Custom selection export" along with "Include data from all n versions", because the list of questions shown will only draw from the most recent version of the form. Of course, we can improve that later, but it's not a trivial thing to do.

jnm · 2021-02-15T17:17:29Z

@jnm todo: propose basic UI for legacy exports
and leszek's idea to show a warning when legacy exports are used

Ig-Rebollo self-assigned this Nov 13, 2020

Ig-Rebollo added needs-design UI & UX User interface problems and improvements labels Nov 13, 2020

jnm mentioned this issue Jan 19, 2021

Custom data exports UI implementation #2963

Closed

jnm mentioned this issue Jan 22, 2021

Custom data exports API implementation #2967

Closed

joshuaberetta mentioned this issue Feb 5, 2021

Custom data exports feature #2975

Merged

joshuaberetta linked a pull request Feb 5, 2021 that will close this issue

Custom data exports feature #2975

Merged

magicznyleszek mentioned this issue Feb 18, 2021

Project Downloads improvements - custom data exports UI #3012

Merged

jnm mentioned this issue Mar 10, 2021

Incorporate previously reviewed changes #3056

Merged

JacquelineMorrissette closed this as completed Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Data Exports #2881

Custom Data Exports #2881

Ig-Rebollo commented Nov 13, 2020 •

edited

Loading

Ig-Rebollo commented Nov 13, 2020

tinok commented Nov 13, 2020

Ig-Rebollo commented Nov 13, 2020

jnm commented Nov 19, 2020

Ig-Rebollo commented Nov 20, 2020

tinok commented Nov 20, 2020

Ig-Rebollo commented Nov 20, 2020

tinok commented Jan 19, 2021

Ig-Rebollo commented Jan 20, 2021

tinok commented Jan 22, 2021

joshuaberetta commented Jan 22, 2021

jnm commented Jan 29, 2021

tinok commented Jan 29, 2021

magicznyleszek commented Feb 4, 2021

Ig-Rebollo commented Feb 4, 2021 •

edited

Loading

jnm commented Feb 12, 2021

jnm commented Feb 15, 2021

Custom Data Exports #2881

Custom Data Exports #2881

Comments

Ig-Rebollo commented Nov 13, 2020 • edited Loading

Goals

Background

User stories

Ig-Rebollo commented Nov 13, 2020

tinok commented Nov 13, 2020

Ig-Rebollo commented Nov 13, 2020

jnm commented Nov 19, 2020

Ig-Rebollo commented Nov 20, 2020

tinok commented Nov 20, 2020

Ig-Rebollo commented Nov 20, 2020

tinok commented Jan 19, 2021

Ig-Rebollo commented Jan 20, 2021

tinok commented Jan 22, 2021

joshuaberetta commented Jan 22, 2021

jnm commented Jan 29, 2021

tinok commented Jan 29, 2021

magicznyleszek commented Feb 4, 2021

Ig-Rebollo commented Feb 4, 2021 • edited Loading

jnm commented Feb 12, 2021

jnm commented Feb 15, 2021

Ig-Rebollo commented Nov 13, 2020 •

edited

Loading

Ig-Rebollo commented Feb 4, 2021 •

edited

Loading