Injecting mandatory FMTM fields into custom XLSForms #1722

spwoodcock · 2024-07-31T10:13:35Z

Is your feature request related to a problem? Please describe.

Currently our XLSForm has hidden fields that are mandatory: feature / task selection, digitization_correct questions.
This is prone to breakage if the user uploads a custom XLSForm and accidentally deletes fields (despite being hidden).
We need a more resilient mechanism to allow for custom XLSForm upload flexbility, but also impose our restrictions / requirements that FMTM needs to function well.

Describe the solution you'd like

Option 1: append the fields in the XLSForm, either via pyxlsx or possibly via Pandas, prior to XLS --> XML.
- Load the XLSForm as a dataframe.
- Have pre-prepared separate XLSForms with building questions & digitization_correct questions.
- Read the building selection questions into a dataframe and append to start.
- Read the digitization_correct questions into a dataframe and append to end.
- However, this will introduce a big dependency: Pandas, which is not ideal.
Option 2: use the current XML parsing / manipulation method to inject all required params, after XLS --> XML.
- This method is possible, but may have many fields we need to insert into specific places.
- Needs testing and is more prone to breakage.

Additional context

This first option would probably require we update all form handling to use Pandas, which would be quite an improvement to XLS wrangling.

Note

See comment below. We should not use Pandas, but instead python-calamine.

The text was updated successfully, but these errors were encountered:

spwoodcock · 2024-08-02T12:04:14Z

This would make the XLSForms a whole lot more maintainable.

Currently if we have say 10 categories = 10 XLSForms, we need to keep the extra fields we use in FMTM in sync for all of them.

Ideally we have:

1 XLSForm with mandatory fields for FMTM.
10 XLSForms with questions only related to the specific mapping category.
On project creation we merge the selected form with the 'FMTM mandatory fields' XLSForm.

spwoodcock · 2024-08-03T16:03:15Z

I just did a small assessment of package size between various XLS and XLSX parser libraries:

Pandas is nice to work with, but pulls in so many unnecessary libraries it's not worth it.
openpyxl and xlrd would be required together to parse XLSX and XLS files respectively.
python-calamine seems like the most sustainable choice (it was recently integrated into Pandas, so should have good community support). At the low level, it's actually using a compiled rust binary, making it up to 5x faster than openpyxl and 1.5x faster than xlrd. It's also the smallest, adding only 12MB to the build!

We should use python-calamine for XLS and XLSX parsing.

Sujanadh · 2024-08-15T10:04:50Z

I found that python-calamine is only for reading xls files but can't use it to manipulate the form fields. We would need another package, for example, openpyxl to modify the form in order to inject fmtm mandatory fields.

spwoodcock · 2024-08-15T12:13:09Z

Oh no! That's a pain 😅

I considered Pandas but it's a huge library (200MB).
Also Polars is similar.

Openpyxl is ancient now and unmaintained - but the xlsx spec probably hasn't changed and perhaps it's just feature complete.

It would mean we need to use it alongside xlrd to read xls though!

Two other options:

Use Pandas or Polars for the simplicity?
Just deal with the XML instead of XLS(X)?
(This isn't ideal, but perhaps we can make it easier by parsing the XML to dict)

spwoodcock · 2024-08-20T13:25:40Z

Acknowledgement I have been a bit silly

Pandas is large, but it's also going to be supported for a long time & is easy to use.
While it takes up a lot of space on a fresh install, I tested adding it to our current FMTM stack & it adds very little...
I assume this is because our other dependencies already install the large subdependencies
We can safely add pandas to the backend dependencies - sorry for the mess around!
While python-calamine is definitely a lot faster, it's overkill for our small xlsx files (I was mainly looking at it as an alternative for pandas)

If we ever need to revisit this

If we ever want to remove pandas for any reason, we can consider calamine again for writing, in parallel with XlsxWriter for writing.

python-calamine for reading. We get a list of lists for the XLSX data.
https://github.com/jmcnamara/XlsxWriter for writing. We can iterate through the list of lists and write a simple XLSX file.

We would need to iterate through the survey and choices sheets to combine our defaults with the custom uploaded XLSX.

from xlsxwriter import Workbook
from python_calamine import CalamineWorkbook

workbook = CalamineWorkbook.from_path("form.xlsx")
xlsx_data = workbook.get_sheet_by_name("survey").to_python()
# [
# ["1",  "2",  "3",  "4",  "5",  "6",  "7"],
# ["1",  "2",  "3",  "4",  "5",  "6",  "7"],
# ["1",  "2",  "3",  "4",  "5",  "6",  "7"],
# ]

workbook = Workbook('combined.xlsx')
worksheet = workbook.add_worksheet("survey")
for row_index, row_data in enumerate(xlsx_data):
    for column_index, column_data in enumerate(row_data):
        # Get the column letter from column_index
        column_letter = xxx
        worksheet.write(f'{column_letter}{row_index}', column_data)

spwoodcock · 2024-08-22T11:47:31Z

Possible issue with translations

Interesting issue I just encountered!

For the choice sheet we have a placeholder:
task_filter 1 1 1 1 1

The 1 values are for list_name name and label
But they are also required in our workflow for label::English(en) and the other 3 languages.
If they are missing, then the value is displayed at a null value in ODK Collect!

I'm raising this because it may cause issues with merging!
If our mandatory fields form has the fields: label::English(en) label::Spanish(es) etc
But the user uploaded form doesn't have these fields, and just uses a default label field, then it looks like the value for specific languages will take priority: in this case they would be null / empty.

Possible solution

It's quite hard to automatically merge translations, as the user may use the label field with a language other than English.
We probably should address this complexity later for now, and assume the default label will be English, with other languages being in fields like label::Spanish(es).

Scenario 1:

The user uploads a form with label field only.
We remove our label::English(en) label::Spanish(es) etc fields from the mandatory fields form.
We move label::English(en) contents to the label field for consistency.
There is only one language for the form - all users will see this.

Scenario 2:

The user uploads a form with ``label::Spanish(es)` etc fields.
We try to match these fields with the mandatory field form fields.
We remove any translation fields from the mandatory field form that are not present in the user uploaded form.
Example:
- User uploads a form with label::Spanish(es) label::French(fr) fields
- We match and keep the label::Spanish(es) label::French(fr) fields form our mandatory fields form
- We remove the additional label::English(en) label::Swahili(sw) fields.

Note

In future we should translation our mandatory field questions into all available languages in ODK.
Then we can handle merging for every possible field.

spwoodcock · 2024-08-22T11:51:23Z

Current mandatory fields in survey sheet:

Current mandatory fields in choices sheet:

Current mandatory fields in entities sheet:

spwoodcock · 2024-09-04T15:20:50Z

For future reference, openpyxl is still maintained, but is available here instead! https://foss.heptapod.net/openpyxl/openpyxl

openpyxl can be used to read AND write XLSX...

My bad for missing that. Although it makes no difference now, as we are using higher level Pandas anyway.

This may be useful for the future though, if one day we remove our reliance on Pandas in osm-fieldwork.

spwoodcock added enhancement New feature or request priority:low Backlog of tasks that will be addressed in time backend Related to backend code effort:medium Likely a day or two labels Jul 31, 2024

manjitapandey mentioned this issue Aug 2, 2024

Direct upload of final submission to OSM, attributed to user #1550

Open

spwoodcock mentioned this issue Aug 2, 2024

Add additional XLSForm categories (feature types for mapping) #1689

Open

spwoodcock added this to the (2024.4.0) XLSForm handling updates milestone Aug 2, 2024

manjitapandey added priority:high Should be addressed as a priority and removed priority:low Backlog of tasks that will be addressed in time labels Aug 2, 2024

spwoodcock mentioned this issue Aug 7, 2024

Project creation workflow 1: Advanced #1741

Open

spwoodcock changed the title ~~Assess possibility of injecting mandatory FMTM fields into XLSForm~~ Injecting mandatory FMTM fields into custom XLSForms Aug 21, 2024

manjitapandey assigned Sujanadh Aug 22, 2024

Sujanadh mentioned this issue Aug 29, 2024

Update form by injecting mandatory fields and validate it #1763

Merged

7 tasks

NSUWAL123 added the testing:ready Ready for testing label Sep 11, 2024

spwoodcock mentioned this issue Sep 17, 2024

Custom XLSForm uploads should have required fields injected automatically #1634

Closed

spwoodcock closed this as completed Sep 17, 2024

spwoodcock reopened this Sep 17, 2024

spwoodcock mentioned this issue Sep 17, 2024

Use XLSForm injection during project creation #1792

Merged

7 tasks

spwoodcock closed this as completed in #1792 Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Injecting mandatory FMTM fields into custom XLSForms #1722

Injecting mandatory FMTM fields into custom XLSForms #1722

spwoodcock commented Jul 31, 2024 •

edited

Loading

spwoodcock commented Aug 2, 2024

spwoodcock commented Aug 3, 2024 •

edited

Loading

Sujanadh commented Aug 15, 2024

spwoodcock commented Aug 15, 2024

spwoodcock commented Aug 20, 2024

spwoodcock commented Aug 22, 2024

spwoodcock commented Aug 22, 2024 •

edited

Loading

spwoodcock commented Sep 4, 2024 •

edited

Loading

Injecting mandatory FMTM fields into custom XLSForms #1722

Injecting mandatory FMTM fields into custom XLSForms #1722

Comments

spwoodcock commented Jul 31, 2024 • edited Loading

spwoodcock commented Aug 2, 2024

spwoodcock commented Aug 3, 2024 • edited Loading

Sujanadh commented Aug 15, 2024

spwoodcock commented Aug 15, 2024

spwoodcock commented Aug 20, 2024

Acknowledgement I have been a bit silly

If we ever need to revisit this

spwoodcock commented Aug 22, 2024

Possible issue with translations

Possible solution

spwoodcock commented Aug 22, 2024 • edited Loading

spwoodcock commented Sep 4, 2024 • edited Loading

spwoodcock commented Jul 31, 2024 •

edited

Loading

spwoodcock commented Aug 3, 2024 •

edited

Loading

spwoodcock commented Aug 22, 2024 •

edited

Loading

spwoodcock commented Sep 4, 2024 •

edited

Loading