Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paginated XML endpoint serializes invalid XML for file attachments #2072

Closed
gstuder-ona opened this issue May 6, 2021 · 2 comments · Fixed by #2079
Closed

Paginated XML endpoint serializes invalid XML for file attachments #2072

gstuder-ona opened this issue May 6, 2021 · 2 comments · Fixed by #2079

Comments

@gstuder-ona
Copy link

gstuder-ona commented May 6, 2021

Environmental Information

  • Onadata version:
    stage.ona.io

Problem description

As mentioned here: https://onaio.slack.com/archives/C04DRCMRD/p1620306800044000

https://stage-api.ona.io/api/v1/data/2632/145875.xml - this gives you

<q0_08_passport_pic type="file">20210109_102230-9_6_2.jpg</q0_08_passport_pic>

https://stage-api.ona.io/api/v1/data/2632.xml?query={"_date_modified": {"$gt": "2021-05-06T06:20:50Z"}}&limit=1 - this gives you:

    <q0_08_passport_pic type="file"><#text>20210109_102230-9_6_2.jpg
    </#text>
</q0_08_passport_pic>

(the paginated result is centered on the same 145875.xml entry)

gregs 45 minutes ago
the ETL fails as the <#text> tag isn't valid XML and so can't be parsed. We can hack something to catch this, but it would be good understand exactly what's going on here and if it's fixable on OnaData side anytime soon.

Expected behavior

The paginated XML endpoint should return valid XML identical to the individual XML endpoint.

Steps to reproduce the behavior

See above.

Additional Information

Seems to be a serialization issue, <#text> is the name for HTML DOM nodes.

@gstuder-ona
Copy link
Author

@DavisRayM I'm sure this will take a dev cycle and QA to fix, but if you're able to give us a good guess as to whether #text is the only problem or there are other DOM nodes to suppress it'd help us get better QA turnaround

@DavisRayM DavisRayM self-assigned this May 6, 2021
@WinnyTroy WinnyTroy added this to the Week 20 - 21 (2021) milestone May 13, 2021
@DavisRayM
Copy link
Contributor

@DavisRayM I'm sure this will take a dev cycle and QA to fix, but if you're able to give us a good guess as to whether #text is the only problem or there are other DOM nodes to suppress it'd help us get better QA turnaround

After further investigation I believe the #text element is the only problem... Working on removing invalid XML elements from the XML in general

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants