Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression fix on using relative paths #248

Merged
merged 9 commits into from
Jan 6, 2019

Conversation

ukanga
Copy link
Contributor

@ukanga ukanga commented Jan 3, 2019

Cuts down performance hit introduced by the relative path functionality. We still get an additional 1 second in my test environment between v0.11.5 and this PR's branch, which is way better than the 38s+ introduced in v0.12.0.

for ((i=0; i<10; i++)); do time xls2xform /tmp/soar.xlsx /tmp/soar.xml > /dev/null; done

This branch has the following times on my test environment:

real    0m3.136s	user    0m5.228s	sys     0m0.124s
real    0m3.225s	user    0m5.224s	sys     0m0.208s
real    0m3.247s	user    0m5.448s	sys     0m0.184s
real    0m3.184s	user    0m5.124s	sys     0m0.144s
real    0m3.206s	user    0m5.168s	sys     0m0.156s
real    0m3.185s	user    0m5.232s	sys     0m0.176s
real    0m3.175s	user    0m5.076s	sys     0m0.152s
real    0m3.528s	user    0m5.696s	sys     0m0.172s
real    0m3.302s	user    0m5.388s	sys     0m0.156s
real    0m3.312s	user    0m5.420s	sys     0m0.156s

v0.11.5 has:

real    0m2.294s	user    0m4.300s	sys     0m0.140s
real    0m2.184s	user    0m3.928s	sys     0m0.164s
real    0m2.247s	user    0m4.104s	sys     0m0.164s
real    0m2.287s	user    0m4.296s	sys     0m0.148s
real    0m2.300s	user    0m4.240s	sys     0m0.136s
real    0m2.227s	user    0m4.220s	sys     0m0.148s
real    0m2.346s	user    0m4.580s	sys     0m0.132s
real    0m2.378s	user    0m4.500s	sys     0m0.160s
real    0m2.617s	user    0m4.720s	sys     0m0.180s
real    0m2.350s	user    0m4.440s	sys     0m0.144s

v0.12.0 has:

real    0m40.083s	user    0m42.208s	sys     0m0.168s
real    0m45.988s	user    0m47.340s	sys     0m0.232s
real    0m41.181s	user    0m43.160s	sys     0m0.212s
real    0m40.310s	user    0m42.412s	sys     0m0.160s
real    0m40.561s	user    0m42.572s	sys     0m0.176s
real    0m41.495s	user    0m43.396s	sys     0m0.176s
real    0m42.968s	user    0m45.012s	sys     0m0.200s
real    0m39.686s	user    0m41.640s	sys     0m0.184s
real    0m40.599s	user    0m42.576s	sys     0m0.236s
real    0m39.867s	user    0m41.820s	sys     0m0.144s

Fix #247

Class method __name_ should be a property.
Will allow use of lru_cache devorator.
This reduces the time by more than half i.e 46s down to 16s. It is still
4 times more than v0.11.5.

Fix XLSForm#247
Returns True on first repeat occurance and caches with lru_cache.

Part fix for XLSForm#247
@yanokwa
Copy link
Contributor

yanokwa commented Jan 3, 2019

Thanks for the quick turnaround on this, @ukanga. If you need a beefier form to test with, HHS_FORM.xlsx is good too.

@ukanga
Copy link
Contributor Author

ukanga commented Jan 4, 2019

@yanokwa Does relative referencing not apply to indexed-repeat() parameters? I am getting the exception below for the form HHS_FORM.xlsx.

java.lang.RuntimeException: Error evaluating field 'p1': The problem was located in calculate expression for /HHS_FORM/g4/ts/cu5/u5_sick/g15/g16/g18/p1
XPath evaluation: type mismatch
indexed-repeat(): parameter 2 must be a parent of the field in parameter 1

The current release is generating the XML:

<bind calculate="indexed-repeat( /HHS_FORM/g4/ts/cu5/u5_sick/g15/g16/g18/ros3/q38 ,  ../ros3 , 1)" nodeset="/HHS_FORM/g4/ts/cu5/u5_sick/g15/g16/g18/p1" type
      ="string"/>

Should it always be?

<bind calculate="indexed-repeat( /HHS_FORM/g4/ts/cu5/u5_sick/g15/g16/g18/ros3/q38 ,  /HHS_FORM/g4/ts/cu5/u5_sick/g15/g16/g18/ros3 , 1)" nodeset="/HHS_FORM/g4/ts/cu5/u5_sick/g15/g16/g18/p1" type
      ="string"/>

@MartijnR Any thoughts on the above?

Update:

  • This seems to be happening when the indexed-repeat() has references to a repeat within another repeat and the node with the indexed-repeat() is in a repeat group itself.

@MartijnR
Copy link
Contributor

MartijnR commented Jan 4, 2019

Good discovery about indexed-repeat(). I think paths within parameters of this function are best excluded from the relative path feature, since indices are part of this function. So that would keep your second XML output.

I think one of the reasons indexed-repeat() was invented was because of the lack of relative path support in pyxform output. In my opinion it's a very awkward function (especially on the XForm side actually). Now that we have relative paths, I think we might want to start thinking about additional pyxform features to create a better non-xpath-function alternative for the remaining use cases that convert directly into native XPath, using ${node}, ${repeat} and [position].

@ukanga ukanga merged commit d40c759 into XLSForm:master Jan 6, 2019
@ukanga ukanga deleted the performance-regression-fix branch January 6, 2019 19:47
@KeynesYouDigIt
Copy link
Contributor

Im pretty new to xpath so pardon my ignorance - "would a repeat" be a shared piece of the path or represent the same path entirely? Whats a good resource to get more familiar with XML/Xpath? I know what they are but not in great depth

@MartijnR
Copy link
Contributor

MartijnR commented Jan 7, 2019

@KeynesYouDigIt if you have a model like this:

<data>
    <rep>
        <questionA>yes</questionA>
    </rep>
    <rep>
        <questionA>no</questionA>
    </rep>
</data>

The path /data/rep/questionA returns both nodes if a nodeset is requested, e.g. in count(/data/rep/questionA). If a string is requested, it returns the first (using the string function rules), e.g. as in calculate="/data/rep/questionA".

The easiest way to refer to the second questionA node directly is /data/rep[2]/questionA using a predicate.

If you want to dynamically populate the position (which is the most common), you'd have to do either:

/data/rep[ position() = /data/pos ]/questionA

or

/data/rep[ number(/data/pos) ]/questionA (very tricky but the number() casting is required here!)

(you've probably seen https://opendatakit.github.io/xforms-spec/#repeats)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants