SI-9047: Extract attributes from multiple elements #43

josephw · 2014-12-17T07:25:30Z

Ensure that \ "@attr" works as expected when the NodeSeq has more than a single element in it. Make the existing implementation more general and remove the one-element special case.

Allow querying for attributes when a NodeSeq has more than one element.

som-snytt · 2014-12-17T08:38:53Z

src/main/scala/scala/xml/NodeSeq.scala

@@ -96,8 +96,7 @@ abstract class NodeSeq extends AbstractSeq[Node] with immutable.Seq[Node] with S
  def \(that: String): NodeSeq = {
    def fail = throw new IllegalArgumentException(that)
    def atResult = {
-      lazy val y = this(0)
-      val attr =
+      this flatMap (y => {


Instead of x map (y => { ... }) you can x map { y => ... }.

josephw · 2015-07-18T07:07:26Z

A comment on the original issue says that all issues are now tracked in this project; is there any more detail I can add for this to be merged?

biswanaths · 2015-07-18T12:31:03Z

Hey @josephw , thank you for the submission of the pull request and patiently waiting for long time. Please give me some time to go through the request and get back to you.

biswanaths · 2015-07-20T06:55:37Z

This looks good to me for merge. Please let us know if the community has any concerns and questions regarding this PR.

ashawley · 2015-07-21T01:35:52Z

Wouldn't this change likely break existing code?

josephw · 2015-07-21T09:12:40Z

@ashawley Yes, it's a behavioural change, which could affect code that's relying on getting the empty list in the case of multiple elements. Does that seem like a likely case? We hit this when we were surprised by it not working when multiple elements were present so, in the absence of intentional tests to show the current behaviour is intentional, I think it's a fixable bug.

(If the behavioural change is unacceptable, it would still be valuable to drop that part and change this into a test for the current behaviour.)

biswanaths · 2015-07-23T17:47:58Z

I agree with @josephw and we should go with this fix rather than leaving the current behaviour. This current erroneous behaviour is surprising and shouldn't be default.

Let us know if others have any thoughts about this.

ashawley · 2015-08-04T05:24:39Z

@josephw Yes, although the intentions here are good, the examples given in SI-9047 and carried over in to the unit tests show a slight misunderstanding of how the projection operator currently works in scala-xml.

This is understandable because of how scala-xml's permissive type hierarchy is structured: The sequence types live alongside scalar values. This isn't scala-xml's fault really, since it's trying to acquiesce to XML. The methods are written in ways that are not always predictable for users.

Here is projection for one child node:

scala> val x = <x><a b="1"/></x>
x: scala.xml.Elem = <x><a b="1"/></x>

scala> x \ "a"
res0: scala.xml.NodeSeq = NodeSeq(<a b="1"/>)

Here is projection for two child nodes:

scala> val x = <x><a b="1"/><a b="2"/></x>
x: scala.xml.Elem = <x><a b="1"/><a b="2"/></x>

scala> x \ "a"
res1: scala.xml.NodeSeq = NodeSeq(<a b="1"/>, <a b="2"/>)

The above results seem predictable and reasonable.

Things go a little awry when you use projection on different sized NodeSeqs.

scala> res0 \ "@b"
res2: scala.xml.NodeSeq = 1

scala> res1 \ "@b"
res3: scala.xml.NodeSeq = NodeSeq()

Apparently, what you are supposed to do, is iterate on both results.

scala> res0.map (_ \ "@b")
res4: scala.collection.immutable.Seq[scala.xml.NodeSeq] = List(1)

scala> res1.map (_ \ "@b")
res5: scala.collection.immutable.Seq[scala.xml.NodeSeq] = List(1, 2)

It's not clear what the motivation of this was, if it was intentional.

I'm going to add some tests in a separate pull request.

josephw · 2015-08-04T06:44:25Z

Thanks for the details.

iterate on both results.

That's exactly the change we made; it struck me as a workaround that should be unnecessary. So I can see how they work, but I wasn't convinced it was intentional or correct.

It's not clear what the motivation of this was, if it was intentional.

Agreed; I believe this version is less surprising.

biswanaths · 2015-08-06T12:24:47Z


scala> val x = <x><a><b c="1">Good</b></a><a><b c="1">Bad</b><b c="1">Ugly</b></a></x>
x: scala.xml.Elem = <x><a><b c="1">Good</b></a><a><b c="1">Bad</b><b c="1">Ugly</b></a></x>

scala> x \ "a" 
res5: scala.xml.NodeSeq = NodeSeq(<a><b c="1">Good</b></a>, <a><b c="1">Bad</b><b c="1">Ugly</b></a>)

scala> x \ "a" \ "b"
res6: scala.xml.NodeSeq = NodeSeq(<b c="1">Good</b>, <b c="1">Bad</b>, <b c="1">Ugly</b>)

scala> x \ "a" \ "@c"
res7: scala.xml.NodeSeq = NodeSeq()

I think this example, for me, shows the current behaviour might not be intended one. Look at the difference between quieries x \ "a" \ "b" and x \ "a" \ "@b" . If the former query is supposed to work on nodeseq with working on each individuals again, then why not the same should be applied to the later ?

Also

scala> val x = <x><b c="1">Good</b><b c="1">Bad</b><b c="1">Ugly</b></x>
x: scala.xml.Elem = <x><b c="1">Good</b><b c="1">Bad</b><b c="1">Ugly</b></x>

scala> x \ "@b"
res9: scala.xml.NodeSeq = NodeSeq()

Should returning empty node seq for this case is fine ?

Please let me know, if there are any wrong assumption and/or analysis I have made.

ashawley · 2015-08-09T05:48:14Z

The assumption is that an XML node (scala.xml.Node) can have an attribute and have its value retrieved with the "@a" projection syntax. However, a sequence of XML nodes (scala.xml.NodeSeq) would not have an attribute or have each node's attribute value returned through projection. The reasoning is sound. Assuming, that's what it is.

Unfortunately, an attribute value is retrievable with this method from a sequence of XML nodes of length one. It has been for quite some time. This seems unintentional. I assume it is an accident of the implementation. It would not be caught by the compiler: The projection method is available from all types (including Elem and Node and Group), the return type of the projection method is a NodeSeq and nearly all types inherit from NodeSeq (including Elem and Node and Group).

It seems like there are two choices available here:

Return all the attributes available in a sequence, always, and break the existing behavior.
Stop providing the attribute values when requested from a sequence of length one and break the existing behavior.

biswanaths · 2015-08-09T09:08:50Z

Now lets see what is happening for operator "\" when working with attribute, for extra fun.

scala> val x = <x><a><b c="1">Good</b></a><a><b c="1">Bad</b><b c="1">Ugly</b></a></x>
x: scala.xml.Elem = <x><a><b c="1">Good</b></a><a><b c="1">Bad</b><b c="1">Ugly</b></a></x>

scala> x \\ "@c"
res0: scala.xml.NodeSeq = NodeSeq(1, 1, 1)

scala> x \\ "a" \\ "@c"
res1: scala.xml.NodeSeq = NodeSeq(1, 1, 1)

scala> x \ "a" \\ "@c"
res4: scala.xml.NodeSeq = NodeSeq(1, 1, 1)

I assume at best we can say there is inconsistency between "" and "\" , both being used for projection.

ashawley · 2015-08-10T03:45:31Z

Yes, indeed. I've added tests on descendant attributes to issue 78. Thank you.

As noted in a comment on scala/scala#1679, the \\@ projection was seen as being to ambiguous to be useful, FWIW.

biswanaths · 2015-08-11T18:23:40Z

Was going through the history of scala(library\xml) to see when this behaviour( extract attribute if node seq has single node) was introduced and it was around 08, introduced by @burakemir . I am still not able to dig any discussion around those. In essence this behaviour is there from a long time (from 08).

biswanaths · 2015-09-13T14:09:11Z

Please let me know any reason community thinks about whether to take this patch or not. I am in two minds at the moment between the not institutive behaviour vs. long used behaviour.

ashawley · 2015-09-13T21:20:51Z

What would you merge? As it stands, this pull request has conflicts. After that, it would not pass a Travis build. It's hard to review something that doesn't work.

Has anyone studied what the consequences of this change to scala-xml will be in either their own or in other projects that depend on scala-xml? I haven't. That would be good to know to make a decision.

In the interest of clarity and getting more visibility on the consequences of this change, perhaps a new pull request should be started?

…cting-attributes-from-multiple-elements

biswanaths · 2015-09-14T07:24:10Z

The larger point is, if this is a issue at all to be fixed or not. Patch is minimal, in the sense that reproducing it is not big deal.

As for consequence, How we figure it out who is using the above mentioned feature, I am pretty unsure how we can go to estimated the number of users of the number of users for this particular feature.

Let me see at least we can come up with hypothetical use cases and the effect of the changes to them.

burakemir · 2015-09-14T19:28:38Z

Hey all. It's been a very long time that I haven't worked with scala-xml, so I am not sure how useful this will be... Let me start by stating the things I can remember clearly from that time:

the "dynamic typing" that I introduced through NodeSeq was a way to define operations that would be more aligned with XML data model proposals (e.g. XQuery 1.0 and XPath 2.0 Data Model http://www.w3.org/TR/xpath-datamodel/#concepts). Consider this, from that doc:

"""An important characteristic of the data model is that there is no distinction between an item (a node or an atomic value) and a singleton sequence containing that item. An item is equivalent to a singleton sequence containing that item and vice versa."""

This (let us call it the NodeSeq principle) leads to many lines of conflict with the rest of Scala's expressive static type system.

I had added special cases to the implementations of \ and \ because performance of the library was a topic and these special cases improved performance significantly (measured with micro-benchmarks FWIW).

Now for the issue at hand, it does not really matter what I thought when I implemented "@" at the time. There were a few places where I had chosen to be inconsistent on purpose. I think, this might have been my way of easing the discrepancy between elements and objects, or it may have simplified some common use cases, and there were certainly some inconsistencies that I would not introduce again today (equality on NodeSeq comes to mind)... but then many people changed the library after me, too.

Now that we got that out of the way, I do like the consistency argument that @ should behave the same for sequences of any length. This would be a natural consequence of the NodeSeq principle.

Speaking without any authority, since I am not involved in scala-xml maintenance in any way: a change isn't necessarily bad just because it breaks some code, this is simply a matter of policy and choice of the maintainers. A "no breaking changes ever" policy seems overly restrictive. I'd think introducing breaking changes should be fine (after due discussion on mailing list and whatever decision-finding process the current maintainers are following), as long as you bump the major revision number ("semantic versioning"). If you don't have any releases, then it should be at least preceded by an announcement on the community mailing list.

Finally, at the time, I had tried hard to keep documentation up-to-date. This change sounds like it would make some documentation obsolete as well. It seems consistency in a library like this will only really appreciated if users can perceive it easily by looking at code examples.

Hope this helps!
-- Burak

som-snytt · 2015-09-14T19:50:14Z

@burakemir Just very nice of you to take the time to chime in! Also glad that the NodeSeq principle now has a name. And that it is a principle.

ashawley · 2015-09-14T20:04:22Z

Thanks for the additional context! We all agreed it was awkward. Glad to have it blessed by @burakemir as a bug.

Indeed, the fix will require some analysis of what nature the breaking change is, and deciding about whether it is a major revision number bump, requires announcements, changing documentation and so on.

Maven reports that there are 286 uses of this library!

http://mvnrepository.com/artifact/org.scala-lang.modules/scala-xml_2.11/usages

This only counts the latest versions, and I'm sure there are many hundreds more unpublished projects with it as a dependency as well.

Thanks again!

hrj · 2015-09-25T06:54:55Z

I vote for a major version bump for this. Path of least cost for everyone involved, AFAICT.

…cting-attributes-from-multiple-elements

Ensure that Group is returned, rather than NodeSeq, and adjust the newer test now that multiple attributes are successfully returned.

josephw · 2017-06-09T04:42:39Z

It's a shame to leave this unresolved. I've taken another look.

As it stands, this pull request has conflicts. After that, it would not pass a Travis build. It's hard to review something that doesn't work.

I've resolved the merge conflict. I've also made a couple of fixes necessitated by the tests added in 72b11f6 (good -- they caught bugs), as well as intentionally adjusting one of those tests to match the newer behaviour.

SethTisue · 2020-01-29T01:43:50Z

@ashawley is this eligible to progress in 2.0, or should we just close it?

SethTisue · 2020-12-05T19:52:14Z

closing for inactivity — but at least based on skimming the discussion, it seems like someone could revive it

josephw added 2 commits December 17, 2014 17:39

SI-9047: Add unit tests for the working and non-working cases.

7512a57

SI-9047: Extract attributes from NodeSeqs with more than one element.

444fcae

Allow querying for attributes when a NodeSeq has more than one element.

som-snytt reviewed Dec 17, 2014
View reviewed changes

SI-9047: Remove redundant braces.

fc27be1

SethTisue mentioned this pull request Jul 20, 2015

NodeSeq attribute search does not work as expected #63

Open

ashawley mentioned this pull request Aug 4, 2015

Add more XML attribute unit tests for SI-9047 #78

Merged

Merge remote-tracking branch 'origin/master' into SI-9047-allow-extra…

e8b598e

…cting-attributes-from-multiple-elements

adriaanm mentioned this pull request Apr 22, 2016

Create and publish release v1.0.6 #97

Closed

josephw added 2 commits June 8, 2017 17:33

Merge remote-tracking branch 'origin/master' into SI-9047-allow-extra…

d28fc1a

…cting-attributes-from-multiple-elements

SI-9047: Ensure Group is returned, and adjust added tests.

9aabe50

Ensure that Group is returned, rather than NodeSeq, and adjust the newer test now that multiple attributes are successfully returned.

SethTisue closed this Dec 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SI-9047: Extract attributes from multiple elements #43

SI-9047: Extract attributes from multiple elements #43

josephw commented Dec 17, 2014

som-snytt Dec 17, 2014

josephw commented Jul 18, 2015

biswanaths commented Jul 18, 2015

biswanaths commented Jul 20, 2015

ashawley commented Jul 21, 2015

josephw commented Jul 21, 2015

biswanaths commented Jul 23, 2015

ashawley commented Aug 4, 2015

josephw commented Aug 4, 2015

biswanaths commented Aug 6, 2015

ashawley commented Aug 9, 2015

biswanaths commented Aug 9, 2015

ashawley commented Aug 10, 2015

biswanaths commented Aug 11, 2015

biswanaths commented Sep 13, 2015

ashawley commented Sep 13, 2015

biswanaths commented Sep 14, 2015

burakemir commented Sep 14, 2015

som-snytt commented Sep 14, 2015

ashawley commented Sep 14, 2015

hrj commented Sep 25, 2015

josephw commented Jun 9, 2017

SethTisue commented Jan 29, 2020

SethTisue commented Dec 5, 2020

SI-9047: Extract attributes from multiple elements #43

SI-9047: Extract attributes from multiple elements #43

Conversation

josephw commented Dec 17, 2014

som-snytt Dec 17, 2014

Choose a reason for hiding this comment

josephw commented Jul 18, 2015

biswanaths commented Jul 18, 2015

biswanaths commented Jul 20, 2015

ashawley commented Jul 21, 2015

josephw commented Jul 21, 2015

biswanaths commented Jul 23, 2015

ashawley commented Aug 4, 2015

josephw commented Aug 4, 2015

biswanaths commented Aug 6, 2015

ashawley commented Aug 9, 2015

biswanaths commented Aug 9, 2015

ashawley commented Aug 10, 2015

biswanaths commented Aug 11, 2015

biswanaths commented Sep 13, 2015

ashawley commented Sep 13, 2015

biswanaths commented Sep 14, 2015

burakemir commented Sep 14, 2015

som-snytt commented Sep 14, 2015

ashawley commented Sep 14, 2015

hrj commented Sep 25, 2015

josephw commented Jun 9, 2017

SethTisue commented Jan 29, 2020

SethTisue commented Dec 5, 2020