Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SI-9047: Extract attributes from multiple elements #43

Conversation

josephw
Copy link

@josephw josephw commented Dec 17, 2014

Ensure that \ "@attr" works as expected when the NodeSeq has more than a single element in it. Make the existing implementation more general and remove the one-element special case.

@@ -96,8 +96,7 @@ abstract class NodeSeq extends AbstractSeq[Node] with immutable.Seq[Node] with S
def \(that: String): NodeSeq = {
def fail = throw new IllegalArgumentException(that)
def atResult = {
lazy val y = this(0)
val attr =
this flatMap (y => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of x map (y => { ... }) you can x map { y => ... }.

@josephw
Copy link
Author

josephw commented Jul 18, 2015

A comment on the original issue says that all issues are now tracked in this project; is there any more detail I can add for this to be merged?

@biswanaths
Copy link
Contributor

Hey @josephw , thank you for the submission of the pull request and patiently waiting for long time. Please give me some time to go through the request and get back to you.

@biswanaths
Copy link
Contributor

This looks good to me for merge. Please let us know if the community has any concerns and questions regarding this PR.

@ashawley
Copy link
Member

Wouldn't this change likely break existing code?

@josephw
Copy link
Author

josephw commented Jul 21, 2015

@ashawley Yes, it's a behavioural change, which could affect code that's relying on getting the empty list in the case of multiple elements. Does that seem like a likely case? We hit this when we were surprised by it not working when multiple elements were present so, in the absence of intentional tests to show the current behaviour is intentional, I think it's a fixable bug.

(If the behavioural change is unacceptable, it would still be valuable to drop that part and change this into a test for the current behaviour.)

@biswanaths
Copy link
Contributor

I agree with @josephw and we should go with this fix rather than leaving the current behaviour. This current erroneous behaviour is surprising and shouldn't be default.

Let us know if others have any thoughts about this.

@ashawley
Copy link
Member

ashawley commented Aug 4, 2015

@josephw Yes, although the intentions here are good, the examples given in SI-9047 and carried over in to the unit tests show a slight misunderstanding of how the projection operator currently works in scala-xml.

This is understandable because of how scala-xml's permissive type hierarchy is structured: The sequence types live alongside scalar values. This isn't scala-xml's fault really, since it's trying to acquiesce to XML. The methods are written in ways that are not always predictable for users.

Here is projection for one child node:

scala> val x = <x><a b="1"/></x>
x: scala.xml.Elem = <x><a b="1"/></x>

scala> x \ "a"
res0: scala.xml.NodeSeq = NodeSeq(<a b="1"/>)

Here is projection for two child nodes:

scala> val x = <x><a b="1"/><a b="2"/></x>
x: scala.xml.Elem = <x><a b="1"/><a b="2"/></x>

scala> x \ "a"
res1: scala.xml.NodeSeq = NodeSeq(<a b="1"/>, <a b="2"/>)

The above results seem predictable and reasonable.

Things go a little awry when you use projection on different sized NodeSeqs.

scala> res0 \ "@b"
res2: scala.xml.NodeSeq = 1

scala> res1 \ "@b"
res3: scala.xml.NodeSeq = NodeSeq()

Apparently, what you are supposed to do, is iterate on both results.

scala> res0.map (_ \ "@b")
res4: scala.collection.immutable.Seq[scala.xml.NodeSeq] = List(1)

scala> res1.map (_ \ "@b")
res5: scala.collection.immutable.Seq[scala.xml.NodeSeq] = List(1, 2)

It's not clear what the motivation of this was, if it was intentional.

I'm going to add some tests in a separate pull request.

@josephw
Copy link
Author

josephw commented Aug 4, 2015

Thanks for the details.

iterate on both results.

That's exactly the change we made; it struck me as a workaround that should be unnecessary. So I can see how they work, but I wasn't convinced it was intentional or correct.

It's not clear what the motivation of this was, if it was intentional.

Agreed; I believe this version is less surprising.

@biswanaths
Copy link
Contributor


scala> val x = <x><a><b c="1">Good</b></a><a><b c="1">Bad</b><b c="1">Ugly</b></a></x>
x: scala.xml.Elem = <x><a><b c="1">Good</b></a><a><b c="1">Bad</b><b c="1">Ugly</b></a></x>

scala> x \ "a" 
res5: scala.xml.NodeSeq = NodeSeq(<a><b c="1">Good</b></a>, <a><b c="1">Bad</b><b c="1">Ugly</b></a>)

scala> x \ "a" \ "b"
res6: scala.xml.NodeSeq = NodeSeq(<b c="1">Good</b>, <b c="1">Bad</b>, <b c="1">Ugly</b>)

scala> x \ "a" \ "@c"
res7: scala.xml.NodeSeq = NodeSeq()

I think this example, for me, shows the current behaviour might not be intended one. Look at the difference between quieries x \ "a" \ "b" and x \ "a" \ "@b" . If the former query is supposed to work on nodeseq with working on each individuals again, then why not the same should be applied to the later ?

Also

scala> val x = <x><b c="1">Good</b><b c="1">Bad</b><b c="1">Ugly</b></x>
x: scala.xml.Elem = <x><b c="1">Good</b><b c="1">Bad</b><b c="1">Ugly</b></x>

scala> x \ "@b"
res9: scala.xml.NodeSeq = NodeSeq()

Should returning empty node seq for this case is fine ?

Please let me know, if there are any wrong assumption and/or analysis I have made.

@ashawley
Copy link
Member

ashawley commented Aug 9, 2015

The assumption is that an XML node (scala.xml.Node) can have an attribute and have its value retrieved with the "@a" projection syntax. However, a sequence of XML nodes (scala.xml.NodeSeq) would not have an attribute or have each node's attribute value returned through projection. The reasoning is sound. Assuming, that's what it is.

Unfortunately, an attribute value is retrievable with this method from a sequence of XML nodes of length one. It has been for quite some time. This seems unintentional. I assume it is an accident of the implementation. It would not be caught by the compiler: The projection method is available from all types (including Elem and Node and Group), the return type of the projection method is a NodeSeq and nearly all types inherit from NodeSeq (including Elem and Node and Group).

It seems like there are two choices available here:

  1. Return all the attributes available in a sequence, always, and break the existing behavior.
  2. Stop providing the attribute values when requested from a sequence of length one and break the existing behavior.

@biswanaths
Copy link
Contributor

Now lets see what is happening for operator "\" when working with attribute, for extra fun.

scala> val x = <x><a><b c="1">Good</b></a><a><b c="1">Bad</b><b c="1">Ugly</b></a></x>
x: scala.xml.Elem = <x><a><b c="1">Good</b></a><a><b c="1">Bad</b><b c="1">Ugly</b></a></x>

scala> x \\ "@c"
res0: scala.xml.NodeSeq = NodeSeq(1, 1, 1)

scala> x \\ "a" \\ "@c"
res1: scala.xml.NodeSeq = NodeSeq(1, 1, 1)

scala> x \ "a" \\ "@c"
res4: scala.xml.NodeSeq = NodeSeq(1, 1, 1)

I assume at best we can say there is inconsistency between "" and "\" , both being used for projection.

@ashawley
Copy link
Member

Yes, indeed. I've added tests on descendant attributes to issue 78. Thank you.

As noted in a comment on scala/scala#1679, the \\@ projection was seen as being to ambiguous to be useful, FWIW.

@biswanaths
Copy link
Contributor

Was going through the history of scala(library\xml) to see when this behaviour( extract attribute if node seq has single node) was introduced and it was around 08, introduced by @burakemir . I am still not able to dig any discussion around those. In essence this behaviour is there from a long time (from 08).

@biswanaths
Copy link
Contributor

Please let me know any reason community thinks about whether to take this patch or not. I am in two minds at the moment between the not institutive behaviour vs. long used behaviour.

@ashawley
Copy link
Member

What would you merge? As it stands, this pull request has conflicts. After that, it would not pass a Travis build. It's hard to review something that doesn't work.

Has anyone studied what the consequences of this change to scala-xml will be in either their own or in other projects that depend on scala-xml? I haven't. That would be good to know to make a decision.

In the interest of clarity and getting more visibility on the consequences of this change, perhaps a new pull request should be started?

@biswanaths
Copy link
Contributor

The larger point is, if this is a issue at all to be fixed or not. Patch is minimal, in the sense that reproducing it is not big deal.

As for consequence, How we figure it out who is using the above mentioned feature, I am pretty unsure how we can go to estimated the number of users of the number of users for this particular feature.

Let me see at least we can come up with hypothetical use cases and the effect of the changes to them.

@burakemir
Copy link

Hey all. It's been a very long time that I haven't worked with scala-xml, so I am not sure how useful this will be... Let me start by stating the things I can remember clearly from that time:

  1. the "dynamic typing" that I introduced through NodeSeq was a way to define operations that would be more aligned with XML data model proposals (e.g. XQuery 1.0 and XPath 2.0 Data Model http://www.w3.org/TR/xpath-datamodel/#concepts). Consider this, from that doc:

"""An important characteristic of the data model is that there is no distinction between an item (a node or an atomic value) and a singleton sequence containing that item. An item is equivalent to a singleton sequence containing that item and vice versa."""

This (let us call it the NodeSeq principle) leads to many lines of conflict with the rest of Scala's expressive static type system.

  1. I had added special cases to the implementations of \ and \ because performance of the library was a topic and these special cases improved performance significantly (measured with micro-benchmarks FWIW).

Now for the issue at hand, it does not really matter what I thought when I implemented "@" at the time. There were a few places where I had chosen to be inconsistent on purpose. I think, this might have been my way of easing the discrepancy between elements and objects, or it may have simplified some common use cases, and there were certainly some inconsistencies that I would not introduce again today (equality on NodeSeq comes to mind)... but then many people changed the library after me, too.

Now that we got that out of the way, I do like the consistency argument that @ should behave the same for sequences of any length. This would be a natural consequence of the NodeSeq principle.

Speaking without any authority, since I am not involved in scala-xml maintenance in any way: a change isn't necessarily bad just because it breaks some code, this is simply a matter of policy and choice of the maintainers. A "no breaking changes ever" policy seems overly restrictive. I'd think introducing breaking changes should be fine (after due discussion on mailing list and whatever decision-finding process the current maintainers are following), as long as you bump the major revision number ("semantic versioning"). If you don't have any releases, then it should be at least preceded by an announcement on the community mailing list.

Finally, at the time, I had tried hard to keep documentation up-to-date. This change sounds like it would make some documentation obsolete as well. It seems consistency in a library like this will only really appreciated if users can perceive it easily by looking at code examples.

Hope this helps!
-- Burak

@som-snytt
Copy link
Contributor

@burakemir Just very nice of you to take the time to chime in! Also glad that the NodeSeq principle now has a name. And that it is a principle.

@ashawley
Copy link
Member

Thanks for the additional context! We all agreed it was awkward. Glad to have it blessed by @burakemir as a bug.

Indeed, the fix will require some analysis of what nature the breaking change is, and deciding about whether it is a major revision number bump, requires announcements, changing documentation and so on.

Maven reports that there are 286 uses of this library!

http://mvnrepository.com/artifact/org.scala-lang.modules/scala-xml_2.11/usages

This only counts the latest versions, and I'm sure there are many hundreds more unpublished projects with it as a dependency as well.

Thanks again!

@hrj
Copy link

hrj commented Sep 25, 2015

I vote for a major version bump for this. Path of least cost for everyone involved, AFAICT.

Ensure that Group is returned, rather than NodeSeq, and adjust
the newer test now that multiple attributes are successfully
returned.
@josephw
Copy link
Author

josephw commented Jun 9, 2017

It's a shame to leave this unresolved. I've taken another look.

As it stands, this pull request has conflicts. After that, it would not pass a Travis build. It's hard to review something that doesn't work.

I've resolved the merge conflict. I've also made a couple of fixes necessitated by the tests added in 72b11f6 (good -- they caught bugs), as well as intentionally adjusting one of those tests to match the newer behaviour.

@SethTisue
Copy link
Member

@ashawley is this eligible to progress in 2.0, or should we just close it?

@SethTisue
Copy link
Member

closing for inactivity — but at least based on skimming the discussion, it seems like someone could revive it

@SethTisue SethTisue closed this Dec 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants