Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions or expressions raises exception in chained xpath call #656

Closed
Fitzsimmons opened this issue Apr 16, 2012 · 3 comments
Closed

Functions or expressions raises exception in chained xpath call #656

Fitzsimmons opened this issue Apr 16, 2012 · 3 comments

Comments

@Fitzsimmons
Copy link
Contributor

I think this test really says it all.

require 'rubygems'
require 'nokogiri'

fail_xml = <<-XML
<root>
  <stuff>thing</stuff>
</root>
XML

fail_doc = Nokogiri::XML(fail_xml)

puts fail_doc.xpath("//root").xpath("./stuff/text()").inspect #works
puts fail_doc.xpath("//root/stuff/text() = 'thing'").inspect #works
puts fail_doc.xpath("//root").xpath("./stuff/text() = 'thing'").inspect #crashes

Stacktrace:

/home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:135:in `|': node_set must be a Nokogiri::XML::NodeSet (ArgumentError)
    from /home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:135:in `block in xpath'
    from /home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:239:in `block in each'
    from /home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:238:in `upto'
    from /home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:238:in `each'
    from /home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:134:in `xpath'
    from test_constraint.rb:14:in `<main>'
@flavorjones
Copy link
Member

Hi!

Thanks for reporting this issue. I'd like to point out, though, that this is not a crash. This is an exception being raised. Precise language is important, so I'm going to change the title of this ticket from "causes crash" to "raises exception".

I'll take a look later today.

@flavorjones
Copy link
Member

Howdy,

Thanks for your patience. Looking at this, there's a bit of a semantic issue with NodeSet. Let's start the explanation with a slightly more complicated example:

require 'rubygems'
require 'nokogiri'

fail_xml = <<-XML
<root>
  <stuff>thing</stuff>
  <stuff>thang</stuff>
</root>
XML

fail_doc = Nokogiri::XML(fail_xml)

# baseline. returns a NodeSet containing two Elements
puts fail_doc.xpath("//root/stuff").inspect
# => [
#      #<Nokogiri::XML::Element:0xcc8120 name="stuff" children=[
#        #<Nokogiri::XML::Text:0xcc7f18 "thing">
#      ]>,
#      #<Nokogiri::XML::Element:0xcc7d10 name="stuff" children=[
#        #<Nokogiri::XML::Text:0xcc7b08 "thang">
#      ]>
#    ]

# returns a NodeSet, which contains two Text nodes
puts fail_doc.xpath("//root/stuff/text()").inspect #works
# => [#<Nokogiri::XML::Text:0x605680 "thing">, #<Nokogiri::XML::Text:0x6055a4 "thang">]

# also returns a NodeSet, which contains two Text nodes. there's magic here.
puts fail_doc.xpath("//root/stuff").xpath("./text()").inspect
# => [#<Nokogiri::XML::Text:0x605680 "thing">, #<Nokogiri::XML::Text:0x6055a4 "thang">]

# returns a bare boolean, answering the question "is there a 'stuff' node which contains text == 'thing'?"
puts fail_doc.xpath("//root/stuff/text() = 'thing'").inspect
# => true

This example demonstrates that the result of Node#xpath can be a NodeSet, or it could be a boolean (TrueClass or FalseClass).

(The result could also be nil, a String, a Float or a Fixnum. See the method Nokogiri_marshal_xpath_funcall_and_return_values in xml_xpath_context.c on master for details.)

There's some unobvious magic (really, an API design choice) involved in the third query. We're calling NodeSet#xpath on the result of Node#xpath and getting back the same result that we did when calling Node#xpath on the composite query.

There's some coercion going on here to make sure that the resulting NodeSet is flattened. I think we could arguably have chosen to return this:

puts fail_doc.xpath("//root/stuff").xpath("./text()").inspect
# => [[#<Nokogiri::XML::Text:0x605680 "thing">], [#<Nokogiri::XML::Text:0x6055a4 "thang">]]

And it would have been arguably correct (perhaps more semantic).

So here's my question for you: if we invoke a boolean xpath query on a NodeSet, what should we get back?

# what should this return?
puts fail_doc.xpath("//root/stuff").xpath("./text() = 'thing'").inspect
# option A => true
# option B => [true, false]

I'd argue that option B is probably what you expect. But that breaks from both of the established API conventions:

  1. boolean xpath expressions are return bare (that is, not contained by a NodeSet)
  2. NodeSet#xpath returns a flattened set of collected results from each node in the NodeSet.

And then there's the real complexity here: currently, NodeSet is an Enumerable that only knows how to contain Nodes. That is, we can't return a NodeSet containing Booleans. There's some nontrivial work to make that happen, if we decide that's the right thing to do.

So, I'm inclined defer a decision for a bit. Let's talk and make a decision, but wait until 2.0 to address this particular wart.

@tenderlove - thoughts?

@flavorjones
Copy link
Member

Closing, captured this wart in ROADMAP.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants