-
-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<noscript> causes selector to fail #139
Comments
After further investigation, the problem appeared to originate in Cascadia: package main
import (
"fmt"
"strings"
"github.com/andybalholm/cascadia"
"golang.org/x/net/html"
)
const data = `<noscript><a href="http://example.org">click</a></noscript>`
func main() {
n, err := html.Parse(strings.NewReader(data))
if err != nil {
fmt.Println(err)
return
}
s, err := cascadia.Compile("noscript a")
if err != nil {
fmt.Println(err)
}
fmt.Println(len(s.MatchAll(n)))
} Before I could file a bug there, however, I came across this: andybalholm/cascadia#14
Now it looks like the bug exists in the Sadly, it hasn't been fixed yet. 😢 |
Hello Nathan, Thanks for looking into this. Makes sense that this is at the html parser level, would be nice if it provided the option to set javascript on or off for parsing. I'll keep the issue open until some decision is made in the parser. Martin |
just noticed the same issue :) |
For those looking for a workaround, re-parsing the content of the noscript tag seems to do the trick. s.Find("noscript").SetHtml(s.Find("noscript").Text()) |
@machinae cool thanks i will try it :) do you have an example which i can run? |
package main
import (
"fmt"
"strings"
"github.com/PuerkitoBio/goquery"
)
const data = `<noscript><a href="http://example.org">click this link</a></noscript>`
func main() {
d, err := goquery.NewDocumentFromReader(strings.NewReader(data))
if err != nil {
fmt.Println(err)
return
}
d.Find("noscript").SetHtml(d.Find("noscript").Text())
a, ok := d.Find("noscript a").Attr("href")
fmt.Printf("URL: '%s', %t\n", a, ok)
} |
@machinae wouldn't this set the contents of the first
(I don't use goquery so the above is just a guess) |
I resolved it with code below:
|
Looks like there was a partial fix in the referenced issue, i.e. ParseOptionEnableScripting(bool) which would support disabling script emulation mode. From the last issue comment it only work when |
Consider the following program:
The expected output is:
But instead the output is:
Changing
noscript
todiv
in both the document and selector causes the expected output, so the problem seems to affect only<noscript>
elements.The text was updated successfully, but these errors were encountered: