Googler no results on 3.9 #306

amitai · 2019-11-22T21:45:49Z

Output of googler -d:

$ googler hello --debug
[DEBUG] googler version 3.9
[DEBUG] Python version 3.6.8
[DEBUG] Connecting to new host www.google.com
[DEBUG] Fetching URL /search?ie=UTF-8&oe=UTF-8&q=hello&sei=MMvdbA1wEeq7OH1_4X723w
[DEBUG] Cookie: 1P_JAR=2019-11-22-21
[DEBUG] Response body written to '/tmp/googler-response-_sjb5zc5.html'.
No results.
googler (? for help)

Link to the response body : https://gist.github.com/amitai/c840955133e1938d4369eafdbd1232a7

Details of operating system, Python version used, terminal emulator and shell;
Python 3.6.8, ubuntu 18.04.3, bash 4.4.20(1)

The text was updated successfully, but these errors were encountered:

jarun · 2019-11-22T22:03:54Z

@zmwangx I've been noticing this today too.

webctrl · 2019-11-23T03:13:53Z

I'm having the same results.

zmwangx · 2019-11-23T05:49:17Z

This is in fact the same problem as #299, and it's getting a bit ridiculous. The markup is pretty damn hard to parse as discussed before.

Again, we wait for maybe 48hrs. If things don't go back to normal by then, we move to a modern UA, and update the parser.

Until then, here's a patch (with modern UA) that works:

diff --git a/googler b/googler
index 460350e..20698c7 100755
--- a/googler
+++ b/googler
@@ -102,7 +102,7 @@ COLORMAP = {k: '\x1b[%sm' % v for k, v in {
     'x': '0', 'X': '1', 'y': '7', 'Y': '7;1',
 }.items()}
 
-USER_AGENT = 'googler/%s (like MSIE)' % _VERSION_
+USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
 
 text_browsers = ['elinks', 'links', 'lynx', 'w3m', 'www-browser']
 
@@ -2192,13 +2192,18 @@ class GoogleParser(object):
                 # Skip smart cards.
                 continue
             try:
-                h3 = div_g.select('h3.r')
-                a = h3.select('a')
-                title = a.text
-                mime = div_g.select('.mime')
-                if mime:
-                    title = mime.text + ' ' + title
-                url = self.unwrap_link(a.attr('href'))
+                h3 = div_g.select('div.r h3')
+                if h3:
+                    title = h3.text
+                    url = self.unwrap_link(h3.parent.attr('href'))
+                else:
+                    h3 = div_g.select('h3.r')
+                    a = h3.select('a')
+                    title = a.text
+                    mime = div_g.select('.mime')
+                    if mime:
+                        title = mime.text + ' ' + title
+                    url = self.unwrap_link(a.attr('href'))
                 matched_keywords = []
                 abstract = ''
                 for childnode in div_g.select('.st').children:
@@ -2233,10 +2238,12 @@ class GoogleParser(object):
         # Search instead for ...
         spell_orig = tree.select("span.spell_orig")
         if spell_orig:
-            self.autocorrected = True
-            self.showing_results_for = next(
+            showing_results_for_link = next(
                 filter(lambda el: el.tag == "a", spell_orig.previous_siblings()), None
-            ).text
+            )
+            if showing_results_for_link:
+                self.autocorrected = True
+                self.showing_results_for = showing_results_for_link.text
 
         # No results found for ...
         # Results for ...:
@@ -2252,14 +2259,14 @@ class GoogleParser(object):
         self.filtered = tree.select('p#ofr') is not None
 
     # Unwraps /url?q=http://...&sa=...
-    # May raise ValueError.
+    # TODO: don't unwrap if URL isn't in this form.
     @staticmethod
     def unwrap_link(link):
         qs = urllib.parse.urlparse(link).query
         try:
             url = urllib.parse.parse_qs(qs)['q'][0]
         except KeyError:
-            raise ValueError(link)
+            return link
         else:
             if "://" in url:
                 return url

If it doesn't work, show me the markup and I'll fix it.

jarun · 2019-11-23T22:33:21Z

The patch works fine for me. Is there a way to auto-detect if the results are in markup?

What if we use the FF user agent and this patch. Looks like we are detecting whether the results are in new markup or earlier.

ajithkumar-natarajan · 2019-11-25T00:04:47Z

The patch provided by @zmwangx works for vanilla searches. Can you also please provide the patch for retrieving news (-N argument) results? It gives the same "No results" error.

Thank you.

zmwangx · 2019-11-25T04:17:59Z

Problem still not resolved. I'll turn the patch into a PR soonish and we'll probably need to cut a release.

@jarun

Is there a way to auto-detect if the results are in markup?

Looks like we are detecting whether the results are in new markup or earlier.

If you're talking about the if h3 conditional: both layouts could appear in a modern UA response.

Yeah, we can possibly maintain compatibility with the older layout we were targeting, but since the older layout appears to be gone, there's no point.

Note that we used this googler (like MSIE) user agent with the assumption that Google would serve a classic, stable layout to that UA, instead of frequently doing A/B testing on modern browsers. However, that assumption seems broken beyond repair now, so no point in using a non-modern UA now.

we use the FF user agent

I propose we use a Chrome UA. It is said that FF is more likely to be reCAPTCHA'ed than Chrome (although it's not clear whether that's based on UA detection).

@ajithkumar-natarajan I did test my patch with -N and it was working for me, and it still does. Please use --debug and share the markup like OP did.

jarun · 2019-11-25T04:21:48Z

Please go ahead. The Chrome UA sounds good.

jarun · 2019-11-25T04:22:13Z

I'll make a release this evening if things are good.

jarun · 2019-11-25T19:08:26Z

Tracking update: the patch works for me so far.

amitai · 2019-11-25T19:27:48Z

Hi, I know I opened this ticket but I will not have access to my affected workstation until late in the week. Just want to make sure you don't wait on me for testing! :-D

jarun · 2019-11-25T19:30:34Z

No problem! Looks like it's reproducible globally. Just came across a post on HN that google is no longer working on Lynx.

bpalmer7440 · 2019-11-25T22:18:55Z

Just FYI, but the patch works for me too running OSX.
Thanks!

mikeaich · 2019-11-26T04:59:57Z

The patch works here on Ubuntu 18.04 as well.

Fixes jarun#306, hopefully. Not refined (even left a TODO), not extensively tested against edge cases.

zmwangx · 2019-11-26T17:29:47Z

Turns out I have more pressing matters and didn't have time to refine and test the patch... Instead of delaying the fix further, I just pushed the patch to #307.

I'll refine it and rewrite our currently useless testing system later, but let's have a working release first...

jarun · 2019-11-26T19:09:50Z

I'll make a release today.

Bo98 mentioned this issue Nov 23, 2019

Broken bottles and conflicts discovered in Python 3.8 CI builds Homebrew/homebrew-core#46728

Closed

30 tasks

zmwangx added a commit to zmwangx/googler that referenced this issue Nov 26, 2019

Switch to modern UA and fix parser

1b78ddb

Fixes jarun#306, hopefully. Not refined (even left a TODO), not extensively tested against edge cases.

zmwangx added a commit to zmwangx/googler that referenced this issue Nov 26, 2019

Switch to modern UA and fix parser

1b48db7

Fixes jarun#306, hopefully. Not refined (even left a TODO), not extensively tested against edge cases.

zmwangx mentioned this issue Nov 26, 2019

Switch to modern UA and fix parser #307

Merged

jarun closed this as completed in #307 Nov 26, 2019

Repository owner locked as resolved and limited conversation to collaborators Dec 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Googler no results on 3.9 #306

Googler no results on 3.9 #306

amitai commented Nov 22, 2019

jarun commented Nov 22, 2019

webctrl commented Nov 23, 2019

zmwangx commented Nov 23, 2019

jarun commented Nov 23, 2019

ajithkumar-natarajan commented Nov 25, 2019

zmwangx commented Nov 25, 2019

jarun commented Nov 25, 2019

jarun commented Nov 25, 2019

jarun commented Nov 25, 2019

amitai commented Nov 25, 2019

jarun commented Nov 25, 2019 •

edited

Loading

bpalmer7440 commented Nov 25, 2019

mikeaich commented Nov 26, 2019

zmwangx commented Nov 26, 2019

jarun commented Nov 26, 2019

Googler no results on 3.9 #306

Googler no results on 3.9 #306

Comments

amitai commented Nov 22, 2019

jarun commented Nov 22, 2019

webctrl commented Nov 23, 2019

zmwangx commented Nov 23, 2019

jarun commented Nov 23, 2019

ajithkumar-natarajan commented Nov 25, 2019

zmwangx commented Nov 25, 2019

jarun commented Nov 25, 2019

jarun commented Nov 25, 2019

jarun commented Nov 25, 2019

amitai commented Nov 25, 2019

jarun commented Nov 25, 2019 • edited Loading

bpalmer7440 commented Nov 25, 2019

mikeaich commented Nov 26, 2019

zmwangx commented Nov 26, 2019

jarun commented Nov 26, 2019

jarun commented Nov 25, 2019 •

edited

Loading