Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add output param #9

Merged
merged 9 commits into from
Jan 21, 2022
Merged

Add output param #9

merged 9 commits into from
Jan 21, 2022

Conversation

KennBro
Copy link
Contributor

@KennBro KennBro commented Dec 25, 2021

I needed the list of titles, description and urls of the searches

I added the output parameter that in case of taking the value "complete" returns a list [{"title": title, "desc": description, "url": url}, ...], for any other value returns the original output

I tested it with the following code

import yagooglesearch

query = "site:github.com"

client = yagooglesearch.SearchClient(
    query,
    tbs="li:1",
    max_search_result_urls_to_return=100,
    http_429_cool_off_time_in_minutes=45,
    http_429_cool_off_factor=1.5,
    verbosity=5,
    output="complete"  # "normal" : Only url list // "complete" : List of {title, desc, url}
)
client.assign_random_user_agent()

urls = client.search()

len(urls)

for url in urls:
    print(url)

The default is "normal" which does not modify the original behavior of the program
I also added the output parameter to the documentation

I'll use the modification from my repo, but if you are interested you can merge it

Thank you very much for this software and Merry Christmas

KennBro

@opsdisk
Copy link
Owner

opsdisk commented Dec 26, 2021

Thanks for taking the time to submit a PR @KennBro! It might be a few days before I can take a good look at it.

@KennBro
Copy link
Contributor Author

KennBro commented Dec 26, 2021

I found that google returns different structures, so I added code to take into account the different outputs
Take as much time as you need
Regards

except Exception:
ROOT_LOGGER.warning(f"No title and desc for link")
title = ''
continue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this continue, it will go to the top of the for loop and not populate a "complete" dictionary for the result found. So setting title="" does nothing.

The logic is if there is no title (same for the desc found later in the code), that search result would be discarded. Are you looking to have a result populated even if the title or desc isn't set found? I'd say yes (option 1), but wanted to confirm.

  1. Option 1, despite not all values being populated, the dictionary will be still added to the unique_complete_result list.
{
    "url": "https://github.com",
    "title": "GitHub site",
    "desc": "",
}
  1. Option 2, not all values are populated, discard and don't return as part of the unique_complete_result list.
{
    "url": "https://example.com",
    "title": "",
    "desc": "Example site description",
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need it to behave like option 1.
You are right, both 'continue' interfere with the behavior of option 1. I would only discard if the link does not exist.

@opsdisk
Copy link
Owner

opsdisk commented Jan 4, 2022

Hey @KennBro - Had some changes to the PR that would be hard to go over through comments. Mind adding the option to allow me to commit changes to your PR? I've never done this, but did find this:

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork

If that doesn't work, I can just attach the updated files to this thread and have you copy/pasta them into your repo's files.

@KennBro
Copy link
Contributor Author

KennBro commented Jan 4, 2022

The "Allow edits by maintainers" checkbox was checked since the beginning of the PR, as it says in the documentation.
I can't attach an image to show you.
If it doesn't work I could resolve the conflict.

@opsdisk
Copy link
Owner

opsdisk commented Jan 6, 2022

Hey @KennBro - Kept the spirit of your PR, but made a few tweaks:

  • Changed variable name to verbose_output and made it a bool type to prevent typo issues
  • Added rank to the result dictionary. It's implied with the order of the result in the list, but wanted to make it explicit as a key/value pair.
  • Consolidated to just have 1 variable (search_result_list) to return

Test it out and let me know what you think

@KennBro
Copy link
Contributor Author

KennBro commented Jan 9, 2022

Awesome @opsdisk
I just tested by simply doing a copy/paste of the README.md example and I have two suggestions :

In the "Usage" example

import yagooglesearch

query = "site:github.com"

client = yagooglesearch.SearchClient(
    query,
    tbs="li:1",
    max_search_result_urls_to_return=100,
    http_429_cool_off_time_in_minutes=45,
    http_429_cool_off_factor=1.5,
    proxy="socks5h://127.0.0.1:9050",
    verbosity=5,
)
client.assign_random_user_agent()

urls = client.search()

len(urls)

for url in urls:
    print(url)

I would comment out the proxy line because many people do copy/paste to test and that line would cause an error if they don't have the service (TOR for example) running on port 9050

And the other suggestion is to add the pysocks library because it is not in the requirements.txt and that causes the error...

requests.exceptions.InvalidSchema: Missing dependencies for SOCKS support.

If you want right now I make a new push with these suggestions
Huge hug
Kenn

@opsdisk
Copy link
Owner

opsdisk commented Jan 11, 2022

Good suggestion...commented out the proxy line and pushed up to the PR. Also added you as a contributor to the README.

For the SOCKS support, are you installing the dependencies through:

  1. requirements.txt - https://github.com/opsdisk/yagooglesearch/blob/master/requirements.txt
pip install -r requirements.txt

or

  1. setup.py - https://github.com/opsdisk/yagooglesearch/blob/master/setup.py#L17
python setup.py install

Either way, PySocks should be installed (because of requests[socks])

https://2.python-requests.org/en/master/user/advanced/#socks

image

I'd try it with a fresh virtual environment to doublecheck. I'll hold off on merging.

@opsdisk opsdisk merged commit ab59ec3 into opsdisk:master Jan 21, 2022
@opsdisk
Copy link
Owner

opsdisk commented Jan 21, 2022

Going to merge this one in. Appreciate the PR and working with you @KennBro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants