Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] -t day does not seem to be working. downloads older than a day are happening. #549

Closed
3 tasks done
jrwren opened this issue Nov 8, 2021 · 3 comments
Closed
3 tasks done
Assignees
Labels
bug Something isn't working

Comments

@jrwren
Copy link
Contributor

jrwren commented Nov 8, 2021

  • I am reporting a bug.
  • I am running the latest version of BDfR
  • I have read the Opening an issue

Description

I am downloading daily update for a users posts with a command like this: bin/python -m bdfr download "$ruser"/ --user "$ruser" -t day --authenticate --no-dupes --submitted --folder-scheme '' --file-scheme '{TITLE}_{SUBREDDIT}_{POSTID}'

I expect the -t day to limit this to the posts for the last 24hrs.

I see

Command

bin/python -m bdfr download "$ruser"/ --user "$ruser" -t day --authenticate --no-dupes --submitted --folder-scheme '' --file-scheme '{TITLE}_{SUBREDDIT}_{POSTID}'

Environment (please complete the following information):

  • OS: [Linux]
  • Python version: [3.9.7]

Logs

$ ruser=Pluto_and_Charon; bin/python -m bdfr download "$ruser"/ --user "$ruser" -t day --authenticate --no-dupes --submitted --folder-scheme '' --file-scheme '{TITLE}_{SUBREDDIT}_{POSTID}'
[2021-11-08 08:54:21,918 - bdfr.downloader - ERROR] - Could not download submission qmm3za: No downloader module exists for url https://twitter.com/thesheetztweetz/status/1456269808811405317?s=20
[2021-11-08 08:54:22,050 - bdfr.downloader - INFO] - Downloaded submission po5ibz from space
[2021-11-08 08:54:22,329 - bdfr.downloader - ERROR] - Could not download submission p9lnxx: No downloader module exists for url https://pbs.twimg.com/media/E9bBUV0XIAgZN6M?format=jpg&name=large
[2021-11-08 08:54:31,443 - bdfr.downloader - ERROR] - Failed to download resource https://www.youtube.com/watch?v=4B2_dfvRZ4M&ab_channel=NASASpaceflight in submission oz5bme with downloader Youtube: Youtube download failed: ERROR: giving up after 0 retries
[2021-11-08 08:54:31,710 - bdfr.downloader - INFO] - Downloaded submission oe7fju from Paleontology
[2021-11-08 08:54:31,933 - bdfr.downloader - INFO] - Downloaded submission o0ehs3 from space
[2021-11-08 08:54:32,251 - bdfr.downloader - ERROR] - Could not download submission ncn38p: No downloader module exists for url https://twitter.com/AJ_FI/status/1393361768081248263
[2021-11-08 08:54:33,753 - bdfr.downloader - ERROR] - Could not download submission mty41o: No downloader module exists for url https://www.bbc.co.uk/news/science-environment-56799755
[2021-11-08 08:54:33,886 - bdfr.downloader - INFO] - Downloaded submission mtm7u0 from space
[2021-11-08 08:54:34,146 - bdfr.downloader - INFO] - Downloaded submission m5504j from space
[2021-11-08 08:54:35,659 - bdfr.downloader - ERROR] - Could not download submission lnnrdx: No downloader module exists for url https://twitter.com/NASAPersevere/status/1362839907824136193
[2021-11-08 08:54:35,819 - bdfr.downloader - INFO] - Downloaded submission lnekrk from space
[2021-11-08 08:54:35,978 - bdfr.downloader - INFO] - Downloaded submission lmml4z from space
[2021-11-08 08:54:36,099 - bdfr.downloader - INFO] - Downloaded submission lkcz9j from space
[2021-11-08 08:54:40,400 - bdfr.downloader - ERROR] - Could not download submission lga2gy: No downloader module exists for url https://www.trtworld.com/turkey/turkey-s-president-erdogan-unveils-national-space-program-44025
[2021-11-08 08:54:40,722 - bdfr.downloader - INFO] - Downloaded submission lelrwh from space
[2021-11-08 08:54:40,862 - bdfr.downloader - INFO] - Downloaded submission k4ksz9 from space
[2021-11-08 08:54:41,006 - bdfr.downloader - INFO] - Downloaded submission i084u4 from space
[2021-11-08 08:54:41,190 - bdfr.downloader - INFO] - Downloaded submission hd9vuw from space
[2021-11-08 08:54:41,322 - bdfr.downloader - INFO] - Downloaded submission gxzyzy from Paleontology
[2021-11-08 08:54:41,456 - bdfr.downloader - INFO] - Downloaded submission gukb5d from AskHistorians
[2021-11-08 08:54:41,579 - bdfr.downloader - INFO] - Downloaded submission gtd7ob from space
[2021-11-08 08:54:41,769 - bdfr.downloader - INFO] - Downloaded submission gpnsn8 from space
[2021-11-08 08:54:42,092 - bdfr.downloader - ERROR] - Could not download submission ga9qj9: No downloader module exists for url https://twitter.com/TomHoltzPaleo/status/1255488851893800961
[2021-11-08 08:54:43,726 - bdfr.downloader - INFO] - Downloaded submission gaaizw from space
[2021-11-08 08:54:44,591 - bdfr.downloader - ERROR] - Could not download submission fwtchk: No downloader module exists for url https://www.nasa.gov/directorates/spacetech/niac/2020_Phase_I_Phase_II/Direct_Multipixel_Imaging_and_Spectroscopy_of_an_Exoplanet
[2021-11-08 08:54:44,827 - bdfr.downloader - INFO] - Downloaded submission fjpgp5 from Paleontology
[2021-11-08 08:54:44,940 - bdfr.downloader - INFO] - Downloaded submission fgh7hs from ac_newhorizons
[2021-11-08 08:54:45,029 - bdfr.downloader - INFO] - Resource hash 4d467af0743b725a09f89ee41742e64b from submission fgh98f downloaded elsewhere
[2021-11-08 08:54:45,369 - bdfr.downloader - INFO] - Downloaded submission e8fepl from Paleontology
# At this point I CTRL-C to break the execution.
$ ls -alrt Pluto_and_Charon/
total 15600
-rw-r--r--   1 jrwren jrwren 2737876 Dec  9  2019 'The Evolution of Reptiles [OC Infographic]_Paleontology_e8fepl.jpg'
-rw-r--r--   1 jrwren jrwren  163232 Mar 10  2020 'My quick concept for a village built on the sides of a canyon, with a river running down the middle_ac_newhorizons_fgh7hs.png'
-rw-r--r--   1 jrwren jrwren 1132769 Mar 16  2020 'An infographic I made showing the evolution of Mammals_Paleontology_fjpgp5.jpg'
-rw-r--r--   1 jrwren jrwren 6238143 Apr 29  2020 'NASA’s Mars Helicopter, Ingenuity (UHD Trailer)_space_gaaizwwebm'
-rw-r--r--   1 jrwren jrwren    1831 May 24  2020 'NASA-SpaceX crewed launch to the Space Station - Pre-launch Megathread_space_gpnsn8.txt'
-rw-r--r--   1 jrwren jrwren    1618 May 30  2020 'NASA-SpaceX crewed launch to the Space Station - Megathread - ATTEMPT 2_space_gtd7ob.txt'
-rw-r--r--   1 jrwren jrwren     277 Jun  1  2020 'The Royal Road was a vast ancient highway built by Persian king Darius the Great, said to connect the capital Susa to Sardis in Anatolia. Is it preserved, do we know the route? Can I go and walk along the road today?_AskHistorians_gukb5d.txt'
-rw-r--r--   1 jrwren jrwren    1097 Jun  6  2020 'Looking for Plant phylogeny stuff_Paleontology_gxzyzy.txt'
-rw-r--r--   1 jrwren jrwren  407517 Jun 21  2020 'The exact location of the lunar south pole; on the rim of Shackleton Crater. This has been proposed as the location for NASA'\''s planned Lunar Base_space_hd9vuw.png'
-rw-r--r--   1 jrwren jrwren    2424 Jul 29  2020 'NASA launches its next rover to Mars - Megathread_space_i084u4.txt'
-rw-r--r--   1 jrwren jrwren    1370 Dec  1  2020 "When this post is about 2 hours old, China's Chang ' e 5 spacecraft will touch down on the surface of the Moon. If successful, this robotic mission will return the first lunar rock samples since 1976._space_k4ksz9.txt"
-rw-r--r--   1 jrwren jrwren 3930257 Feb  7  2021 'I made a timeline showing the biggest space events of the next decade_space_lelrwh.png'
-rw-r--r--   1 jrwren jrwren    1345 Feb 15  2021 'Official subreddit COMPETITION! - NASA Perseverance Rover Landing Bingo 🚀_space_lkcz9j.txt'
-rw-r--r--   1 jrwren jrwren    2424 Feb 18  2021 'NASA Mars Rover Landing - rSpace Megathread_space_lmml4z.txt'
-rw-r--r--   1 jrwren jrwren    2840 Feb 19  2021 'NASA Perseverance Rover : First Week on Mars Megathread_space_lnekrk.txt'
-rw-r--r--   1 jrwren jrwren  669079 Mar 14  2021 'I made a poster showing what the Moon might look like in the year 2050_space_m5504j.png'
-rw-r--r--   1 jrwren jrwren    1942 Apr 18  2021 'NASA Ingenuity Helicopter Megathread - First Powered Flight On Another Planet_space_mtm7u0.txt'
-rw-r--r--   1 jrwren jrwren    2280 Jun 15 09:34 'With the Artemis program and the recently announced missions, the decade from 2025-2035 will be the greatest period of solar system exploration since the Apollo Age!_space_o0ehs3.txt'
-rw-r--r--   1 jrwren jrwren  626000 Jul  5 09:30 'Following all the cool finds of early human fossils recently, I made an infographic summarising our current best knowledge about the evolution of our genus (Homo)_Paleontology_oe7fju.png'
-rw-r--r--   1 jrwren jrwren    2111 Sep 14 11:52 'Retired CSA astronaut Chris Hadfield will be giving a Reddit Talk on rspace on Thursday 5-6 PM PST, post your questions in advance here!_space_po5ibz.txt'
drwxrwxr-x 135 jrwren jrwren    4096 Nov  8 08:54  ..
drwxr-xr-x   2 jrwren jrwren    4096 Nov  8 08:54  .

Notice many downloads older than 1 day.

@jrwren jrwren added the bug Something isn't working label Nov 8, 2021
@Serene-Arc Serene-Arc self-assigned this Dec 3, 2021
@Serene-Arc
Copy link
Owner

Alright I had to go through a bunch of the documentation for the Reddit interface to narrow it down. First off, by default the BDFR gets the 'hot' list, which cannot be sorted by time. So in the BDFR code, your 'day' option is just ignored anyway. If you use 'top', 'controversial', or the search function, then the option will be considered.

Even then though, the time filter doesn't actually filter based on the creation date for the post. It's more circumspect and vague than that. If you use 'top' sorting and a 'day' filter, then you get top posts in the last days, not necessarily posts made in the last day which are top in the upvoted list. These lists may have a lot of overlap but that isn't guaranteed, especially on weird subreddits with abnormal patterns of viewing and upvoting.

Specify the 'top' sort filter and you'll get what you're trying to through the BDFR. My testing didn't give any examples where there was a difference between the creation date and the hour limit, or whatever time limit, but apparently it can happen so be aware of that.

@jrwren
Copy link
Contributor Author

jrwren commented Dec 9, 2021

What about the combination of --user, -S new , and -t day? Maybe --user is like one of those weird subreddits with abnormal patterns. Bummer. Thank you.

@Serene-Arc
Copy link
Owner

@jrwren 'new' isn't a type that the time filter can be applied to. it's just 'top' and 'controversial'

Botts85 added a commit to Botts85/bulk-downloader-for-reddit that referenced this issue Jan 2, 2023
Serene-Arc determined that the time filter does not apply if the reddit API is not providing the "Top" or "Controversial" list.

As such, I have updated the time option documentation to clarify that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants