Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Download clips by Stream ID #42

Open
ndv400 opened this issue Nov 10, 2020 · 23 comments
Open

[Feature Request] Download clips by Stream ID #42

ndv400 opened this issue Nov 10, 2020 · 23 comments

Comments

@ndv400
Copy link

ndv400 commented Nov 10, 2020

Hey, so with all this DMCA stuff going on many streamers nuked their clips.
However, it's still possible to download them by using a Stream ID like this:
https://pastebin.com/H25NPesp
Could it be possible to implement such a useful feature? Check stream IDs on this website: twitchtracker.com

@ihabunek
Copy link
Owner

Looking into it.

@ndv400
Copy link
Author

ndv400 commented Nov 10, 2020

@ihabunek here's an example of stream ids I've scraped from a certain streamer:
test.txt

@ihabunek
Copy link
Owner

How do you determine which offsets to scan? I found clips at pretty high offsets, 5000+.

@ndv400
Copy link
Author

ndv400 commented Nov 10, 2020

If I understand it right, Rydan's solution on twitter works like this:
'Take that id and then loop through every offset until stream end (for a 1h long stream that would mean offset 1 to 3600) and then request each URL via get request and check the response for a status code of 200. All the ones with that status code are existing clips that can be downloaded.'

@ihabunek
Copy link
Owner

I don't want to add this to twitch-dl until I fully understand the logic so I've made a new project to play around with it.

You need python3.5+. Download the package here.

To test, try this out:

python3 ./clips-dl.0.2.0.pyz 787034579 --min-offset 2400 --max-offset 3300 > urls.txt

Note that 787034579 is the video ID, NOT the stream ID. I fetch the stream ID from twitch.

urls.txt should contain URLs of clips in given offset range. You can download them with wget or similar (I get five of them for the above command).

wget -i urls.txt

You can leave out the min and max offset, and they will be set to 0 and video length in seconds respectively.

@ndv400
Copy link
Author

ndv400 commented Nov 10, 2020

hmm, tried it out with this vod:

python3 ./clips-dl.0.2.0.pyz 798052582 --min-offset 2400 --max-offset 3300 > urls.txtINFO:__main__:Fetching stream id for video ID: 798052582
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "./clips-dl.0.2.0.pyz/__main__.py", line 108, in <module>
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "./clips-dl.0.2.0.pyz/__main__.py", line 79, in main
  File "./clips-dl.0.2.0.pyz/__main__.py", line 43, in get_video_info
IndexError: list index out of range

Works with your example, though

@ndv400
Copy link
Author

ndv400 commented Nov 10, 2020

same output with a different vod without offset parameters:

python3 ./clips-dl.0.2.0.pyz 798052582 > urls.txt
INFO:__main__:Fetching stream id for video ID: 798052582
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "./clips-dl.0.2.0.pyz/__main__.py", line 108, in <module>
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "./clips-dl.0.2.0.pyz/__main__.py", line 79, in main
  File "./clips-dl.0.2.0.pyz/__main__.py", line 43, in get_video_info
IndexError: list index out of range

@ihabunek
Copy link
Owner

I think i know what the issue is, but i don't have any more time to work on this today. Paid job beckons. I'll get around to this later.

@ndv400
Copy link
Author

ndv400 commented Nov 10, 2020

Okay, thanks for helping out! I'm currently playing around with getting stream ids from old vods which are gone to scan them for clips which are still there. Is there any way to convert an old vod id (e.g. 78349737) into a stream id? For me it returns {'data': {'video': None}.
Despite not knowing python, I tried to debug the clips-dl bundle you've provided. It appears that it didn't work with 798052582 because the stream was ongoing. I've also made a mistake and checked this vod id twice, was a little distracted. The response didn't contain the stream id:

INFO:__main__:Fetching stream id for video ID: 798052582
INFO:__main__:{'data': {'video': {'previewThumbnailURL': 'https://vod-secure.twitch.tv/_404/404_processing_{width}x{height}.png', 'lengthSeconds': 7799}}, 'extensions': {'durationMilliseconds': 17, 'requestID': '01EPSD3A4ZTNX9SDNY89JMPWDQ'}}

So that wasn't an issue, sorry.
Also a bunch of errors on an unstable connection but it's not a big deal.

DEBUG:__main__:GET https://clips-media-assets2.twitch.tv/1603370954-offset-10219.mp4
Traceback (most recent call last):
  File "clip-dl.pyz/httpx/_exceptions.py", line 326, in map_exceptions
  File "clip-dl.pyz/httpx/_client.py", line 1502, in _send_single_request
  File "clip-dl.pyz/httpcore/_async/connection_pool.py", line 218, in arequest
  File "clip-dl.pyz/httpcore/_async/connection.py", line 105, in arequest
  File "clip-dl.pyz/httpcore/_async/http11.py", line 72, in arequest
  File "clip-dl.pyz/httpcore/_async/http11.py", line 133, in _receive_response
  File "clip-dl.pyz/httpcore/_async/http11.py", line 172, in _receive_event
  File "clip-dl.pyz/httpcore/_backends/asyncio.py", line 150, in read
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "clip-dl.pyz/httpcore/_exceptions.py", line 12, in map_exceptions
httpcore.ReadError: [Errno 104] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "clip-dl.pyz/__main__.py", line 108, in <module>
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "clip-dl.pyz/__main__.py", line 87, in main
  File "clip-dl.pyz/__main__.py", line 73, in find_clips
  File "clip-dl.pyz/__main__.py", line 52, in process_segment
  File "clip-dl.pyz/httpx/_client.py", line 1602, in head
  File "clip-dl.pyz/httpx/_client.py", line 1371, in request
  File "clip-dl.pyz/httpx/_client.py", line 1406, in send
  File "clip-dl.pyz/httpx/_client.py", line 1444, in _send_handling_auth
  File "clip-dl.pyz/httpx/_client.py", line 1476, in _send_handling_redirects
  File "clip-dl.pyz/httpx/_client.py", line 1502, in _send_single_request
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "clip-dl.pyz/httpx/_exceptions.py", line 343, in map_exceptions
httpx.ReadError: [Errno 104] Connection reset by peer

DEBUG:__main__:GET https://clips-media-assets2.twitch.tv/1603370954-offset-17993.mp4
Traceback (most recent call last):
  File "clip-dl.pyz/httpx/_exceptions.py", line 326, in map_exceptions
  File "clip-dl.pyz/httpx/_client.py", line 1502, in _send_single_request
  File "clip-dl.pyz/httpcore/_async/connection_pool.py", line 218, in arequest
  File "clip-dl.pyz/httpcore/_async/connection.py", line 105, in arequest
  File "clip-dl.pyz/httpcore/_async/http11.py", line 72, in arequest
  File "clip-dl.pyz/httpcore/_async/http11.py", line 133, in _receive_response
  File "clip-dl.pyz/httpcore/_async/http11.py", line 172, in _receive_event
  File "clip-dl.pyz/httpcore/_backends/asyncio.py", line 150, in read
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "clip-dl.pyz/httpcore/_exceptions.py", line 12, in map_exceptions
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "clip-dl.pyz/__main__.py", line 108, in <module>
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "clip-dl.pyz/__main__.py", line 87, in main
  File "clip-dl.pyz/__main__.py", line 73, in find_clips
  File "clip-dl.pyz/__main__.py", line 52, in process_segment
  File "clip-dl.pyz/httpx/_client.py", line 1602, in head
  File "clip-dl.pyz/httpx/_client.py", line 1371, in request
  File "clip-dl.pyz/httpx/_client.py", line 1406, in send
  File "clip-dl.pyz/httpx/_client.py", line 1444, in _send_handling_auth
  File "clip-dl.pyz/httpx/_client.py", line 1476, in _send_handling_redirects
  File "clip-dl.pyz/httpx/_client.py", line 1502, in _send_single_request
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "clip-dl.pyz/httpx/_exceptions.py", line 343, in map_exceptions
httpx.ReadTimeout

@ndv400
Copy link
Author

ndv400 commented Nov 10, 2020

Everything works very well on a good connection, albeit only for one vod at a time... I hope this isn't asking for too much, but could there be an option to pass an array of stream ids from a txt file in which every stream id is on a new line? With the program writing outputs to urls-{stream_id}.txt. Tried to do it myself but couldn't figure it out, would really appreciate help with this.

@ihabunek
Copy link
Owner

@Daniil1288 I added some retries and hopefully fixed the stream_id parsing logic.
https://git.sr.ht/~ihabunek/clips-dl/refs/0.2.1/clips-dl.0.2.1.pyz

About your request, I can do that. Could you also provide the stream duration? I don't have a simple way to get stream duration from a stream_id.

@ihabunek
Copy link
Owner

BTW, have you noticed that all clips downloaded this way have a duration of 30 seconds? Also I'm getting some some duplication, e.g. in the example i sent before i get these clips:

INFO:clips-dl:Found clip: https://clips-media-assets2.twitch.tv/40284574190-offset-2496.mp4
INFO:clips-dl:Found clip: https://clips-media-assets2.twitch.tv/40284574190-offset-2694.mp4
INFO:clips-dl:Found clip: https://clips-media-assets2.twitch.tv/40284574190-offset-2698.mp4
INFO:clips-dl:Found clip: https://clips-media-assets2.twitch.tv/40284574190-offset-3078.mp4
INFO:clips-dl:Found clip: https://clips-media-assets2.twitch.tv/40284574190-offset-3244.mp4

Clips at offsets 2694 and 2698 overlap.

@ndv400
Copy link
Author

ndv400 commented Nov 11, 2020

Hey @ihabunek, I guess it would be easier to make it an argument that converts hours into seconds. In my case, the average stream length is 8 hrs, so I have to check offsets from 1 to 28800. The only way to get the length accurately would be to use a third-party website like twitchtracker.com which is where I got my stream IDs from as their https://twitchtracker.com/{streamername}/streams page gives all the required information just like this in one request for all streams for any streamer on the website:

<div class="hidden text-uppercase">
  | <span class="to-date">2016-11-17</span>
  | <br>Live for 7 hour(s)
  | <br>Viewers: <span class="to-number">18928</span>
  | <br><small class="text-warning">click to show details</small>
</div>

The clips downloaded this way have a duration of 30 seconds because this is how non-customized clips are stored. Customized clips with custom length have this format: AT-cm_363621067 but I don't think there's a way to bruteforce these easily if at all since the number looks to be random and isn't associated with any streamer. Here's Rydan explaining it on twitter.
As he says in that conversation, there's also a third type of a clip link which has /raw_media/ in it which is formed when you create a clip. It's the 90 second version which you crop when customizing a clip. Here's an example: https://clips-media-assets2.twitch.tv/raw_media/39947855852-offset-10942.mp4 - just made one now. Haven't been able to find working ones like this on the internet, though... might play around with it to see if it's possible to scan for a 'raw media' version on old streams, too.

@ndv400
Copy link
Author

ndv400 commented Nov 11, 2020

The way you get the stream id which you've called 'hacky' in the recent commit is actually perfectly normal, that's how most twitch clip downloaders used to work iirc, i.e. by splitting the thumbnail URL.
In fact, when I was downloading what's left of my clips collection over at https://dashboard.twitch.tv/u/daniil1288/content/clips, as there was no button or program online to actually download all the clips in one go if you're a viewer - which is why I've been putting off doing it until it was too late as most of them got nuked - I had to scroll down until it loaded about 950 clips and refused to go any further, save the html, then reverse the 'Created' order, load 950 clips and save the html again, split lines so I'd get a list of thumbnails like https://clips-media-assets2.twitch.tv/vod-85620741-offset-17528-preview-260x147.jpg and turn this into https://clips-media-assets2.twitch.tv/vod-85620741-offset-17528.mp4 only to then finally throw the links in JDownloader to get them all. Thankfully I had only 1.7k clips left, not sure if it's even possible to load clips past the 950-1000 mark... not taking into account that the website lagged and often refused to load more than 100 or 150 clips at once.
Also, there are clips out there (not deleted ones) which don't have a thumbnail for some reason and return https://static-cdn.jtvnw.net/ttv-static/404_preview-160x90.jpg, just thought I'd let you know. Twitch is weird.
Anyway, thanks for taking your time to help me out with this! The bruteforce actually helps a lot for those who watch smaller (sub 1k viewers) streamers who had to delete all clips and didn't back up the years of content they've produced on the platform either.

@ihabunek
Copy link
Owner

https://git.sr.ht/~ihabunek/clips-dl/refs/0.3.0/clips-dl.0.3.0.pyz

I made searching for clips for the whole channel.

λ ./clips-dl.0.3.0.pyz channel -h
usage: clips-dl channel [-h] [-s SKIP] [-w WORKERS] [-v] [-q] channel_name

positional arguments:
  channel_name          Channel name

optional arguments:
  -h, --help            show this help message and exit
  -s SKIP, --skip SKIP  Number of streams to skip (default 0)
  -w WORKERS, --workers WORKERS
                        Number of concurrent downloads (default: 25)
  -v, --verbose         Verbose logging
  -q, --quiet           Disable logging

For example:

λ ./clips-dl.0.3.0.pyz channel bananasaurus_rex > urls.txt

Still just outputs urls, does not download.

You can still download for a single video id:

λ ./clips-dl.0.3.0.pyz video 12345

@ihabunek
Copy link
Owner

Tips:

  • use -v to show each URL checked.
  • use -w to teak number of concurrent downloads, optimal value may depend on your network connnection
  • use -s to skip a number of streams, e.g. if the program breaks and you want to resume where you left off

@ndv400
Copy link
Author

ndv400 commented Nov 13, 2020

You've hardcoded the bananasaurus_rex argument by mistake, I think:
streams = await find_streams(client, "bananasaurus_rex")
Otherwise works fine after I changed that, thanks a lot! Started getting 503 at some point, though, which is unhandled and throws an exception. Had to decrease the amount of workers to 20 or 15 but still getting it. Could you handle that exception so it retries if it encounters 503? Would also be cool to be able to specify the timeout and the amount of retries by passing an argument.

@ndv400
Copy link
Author

ndv400 commented Nov 13, 2020

503'd on the fourth stream after letting it run for an hour with -w 10.

@ihabunek
Copy link
Owner

LOL about hardcoded channel. I'll handle the exception. Don't know if I'll have time today. Probably over the weekend.

@ndv400
Copy link
Author

ndv400 commented Nov 30, 2020

Hello, hope you're doing well, is there a possibility you'll handle the exception any time soon? It's very usable if it weren't for it crashing every now and then.

@ihabunek
Copy link
Owner

ihabunek commented Dec 1, 2020

@Daniil1288 Sorry, got distracted by other stuff. Here's a new version which should solve the retries (retry on any error, not only timeout), removes the hardcoded channel name and adds options for retries and timeout.

https://git.sr.ht/~ihabunek/clips-dl/refs/0.4.0/clips-dl.0.4.0.pyz

λ ./clips-dl.0.4.0.pyz channel --help
usage: clips-dl channel [-h] [-s SKIP] [-w WORKERS] [-v] [-q] [-t TIMEOUT] [-r RETRIES] channel_name

positional arguments:
  channel_name          Channel name

optional arguments:
  -h, --help            show this help message and exit
  -s SKIP, --skip SKIP  Number of streams to skip (default 0)
  -w WORKERS, --workers WORKERS
                        Number of concurrent downloads (default: 25)
  -v, --verbose         Verbose logging
  -q, --quiet           Disable logging
  -t TIMEOUT, --timeout TIMEOUT
                        HTTP timeout in seconds (default: 10
  -r RETRIES, --retries RETRIES
                        How many times to retry failed requests (default: 5

@ndv400
Copy link
Author

ndv400 commented Jan 3, 2021

Hey, ran it for a while, it managed to do thirty or so vods and then errored out with this:

Traceback (most recent call last):
  File "C:\Users\...\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\...\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\clips\clips-dl.0.4.0.pyz\__main__.py", line 225, in <module>
  File "C:\Users\...\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "C:\clips\clips-dl.0.4.0.pyz\__main__.py", line 178, in channel
  File "C:\clips\clips-dl.0.4.0.pyz\__main__.py", line 131, in find_clips
  File "C:\clips\clips-dl.0.4.0.pyz\__main__.py", line 117, in process_segment
ValueError: Unhandled HTTP status: 500

Twitch was down for a while, this is why.
Kind of a shame that it also didn't save the output anywhere after erroring out.
Interestingly enough, I've also found that my old clip urls that I grabbed using version 0.2.0 don't work any more, instead displaying AccessDenied... hopefully twitch didn't patch this, I'll report back once the script gets to older vods.

@ndv400
Copy link
Author

ndv400 commented Jan 4, 2021

Okay, it still works, it's just that some older clips got deleted, it looks like.
Still, for me it errors out on http 500 now after a while, just let it run again and it happened again... without saving the output anywhere. Could you handle http 500 too? So it retries once it encounters this error. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants