Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download subtitles from videos that have embedded subtitles and option to render it in video #948

Open
2 tasks done
MrDummyNL opened this issue Jan 18, 2024 · 27 comments · May be fixed by #970
Open
2 tasks done

Download subtitles from videos that have embedded subtitles and option to render it in video #948

MrDummyNL opened this issue Jan 18, 2024 · 27 comments · May be fixed by #970
Labels
enhancement New feature or request

Comments

@MrDummyNL
Copy link

Checklist

Write your feature request here

More streamers are using the subtitles that are send with the video. This is already possible and i use it too. Some of my friends are using it too. This is why it will shame if subtitle data cannot stored in SRT file.

That is why i ask you to add the function to read subtitle data from TS video (which is logical step because you can only read it with full TS video, then you can render it. Optionally, you can render subtitles together with video. So you have 2 options:

  • store SRT file seperated
  • option to render it too with mp4 video. (ofc you need srt file for it)

I know tools exists to bake subtitles in the video. Should not too hard.
I might save some interesting vods, but subtitle data will lost here. That is kinda sad.

Can you make it possible? Thanks in advance.

@MrDummyNL MrDummyNL added the enhancement New feature or request label Jan 18, 2024
@superbonaci
Copy link
Contributor

The subtitles are not in the TS video, no idea where you got that from. A different thing is that the subs are video encoded inside the stream, but that's up to the streamer using OBS or other software.

Baking subs inside a video requires more knowledge that it look like, that's usually a whole different matter.

If you mean to extract the chat from the video you need some OCR but it's still alpha software.

@ScrubN
Copy link
Collaborator

ScrubN commented Jan 22, 2024

Someone already requested including subtitles in #750, and I made some very minor progress on it.

The subtitles are not in the TS video

The subtitles are stored in the TS video chunks. My initial implementation used FFmpeg to extract the subtitles which is extremely slow (~1.6s per 10 seconds of video). I felt this was unacceptable performance so I wanted to make a custom solution that is significantly faster, however other bug reports took priority at the time.

@superbonaci

This comment was marked as off-topic.

@ScrubN

This comment was marked as off-topic.

@superbonaci

This comment was marked as off-topic.

@ScrubN

This comment was marked as off-topic.

@superbonaci

This comment was marked as off-topic.

@MrDummyNL
Copy link
Author

MrDummyNL commented Jan 31, 2024

There is free plugin for OBS (it's also ONLY one plugin) that will send text data inside videostream to Twitch. The player in Twitch support the CC inside video and will show it.

There are other CC solutions, but they are not inside video, but send over second text stream. Those CC are done with browser that is open by streamer side. They will not show up in VOD (means to watch video back) so they wont work. Cannot watched later back because Twitch don't save seperate text stream.

But like other say @ScrubN , it's inside TS video so it can scanned for it and generate first SRT file based on time stamps, and as option it can encoded with ffmpeg inside mp4 video OR make seperated SRT file.

It's not impossible, some videoplayers even support it. We need make correct code to extract text data from TS video.
It's something i miss it, i make backups of my videos and my videos have CC inside TS stream. I know they exists.

Edit: the plugin is this one: https://github.com/ratwithacompiler/OBS-captions-plugin

@superbonaci
Copy link
Contributor

superbonaci commented Jan 31, 2024

Do you mean the only-subscribers VODs chats are stored inside the TS parts, with the emojis, colours, gifs and everything?

So the ID https://www.twitch.tv/videos/1980035805 has embedded subs in the TS? I'll check it I didn't know that was possible.

@superbonaci

This comment was marked as outdated.

@ScrubN
Copy link
Collaborator

ScrubN commented Jan 31, 2024

@MrDummyNL which software can be used to extract the subs from the TS?

See this earlier comment #948 (comment). I still want to implement custom caption extraction (if possible) because FFmpeg is so slow.

@MrDummyNL
Copy link
Author

MrDummyNL commented Feb 1, 2024

Even it's slow, it's always welcome. I am happy to extract something. I will search around for other solutions, so i let you know soon, otherwise we have @ScrubN option.

All my latest twitch streams (on MrDummy_NL channel) have CC video inside video. (2024 videos) so you can test on mine. My friend MarukaKou is using it too with CC in video lately, so you can also use his VODs to test it,

Update: there is tool on github: https://github.com/kanongil/telxcc and it's also used in other tools like CCExtrator. Seems you can use it.
Video editor https://www.nikse.dk/subtitleedit seems can read CC from TS files.
Pyhton script: https://pypi.org/project/ts-cc-extractor/

Should nice to link to some 3rd party programs and run it when TS file is completed - or - use github code and ofc credit the creator with it.

@superbonaci
Copy link
Contributor

@MrDummyNL did you try any of these if actually work with the twitch sample? Some of these programs are 10 years old.

@superbonaci
Copy link
Contributor

None of them work, I've reported the issue to all of them.

@ScrubN
Copy link
Collaborator

ScrubN commented Feb 2, 2024

Update: there is tool on github: https://github.com/kanongil/telxcc and it's also used in other tools like CCExtrator. Seems you can use it.
Video editor https://www.nikse.dk/subtitleedit seems can read CC from TS files.
Pyhton script: https://pypi.org/project/ts-cc-extractor/

telxcc is written in C and provides no prebuilt binaries, and I really don't want to deal with makefiles or linking on linux.
subtitleedit is licensed as GPL, so I cannot use their source code as reference without relicensing TwitchDownloader as GPL.
ts-cc-extractor is licensed as BSD 2-Clause, so I can reference their source code provided I include a copy of the license with TwitchDownloader.

Also @superbonaci, you cannot extract the subtitles from the concatenated TS file because it is produced by concatenating the raw bytes from all of the parts together. It's honestly a miracle to me that FFmpeg can read the concatenated file because of how f-ed the metadata keys probably are.

@superbonaci
Copy link
Contributor

@ScrubN HandBrake is able to detect the Twitch subtitles as Closed Caption CC608, and embed them in the video (if you wish).

Here's the video: https://www.twitch.tv/videos/1923916260

handbrake_twitch

If you save it as mkv and choose not to Burn into video, you can choose or not as subtitle track from VLC:

handbrake_mkv

What I don't know is if HandBrake used some external command to do it or is all built in to it, but it works great.

As I said the other commands don't work: telxcc, subtitleedit, ts-cc-extractor.

@ScrubN
Copy link
Collaborator

ScrubN commented Feb 2, 2024

@ScrubN HandBrake is able to detect the Twitch subtitles as Closed Caption CC608, and embed them in the video (if you wish).

Oh strange. I was only able to detect the subtitles from the individual parts with both FFmpeg and MPV. It seems that VLC also detects the subtitles from the merged video though. Again, this is probably a result of their parsers being overly forgiving to non-standard/corrupted data.

@superbonaci
Copy link
Contributor

@ScrubN HandBrake is able to detect the Twitch subtitles as Closed Caption CC608, and embed them in the video (if you wish).

Oh strange. I was only able to detect the subtitles from the individual parts with both FFmpeg and MPV. It seems that VLC also detects the subtitles from the merged video though. Again, this is probably a result of their parsers being overly forgiving to non-standard/corrupted data.

I'm not sure there is any corrupt data actually, because Video DownloadHelper downloads the same m2ts file as the merged with TwitchDownloader, so it must be correct.

Yes VLC shows several subs tracks but only one works (or there's only one). I'll have to report the issue to VLC and see what they say.

@MrDummyNL
Copy link
Author

Any progress here about extracting CC part from TS videos?

@ScrubN ScrubN linked a pull request Feb 14, 2024 that will close this issue
@ScrubN
Copy link
Collaborator

ScrubN commented Feb 14, 2024

Any progress here about extracting CC part from TS videos?

Sorry, I have been taking a break from TD to work on a private project with another developer. I have cleaned up a subtitle scanner I had written some time ago and committed it to a draft PR for transparency. Hopefully it should not take too long to finish and get into a working state.

@MrDummyNL
Copy link
Author

That is great news! Thank you to make it soon possible!

@ScrubN
Copy link
Collaborator

ScrubN commented Jun 18, 2024

I'm sorry for the wait. I was having a bit of a hard time when I learned that by changing how we concatenate the downloaded parts, the subtitles can be naturally preserved without any extra work. The only issue is that I need to rewrite how trimming is handled, so it may take a little while.

More good news though, this alternative approach will make video finalization MUCH faster and possibly also fix some other issues.

@ScrubN
Copy link
Collaborator

ScrubN commented Jun 19, 2024

Good news:

  • The new method works and the subtitle stream is detected in VLC, handbrake, FFmpeg, and mpv
  • Finalization is indeed faster.

Bad news:

  • Only VLC actually displays the subtitles, and they're a little scuffed.
    Note:

    • The subtitles delay is present in the original vod. The flashing subtitles are not.
    • The short artifacting on Ray's face at the start was caused by VLC.
    Subtitles example

    vlc_N495MjWQjm.mp4

  • Despite FFmpeg seeing the subtitle stream, it refuses to let me extract them, so implementing your request of exporting an srt file will still require some more effort.

@ScrubN

This comment was marked as off-topic.

@ScrubN
Copy link
Collaborator

ScrubN commented Jun 28, 2024

I'm really annoyed right now. The ffmpeg command I was using that was extracting the subtitles is no longer working. Handbrake does recognize the subtitles, but annoyingly it only lets me burn in the subtitles, not export them. I might actually need to write a custom subtitles parser and I'm not very happy about it.

@ScrubN
Copy link
Collaborator

ScrubN commented Jun 29, 2024

@MrDummyNL you said you currently extract the subtitles from the download cache. What tool(s) do you use to do that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants