Skip to content

Extracted YouTube 8M URLs and Labels without all the TF Record parsing/features

Notifications You must be signed in to change notification settings

danielgordon10/youtube8m-data

Repository files navigation

youtube8m-data

Extracted YouTube 8M URLs and Labels without all the TF Record parsing/features.

YouTube8M is nice, but it comes with a lot of extra stuff that you might not want. If you just want the video urls and the labels, then you're in luck.

Video ids and labels can be downloaded from:

Alternatively, run this script: python download_dataset.py

You can look at the videos and labels easily using the provided script:

pip install -r requirements.txt
python examine_videos.py

The script used to generate the files is also included in the repo.

If for whatever reason you want to regenerate the data, you can run something like the following (modifying paths until they make sense).

mkdir -p ~/data/yt8m/video; cd ~/data/yt8m/video
pip install tensorflow==1.14.0

curl data.yt8m.org/download.py | partition=2/video/train mirror=us python
curl data.yt8m.org/download.py | partition=2/video/validate mirror=us python
curl data.yt8m.org/download.py | partition=2/video/test mirror=us python
python parse_tfrecord.py

About

Extracted YouTube 8M URLs and Labels without all the TF Record parsing/features

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages