-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel segment push #129
Conversation
transcode/transcode.go
Outdated
var completed sync.WaitGroup | ||
completed.Add(config.TranscodingParallelJobs) | ||
for index := 0; index < config.TranscodingParallelJobs; index++ { | ||
go transcodeSegment(streamName, manifestID, transcodeRequest, transcodeProfiles, queue, errors, sourceSegmentURLs, targetOSURL, &completed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this pattern of passing channels around, because it makes the control of the flow harder to follow and also makes it difficult to test - I'd prefer to keep transcodeSegment
as a simple function that knows how to transcode a segment and then have all of the management of the segment queue and allocation of work to goroutines sitting up at this level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there's enough going on here that it's probably worth moving out of the main Transcode body, so the flow would be something like
<Transcode - high level iterative transcode process> -> <Parallel Segment Transcoder> -> <transcodeSegment>
transcode/transcode.go
Outdated
completed.Add(config.TranscodingParallelJobs) | ||
for index := 0; index < config.TranscodingParallelJobs; index++ { | ||
go transcodeSegment(streamName, manifestID, transcodeRequest, transcodeProfiles, queue, errors, sourceSegmentURLs, targetOSURL, &completed) | ||
time.Sleep(713 * time.Millisecond) // Add some desync interval to avoid load spikes on segment-encode-end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, good thinking - worth introducing some randomness?
Looks really good - I've left one comment about the general code structure, but other than that the only thing I think this needs is some new test coverage. |
Codecov Report
@@ Coverage Diff @@
## main #129 +/- ##
===================================================
+ Coverage 41.43750% 42.07617% +0.63867%
===================================================
Files 26 26
Lines 1600 1628 +28
===================================================
+ Hits 663 685 +22
- Misses 857 862 +5
- Partials 80 81 +1
Continue to review full report at Codecov.
|
} | ||
}() | ||
// Add some desync interval to avoid load spikes on segment-encode-end | ||
time.Sleep(713 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels pretty arbitrary - are the load spikes on GPU or CPU? Did you see any issues with the load esp if it's GPU?
Would also recommend building the linux targets and copying over the |
return | ||
} | ||
var completedRatio = calculateCompletedRatio(len(sourceSegmentURLs), segment.Index+1) | ||
if err = clients.DefaultCallbackClient.SendTranscodeStatus(transcodeRequest.CallbackURL, clients.TranscodeStatusTranscoding, completedRatio); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm won't we get multiple completed callbacks now instead of just the one that Studio was expecting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually progress callback that looks like:
{
"completion_ratio": 0.6666666666666667,
"status": "transcoding",
"timestamp": 1667312537
}
And final completed callback is here:
{
"completion_ratio": 1,
"status": "success",
"timestamp": 1667317010,
"type": "video",
"video_spec": {
"format": "mp4",
"tracks": [
{
"type": "video",
"codec": "H264",
"bitrate": 7508656,
"duration": 184.541,
"size": 0,
"start_time": 0,
"width": 2048,
"height": 858
},
{
"type": "audio",
"codec": "AAC",
"bitrate": 256000,
"duration": 184.661,
"size": 0,
"start_time": 0,
"channels": 2,
"sample_rate": 48000,
"sample_bits": 16
}
],
"duration": 184.64,
"size": 183755148
},
"outputs": [
{
"type": "google-s3",
"manifest": "s3+https:/GOOG1EXCXWNKVLKBEJS62OMGXMQ5C5FH67YZMQXYGFBQYQRBYXQQJT5OM6L4I:rzM9CFtiunoflGzBwU3c%[email protected]/alexk-dms-upload-test/bbb/long/transcoded/index.m3u8",
"videos": null
}
]
}
Manually tested this on Canary with Alex today and it seems to work well. Progress callbacks occasionally out of order as expected, but this shouldn't be an issue (and we'll be refactoring the progress stuff to reduce the number of callbacks soon anyway) |
* apis/livepeer: Allow specifying record object store ID * cmd/record-tester: Record object store ID cli flag * Fix logs * test-streamer2: Wait a little longer before ending the test Will probably show other errors apart from the stream health one. * test-streamer: Sort errors for the alerts
Files changed:\nM manifest.yaml Co-authored-by: livepeer-docker <[email protected]>
[BOT] catalyst-api: parallel segment push (#129)
Submitting multiple segments to local B node.
config.TranscodingParallelJobs
defines number of parallel segments being transcoded.Using
TranscodingParallelJobs int = 15
and additional interval with purpose to misalign job start of each parallel worker we get nice GPU load graph without spikes on local B:This resolves #111