parallel segment push #129

AlexKordic · 2022-11-01T14:29:38Z

Submitting multiple segments to local B node.

config.TranscodingParallelJobs defines number of parallel segments being transcoded.

Using TranscodingParallelJobs int = 15 and additional interval with purpose to misalign job start of each parallel worker we get nice GPU load graph without spikes on local B:

This resolves #111

thomshutt · 2022-11-01T14:47:03Z

transcode/transcode.go

+	var completed sync.WaitGroup
+	completed.Add(config.TranscodingParallelJobs)
+	for index := 0; index < config.TranscodingParallelJobs; index++ {
+		go transcodeSegment(streamName, manifestID, transcodeRequest, transcodeProfiles, queue, errors, sourceSegmentURLs, targetOSURL, &completed)


I don't love this pattern of passing channels around, because it makes the control of the flow harder to follow and also makes it difficult to test - I'd prefer to keep transcodeSegment as a simple function that knows how to transcode a segment and then have all of the management of the segment queue and allocation of work to goroutines sitting up at this level

Also there's enough going on here that it's probably worth moving out of the main Transcode body, so the flow would be something like

<Transcode - high level iterative transcode process> -> <Parallel Segment Transcoder> -> <transcodeSegment>

thomshutt · 2022-11-01T14:47:27Z

transcode/transcode.go

+	completed.Add(config.TranscodingParallelJobs)
+	for index := 0; index < config.TranscodingParallelJobs; index++ {
+		go transcodeSegment(streamName, manifestID, transcodeRequest, transcodeProfiles, queue, errors, sourceSegmentURLs, targetOSURL, &completed)
+		time.Sleep(713 * time.Millisecond) // Add some desync interval to avoid load spikes on segment-encode-end


Nice, good thinking - worth introducing some randomness?

thomshutt · 2022-11-01T14:50:20Z

Looks really good - I've left one comment about the general code structure, but other than that the only thing I think this needs is some new test coverage.

codecov · 2022-11-01T15:24:11Z

Codecov Report

Merging #129 (da5e999) into main (491de10) will increase coverage by 0.63867%.
The diff coverage is 53.84615%.

@@                 Coverage Diff                 @@
##                main        #129         +/-   ##
===================================================
+ Coverage   41.43750%   42.07617%   +0.63867%     
===================================================
  Files             26          26                 
  Lines           1600        1628         +28     
===================================================
+ Hits             663         685         +22     
- Misses           857         862          +5     
- Partials          80          81          +1

Impacted Files	Coverage Δ
transcode/transcode.go	`63.12500% <53.84615%> (+3.27652%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 491de10...da5e999. Read the comment docs.

Impacted Files	Coverage Δ
transcode/transcode.go	`63.12500% <53.84615%> (+3.27652%)`	⬆️

emranemran · 2022-11-02T05:22:58Z

transcode/transcode.go

-		}
+		}()
+		// Add some desync interval to avoid load spikes on segment-encode-end
+		time.Sleep(713 * time.Millisecond)


Feels pretty arbitrary - are the load spikes on GPU or CPU? Did you see any issues with the load esp if it's GPU?

emranemran · 2022-11-02T05:29:35Z

Would also recommend building the linux targets and copying over the catalyst-api to canary for some quick testing prior to merging to see what perf (or top) shows when a server is being used instead of a local machine.

emranemran · 2022-11-02T05:25:59Z

transcode/transcode.go

+					return
+				}
+				var completedRatio = calculateCompletedRatio(len(sourceSegmentURLs), segment.Index+1)
+				if err = clients.DefaultCallbackClient.SendTranscodeStatus(transcodeRequest.CallbackURL, clients.TranscodeStatusTranscoding, completedRatio); err != nil {


hmm won't we get multiple completed callbacks now instead of just the one that Studio was expecting?

This is actually progress callback that looks like:

{ "completion_ratio": 0.6666666666666667, "status": "transcoding", "timestamp": 1667312537 }

And final completed callback is here:

{ "completion_ratio": 1, "status": "success", "timestamp": 1667317010, "type": "video", "video_spec": { "format": "mp4", "tracks": [ { "type": "video", "codec": "H264", "bitrate": 7508656, "duration": 184.541, "size": 0, "start_time": 0, "width": 2048, "height": 858 }, { "type": "audio", "codec": "AAC", "bitrate": 256000, "duration": 184.661, "size": 0, "start_time": 0, "channels": 2, "sample_rate": 48000, "sample_bits": 16 } ], "duration": 184.64, "size": 183755148 }, "outputs": [ { "type": "google-s3", "manifest": "s3+https:/GOOG1EXCXWNKVLKBEJS62OMGXMQ5C5FH67YZMQXYGFBQYQRBYXQQJT5OM6L4I:rzM9CFtiunoflGzBwU3c%[email protected]/alexk-dms-upload-test/bbb/long/transcoded/index.m3u8", "videos": null } ] }

thomshutt · 2022-11-02T12:52:12Z

Manually tested this on Canary with Alex today and it seems to work well. Progress callbacks occasionally out of order as expected, but this shouldn't be an issue (and we'll be refactoring the progress stuff to reduce the number of callbacks soon anyway)

* apis/livepeer: Allow specifying record object store ID * cmd/record-tester: Record object store ID cli flag * Fix logs * test-streamer2: Wait a little longer before ending the test Will probably show other errors apart from the stream health one. * test-streamer: Sort errors for the alerts

Files changed:\nM manifest.yaml Co-authored-by: livepeer-docker <[email protected]>

[BOT] catalyst-api: parallel segment push (#129)

AlexKordic added 3 commits November 1, 2022 14:55

parallel push of segments

30f1e5c

Separate transcode func from parallelization code

0dc3223

avoid load spikes on segment-encode-end

1e5216e

AlexKordic requested review from thomshutt and emranemran November 1, 2022 14:29

thomshutt reviewed Nov 1, 2022

View reviewed changes

Merge branch 'main' into ak/parallel_seg_push

b4e92ef

CR changes: Separate transcode from synchronization logic

da5e999

AlexKordic requested a review from thomshutt November 1, 2022 15:40

emranemran reviewed Nov 2, 2022

View reviewed changes

thomshutt approved these changes Nov 2, 2022

View reviewed changes

AlexKordic merged commit fe520dd into main Nov 2, 2022

AlexKordic deleted the ak/parallel_seg_push branch November 2, 2022 13:12

iameli pushed a commit that referenced this pull request Feb 7, 2023

[AUTO-COMMIT] Update manifest.yaml (#129)

8d0aede

Files changed:\nM manifest.yaml Co-authored-by: livepeer-docker <[email protected]>

iameli pushed a commit that referenced this pull request Feb 7, 2023

Merge pull request #241 from livepeer/staging

8e922a0

[BOT] catalyst-api: parallel segment push (#129)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel segment push #129

parallel segment push #129

AlexKordic commented Nov 1, 2022 •

edited

Loading

thomshutt Nov 1, 2022

thomshutt Nov 1, 2022 •

edited

Loading

thomshutt Nov 1, 2022 •

edited

Loading

thomshutt commented Nov 1, 2022

codecov bot commented Nov 1, 2022 •

edited

Loading

emranemran Nov 2, 2022

emranemran commented Nov 2, 2022

emranemran Nov 2, 2022

AlexKordic Nov 2, 2022

thomshutt commented Nov 2, 2022

parallel segment push #129

parallel segment push #129

Conversation

AlexKordic commented Nov 1, 2022 • edited Loading

thomshutt Nov 1, 2022

Choose a reason for hiding this comment

thomshutt Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

thomshutt Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

thomshutt commented Nov 1, 2022

codecov bot commented Nov 1, 2022 • edited Loading

Codecov Report

emranemran Nov 2, 2022

Choose a reason for hiding this comment

emranemran commented Nov 2, 2022

emranemran Nov 2, 2022

Choose a reason for hiding this comment

AlexKordic Nov 2, 2022

Choose a reason for hiding this comment

thomshutt commented Nov 2, 2022

AlexKordic commented Nov 1, 2022 •

edited

Loading

thomshutt Nov 1, 2022 •

edited

Loading

thomshutt Nov 1, 2022 •

edited

Loading

codecov bot commented Nov 1, 2022 •

edited

Loading