Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel segment push #129

Merged
merged 5 commits into from
Nov 2, 2022
Merged

parallel segment push #129

merged 5 commits into from
Nov 2, 2022

Conversation

AlexKordic
Copy link
Contributor

@AlexKordic AlexKordic commented Nov 1, 2022

Submitting multiple segments to local B node.

config.TranscodingParallelJobs defines number of parallel segments being transcoded.

parallel_local_b_load

Using TranscodingParallelJobs int = 15 and additional interval with purpose to misalign job start of each parallel worker we get nice GPU load graph without spikes on local B:
desynced_parallel_local_b_load

This resolves #111

var completed sync.WaitGroup
completed.Add(config.TranscodingParallelJobs)
for index := 0; index < config.TranscodingParallelJobs; index++ {
go transcodeSegment(streamName, manifestID, transcodeRequest, transcodeProfiles, queue, errors, sourceSegmentURLs, targetOSURL, &completed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this pattern of passing channels around, because it makes the control of the flow harder to follow and also makes it difficult to test - I'd prefer to keep transcodeSegment as a simple function that knows how to transcode a segment and then have all of the management of the segment queue and allocation of work to goroutines sitting up at this level

Copy link
Contributor

@thomshutt thomshutt Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there's enough going on here that it's probably worth moving out of the main Transcode body, so the flow would be something like

<Transcode - high level iterative transcode process> -> <Parallel Segment Transcoder> -> <transcodeSegment>

completed.Add(config.TranscodingParallelJobs)
for index := 0; index < config.TranscodingParallelJobs; index++ {
go transcodeSegment(streamName, manifestID, transcodeRequest, transcodeProfiles, queue, errors, sourceSegmentURLs, targetOSURL, &completed)
time.Sleep(713 * time.Millisecond) // Add some desync interval to avoid load spikes on segment-encode-end
Copy link
Contributor

@thomshutt thomshutt Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, good thinking - worth introducing some randomness?

@thomshutt
Copy link
Contributor

Looks really good - I've left one comment about the general code structure, but other than that the only thing I think this needs is some new test coverage.

@codecov
Copy link

codecov bot commented Nov 1, 2022

Codecov Report

Merging #129 (da5e999) into main (491de10) will increase coverage by 0.63867%.
The diff coverage is 53.84615%.

Impacted file tree graph

@@                 Coverage Diff                 @@
##                main        #129         +/-   ##
===================================================
+ Coverage   41.43750%   42.07617%   +0.63867%     
===================================================
  Files             26          26                 
  Lines           1600        1628         +28     
===================================================
+ Hits             663         685         +22     
- Misses           857         862          +5     
- Partials          80          81          +1     
Impacted Files Coverage Δ
transcode/transcode.go 63.12500% <53.84615%> (+3.27652%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 491de10...da5e999. Read the comment docs.

Impacted Files Coverage Δ
transcode/transcode.go 63.12500% <53.84615%> (+3.27652%) ⬆️

}
}()
// Add some desync interval to avoid load spikes on segment-encode-end
time.Sleep(713 * time.Millisecond)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels pretty arbitrary - are the load spikes on GPU or CPU? Did you see any issues with the load esp if it's GPU?

@emranemran
Copy link
Collaborator

Would also recommend building the linux targets and copying over the catalyst-api to canary for some quick testing prior to merging to see what perf (or top) shows when a server is being used instead of a local machine.

return
}
var completedRatio = calculateCompletedRatio(len(sourceSegmentURLs), segment.Index+1)
if err = clients.DefaultCallbackClient.SendTranscodeStatus(transcodeRequest.CallbackURL, clients.TranscodeStatusTranscoding, completedRatio); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm won't we get multiple completed callbacks now instead of just the one that Studio was expecting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually progress callback that looks like:

{
  "completion_ratio": 0.6666666666666667,
  "status": "transcoding",
  "timestamp": 1667312537
}

And final completed callback is here:

{
  "completion_ratio": 1,
  "status": "success",
  "timestamp": 1667317010,
  "type": "video",
  "video_spec": {
    "format": "mp4",
    "tracks": [
      {
        "type": "video",
        "codec": "H264",
        "bitrate": 7508656,
        "duration": 184.541,
        "size": 0,
        "start_time": 0,
        "width": 2048,
        "height": 858
      },
      {
        "type": "audio",
        "codec": "AAC",
        "bitrate": 256000,
        "duration": 184.661,
        "size": 0,
        "start_time": 0,
        "channels": 2,
        "sample_rate": 48000,
        "sample_bits": 16
      }
    ],
    "duration": 184.64,
    "size": 183755148
  },
  "outputs": [
    {
      "type": "google-s3",
      "manifest": "s3+https:/GOOG1EXCXWNKVLKBEJS62OMGXMQ5C5FH67YZMQXYGFBQYQRBYXQQJT5OM6L4I:rzM9CFtiunoflGzBwU3c%[email protected]/alexk-dms-upload-test/bbb/long/transcoded/index.m3u8",
      "videos": null
    }
  ]
}

@thomshutt
Copy link
Contributor

Manually tested this on Canary with Alex today and it seems to work well. Progress callbacks occasionally out of order as expected, but this shouldn't be an issue (and we'll be refactoring the progress stuff to reduce the number of callbacks soon anyway)

@AlexKordic AlexKordic merged commit fe520dd into main Nov 2, 2022
@AlexKordic AlexKordic deleted the ak/parallel_seg_push branch November 2, 2022 13:12
iameli pushed a commit that referenced this pull request Feb 7, 2023
* apis/livepeer: Allow specifying record object store ID

* cmd/record-tester: Record object store ID cli flag

* Fix logs

* test-streamer2: Wait a little longer before ending the test

Will probably show other errors apart from the stream health one.

* test-streamer: Sort errors for the alerts
iameli pushed a commit that referenced this pull request Feb 7, 2023
Files changed:\nM	manifest.yaml

Co-authored-by: livepeer-docker <[email protected]>
iameli pushed a commit that referenced this pull request Feb 7, 2023
[BOT] catalyst-api: parallel segment push (#129)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelise pushing of segments to Broadcaster
3 participants