Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AV1 samplebuilder support (WIP) #264

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

jech
Copy link
Member

@jech jech commented Apr 15, 2024

  • Make AV1Packet implement Depacketizer
  • Implement AppendToSample for AV1Packet

An attempt to implement the required hooks for the samplebuilder
to work with AV1. This approach is likely to be both faster and
simpler than the one in #238.

COMPLETELY UNTESTED, DO NOT COMMIT.

@alexpokotilo
Copy link

alexpokotilo commented Apr 15, 2024

  1. func (p *AV1Packet) IsPartitionHead(payload []byte) bool is called before func (p *AV1Packet) Unmarshal(payload []byte) ([]byte, error) { and hence you cannot use !p.Z && p.N and should use something like

    if len(payload) == 0 {
    return false
    }
    return (payload[0] & byte(0b10000000)) == 0

I was not able to test further as IsPartitionHead always false

This adds IsPartitionHead and IsPartitionTail to AV1Packet.
AV1 RTP payload cannot be concatenated without first recovering
the omitted OBU length.
@jech
Copy link
Member Author

jech commented Apr 15, 2024

Fixed. However, as I've mentioned, this is completely untested, and I'd be very surprised if it worked. The question is whether you agree that this could work in principle, since it is both more efficient and simpler than what you propose.

Copy link

codecov bot commented Apr 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 31 lines in your changes are missing coverage. Please review.

Project coverage is 84.70%. Comparing base (74a9dc7) to head (89c2b75).

Files Patch % Lines
codecs/av1_packet.go 0.00% 31 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #264      +/-   ##
==========================================
- Coverage   86.21%   84.70%   -1.52%     
==========================================
  Files          22       22              
  Lines        1734     1765      +31     
==========================================
  Hits         1495     1495              
- Misses        203      234      +31     
  Partials       36       36              
Flag Coverage Δ
go 84.70% <0.00%> (-1.52%) ⬇️
wasm 84.24% <0.00%> (-1.51%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexpokotilo
Copy link

Fixed. However, as I've mentioned, this is completely untested, and I'd be very surprised if it worked. The question is whether you agree that this could work in principle, since it is both more efficient and simpler than what you propose.

AppendToSample implementation is wrong. There are maybe several OBUS in AV1Packet and AppendToSample should run the similar loop as func (p *AV1Packet) Unmarshal

But there is much simplier approach much better than my and yours: just return

OBULen+OBU list right from Unmarshal
I checked and nobody cares about func (p *AV1Packet) Unmarshal. Who use AV1Packet without samplebuilder, they use AV1Packet for it's OBUelements map. They don't care about payload format at all.

All my fix was to workaround current useless av1 format. your AppendToSample introduce to keep current format but to produce right format after Unmarshal. Shouldn't we just return all we need from Unmarshal. Yes, we need to allocate for sample but your implementation do the same, but for only one OBU. This is wrong. There maybe multiple OBUs in Av1Packet.
I'd bet to return correct av1 bitstreama directly from Unmarshal

@jech
Copy link
Member Author

jech commented Apr 15, 2024

AppendToSample implementation is wrong.

Interesting.

There are maybe several OBUS in AV1Packet

That's right, and my implementation builds a sample as a sequence of LE128-prefixed OBUs. Are you suggesting a different representation?

I checked and nobody cares about func (p *AV1Packet) Unmarshal.

https://github.com/jech/galene/blob/master/codecs/codecs.go#L48

This code does the equivalent of AV1Packer.Unmarshal in an efficient manner. I had to write it because Unmarshal was written by people who don't care about efficiency.

Shouldn't we just return all we need from Unmarshal.

Unmarshal is a low-leve function that should run in constant time and perform no allocations. Check the H264 version of Unmarshal, it's done right, it doesn't try to parse the list of NALUs. Please also read the comments in #265 and #266.

@jech
Copy link
Member Author

jech commented Apr 15, 2024

@alexpokotilo Another point is that you cannot reassemble fragmented OBUs in Unmarshal, since the packets haven't been reordered at this point: it's the samplebuilder's (or the jitter buffer's) job to reorder packets. So there's really no alternative to doing the parsing later than at Unmarshal time.

@alexpokotilo
Copy link

alexpokotilo commented Apr 16, 2024

AppendToSample implementation is wrong.

Interesting.

There are maybe several OBUS in AV1Packet

That's right, and my implementation builds a sample as a sequence of LE128-prefixed OBUs. Are you suggesting a different representation?

@jech, maybe my problem is that I see AppendToSample implementation but don't see its usage in your branch. Could you please finish your version so I can just use it and check it really support all cases.
Now I see a part of job. I don't understand how AppendToSample do the same OBUs processing as Unmarshal because you didn't add AppendToSample call to sample builder. If you call it in loop till all data is processed then ok, I got the point. If you call it without loop like you pointed in your #238 (comment) example I don't get the point then. Again speaking about remained fragment of last OBU. I don't see how you are going to process it.

Long story short. You are right and my version is not ideal. I don't like it either.
When I did it I tried to close the gap in AV1Packet implementation. I had to use AV1Frame to correctly process last remained OBU as AV1Packet drops OBUElements on unmarshal and don't save last not complete OBU. For example H264Packet save fragmented nal for further processing. AV1Packet doesn't do that so your AppendToSample will not work.

From performance pov it doesn't matter where you process OBUs: in unmarshal method or in AppendToSample right next to it.
I can see AV1Packet is used only in samplebuffer and some tests.
if we use AppendToSample or return processed data from unmarshal the total performance will be the same. You will not be able to eliminate allocation in unmarshal if you add fragmented obu support there. Take a look H264Packet. If you have fuaBuffer from previous packet then "p.fuaBuffer = append(p.fuaBuffer, payload[fuaHeaderSize:]...)" performaed. and this is allocation.

Now AV1Packet just drop OBUElements on every unmarshal. it doesn't care about previous fragments. I had to use AV1Frame to combine OBUs from AV1Packets as AV1Packet doesn't care about it.

What you propose is :

  • add new interface AppendToSample
  • add check of it into sample buffer
  • call this AppendToSample in loop.
    And all of this because you don't want allocations in AV1Packets's unmarshal. Maybe you are right and unmarshal shouldn't do allocations but look at H264Packet unmarshal. It does allocation. and it handles tail from previous unsharshal itself. So it's not just av1 problem after all. To be honest I don't see any problem with this approash at all. If you want to check just rtp headers you can just process packet payload or we can add new method to all Packets like unmarshalHeader. we can use it in unmarshal as well.

Don't get me wrong please. I admit my initial approach is wrong and I'm starving to test your version but it doesn't exist.

I still don't see a problem in adding tail support into AV1Packets the same way H264Packet do and return necessary format from it without adding new interface AppendToSample. at the end of the day you will have to allocate somewhere. Why not in unmarshal ? What so special in this method so that you don't want to do the job there? these Packet structures are used mainly in samplebuilder. You don't add any gain moving processing in new method AppendToSample.

If you complain about unmarshal-only approach only from design POV - I'm sorry I don't completely confident to speak from design POV. I'm not Packet approach designer. But If you think your approach gain something to samplebuffer users - please complete it and I will create performance test to check where you are right or not.

I think it's time to complete AppendToSample so I can do tests and check whether it works or not. Now I cannot and don't see how I can help to prove your points or argue with them.

My main problem with AppendToSample approach is that I don't get what is so special with av1 so we need to handle it different way from h264? do we have tails in h264? yes, we do. How we process them? in unmarshal. Do we do allocation in h264's unmarshals? Yes, in case we have previous tail. Does h264' unmarshal return ready to append buffer? Yes, it is.

You want to have lightweight header parser without all body processing - go ahead but this is a different story.
I don't see how you improve performance by moving processing into AppendToSample and call it right after unmarshal. What samplebuffer users gain from this ?

Do I agree that AV1Packet made in different way than others Packets? Yes.
Should we add tail support in it to not rely on Av1Frame for assembling? 100%
Should we add AppendToSample to offload unmarshal? For what ?
Should we remove OBUElements processing? Well yes, but Av1Frame users will not be happy right now. this is legasy, We can add flag like @Sean-Der proposed in one of your pulls.

This is all I can say about it.

@alexpokotilo
Copy link

alexpokotilo commented Apr 16, 2024

@alexpokotilo Another point is that you cannot reassemble fragmented OBUs in Unmarshal, since the packets haven't been reordered at this point: it's the samplebuilder's (or the jitter buffer's) job to reorder packets. So there's really no alternative to doing the parsing later than at Unmarshal time.

Please look at H264Packet. it does process tails from previous unmarshal call relying on the fact that samplebuffer call unmarshal in correct order. This is what I see in the code at least. Check func (p *H264Packet) Unmarshal(payload []byte) ([]byte, error) and it's p.fuaBuffer processing. If unmarshal didn't rely on packet order this approach would not work for h264.

@jech
Copy link
Member Author

jech commented Apr 16, 2024

Please look at H264Packet. it does process tails from previous unmarshal call relying on the fact that samplebuffer call unmarshal in correct order.

Yes, and that's wrong, for at least three reasons:

  • people expect Unmarshal and Marshal should be (at least logically) inverses; if you defragment at Unmarshal time, you loose this property, which causes subtle bugs;
  • there is no place where it is documented that Unmarshal must be colled in sequential orde (the VP8 and VP9 formats don't require it), so the requirement causes subtle bugs;
  • there are applications that want to call Unmarshal just to check header flags, these applications do not want to pay the whole price of parsing the OBU or NALU list.

Alex, let us please do the right thing in AV1, and let's fix H.264. I've started here:

at the end of the day you will have to allocate somewhere. Why not in unmarshal ? What so special in this method

Because Unmarshal is not optional: if you want to work with a packet, you must call Unmarshal. Any other methods are optional, you only call them if you need the functionality.

Pion is designed to be suitable for many different applications, from full-featured multimedia applications (that do encoding, decoding and so on) to fast media servers (that want to push massive amounts of packets around while paying as little overhead as possible. We must endeavour to make sure that Pion remains widely usable, and doesn't become optimised for one specific class of applications.

@alexpokotilo
Copy link

Please look at H264Packet. it does process tails from previous unmarshal call relying on the fact that samplebuffer call unmarshal in correct order.

Yes, and that's wrong, for at least three reasons:

* people expect Unmarshal and Marshal should be (at least logically) inverses; if you defragment at Unmarshal time, you loose this property, which causes subtle bugs;

* there is no place where it is documented that Unmarshal must be colled in sequential orde (the VP8 and VP9 formats don't require it), so the requirement causes subtle bugs;

* there are applications that want to call Unmarshal just to check header flags, these applications do not want to pay the whole price of parsing the OBU or NALU list.

So maybe add Another method to help these application and keep current interface to others ?
I respect your passion and willing to change things you consider not right. I just don't see anything wrong with the way it is for h264 and others. I vote to fix AV1Packet. If you can convince maintainers then you are right. I don't see any reason for this. But I could be wrong

@lebedyncrs
Copy link

@jech any plans to continue on this one? If not can you please sum up what needs to be done? maybe somebody take over

@jech
Copy link
Member Author

jech commented Sep 11, 2024

Yes, I'm planning to work on it. I'm actually planning a fairly comprehensive rework of the packet code:

  • add a new interface, so that the zero-alloc mode has feature-parity with the (currently default) wasteful mode;
  • modify the sample-builder to work with the zero-alloc mode;
  • add support for the new interface to the AV1 codec (zero-alloc mode only, since I'm planning to deprecate the wasteful mode);
  • convince Sean to make zero-alloc the default (incompatible change).

Unfortunately, I won't be able to work on that before the end of October.

@lebedyncrs
Copy link

Yes, I'm planning to work on it. I'm actually planning a fairly comprehensive rework of the packet code:

  • add a new interface, so that the zero-alloc mode has feature-parity with the (currently default) wasteful mode;
  • modify the sample-builder to work with the zero-alloc mode;
  • add support for the new interface to the AV1 codec (zero-alloc mode only, since I'm planning to deprecate the wasteful mode);
  • convince Sean to make zero-alloc the default (incompatible change).

Unfortunately, I won't be able to work on that before the end of October.

Sounds exciting, please let me know if you would need help with testing. Does this PR is going to solve #273 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants