Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize m3d mesh creation #3363

Merged
merged 4 commits into from
Oct 7, 2023
Merged

Optimize m3d mesh creation #3363

merged 4 commits into from
Oct 7, 2023

Conversation

DaveH355
Copy link
Contributor

@DaveH355 DaveH355 commented Oct 2, 2023

Hello, I've changed the m3d loader to sort faces by material. The old code assumes the faces are already grouped together by material, which may not be the case for every model.

This will reduce the number of meshes being created and uploaded to the gpu for those cases

src/rmodels.c Show resolved Hide resolved
src/rmodels.c Show resolved Hide resolved
@orcmid
Copy link
Contributor

orcmid commented Oct 2, 2023

I reviewed the insertion sort and confirmed that it is correct.

It has the unfortunate characteristics of having a worst-case on the order of numface squared compare-and-move operations and that might be messy if numface can be large. The best case is for already-sorted, the worst case is for initially in reverse order.

Although this might work well under likely conditions, I have a small concern about the added burden on understandability and maintenance risks, in comparison with using a standard-library function.

@bztsrc
Copy link

bztsrc commented Oct 3, 2023

The old code assumes the faces are already grouped together by material, which may not be the case for every model.

Yes, because according to the M3D specification they must be stored like that in the file. The overall M3D policy here is, use as much of the time consuming operations as possible in m3d_save, and keep m3d_load fast.

(FYI: the SDK explicitly calls qsort the same way like your patch, and Blender's internal representation stores only one material per face, so in the Blender plugin all polygons are already grouped together by material. In both cases the model file is saved with a mesh grouped by material.)

Considering that M3D files could be saved by other (non-SDK and non-Blender plugin) applications as well and if the mesh is already sorted (as it should be) then this adds just a very-very small overhead to the model load time, so all in all I think it's safe to do this in raylib. Making sure that the mesh is sorted won't hurt, but might be useful in some special cases with buggy model files.

It is not necessary for model files that comply with the M3D specification though.

I try to make functions as much self-contained as possible, avoiding dependencies to other functions, could it be implemented directly inside the function?

I'm afraid that's not possible, qsort explicitly requires a comparator function as an argument.

Cheers,
bzt

@bztsrc
Copy link

bztsrc commented Oct 3, 2023

Although this might work well under likely conditions

No, this insertion sort patch has a terrible performance and must not be used (O(numface factorial) in best case).

It adds a significant overhead even when the mesh is already sorted, this is the worst possible solution! Using qsort is much-much better and uncomparably more efficient (in both best case and worst case), so having a comparator function might not be nice, but definitely worth it.

But as I've said, sorting the mesh isn't required with models that comply with the M3D specification in the first place. If this patch is going to be added, I'd like to ask to add a comment as well that states this is just a workaround, not required for valid M3D model files.

-  // Sort faces by material.
+ // just a workaround for buggy models, NOT required as valid M3D model files must store faces sorted
+ qsort(m3d->face, m3d->numface, sizeof(m3df_t), m3d_compare_faces);

Cheers,
bzt

@DaveH355
Copy link
Contributor Author

DaveH355 commented Oct 3, 2023

@bztsrc

Blender's internal representation stores only one material per face, so in the Blender plugin all polygons are already grouped together by material.

It is true Blender only allows 1 material per face. But there is no guarantee Blender stores the faces by order of material. The order can be changed by creating new faces, subdividing, triangulating etc.

The Blender plugin doesn't sort the faces either. It just checks if the material index of a face is greater than number of materials.
Perhaps it is a bug in the M3D Blender addon?

I've also tested this to make sure. Here is a plane with 2 materials assigned to some faces.
screenshot

Exporting this using the addon and loading it using Raylib from master branch gives
Model has 16 meshes and 3 materials
If Raylib sorts the faces by material. Either insertion or quicksort
Model has 2 meshes and 3 materials

Thus evidence Blender and the M3D Blender addon doesn't sort faces by material.

The results are the same for Blender 3.1 and 3.6. I haven't tested it on more versions yet
Here is the blend file used
test_plane.zip

@bztsrc
Copy link

bztsrc commented Oct 3, 2023

Perhaps it is a bug in the M3D Blender addon?

Indeed! Thanks for the heads up! Right now I'm very busy with another project, but I'll fix the Blender plugin as soon as I can.

Cheers,
bzt

@raysan5
Copy link
Owner

raysan5 commented Oct 3, 2023

@bztsrc Thank you very much for your answer, I presumed it would already be contemplated at m3d side.

If m3d specs already consider that sorting, I prefer to avoid that extra sort in raylib side. In any case, I think it can be added with a note and the qsort() commented line, in case of future users digging into it.

@DaveH355
Copy link
Contributor Author

DaveH355 commented Oct 3, 2023

I've reverted back to qsort()
I think it is a good failsafe with little overhead

@orcmid
Copy link
Contributor

orcmid commented Oct 3, 2023

@bztsrc @DaveH355

Although this might work well under likely conditions

No, this insertion sort patch has a terrible performance and must not be used (O(numface factorial) in best case).

Don't be so hasty Dave. The insertion sort you wrote works very quickly when the data is already sorted. It is essentially a linear check in that case. If there are a few items out of order, it also won't be terrible.

The worst case for the insertion comes with the items in reverse sequence. But the cost is not n!, it is more like n(n+1)/2 (i.e., O(n^2)). This is determined by the worst number of shifts of items in the array to make the hole for the next one to be moved earlier in what is already sorted. (@bztsrc This is an addition of cases, not a product).

@bztsrc
Copy link

bztsrc commented Oct 4, 2023

This doesn't matter anymore, because I've added the sort to the Blender plugin too (see latest commit). Needs some more testing to be sure, but it's already there and it is to stay.

@raysan5: you can safely remove the sort from raylib now (or if you decide to keep it to be on the safe side, then please add a comment // just a workaround, sorting not needed for valid M3D model files, thanks!).

Cheers,
bzt

@orcmid
Copy link
Contributor

orcmid commented Oct 4, 2023

[2023-10-05 edit: Big simplification]

For those playing along at home, there are two things to observer here. One is algorithmic performance analysis. The second is actual performance in practice. I will address the second first.

B. We know in the already-sorted case that the insertion-sort method on n items will make n-1 comparisons and conclude without altering anything. (Actually, it picks up and puts back all but one, leaving things unchanged. We don't count that.) In this case we have no idea how much work qsort will do with that case. We also don't know what it will take in the worst-case for qsort.

A. The worst case for the insertion case is when the original n values are in descending order. Let's see what happens. Let's keep it simple. Let's use {4, 3, 2, 1, 0}, for 5 values.

a. To see how the algorithm works, consider that the values are printed on little square tiles and the tiles are placed in sequence in some given order. We want to arrange them in order by the numerals printed on them. The way insertion sort works is we want to walk through the tiles from first to last in the sequence and make rearrangements in such a way that all tiles to the left of the one we are looking at next are arranged in ascending sequence.

b. We start by picking up the second tile (tile[1]) and setting aside that as the candidate to be placed in correct sequence. We start at tile[1] because the single tile initially at tile[0] is already in perfect sequence by itself. There is now a gap where the candidate tile was sitting. If the tile to the left of the gap has a numeral that is greater than that of the candidate, We move that tile into the gap. We have effectively moved the gap one space to the left. As soon as there is no tile to the left of the gap or there is one whose numeral is not greater than that of the candidate, we have found the insertion point. The candidate is now dropped into the gap.

c. In this manner we advance to successive candidates, rinse-repeat. The tiles to the left of the candidate are already sorted and the candidate drops into the gap that is left when all tiles with a greater number (if any) have been shifted to have a gap where the candidate will fit in correct non-decreasing sequence.
(Note that duplicates are allowed and don't cause any unnecessary movement.)

d. Here is the procedure with the example.

i. {4, ...} is fine without looking farther.

ii. With {4, 3, ...} it's seen that 3 is out of order. Pick it up and slide the larger of the already-sorted larger ones down the list to make a gap, and drop the 3 in it. So we are now at {3,4,...} and can continue. For this to happen there was one compare and one shift of a value. That is, the gap was moved one time. We'll call that one step.

iii. With {3, 4, 2, ...} the same thing happens but the 2 is compared with two values and two values are shifted to arrive at {2, 3, 4, ...}. That a gap movement of two places. Call that two steps.

iv. With {2, 3, 4, 1, ...} The rules get us to {1, 2, 3, 4, ...} and there were three steps.

v. Finally, with {1, 2, 3, 4, 0} we take four steps and arrive at {0, 1, 2, 3, 4}.

What's the pattern?

The worst-case number of steps taken for n items is the sum

1 + 2 + ..., + (n-1).

And that sum is the value n(n-1)/2. We speak of that as a complexity of order n × n or O(n^2), because that's what dominates the growth as n is increases.

That's the algorithmic analysis for complexity. We haven't addressed the comparative cost of different approaches. The key thing to notice for the insertion sort is that if the list is already sorted, there are just n-1 simple comparisons and no gap-movement steps.

In the algorithm as described, Each candidate is always picked up and a gap created even if the tile immediately to the left of the candidate is not greater than the candidate, so the candidate is immediately put back into the same place. We could eliminate that by first noticing whether the tile immediately to the left of the candidate does not have a greater number. That might be valuable if the tiles are "heavy." Seeing what the difference is in that case might be worth testing. It would make insert-sorting an already sorted sequence a bit faster.

@orcmid
Copy link
Contributor

orcmid commented Oct 7, 2023

@raysan5 @bztsrc @DaveH355

BREAKING NEWS: The worst-case time for Quicksort is when the items are already sorted.

That statement is made by Donald Knuth in remarks following Algorithm 4.2.2Q (Quicksort) in "The Art of Computer Programming," Volume 3, Sorting and Searching.

Knuth's observation is from algorithmic analysis. It does not tell us what the comparative performance between an insertion sort and a Quicksort in actual code might be, and when (as the number of items increases) does the better performance change from one to the other.

However, since we know that the input to the m3d loader is almost always already sorted, with perhaps a few out-of-order, a defensive approach would be to use an insertion sort that exits very quickly if the requirement for an already-sorted list is satisfied. Quicksort is not the answer for that case.

In the spirit of defensive programming, I have developed an improvement to the previous insertion sort that is blazingly fast when the items are already sorted. It doesn't monkey with the list at all except when it finds an out-of-order item.

Adopting the @ubkp playbook, I have created a stress test, TrySorts.c (using simple arrays of integers) to identify where the performance of qsort surpasses that of my sped-up insertion sort. The code is at orcmid/myCleanC.

I have run TrySorts (built using VC/C++) on two systems. Both of them show the crossover where qsort becomes faster to be around 250 items.

Here's the run on my 12-year-old desktop.

TrySorts-2023-10-06-2145-XPS8500

Here's the run on my new laptop.

TrySorts-2023-10-06-2151-Laptop

@ghost
Copy link

ghost commented Oct 7, 2023

@orcmid That's pretty cool. These benchmarks are awesome.

Since you're on Windows, would like to suggest also testing with mingw (that's packed with raylib distribution), since I guess that's what a lot of users use.

@orcmid
Copy link
Contributor

orcmid commented Oct 7, 2023

@ubkp

Since you're on Windows, would like to suggest also testing with mingw (that's packed with raylib distribution), since I guess that's what a lot of users use.

That's probably the case for many raylib users. I'd welcome seeing the results that anyone obtains with different compilers/platforms. I'm not prepared to work with the mingw setup myself.

PS: Anyone building and running the TrySorts program needs to expect that it will take over 2 hours to get through to the final case.

@ghost
Copy link

ghost commented Oct 7, 2023

@orcmid Luckly that's super easy for Windows users: https://www.youtube.com/watch?v=R1iN3j7mAos

@raysan5 raysan5 merged commit da5407b into raysan5:master Oct 7, 2023
@raysan5
Copy link
Owner

raysan5 commented Oct 7, 2023

I'm merging this change BUT commenting it. The specs already specify data MUST be sorted.

raysan5 added a commit that referenced this pull request Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants