-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize m3d mesh creation #3363
Conversation
I reviewed the insertion sort and confirmed that it is correct. It has the unfortunate characteristics of having a worst-case on the order of Although this might work well under likely conditions, I have a small concern about the added burden on understandability and maintenance risks, in comparison with using a standard-library function. |
Yes, because according to the M3D specification they must be stored like that in the file. The overall M3D policy here is, use as much of the time consuming operations as possible in m3d_save, and keep m3d_load fast. (FYI: the SDK explicitly calls qsort the same way like your patch, and Blender's internal representation stores only one material per face, so in the Blender plugin all polygons are already grouped together by material. In both cases the model file is saved with a mesh grouped by material.) Considering that M3D files could be saved by other (non-SDK and non-Blender plugin) applications as well and if the mesh is already sorted (as it should be) then this adds just a very-very small overhead to the model load time, so all in all I think it's safe to do this in raylib. Making sure that the mesh is sorted won't hurt, but might be useful in some special cases with buggy model files. It is not necessary for model files that comply with the M3D specification though.
I'm afraid that's not possible, Cheers, |
No, this insertion sort patch has a terrible performance and must not be used (O(numface factorial) in best case). It adds a significant overhead even when the mesh is already sorted, this is the worst possible solution! Using qsort is much-much better and uncomparably more efficient (in both best case and worst case), so having a comparator function might not be nice, but definitely worth it. But as I've said, sorting the mesh isn't required with models that comply with the M3D specification in the first place. If this patch is going to be added, I'd like to ask to add a comment as well that states this is just a workaround, not required for valid M3D model files. - // Sort faces by material.
+ // just a workaround for buggy models, NOT required as valid M3D model files must store faces sorted
+ qsort(m3d->face, m3d->numface, sizeof(m3df_t), m3d_compare_faces); Cheers, |
It is true Blender only allows 1 material per face. But there is no guarantee Blender stores the faces by order of material. The order can be changed by creating new faces, subdividing, triangulating etc. The Blender plugin doesn't sort the faces either. It just checks if the material index of a face is greater than number of materials. I've also tested this to make sure. Here is a plane with 2 materials assigned to some faces. Exporting this using the addon and loading it using Raylib from master branch gives Thus evidence Blender and the M3D Blender addon doesn't sort faces by material. The results are the same for Blender 3.1 and 3.6. I haven't tested it on more versions yet |
Indeed! Thanks for the heads up! Right now I'm very busy with another project, but I'll fix the Blender plugin as soon as I can. Cheers, |
@bztsrc Thank you very much for your answer, I presumed it would already be contemplated at m3d side. If m3d specs already consider that sorting, I prefer to avoid that extra sort in raylib side. In any case, I think it can be added with a note and the qsort() commented line, in case of future users digging into it. |
This reverts commit dc1bd55.
I've reverted back to qsort() |
Don't be so hasty Dave. The insertion sort you wrote works very quickly when the data is already sorted. It is essentially a linear check in that case. If there are a few items out of order, it also won't be terrible. The worst case for the insertion comes with the items in reverse sequence. But the cost is not n!, it is more like n(n+1)/2 (i.e., O(n^2)). This is determined by the worst number of shifts of items in the array to make the hole for the next one to be moved earlier in what is already sorted. (@bztsrc This is an addition of cases, not a product). |
This doesn't matter anymore, because I've added the sort to the Blender plugin too (see latest commit). Needs some more testing to be sure, but it's already there and it is to stay. @raysan5: you can safely remove the sort from raylib now (or if you decide to keep it to be on the safe side, then please add a comment Cheers, |
[2023-10-05 edit: Big simplification] For those playing along at home, there are two things to observer here. One is algorithmic performance analysis. The second is actual performance in practice. I will address the second first. B. We know in the already-sorted case that the insertion-sort method on n items will make n-1 comparisons and conclude without altering anything. (Actually, it picks up and puts back all but one, leaving things unchanged. We don't count that.) In this case we have no idea how much work qsort will do with that case. We also don't know what it will take in the worst-case for qsort. A. The worst case for the insertion case is when the original n values are in descending order. Let's see what happens. Let's keep it simple. Let's use {4, 3, 2, 1, 0}, for 5 values. a. To see how the algorithm works, consider that the values are printed on little square tiles and the tiles are placed in sequence in some given order. We want to arrange them in order by the numerals printed on them. The way insertion sort works is we want to walk through the tiles from first to last in the sequence and make rearrangements in such a way that all tiles to the left of the one we are looking at next are arranged in ascending sequence. b. We start by picking up the second tile (tile[1]) and setting aside that as the candidate to be placed in correct sequence. We start at tile[1] because the single tile initially at tile[0] is already in perfect sequence by itself. There is now a gap where the candidate tile was sitting. If the tile to the left of the gap has a numeral that is greater than that of the candidate, We move that tile into the gap. We have effectively moved the gap one space to the left. As soon as there is no tile to the left of the gap or there is one whose numeral is not greater than that of the candidate, we have found the insertion point. The candidate is now dropped into the gap. c. In this manner we advance to successive candidates, rinse-repeat. The tiles to the left of the candidate are already sorted and the candidate drops into the gap that is left when all tiles with a greater number (if any) have been shifted to have a gap where the candidate will fit in correct non-decreasing sequence. d. Here is the procedure with the example. i. {4, ...} is fine without looking farther. ii. With {4, 3, ...} it's seen that 3 is out of order. Pick it up and slide the larger of the already-sorted larger ones down the list to make a gap, and drop the 3 in it. So we are now at {3,4,...} and can continue. For this to happen there was one compare and one shift of a value. That is, the gap was moved one time. We'll call that one step. iii. With {3, 4, 2, ...} the same thing happens but the 2 is compared with two values and two values are shifted to arrive at {2, 3, 4, ...}. That a gap movement of two places. Call that two steps. iv. With {2, 3, 4, 1, ...} The rules get us to {1, 2, 3, 4, ...} and there were three steps. v. Finally, with {1, 2, 3, 4, 0} we take four steps and arrive at {0, 1, 2, 3, 4}. What's the pattern? The worst-case number of steps taken for n items is the sum 1 + 2 + ..., + (n-1). And that sum is the value n(n-1)/2. We speak of that as a complexity of order n × n or O(n^2), because that's what dominates the growth as n is increases. That's the algorithmic analysis for complexity. We haven't addressed the comparative cost of different approaches. The key thing to notice for the insertion sort is that if the list is already sorted, there are just n-1 simple comparisons and no gap-movement steps. In the algorithm as described, Each candidate is always picked up and a gap created even if the tile immediately to the left of the candidate is not greater than the candidate, so the candidate is immediately put back into the same place. We could eliminate that by first noticing whether the tile immediately to the left of the candidate does not have a greater number. That might be valuable if the tiles are "heavy." Seeing what the difference is in that case might be worth testing. It would make insert-sorting an already sorted sequence a bit faster. |
BREAKING NEWS: The worst-case time for Quicksort is when the items are already sorted. That statement is made by Donald Knuth in remarks following Algorithm 4.2.2Q (Quicksort) in "The Art of Computer Programming," Volume 3, Sorting and Searching. Knuth's observation is from algorithmic analysis. It does not tell us what the comparative performance between an insertion sort and a Quicksort in actual code might be, and when (as the number of items increases) does the better performance change from one to the other. However, since we know that the input to the m3d loader is almost always already sorted, with perhaps a few out-of-order, a defensive approach would be to use an insertion sort that exits very quickly if the requirement for an already-sorted list is satisfied. Quicksort is not the answer for that case. In the spirit of defensive programming, I have developed an improvement to the previous insertion sort that is blazingly fast when the items are already sorted. It doesn't monkey with the list at all except when it finds an out-of-order item. Adopting the @ubkp playbook, I have created a stress test, I have run TrySorts (built using VC/C++) on two systems. Both of them show the crossover where qsort becomes faster to be around 250 items. Here's the run on my 12-year-old desktop. Here's the run on my new laptop. |
@orcmid That's pretty cool. These benchmarks are awesome. Since you're on Windows, would like to suggest also testing with mingw (that's packed with raylib distribution), since I guess that's what a lot of users use. |
@ubkp
That's probably the case for many raylib users. I'd welcome seeing the results that anyone obtains with different compilers/platforms. I'm not prepared to work with the mingw setup myself. PS: Anyone building and running the |
@orcmid Luckly that's super easy for Windows users: https://www.youtube.com/watch?v=R1iN3j7mAos |
I'm merging this change BUT commenting it. The specs already specify data MUST be sorted. |
Hello, I've changed the m3d loader to sort faces by material. The old code assumes the faces are already grouped together by material, which may not be the case for every model.
This will reduce the number of meshes being created and uploaded to the gpu for those cases