Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactoring] Improve spline/bezier. #11425

Merged
merged 72 commits into from
Nov 4, 2018

Conversation

xebra
Copy link
Contributor

@xebra xebra commented Sep 27, 2018

This is a little big rewrite of spline/bezier include many improvements, bug fixes and refactoring.

Performance improvements

  • Improve the speed of software tessellation approximately 15-25 fps on PC/Android. (For example, FF4 thundaga on Android: 5 fps to 20 fps)
  • Hardware tessellation worked on my "weak" GPU(adreno 305), also the speed improved approximately 40 fps(20 fps to 60 fps) with B-spline demo(bezier in pspgl).
  • [Edit] Unfortunately HW tessellation is not so fast on weak GPU like old mobile phone. It seems just get almost same speed as SW tessellation. Anyway SW tessellation seems faster enough for now.
  • [Edit] Pursuit Force is a bit slower when using HW tessellation on GLES.

Feature improvements

  • Get rid of divisions in spline weights calculation.
  • Use derivatives for B-spline normals generation.
  • Use same weights if u == v to avoid duplicated calculations.
  • Use cached weights.
  • Pre-convert the control points to avoid conversion in hot loops.
  • Use real tessellation for low-quality.
    Before(spline/bezier is same geometory)
    bezi00678_00000
    bezi00678_00001
    After(Bezier)
    bezi00678_00000
    bezi00678_00001
    High quality(bezier)
    bezi00678_00002
    After(Spline)
    bezi00678_00002
    bezi00678_00003
    High quality(spline)
    bezi00678_00003
  • Some SIMD optimizations(correctly optimized?).
  • Managing a raw memory buffer.
  • Add support for changing the quality of hardware tessellation.
  • Disable instanced tessellation.

Bug fixes

  • Fix texture coordinates of low-quality.
    Before
    ulus10560_00000
    After
    ulus10560_00008

Refactoring

  • Unify spline/bezier tessellation logic using template.

Known issue [Edit]

  • SplineSurface demo(from jpcsp) is something wrong with SW tessellation. However, HW tessellation seems correct. This issue is known since before these commits.
    HW tess
    spli01764_00001
    SW tess
    spli01764_00004

Future plan(TODO) [Edit]

  • Implement HW tessellation for DX9.
  • Hand optimization with ARM NEON if I can.

I just wrote something that is remembered for now, so I may add the description later.
Sorry for the huge pull request.

@hrydgard
Copy link
Owner

Whoa. Won't have time to look at this until the weekend but sounds promising :)

@marosis
Copy link

marosis commented Sep 27, 2018

Owhdusbduxbfzykqhxkabw WHAAAAAAAT? You have JUST REALLY said FPS improve over 15-25???????? I don't have words

@LunaMoo
Copy link
Collaborator

LunaMoo commented Sep 28, 2018

Nice overall, but hardware tesselation seems to still be buggy in "Juiced 2: Hot Import Nights" so this also breaks it when spline/bezier curves are set to low now. The game uses curves to draw road and since it's a racing game people with low end hardware might abuse setting the curves to low here. Anyway the glitch of hardware should be easily noticeable in this dump:

ULES00928_0001.zip

And probably shouldn't be a blocking case for this PR since hardware tesselation is still buggy same way on master.

Edit:
Also tested CODED ARMS which is that one game where whole levels are made via curves, seems to be perfectly fine quality wise and uncapped speed using curves on high on desktop pc increased from ~ 4000% to ~ 5000% so a pretty solid 25% increase in performance in that game assuming cpu bottleneck.

Copy link
Collaborator

@unknownbrackets unknownbrackets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of changes. Caching the curves seems to make sense, I wonder how much vectorization we get from MSVC and GCC now on Intel/ARM without the specializations...

Slightly concerned about the FF4 note on matUpdate. It'd be nice to collect some bezier/spline GE dumps to test things like #7525. I'll test more later.

-[Unknown]

@@ -109,7 +109,7 @@ GPU_GLES::GPU_GLES(GraphicsContext *gfxCtx, Draw::DrawContext *draw)
if (g_Config.bHardwareTessellation) {
// Disable hardware tessellation if device is unsupported.
bool hasTexelFetch = gl_extensions.GLES3 || (!gl_extensions.IsGLES && gl_extensions.VersionGEThan(3, 3, 0)) || gl_extensions.EXT_gpu_shader4;
if (!gstate_c.SupportsAll(GPU_SUPPORTS_INSTANCE_RENDERING | GPU_SUPPORTS_VERTEX_TEXTURE_FETCH | GPU_SUPPORTS_TEXTURE_FLOAT) || !hasTexelFetch) {
if (!gstate_c.SupportsAll(GPU_SUPPORTS_VERTEX_TEXTURE_FETCH | GPU_SUPPORTS_TEXTURE_FLOAT) || !hasTexelFetch) {
// TODO: Check unsupported device name list.(Above gpu features are supported but it has issues with weak gpu, memory, shader compiler etc...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is less relevant now?

-[Unknown]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?
Well, I decided to disable instanced rendering because instanced tessellation to be very slow with B-Spline surface on weak GPU.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so maybe this comment is less relevant (the device name list blacklist.)

-[Unknown]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, device name list...
Yeah, I believe so.

ambientStr = (matUpdate & 1) && hasColor ? "col" : "u_matambientalpha";
diffuseStr = (matUpdate & 2) && hasColor ? "col.rgb" : "u_matdiffuse";
specularStr = (matUpdate & 4) && hasColor ? "col.rgb" : "u_matspecular.rgb";
// TODO: Probably, should use hasColorTess but FF4 has a problem with drawing the background.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. hasColor should always be true because we normalize the vertices. In theory, this should only affect diffuse and specular. For ambient, it should be the same either way, right?

-[Unknown]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. hasColor should always be true because we normalize the vertices.
Yes, so we don't need to check the flag hasColor.
However, actually, I'm not really sure that is doing what.

@unknownbrackets
Copy link
Collaborator

Just wanted to say these performance improvements look very cool.

But @marosis note that it only affects certain games / certain areas of games. Most of the FPS notes are using tests that render only spline/bezier. Don't expect it to make every game way faster - it's a very cool update that will make some games way faster, though, so it's still something to be excited about.

-[Unknown]

@marosis
Copy link

marosis commented Sep 28, 2018

Yeah i know this, but will i get some performance improvements when my android don't have HW tesselation in settings?

@xebra
Copy link
Contributor Author

xebra commented Sep 28, 2018

Thanks all.
@hrydgard @unknownbrackets Sorry for the huge commits in one time. I kept in mind to simple, clean, clear, easy to read, so I hope you can read easily whole code.

@LunaMoo I'm not sure what is the problem about hw tess. Also I don't know how can I get any information from the dump file. Thanks anyway to some tests.

@marosis I'm sure you can get any effects when you play games in very heavy spline/bezier situation.

@unknownbrackets
Copy link
Collaborator

To use a GE dump - just open it like a game. I usually drag them into the PPSSPP window.

It will run all the same GE commands that the game sent for that frame, just over and over again each frame. Note that anything that was render-to-texture or any block transfers will now come from RAM, so it's not perfect, but it does get you most of the frame behavior.

-[Unknown]

@xebra
Copy link
Contributor Author

xebra commented Sep 28, 2018

Thanks @unknownbrackets !
Oh, it's very easy way to use it. And really useful feature!
I will try to look into it.

@marosis
Copy link

marosis commented Sep 28, 2018

Pls Onkel hrydgard will be this in 1.7.0? 😂😂😂😂😂

@zminhquanz
Copy link
Contributor

@xebra Hello , long time no see , i'll expect about this . How about HW Tesellation on GLES 2 and DX9 ?

@weihuoya
Copy link
Contributor

oh, Pursuit Force use a lot of bezier, Metal Gear Acid 2 use a lot of spline, and will get performance improvement.

@xebra
Copy link
Contributor Author

xebra commented Sep 29, 2018

@zminhquanz Hi! I will try to implement for DX9 after this. Also, GLES2 is... I need more research on it.

@xebra
Copy link
Contributor Author

xebra commented Sep 29, 2018

BTW, I have a question.
We call Flush() twice soon in spline/bezier.
This seems wasted. So, can we get rid of one of these?

void GPUCommon::Execute_Bezier(u32 op, u32 diff) {
	drawEngineCommon_->DispatchFlush();
	...
	drawEngineCommon_->SubmitBezier(...);
}
...
void DrawEngineCommon::SubmitBezier(...) {
	PROFILE_THIS_SCOPE("bezier");
	DispatchFlush();
	...
}

@unknownbrackets
Copy link
Collaborator

Well, the first flush is in case we have other prims/triangles/etc., which can't be combined with the bezier. The second one is to draw the actual bezier because it can't be combined with later drawing.

-[Unknown]

@xebra
Copy link
Contributor Author

xebra commented Sep 29, 2018

Really?
Actually, Flush() is called three times in this case because it's called twice in SubmitBeizer().

void DrawEngineCommon::SubmitBezier(...) {
	PROFILE_THIS_SCOPE("bezier");
	DispatchFlush();
	...
	// Tessellation...
	...
	DispatchSubmitPrim(...);

	DispatchFlush();

	if (origVertType & GE_VTYPE_TC_MASK) {
		gstate_c.uv = prevUVScale;
	}
}

I think you said

The second one is to draw the actual bezier because it can't be combined with later drawing.

is meaning last one.

[Edit]
I mean the first one is called twice, so I said it seems wasted.

@unknownbrackets
Copy link
Collaborator

Ah, you're right. We only need two, I think the one outside SubmitBezier was added when removing the always flush before flag.

-[Unknown]

@xebra
Copy link
Contributor Author

xebra commented Sep 29, 2018

Thanks!
I will try to fix it later.

@xebra
Copy link
Contributor Author

xebra commented Sep 29, 2018

Pursuit Force on Android seems almost don't gain performance improvements.
Probably, it's not bezier's issue, it's because of heavy(approx. 1000) draw calls(4x4 bezier patch) in one frame, I think.

@hrydgard
Copy link
Owner

For Pursuit Force we might then want to try to combine draw calls even with beziers.. Should be possible, just need a little extra state tracking I guess..

@weihuoya
Copy link
Contributor

weihuoya commented Sep 29, 2018

I had try to combine beziers and splines drawcalls, it do not have any improvement. only drawcalls down a lot.

@xebra
Copy link
Contributor Author

xebra commented Sep 29, 2018

Yeah, I was just thinking same thing but it seems not easy way for me.

@xebra
Copy link
Contributor Author

xebra commented Sep 30, 2018

I guess GLES2 devices have old and weak GPU, so even if we can support GLES2 for HW tess, maybe it's slower than SW tess(CPU).
Also If we decide to support GLES3 or higher, maybe we can use more strong GL features.
How about it?

@unknownbrackets
Copy link
Collaborator

I personally think these improvements are great, and GLES2 (using software) will already benefit a lot from them. I also think GLES2 might perform worse with hardware (even my old NVIDIA desktop GPU did, iirc.)

I tested this with a couple games, but starting to need to keep a database of GE dumps with tags for things they do... I can't remember which games that I have use bezier/spline anymore.

Anyway, seems great so far from what I've tried.

-[Unknown]

@xebra
Copy link
Contributor Author

xebra commented Sep 30, 2018

@unknownbrackets Thank you for testing these. Your opinion is helpful to me.

Current HW tess on OpenGL is a little slower than Vulkan/DX11. It's probably because of using texture as tessellation data transfer. So, if we can use any generic buffer like TBO/UBO instead texture, maybe we can boost speed more.

@weihuoya
Copy link
Contributor

UBO is too small for this.

@zminhquanz
Copy link
Contributor

@xebra if HW Tessellation on GLES 2 are too slow , you can make a check to enable SW Tessellation on these

@LunaMoo
Copy link
Collaborator

LunaMoo commented Oct 1, 2018

I think the option of hardware tesselation should be renamed to something like "hardware spline/bezier curves" or "generate spline/bezier curves on gpu", as seeing the above comment and similar confusion in the past on the forums, people still think this is some kind of graphic enhancement or a speedhack that they absolutely have to use. While actually it's not even as compatible as software curves yet and might reduce quality in affected games, performance might also be pretty terrible for example on amd gpu's or older gpu's in general compared to software curves:].

@hrydgard
Copy link
Owner

hrydgard commented Oct 1, 2018

Agree that renaming it makes sense. Or removing it and autodetect whether to use it somehow... although that might be a bit unrealistic still.

@ghost
Copy link

ghost commented Oct 1, 2018

@xebra this could help #10842 or not?

@xebra
Copy link
Contributor Author

xebra commented Oct 7, 2018

I tried fixing it. Thanks to the note @unknownbrackets .

@hrydgard
Copy link
Owner

hrydgard commented Nov 4, 2018

I'm happy with it too - let's go for it.

@hrydgard hrydgard merged commit 22c0665 into hrydgard:master Nov 4, 2018
@ghost
Copy link

ghost commented Nov 5, 2018

@xebra what are those games that really benefit this Refactoring and even my android phone (Mali-450 MP) doesn't support HW Tesselation I can feel a little bit improvement in the performance?

Just Asking :)

@xebra
Copy link
Contributor Author

xebra commented Nov 5, 2018

Thank you for approval and merging. I'm happy that these were merged.
And, now I'm trying to fix the low-quality issue but it needs more time.

@Emulatorer Only on games using Spline/Bezier.
I don't know what game is using these but I just testing with FF4, Loco Roco, Pursuit Force, and some demos.
I think you should search in this issue/PR lists, and then you can get some information about those.
Please don't get your hopes up about HW Tessellation because this is still a highly experimental feature.

@ghost
Copy link

ghost commented Nov 7, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Spline/Bezier tesselation Issues related to the PSP's hardware tesselation support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants