Skip to content

Commit

Permalink
Don't use aligned loads in non-inlined funcs.
Browse files Browse the repository at this point in the history
I'm wanting things to stay in registers, but that's not realistic for
arguments.  Force inline the others.  May help #5699.
  • Loading branch information
unknownbrackets committed Mar 23, 2014
1 parent a26e6ce commit 56b83af
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 10 deletions.
15 changes: 9 additions & 6 deletions GPU/Math3D.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,11 @@ float Vec2<float>::Length() const
{
#if defined(_M_SSE)
float ret;
__m128 sq = _mm_mul_ps(vec, vec);
__m128 xy = _mm_loadu_ps(&x);
__m128 sq = _mm_mul_ps(xy, xy);
const __m128 r2 = _mm_shuffle_ps(sq, sq, _MM_SHUFFLE(0, 0, 0, 1));
const __m128 res = _mm_add_ss(sq, r2);
_mm_store_ps(&ret, _mm_sqrt_ss(res));
_mm_store_ss(&ret, _mm_sqrt_ss(res));
return ret;
#else
return sqrtf(Length2());
Expand Down Expand Up @@ -71,11 +72,12 @@ float Vec3<float>::Length() const
{
#if defined(_M_SSE)
float ret;
__m128 sq = _mm_mul_ps(vec, vec);
__m128 xyz = _mm_loadu_ps(&x);
__m128 sq = _mm_mul_ps(xyz, xyz);
const __m128 r2 = _mm_shuffle_ps(sq, sq, _MM_SHUFFLE(0, 0, 0, 1));
const __m128 r3 = _mm_shuffle_ps(sq, sq, _MM_SHUFFLE(0, 0, 0, 2));
const __m128 res = _mm_add_ss(sq, _mm_add_ss(r2, r3));
_mm_store_ps(&ret, _mm_sqrt_ss(res));
_mm_store_ss(&ret, _mm_sqrt_ss(res));
return ret;
#else
return sqrtf(Length2());
Expand Down Expand Up @@ -185,10 +187,11 @@ float Vec4<float>::Length() const
{
#if defined(_M_SSE)
float ret;
__m128 sq = _mm_mul_ps(vec, vec);
__m128 xyzw = _mm_loadu_ps(&x);
__m128 sq = _mm_mul_ps(xyzw, xyzw);
const __m128 r2 = _mm_add_ps(sq, _mm_movehl_ps(sq, sq));
const __m128 res = _mm_add_ss(r2, _mm_shuffle_ps(r2, r2, _MM_SHUFFLE(0, 0, 0, 1)));
_mm_store_ps(&ret, _mm_sqrt_ss(res));
_mm_store_ss(&ret, _mm_sqrt_ss(res));
return ret;
#else
return sqrtf(Length2());
Expand Down
8 changes: 4 additions & 4 deletions GPU/Math3D.h
Original file line number Diff line number Diff line change
Expand Up @@ -919,7 +919,7 @@ inline Vec3<int> Vec3<int>::FromRGB(unsigned int rgb)
}

template<>
inline unsigned int Vec3<float>::ToRGB() const
__forceinline unsigned int Vec3<float>::ToRGB() const
{
#if defined(_M_SSE)
__m128i c = _mm_cvtps_epi32(_mm_mul_ps(vec, _mm_set_ps1(255.0f)));
Expand All @@ -933,7 +933,7 @@ inline unsigned int Vec3<float>::ToRGB() const
}

template<>
inline unsigned int Vec3<int>::ToRGB() const
__forceinline unsigned int Vec3<int>::ToRGB() const
{
#if defined(_M_SSE)
__m128i c16 = _mm_packs_epi32(ivec, ivec);
Expand Down Expand Up @@ -973,7 +973,7 @@ inline Vec4<int> Vec4<int>::FromRGBA(unsigned int rgba)
}

template<>
inline unsigned int Vec4<float>::ToRGBA() const
__forceinline unsigned int Vec4<float>::ToRGBA() const
{
#if defined(_M_SSE)
__m128i c = _mm_cvtps_epi32(_mm_mul_ps(vec, _mm_set_ps1(255.0f)));
Expand All @@ -988,7 +988,7 @@ inline unsigned int Vec4<float>::ToRGBA() const
}

template<>
inline unsigned int Vec4<int>::ToRGBA() const
__forceinline unsigned int Vec4<int>::ToRGBA() const
{
#if defined(_M_SSE)
__m128i c16 = _mm_packs_epi32(ivec, ivec);
Expand Down

4 comments on commit 56b83af

@solarmystic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a very, very minor performance regression in certain games (Tekken 6, Monhun series) traced back to this commit. I guess the performance gains in 4772852 were nullfied to an extent.

I guess it's unavoidable to fix the crashes in #5699 (which I didn't experience on my Windows 64bit system).

@unknownbrackets
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will naturally hurt performance. That's why I was hoping for it to stick into a register rather than having to use a load.

-[Unknown]

@hrydgard
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder if the crash culprit wasn't actually your _store_ps to a local float, which could have been aligned however the compiler felt like. But should use loads properly anyway I guess.

@solarmystic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unknownbrackets
Hrmm... apparently I can reproduce the crash described in #5699 with Tekken DR if Hardware Transform is off, but only on the Windows x64 debug builds.

in the release builds it still doesn't happen. I discovered this when trying to acquire the debug stacktrace as per your request here (#5782 (comment)), and it crashed the moment i disabled HW Transform instead of the usual point (after the 3rd stage) during Armor King's Story Mode. A stack trace of that particular crash pointed to the Math3D.cpp file as the culprit, and a git reset --hard fixed it.

I was using a customized build with this particular commit reverted for better performance. It seems like this commit may be needed after all even for Windows if one were to use SW Transform with a Debug Build for a game that needs HW Transform off.

Please sign in to comment.