Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build on AIX fails because of "restrict" pointers #620

Closed
smuehlst opened this issue Oct 7, 2015 · 5 comments
Closed

Build on AIX fails because of "restrict" pointers #620

smuehlst opened this issue Oct 7, 2015 · 5 comments
Labels
Milestone

Comments

@smuehlst
Copy link
Contributor

smuehlst commented Oct 7, 2015

I'm trying to build OpenJPEG on AIX with the xlc compiler. I invoked cmake like this:

CC=xlc cmake .

Running make fails with the following error:

(1)$ make
[  2%] Building C object src/lib/openjp2/CMakeFiles/openjp2.dir/dwt.c.o
"/home/stm/src/openjpeg_aix/src/lib/openjp2/openjpeg.h", line 75.9: 1506-224 (I) Incorrect pragma ignored.
"/home/stm/src/openjpeg_aix/src/lib/openjp2/dwt.c", line 764.20: 1506-1418 (E) Assignment between restrict pointers "fl" and "fw" is not allowed. Only outer-to-inner scope assignments between restrict pointers are allowed.
make: 1254-004 The error code from the last command is 1.


Stop.
make: 1254-004 The error code from the last command is 2.


Stop.
make: 1254-004 The error code from the last command is 2.

The problematic source code is this:

static void opj_v4dwt_decode_step2(opj_v4_t* l, opj_v4_t* w, OPJ_INT32 k, OPJ_INT32 m, OPJ_FLOAT32 c)
{
        OPJ_FLOAT32* restrict fl = (OPJ_FLOAT32*) l;
        OPJ_FLOAT32* restrict fw = (OPJ_FLOAT32*) w;
        OPJ_INT32 i;
        for(i = 0; i < m; ++i){
                ...
                fl = fw;
                fw += 8;
        }

If I understand the C99 standard correctly, then the assignment fl = fw is undefined behavior because fl and fw are declared at the same level. See example 4 in "6.7.3.1 Formal definition of restrict" in the C99 standard:

{
    int * restrict p1;
    int * restrict q1;
    p1 = q1; // undefined behavior
    {
        int * restrict p2 = p1; // valid
        int * restrict q2 = q1; // valid
        p1 = q2; // undefined behavior
        p2 = q2; // undefined behavior
    }
}
@malaterre
Copy link
Collaborator

Code first appeared in: dbeebe72#diff-1ea8331903552f12aee1bc3924cd0780R658

@fodevaux do you remember what the code was supposed to do. AFAIK I do not see the point of restrict keywords at the beginning of the funtion.

@fodevaux
Copy link
Collaborator

Sorry, I won't be able to help you in this one...

@Dzonatas
Copy link

The original patch I submitted had no change to the lossless compress/decompress and did not have the restrict keywords. I wrote in the (GCC specific) single vector variable before that was wrote out with the expanded, autovectorized code and wrapped with SSE intrinsics. Note: I did not apply it to the lossless code due to the 80/20 rule, anchored to specific integer math.

@SegHaxx
Copy link

SegHaxx commented Oct 13, 2015

I refactored those loops a bit from what Dzonatas wrote. The restrict pointers were an attempt to make GCC's autovectorizer happy, at the time if all pointers were not restrict then it would refuse to vectorize. Current GCC is smarter now and may not require this.

I guess for now, you could just drop the 'restrict'. This code is all a bit hairy and I should probably re-work it anyway. It was all an attempt to keep the icache/BTB footprint down by merging some of the original loops, but the optimizer seems insistent on duplicating them back anyway. :P

@SegHaxx
Copy link

SegHaxx commented Oct 13, 2015

Re-reading the spec, it seems using restrict is indeed invalid here. If that kills autovectorization, then oh well...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants