Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling hyper 0.12 on armv7-linux-androideabi with target-features=+neon fails with LLVM ERROR: ran out of registers during register allocation #55105

Open
Eijebong opened this issue Oct 15, 2018 · 13 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-bug Category: This is a bug. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Eijebong
Copy link
Contributor

root@7f26157a3837:~/hyper# cargo rustc --release -v --target "armv7-linux-androideabi" -- -C target-feature=+neon                                             
     Running `rustc --crate-name hyper src/lib.rs --crate-type lib --emit=dep-info,link -C opt-level=3 -C codegen-units=1 -C target-feature=+neon --cfg 'feature="__internal_flaky_tests"' --cfg 'feature="default"' --cfg 'feature="futures-cpupool"' --cfg 'feature="net2"' --cfg 'feature="runtime"' --cfg 'feature="tokio"' --cfg 'feature="tokio-executor"' --cfg 'feature="tokio-reactor"' --cfg 'feature="tokio-tcp"' --cfg 'feature="tokio-timer"' -C metadata=5c7c44dab8eed49e -C extra-filename=-5c7c44dab8eed49e --out-dir /root/hyper/target/armv7-linux-androideabi/release/deps --target armv7-linux-androideabi -L dependency=/root/hyper/target/armv7-linux-androideabi/release/deps -L dependency=/root/hyper/target/release/deps --extern bytes=/root/hyper/target/armv7-linux-androideabi/release/deps/libbytes-ddc5925e1332c4e2.rlib --extern futures=/root/hyper/target/armv7-linux-androideabi/release/deps/libfutures-169e26de4e2e0883.rlib --extern futures_cpupool=/root/hyper/target/armv7-linux-androideabi/release/deps/libfutures_cpupool-a7ff7f77e82e2fd1.rlib --extern h2=/root/hyper/target/armv7-linux-androideabi/release/deps/libh2-a6ab8093d4aaef1e.rlib --extern http=/root/hyper/target/armv7-linux-androideabi/release/deps/libhttp-a998474d6086d0cc.rlib --extern httparse=/root/hyper/target/armv7-linux-androideabi/release/deps/libhttparse-fe59ea708b984d2c.rlib --extern iovec=/root/hyper/target/armv7-linux-androideabi/release/deps/libiovec-0fbee7688c5d5997.rlib --extern itoa=/root/hyper/target/armv7-linux-androideabi/release/deps/libitoa-6968692ba2ead4d2.rlib --extern log=/root/hyper/target/armv7-linux-androideabi/release/deps/liblog-982ca16cb4a51ba3.rlib --extern net2=/root/hyper/target/armv7-linux-androideabi/release/deps/libnet2-eca709a2543aac40.rlib --extern time=/root/hyper/target/armv7-linux-androideabi/release/deps/libtime-d2ab1c7a701d0c5c.rlib --extern tokio=/root/hyper/target/armv7-linux-androideabi/release/deps/libtokio-f318b1cbbd06c200.rlib --extern tokio_executor=/root/hyper/target/armv7-linux-androideabi/release/deps/libtokio_executor-a34c5d48733eb612.rlib --extern tokio_io=/root/hyper/target/armv7-linux-androideabi/release/deps/libtokio_io-615c1ba46c31b299.rlib --extern tokio_reactor=/root/hyper/target/armv7-linux-androideabi/release/deps/libtokio_reactor-5f33b571b5aa2433.rlib --extern tokio_tcp=/root/hyper/target/armv7-linux-androideabi/release/deps/libtokio_tcp-dcd0b854ed32b20c.rlib --extern tokio_timer=/root/hyper/target/armv7-linux-androideabi/release/deps/libtokio_timer-b8925bce2a2b5c77.rlib --extern want=/root/hyper/target/armv7-linux-androideabi/release/deps/libwant-c71f954f09dd620c.rlib`                                                                         
LLVM ERROR: ran out of registers during register allocation
error: Could not compile `hyper`.
@est31
Copy link
Member

est31 commented Oct 15, 2018

Can reproduce this locally.

$ git clone https://github.com/hyperium/hyper
$ cd hyper
$ git checkout v0.12.10
$ cargo +nightly rustc --release -v --target "armv7-linux-androideabi" -- -C target-feature=+neon
[...]
LLVM ERROR: ran out of registers during register allocation                                                      
error: Could not compile `hyper`

Rustc version:

$ rustc +nightly -vV
rustc 1.31.0-nightly (4699283c5 2018-10-13)
binary: rustc
commit-hash: 4699283c5b549d1559c198123a67fef498aa6a44
commit-date: 2018-10-13
host: x86_64-unknown-linux-gnu
release: 1.31.0-nightly
LLVM version: 8.0

@SimonSapin
Copy link
Contributor

(This is blocking the Hyper upgrade in Servo.)

@est31
Copy link
Member

est31 commented Oct 15, 2018

I've traced down the bug to originate from the record_header_indices function in src/proto/h1/role.rs, here. If I remove all of its contents, compilation succeeds.

@est31
Copy link
Member

est31 commented Oct 15, 2018

This gives us this self-contained example:

pub struct Header<'a> {
    pub name: &'a [u8],
    pub value: &'a [u8],
}

pub struct HeaderIndices {
    name: (usize, usize),
    value: (usize, usize),
}


pub fn record_header_indices(bytes: &[u8], headers: &[Header], indices: &mut [HeaderIndices]) {
    let bytes_ptr = bytes.as_ptr() as usize;
    for (header, indices) in headers.iter().zip(indices.iter_mut()) {
        let name_start = header.name.as_ptr() as usize - bytes_ptr;
        let name_end = name_start + header.name.len();
        indices.name = (name_start, name_end);
        let value_start = header.value.as_ptr() as usize - bytes_ptr;
        let value_end = value_start + header.value.len();
        indices.value = (value_start, value_end);
    }
}

Compiled with the same invocation as above. cargo +nightly rustc --release -v --target "armv7-linux-androideabi" -- -C target-feature=+neon

For a workaround, splitting up the loop seems to have fixed the bug:

pub fn record_header_indices(bytes: &[u8], headers: &[Header], indices: &mut [HeaderIndices]) {
    let bytes_ptr = bytes.as_ptr() as usize;
    for (header, indices) in headers.iter().zip(indices.iter_mut()) {
        let name_start = header.name.as_ptr() as usize - bytes_ptr;
        let name_end = name_start + header.name.len();
        indices.name = (name_start, name_end);
    }
    for (header, indices) in headers.iter().zip(indices.iter_mut()) {
        let value_start = header.value.as_ptr() as usize - bytes_ptr;
        let value_end = value_start + header.value.len();
        indices.value = (value_start, value_end);
    }
}

@est31
Copy link
Member

est31 commented Oct 16, 2018

I've tried to make a C version that reproduces this, but failed. This is a Rust version very close to C that reproduces the bug:

pub struct Header {
    pub name: * const u8,
    pub name_len: usize,
    pub value: * const u8,
    pub value_len: usize,
}

pub struct HeaderIndices {
    name_a :usize,
    name_b: usize,
    value_a: usize,
    value_b: usize,
}


pub fn record_header_indices(bytes_ptr: usize, headers: * const Header, indices: *mut HeaderIndices, len: isize) {
    for i in 0 .. len {
        let mut indices = unsafe { &mut *indices.offset(i) };
        let header = unsafe { &*headers.offset(i) };
        let name_start = header.name as usize - bytes_ptr;
        let name_end = name_start + header.name_len;
        indices.name_a = name_start;
        indices.name_b = name_end;
        let value_start = header.value as usize - bytes_ptr;
        let value_end = value_start + header.value_len;
        indices.value_a = value_start;
        indices.value_b = value_end;
    }
}

But this C version doesn't (clang invocation was clang -cc1 test.c -triple armv7-linux-androideabi -O3 -target-feature +neon):

typedef unsigned __INT32_TYPE__ size_t;

typedef struct header {
    char *name;
    size_t name_len;
    char *val;
    size_t val_len;
} header;

typedef struct header_indices {
    size_t name_a;
    size_t name_b;
    size_t val_a;
    size_t val_b;
} header_indices;

void record_header_indices(size_t bytes_ptr, const header *headers, header_indices *indices, size_t len) {
    for (size_t i = 0; i < len; i++) {
        header_indices *idxs = &indices[i];
        const header *hdr = &headers[i];
        size_t name_start = ((size_t)hdr->name) - bytes_ptr;
        size_t name_end = name_start + hdr->name_len;
        idxs->name_a = name_start;
        idxs->name_b = name_end;
        size_t val_start = ((size_t)hdr->val) - bytes_ptr;
        size_t val_end = name_start + hdr->val_len;
        idxs->val_a = name_start;
        idxs->val_b = name_end;
    }
}

@Eijebong
Copy link
Contributor Author

Eijebong commented Oct 16, 2018

Here's the output of emit=llvm-ir

; ModuleID = 'foo.3vww23kh-cgu.0'
source_filename = "foo.3vww23kh-cgu.0"
target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "armv7-none-linux-android"

%Header = type { [0 x i32], i8*, [0 x i32], i32, [0 x i32], i8*, [0 x i32], i32, [0 x i32] }
%HeaderIndices = type { [0 x i32], i32, [0 x i32], i32, [0 x i32], i32, [0 x i32], i32, [0 x i32] }
%"unwind::libunwind::_Unwind_Exception" = type { [0 x i64], i64, [0 x i32], void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [0 x i32], [20 x i32], [1 x i32] }
%"unwind::libunwind::_Unwind_Context" = type { [0 x i8] }

; foo::record_header_indices
; Function Attrs: nounwind nonlazybind uwtable
define void @_ZN3foo21record_header_indices17h1de73b969897f0d0E(i32 %bytes_ptr, %Header* nocapture readonly %headers, %HeaderIndices* nocapture %indices, i32 %len) unnamed_addr #0 personality i32 (i32, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  %0 = icmp sgt i32 %len, 0
  br i1 %0, label %bb6.preheader, label %bb4

bb6.preheader:                                    ; preds = %start
  %min.iters.check = icmp ult i32 %len, 4
  br i1 %min.iters.check, label %bb6.preheader21, label %vector.memcheck

bb6.preheader21:                                  ; preds = %middle.block, %vector.memcheck, %bb6.preheader
  %iter.sroa.0.010.ph = phi i32 [ 0, %vector.memcheck ], [ 0, %bb6.preheader ], [ %n.vec, %middle.block ]
  br label %bb6

vector.memcheck:                                  ; preds = %bb6.preheader
  %scevgep = getelementptr %HeaderIndices, %HeaderIndices* %indices, i32 %len
  %scevgep14 = getelementptr %Header, %Header* %headers, i32 %len
  %1 = bitcast %Header* %scevgep14 to %HeaderIndices*
  %bound0 = icmp ugt %HeaderIndices* %1, %indices
  %2 = bitcast %HeaderIndices* %scevgep to %Header*
  %bound1 = icmp ugt %Header* %2, %headers
  %found.conflict = and i1 %bound0, %bound1
  br i1 %found.conflict, label %bb6.preheader21, label %vector.ph

vector.ph:                                        ; preds = %vector.memcheck
  %n.vec = and i32 %len, -4
  %broadcast.splatinsert19 = insertelement <4 x i32> undef, i32 %bytes_ptr, i32 0
  %broadcast.splat20 = shufflevector <4 x i32> %broadcast.splatinsert19, <4 x i32> undef, <4 x i32> zeroinitializer
  br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %3 = getelementptr inbounds %Header, %Header* %headers, i32 %index, i32 0, i32 0
  %4 = bitcast i32* %3 to <16 x i32>*
  %wide.vec = load <16 x i32>, <16 x i32>* %4, align 4
  %strided.vec = shufflevector <16 x i32> %wide.vec, <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>
  %strided.vec16 = shufflevector <16 x i32> %wide.vec, <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>
  %strided.vec17 = shufflevector <16 x i32> %wide.vec, <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>
  %strided.vec18 = shufflevector <16 x i32> %wide.vec, <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>
  %5 = sub <4 x i32> %strided.vec, %broadcast.splat20
  %6 = add <4 x i32> %5, %strided.vec16
  %7 = sub <4 x i32> %strided.vec17, %broadcast.splat20
  %8 = add <4 x i32> %7, %strided.vec18
  %9 = getelementptr inbounds %HeaderIndices, %HeaderIndices* %indices, i32 %index, i32 7
  %10 = getelementptr inbounds i32, i32* %9, i32 -3
  %11 = bitcast i32* %10 to <16 x i32>*
  %12 = shufflevector <4 x i32> %5, <4 x i32> %6, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %13 = shufflevector <4 x i32> %7, <4 x i32> %8, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %interleaved.vec = shufflevector <8 x i32> %12, <8 x i32> %13, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>
  store <16 x i32> %interleaved.vec, <16 x i32>* %11, align 4
  %index.next = add i32 %index, 4
  %14 = icmp eq i32 %index.next, %n.vec
  br i1 %14, label %middle.block, label %vector.body, !llvm.loop !1

middle.block:                                     ; preds = %vector.body
  %cmp.n = icmp eq i32 %n.vec, %len
  br i1 %cmp.n, label %bb4, label %bb6.preheader21

bb4:                                              ; preds = %bb6, %middle.block, %start
  ret void

bb6:                                              ; preds = %bb6.preheader21, %bb6
  %iter.sroa.0.010 = phi i32 [ %15, %bb6 ], [ %iter.sroa.0.010.ph, %bb6.preheader21 ]
  %15 = add nuw nsw i32 %iter.sroa.0.010, 1
  %16 = getelementptr inbounds %Header, %Header* %headers, i32 %iter.sroa.0.010, i32 0, i32 0
  %17 = load i32, i32* %16, align 4
  %18 = sub i32 %17, %bytes_ptr
  %19 = getelementptr inbounds %Header, %Header* %headers, i32 %iter.sroa.0.010, i32 3
  %20 = load i32, i32* %19, align 4
  %21 = add i32 %18, %20
  %22 = getelementptr inbounds %HeaderIndices, %HeaderIndices* %indices, i32 %iter.sroa.0.010, i32 0, i32 0
  store i32 %18, i32* %22, align 4
  %23 = getelementptr inbounds %HeaderIndices, %HeaderIndices* %indices, i32 %iter.sroa.0.010, i32 3
  store i32 %21, i32* %23, align 4
  %24 = getelementptr inbounds %Header, %Header* %headers, i32 %iter.sroa.0.010, i32 5
  %25 = bitcast i8** %24 to i32*
  %26 = load i32, i32* %25, align 4
  %27 = sub i32 %26, %bytes_ptr
  %28 = getelementptr inbounds %Header, %Header* %headers, i32 %iter.sroa.0.010, i32 7
  %29 = load i32, i32* %28, align 4
  %30 = add i32 %27, %29
  %31 = getelementptr inbounds %HeaderIndices, %HeaderIndices* %indices, i32 %iter.sroa.0.010, i32 5
  store i32 %27, i32* %31, align 4
  %32 = getelementptr inbounds %HeaderIndices, %HeaderIndices* %indices, i32 %iter.sroa.0.010, i32 7
  store i32 %30, i32* %32, align 4
  %exitcond = icmp eq i32 %15, %len
  br i1 %exitcond, label %bb4, label %bb6, !llvm.loop !3
}

; Function Attrs: nonlazybind uwtable
declare i32 @rust_eh_personality(i32, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #1

attributes #0 = { nounwind nonlazybind uwtable "target-features"="+v7,+thumb-mode,+thumb2,+vfp3,+d16,-neon,+neon" }
attributes #1 = { nonlazybind uwtable "target-cpu"="generic" "target-features"="+v7,+thumb-mode,+thumb2,+vfp3,+d16,-neon,+neon" }

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"RtLibUseGOT", i32 1}
!1 = distinct !{!1, !2}
!2 = !{!"llvm.loop.isvectorized", i32 1}
!3 = distinct !{!3, !2}`

SimonSapin added a commit to servo/hyper that referenced this issue Oct 16, 2018
rustc issue: rust-lang/rust#55105

Steps to reproduce:

```
rustup target add armv7-linux-androideabi
RUSTFLAGS="-Ctarget-feature=+neon" cargo build --target armv7-linux-androideabi --release
```

Output without this change:

```
   Compiling hyper v0.12.11 (/home/simon/projects/servo-deps/hyper)
LLVM ERROR: ran out of registers during register allocation
error: Could not compile `hyper`.
```
@SimonSapin
Copy link
Contributor

Thanks @est31 for reducing and finding a work-around! I’ve filed it at hyperium/hyper#1671

SimonSapin added a commit to servo/hyper that referenced this issue Oct 16, 2018
…levant targets

rustc issue: rust-lang/rust#55105

Steps to reproduce:

```
rustup target add armv7-linux-androideabi
RUSTFLAGS="-Ctarget-feature=+neon" cargo build --target armv7-linux-androideabi --release
```

Output without this change:

```
   Compiling hyper v0.12.11 (/home/simon/projects/servo-deps/hyper)
LLVM ERROR: ran out of registers during register allocation
error: Could not compile `hyper`.
```
bors-servo pushed a commit to servo/servo that referenced this issue Oct 16, 2018
WIP: Update hyper to 0.12

Left to do:
 - servo/webrender#3034
 - servo/hyper_serde#20
 - rust-lang/rust#55105
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/21644)
<!-- Reviewable:end -->
seanmonstar pushed a commit to hyperium/hyper that referenced this issue Oct 16, 2018
…M bug

rustc issue: rust-lang/rust#55105

Steps to reproduce:

```
rustup target add armv7-linux-androideabi
RUSTFLAGS="-Ctarget-feature=+neon" cargo build --target armv7-linux-androideabi --release
```

Output without this change:

```
   Compiling hyper v0.12.11 (/home/simon/projects/servo-deps/hyper)
LLVM ERROR: ran out of registers during register allocation
error: Could not compile `hyper`.
```
bors-servo pushed a commit to servo/servo that referenced this issue Oct 16, 2018
WIP: Update hyper to 0.12

Left to do:
 - servo/webrender#3034
 - servo/hyper_serde#20
 - rust-lang/rust#55105
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/21644)
<!-- Reviewable:end -->
bors-servo pushed a commit to servo/servo that referenced this issue Oct 16, 2018
WIP: Update hyper to 0.12

Left to do:
 - servo/webrender#3034
 - servo/hyper_serde#20
 - rust-lang/rust#55105
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/21644)
<!-- Reviewable:end -->
bors-servo pushed a commit to servo/servo that referenced this issue Oct 16, 2018
WIP: Update hyper to 0.12

Left to do:
 - servo/webrender#3034
 - servo/hyper_serde#20
 - rust-lang/rust#55105
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/21644)
<!-- Reviewable:end -->
@seanmonstar
Copy link
Contributor

Just a nudge that this should probably be triaged or labeled, as it's a bug in the LLVM output of the compiler. Once fixed, hyper can remove its workaround that split the loop into two loops.

@parched
Copy link
Contributor

parched commented Jan 13, 2019

Have you tried with target-feature=-d16, neon isn't supported with only 16 registers.

@ishitatsuyuki ishitatsuyuki added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state C-bug Category: This is a bug. labels Jan 13, 2019
@mati865
Copy link
Contributor

mati865 commented Mar 20, 2019

-C target-feature=-d16 solves the issue.

@seanmonstar
Copy link
Contributor

So should that be automatically assumed this target? I'd prefer to remove the hack from hyper.

@mati865
Copy link
Contributor

mati865 commented Jun 27, 2019

IIUC hyper didn't build because of user error (using neon cpu feature with only 16 registers) so workaround shouldn't be necessary.
The question remains whether Rust should act on it (like printing warning or enabling more registers).

@clintfred
Copy link
Contributor

clintfred commented Mar 26, 2020

This error went away for me this week, but I'm not sure why. I did apply some system updates (kubuntu 19.10) so perhaps clang fixed this upstream? I am able to use -C lto inside of cross without error

cross rustc --target i686-linux-android --release -p ironoxide-android -- -C lto now works and was giving me "ran out of registers" earlier.

$ apt search clang | grep install

clang/eoan,now 1:9.0-49~exp1 amd64 [installed]
clang-9/eoan,now 1:9-2 amd64 [installed,automatic]
libclang-common-9-dev/eoan,now 1:9-2 amd64 [installed,automatic]
libclang-cpp9/eoan,now 1:9-2 amd64 [installed,automatic]
libclang1-9/eoan,now 1:9-2 amd64 [installed,automatic]

Edit: We do depend on reqwest/hyper

@workingjubilee workingjubilee added the A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. label Mar 3, 2023
@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-bug Category: This is a bug. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants