Skip to content

Commit

Permalink
WarcParser: Improve compatibility with ARC variants
Browse files Browse the repository at this point in the history
This makes it so we can read the warcio example.arc file and the example in the ARC file format reference.

* Ignore up to 3 spurious linefeeds at the start of ARC records.
* Accept ARC records with the trailing linefeed missing.
* Accept (but currently ignore) the extra URL-record-v2 fields.
* Accept "0" in the ARC IP address field.

Fixes #82
  • Loading branch information
ato committed Feb 9, 2024
1 parent efcb28f commit c4e3ab7
Show file tree
Hide file tree
Showing 5 changed files with 411 additions and 172 deletions.
2 changes: 1 addition & 1 deletion src/org/netpreserve/jwarc/LengthedBody.java
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ public synchronized void consume() throws IOException {
position += buffer.remaining();
buffer.clear();
if (channel.read(buffer) < 0) {
throw new EOFException();
throw new EOFException("Expected to read " + (size - position) + " more bytes");
}
buffer.flip();
}
Expand Down
Loading

0 comments on commit c4e3ab7

Please sign in to comment.