Skip to content

Commit

Permalink
Improved seekable format ingestion speed for small frame size
Browse files Browse the repository at this point in the history
As reported by @P-E-Meunier in #2662 (comment),
seekable format ingestion speed can be particularly slow
when selected `FRAME_SIZE` is very small,
especially in combination with the recent row_hash compression mode.
The specific scenario mentioned was `pijul`,
using frame sizes of 256 bytes and level 10.

This is improved in this PR,
by providing approximate parameter adaptation to the compression process.

Tested locally on a M1 laptop,
ingestion of `enwik8` using `pijul` parameters
went from 35sec. (before this PR) to 2.5sec (with this PR).
For the specific corner case of a file full of zeroes,
this is even more pronounced, going from 45sec. to 0.5sec.

These benefits are unrelated to (and come on top of) other improvement efforts currently being made by @yoniko for the row_hash compression method specifically.

The `seekable_compress` test program has been updated to allows setting compression level,
in order to produce these performance results.
  • Loading branch information
Cyan4973 committed Mar 10, 2023
1 parent d55a648 commit 1df9f36
Show file tree
Hide file tree
Showing 7 changed files with 18 additions and 13 deletions.
2 changes: 1 addition & 1 deletion contrib/seekable_format/examples/parallel_compression.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

#include "pool.h" // use zstd thread pool for demo

#include "zstd_seekable.h"
#include "../zstd_seekable.h"

static void* malloc_orDie(size_t size)
{
Expand Down
2 changes: 1 addition & 1 deletion contrib/seekable_format/examples/parallel_processing.c
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

#include "pool.h" // use zstd thread pool for demo

#include "zstd_seekable.h"
#include "../zstd_seekable.h"

#define MIN(a, b) ((a) < (b) ? (a) : (b))

Expand Down
17 changes: 10 additions & 7 deletions contrib/seekable_format/examples/seekable_compression.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
#define ZSTD_STATIC_LINKING_ONLY
#include <zstd.h> // presumes zstd library is installed

#include "zstd_seekable.h"
#include "../zstd_seekable.h"

static void* malloc_orDie(size_t size)
{
Expand Down Expand Up @@ -112,20 +112,23 @@ static char* createOutFilename_orDie(const char* filename)
return (char*)outSpace;
}

int main(int argc, const char** argv) {
#define CLEVEL_DEFAULT 5
int main(int argc, const char** argv)
{
const char* const exeName = argv[0];
if (argc!=3) {
printf("wrong arguments\n");
printf("usage:\n");
printf("%s FILE FRAME_SIZE\n", exeName);
if (argc<3 || argc>4) {
printf("wrong arguments \n");
printf("usage: \n");
printf("%s FILE FRAME_SIZE [LEVEL] \n", exeName);
return 1;
}

{ const char* const inFileName = argv[1];
unsigned const frameSize = (unsigned)atoi(argv[2]);
int const cLevel = (argc==4) ? atoi(argv[3]) : CLEVEL_DEFAULT;

char* const outFileName = createOutFilename_orDie(inFileName);
compressFile_orDie(inFileName, outFileName, 5, frameSize);
compressFile_orDie(inFileName, outFileName, cLevel, frameSize);
free(outFileName);
}

Expand Down
2 changes: 1 addition & 1 deletion contrib/seekable_format/examples/seekable_decompression.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#include <zstd.h> // presumes zstd library is installed
#include <zstd_errors.h>

#include "zstd_seekable.h"
#include "../zstd_seekable.h"

#define MIN(a, b) ((a) < (b) ? (a) : (b))

Expand Down
2 changes: 1 addition & 1 deletion contrib/seekable_format/tests/seekable_tests.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#include <stdio.h>
#include <assert.h>

#include "zstd_seekable.h"
#include "../zstd_seekable.h"

/* Basic unit tests for zstd seekable format */
int main(int argc, const char** argv)
Expand Down
4 changes: 2 additions & 2 deletions contrib/seekable_format/zstd_seekable.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ extern "C" {

#define ZSTD_SEEKABLE_MAXFRAMES 0x8000000U

/* Limit the maximum size to avoid any potential issues storing the compressed size */
#define ZSTD_SEEKABLE_MAX_FRAME_DECOMPRESSED_SIZE 0x80000000U
/* Limit maximum size to avoid potential issues storing the compressed size */
#define ZSTD_SEEKABLE_MAX_FRAME_DECOMPRESSED_SIZE 0x40000000U

/*-****************************************************************************
* Seekable Format
Expand Down
2 changes: 2 additions & 0 deletions contrib/seekable_format/zstdseek_compress.c
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,8 @@ size_t ZSTD_seekable_compressStream(ZSTD_seekable_CStream* zcs, ZSTD_outBuffer*
const BYTE* const inBase = (const BYTE*) input->src + input->pos;
size_t inLen = input->size - input->pos;

assert(zcs->maxFrameSize < INT_MAX);
ZSTD_CCtx_setParameter(zcs->cstream, ZSTD_c_srcSizeHint, (int)zcs->maxFrameSize);
inLen = MIN(inLen, (size_t)(zcs->maxFrameSize - zcs->frameDSize));

/* if we haven't finished flushing the last frame, don't start writing a new one */
Expand Down

1 comment on commit 1df9f36

@P-E-Meunier
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing, I believe the documentation could mention the trade-offs of small frame sizes in terms of memory use, speed and compression ratio.

Please sign in to comment.