Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: add apache arrow parquet library and writer #99288

Merged
merged 3 commits into from
Apr 6, 2023

Conversation

jayshrivastava
Copy link
Contributor

@jayshrivastava jayshrivastava commented Mar 22, 2023

cdc: add apache arrow parquet library

This commit installs the apache arrow parquet library for Go
at version 11. The release can be found here:
https://github.com/apache/arrow/releases/tag/go%2Fv11.0.0

This library is licensed under the Apache License 2.0.

Informs: #99028
Epic: None
Release note: None


util/parquet: create parquet writer library

This change implements a Writer struct in the new util/parquet package.
This Writer writes datums to the io.Writer sink
using a configurable parquet version (defaults to v2.6).

The package implements several features internally required to write in the parquet format:

  • schema creation
  • row group / column page management
  • encoding/decoding of CRDB datums to parquet datums
    Currently, the writer only supports types found in the TPCC workload, namely INT, DECIMAL, STRING
    UUID, TIMESTAMP and BOOL.

This change also adds a benchmark and tests which verify the correctness of the
writer and test utils for reading datums from parquet files.

Informs: #99028
Epic: None
Release note: None


changefeedccl: add parquet writer

This change adds the file parquet.go which contains
helper functions to help create parquet writers
and export data via cdcevent.Row structs.

This change also adds tests to ensure rows are written
to parquet files correctly.

Epic: None
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jayshrivastava jayshrivastava force-pushed the parquet-perf branch 2 times, most recently from e6caf6e to af6cc60 Compare March 23, 2023 15:56
@jayshrivastava jayshrivastava marked this pull request as ready for review March 23, 2023 17:18
@jayshrivastava jayshrivastava requested review from a team as code owners March 23, 2023 17:18
@jayshrivastava jayshrivastava requested review from samiskin and removed request for a team March 23, 2023 17:18
@miretskiy miretskiy self-requested a review March 23, 2023 17:18
@jayshrivastava jayshrivastava force-pushed the parquet-perf branch 3 times, most recently from a10630a to cffa798 Compare March 23, 2023 20:45
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments, but most are nits. I think this is an excellent start.

Reviewed 4 of 5 files at r1, 3 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava and @samiskin)


-- commits line 26 at r2:
is the version mention important?


pkg/ccl/changefeedccl/parquet.go line 31 at r2 (raw file):

// MaxParquetRowGroupSize is the maximal number of rows which can be written out to
// a single row group in a parquet file.
// TODO(jayant): customers may want to adjust the row group size based on rows instead of bytes

Meh... It's a bit of math.. I don't know if you need this todo.


pkg/ccl/changefeedccl/parquet.go line 34 at r2 (raw file):

var MaxParquetRowGroupSize = settings.RegisterIntSetting(
	settings.TenantWritable,
	"changefeed.format.parquet.max_row_group_size",

Do we want this to be cluster setting? Customers already specify format=parquet option.... we can just have
a format specific option rows_per_group or some such.


pkg/ccl/changefeedccl/parquet.go line 75 at r2 (raw file):

// files. It should not be required for reading files since all the necessary
// metadata to interpret columns will be written out to the file and read by a
// reader.

❤️ this comment.


pkg/ccl/changefeedccl/parquet.go line 94 at r2 (raw file):

func newParquetSchema(row cdcevent.Row) (*parquetSchemaDefinition, error) {
	fields := make([]schema.Node, 0)
	cols := make([]parquetColumn, 0)

you can probably allocate both of these as

cols := make([]parquetColumn, 0, len(row.ResultColumns()))

pkg/ccl/changefeedccl/parquet.go line 125 at r2 (raw file):

	})
	groupNode, err := schema.NewGroupNode("schema", parquet.Repetitions.Required,
		fields, defaultParquetSchemaFieldID)

you probably don't need to build up fields as you're building up columns, do you?
You could just iterate columns and extract node.


pkg/ccl/changefeedccl/parquet.go line 161 at r2 (raw file):

	}
	opts := []parquet.WriterProperty{parquet.WithCreatedBy("cockroachdb"),
		parquet.WithMaxRowGroupLength(1)}

I know we have discussed the replacement of the existing parquet encoding library for export some time in the future.
And, I certainly do not expect this replacement to happen as part of this PR. I do think though, that we can make
this future replacement much easier by introducing a standalone parquet encoder/writer package that does not
have any dependency on CDC code.

There are multiple benefits to this approach:

  1. As mentioned above, easy to swap out export implementation
  2. Better defined API -- parquet library deals with taking cockroach specific types (tree.Datum) and producing parquet files.
  3. Better testing -- it's just a lot easier to test encoding/decoding by throwing bunch of tree.Datums and seeing what happens.

The final benefit is that I don't think it's such a tall order -- the code as it exists right now is already pretty clean and nice.
With that in mind, here are my concrete suggestions:

  • Create parquet directory where parquet writer and related stuff will live (under changefeeedccl for now). This entire package will eventually be moved
    under sql or util package.

  • Just like the underlying arrow library supports options (parquet.WithMaxRowGroupLength), your NewParquetWriter should also take options:

type config struct {
  maxRowGroupLength int
}
type Option interface {
   apply(*config)
}

Note: by moving this stuff under parquet package, you can clean up naming a lot -- by removing parquet prefixes on many variables.

  • You will need to define small Schema struct which just has a method Add(name, *types.T) to add columns of specific type.
    (the newParquetSchema will of course use the above schema object to convert cdcevent.Row -> Schema)
    This function will take the Schema when constructing writer.
  • Similar change to the AddData method -- we don't want to take cdcevent.Row -- either take tree.Datums or even EncDatumRow, or define an iterator like interface.

This file -- i.e. changefeed specific wrappers for parquet will still remain because we do want to AddRow, and add event type stuff, etc. But that's a higher level package
that will be used by changefeeds.


pkg/ccl/changefeedccl/parquet.go line 171 at r2 (raw file):

}

// nonNilDefLevel represents a def level of 1, meaning that the value is non-nil.

would be nice to add a bit more context as to what those def levels are.
Perhaps link https://github.com/apache/parquet-format/blob/master/README.md#nested-encoding


pkg/ccl/changefeedccl/parquet.go line 178 at r2 (raw file):

// writeColBatch writes a value to the provided column chunk writer.
func writeColBatch(colWriter file.ColumnChunkWriter, value interface{}) (int64, error) {

what's the int result? number of items written?


pkg/ccl/changefeedccl/parquet.go line 191 at r2 (raw file):

		return w.WriteBatch([]parquet.FixedLenByteArray{value.(parquet.FixedLenByteArray)}, nonNilDefLevel, nil)
	default:
		panic("unimplemented")

let's return assertion failed error instead of panic.


pkg/ccl/changefeedccl/parquet.go line 208 at r2 (raw file):

		return w.WriteBatch([]parquet.FixedLenByteArray{}, nilDefLevel, nil)
	default:
		panic("unimplemented")

s/panic/error/


pkg/ccl/changefeedccl/parquet.go line 238 at r2 (raw file):

// AddData writes the updatedRow. There is no guarantee that the row will
// immediately be flushed to the output sink.
func (w *ParquetWriter) AddData(updatedRow cdcevent.Row, prevRow cdcevent.Row) error {

would it be better to pass in event type instead of the full prevRow?


pkg/ccl/changefeedccl/parquet.go line 270 at r2 (raw file):

func getEventTypeDatum(updatedRow cdcevent.Row, prevRow cdcevent.Row) tree.Datum {
	eventTypeDatum := tree.NewDString(parquetEventInsert)

you probably want to define insert/update dstrings as variables here to avoid creating new dstrings for those.
Also, see setupContextForRow in expr_eval where cdc expression computes op type.
It's okay to repeat the code, but maybe there is a way not to... More importantly, see that function regarding
differentiating between update and insert -- which requires withDiff option.


pkg/ccl/changefeedccl/parquet.go line 291 at r2 (raw file):

// By default, a column's repetitions are set to parquet.Repetitions.Optional,
// which means that the column is nullable.
func newParquetColumn(column cdcevent.ResultColumn) (parquetColumn, error) {

nit: perhaps makeParquetColumn is a better name (usually new returns *)


pkg/ccl/changefeedccl/parquet.go line 310 at r2 (raw file):

	case types.BoolFamily:
		result.node = schema.NewBooleanNode(colName, defaultRepetitions, defaultParquetSchemaFieldID)
		result.node.LogicalType()

is this a no-op call?


pkg/ccl/changefeedccl/parquet.go line 394 at r2 (raw file):

var parquetBoolEncoder parquetEncodeFn = func(d tree.Datum) (interface{}, error) {
	return bool(*d.(*tree.DBool)), nil

let's be a lot more verbose. Let's make sure we return an error if d is not DBool.
When we AddData, we are adding tree.Datums -- those are interfaces, so it's quite possible to make a mistake
(that's why say EncDatumRow has EnsureDecoded to guaranteed that encoded datum can be decoded as target type).

Same for all of the below decoders.

(Another option: if this is only intended to be used in tests... you could move these to test function... and then
keep those type assertions. but: I do think having encoder/decoder support is good)


pkg/ccl/changefeedccl/parquet_test.go line 122 at r2 (raw file):

			},
		},
	} {

nice, high level tests. I think if the underlying parquet writer only concerned itself with datum -> parquet conversion, the
a lower level test, plus a lower level benchmark could be written using randgen.RandDatum to generate random datums for testing.

@jayshrivastava jayshrivastava requested a review from a team as a code owner March 27, 2023 21:13
@jayshrivastava jayshrivastava force-pushed the parquet-perf branch 14 times, most recently from 1d47bf5 to 9831f4d Compare March 28, 2023 20:28
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 17 of 22 files at r4, 1 of 1 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava and @samiskin)


pkg/util/parquet/writer.go line 23 at r5 (raw file):

// Config stores configurable options for the Writer.
type Config struct {

config does not need to be exported.
External callers should use WithXXX to construct config.


pkg/util/parquet/writer.go line 35 at r5 (raw file):

}

type option interface {

since option will be used by external callers, please export it.
apply does not need to be exported though.


pkg/util/parquet/writer.go line 56 at r5 (raw file):

type WithVersion string

func (v WithVersion) apply(c *Config) error {

it's an interesting way to define options... I guess it's fine, though it's quite common to have something like:

type funcOpt func(c *Config) error
func (f funcOpt) apply(c *Config) error {
   return f(c)
}
func WithVersion(v string) option {
   return func(c *Config) error {
      ....
   }
}

pkg/util/parquet/writer.go line 156 at r5 (raw file):

		w.currentRowGroupWriter = w.writer.AppendBufferedRowGroup()
		w.currentRowGroupSize = 0
	}

I was thinking about how to better integrate this AddData method with the existing iterator in cdcevent.
One way is to .. change cdcevent iterator somehow...

Here is another idea: instead of AddData(), we do something like:

type Row struct {
  w *Writer
}

func (r Row) SetCol(idx int, tree.Datum) error {}
func (r Row) Close() error {
   w.currentRowGroupSize += 1
		if err := w.currentRowGroupWriter.Close(); err != nil {
			return err
		}
		w.currentRowGroupWriter = w.writer.AppendBufferedRowGroup()
		w.currentRowGroupSize = 0
                return nil
}

func (w *Writer) AddRow() (Row, func()) {
	if w.currentRowGroupWriter == nil {
		w.currentRowGroupWriter = w.writer.AppendBufferedRowGroup()
	}
        return Row{w: w}
}

pkg/util/parquet/writer_bench_test.go line 16 at r4 (raw file):

Previously, jayshrivastava (Jayant) wrote…

Here's the benchmark output.

BenchmarkParquetWriter-10        3896122              3112 ns/op       2958 B/op       48 allocs/op

Per our discussion, you have a plan on how to reduce allocs.
Use unsafe string to get bytes from datum; and use idea similar to
tree.DatumAlloc to remove allocations for single element batch arrays.


pkg/util/parquet/writer_bench_test.go line 35 at r5 (raw file):

	for i := 0; i < numCols; i++ {
		sch.columnTypes[i] = types.String
		sch.columnNames[i] = fmt.Sprintf("col%d", i)

Do we want to test different types? I suspect we do, and I also suspect this isn't done yet because we don't support all types.
Perhaps allocate few types you already support. Or... leave a todo.

@jayshrivastava jayshrivastava force-pushed the parquet-perf branch 2 times, most recently from 716083d to 4ab442b Compare April 5, 2023 19:46
Copy link
Contributor Author

@jayshrivastava jayshrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @samiskin)


pkg/util/parquet/writer.go line 23 at r5 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

config does not need to be exported.
External callers should use WithXXX to construct config.

Done.


pkg/util/parquet/writer.go line 35 at r5 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

since option will be used by external callers, please export it.
apply does not need to be exported though.

Done.


pkg/util/parquet/writer.go line 56 at r5 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

it's an interesting way to define options... I guess it's fine, though it's quite common to have something like:

type funcOpt func(c *Config) error
func (f funcOpt) apply(c *Config) error {
   return f(c)
}
func WithVersion(v string) option {
   return func(c *Config) error {
      ....
   }
}

Done.


pkg/util/parquet/writer.go line 156 at r5 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

I was thinking about how to better integrate this AddData method with the existing iterator in cdcevent.
One way is to .. change cdcevent iterator somehow...

Here is another idea: instead of AddData(), we do something like:

type Row struct {
  w *Writer
}

func (r Row) SetCol(idx int, tree.Datum) error {}
func (r Row) Close() error {
   w.currentRowGroupSize += 1
		if err := w.currentRowGroupWriter.Close(); err != nil {
			return err
		}
		w.currentRowGroupWriter = w.writer.AppendBufferedRowGroup()
		w.currentRowGroupSize = 0
                return nil
}

func (w *Writer) AddRow() (Row, func()) {
	if w.currentRowGroupWriter == nil {
		w.currentRowGroupWriter = w.writer.AppendBufferedRowGroup()
	}
        return Row{w: w}
}

Done. I added many assertions and tests that break them. I ended up using a []uint64 bitmap because reseting the util.FastIntMap would require reallocating it... I think. Using an []uint64 instead of uint64 lets us support more than 64 cols.


pkg/util/parquet/writer_bench_test.go line 16 at r4 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Per our discussion, you have a plan on how to reduce allocs.
Use unsafe string to get bytes from datum; and use idea similar to
tree.DatumAlloc to remove allocations for single element batch arrays.

Done.


pkg/util/parquet/writer_bench_test.go line 35 at r5 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Do we want to test different types? I suspect we do, and I also suspect this isn't done yet because we don't support all types.
Perhaps allocate few types you already support. Or... leave a todo.

Added a TODO.

@jayshrivastava jayshrivastava force-pushed the parquet-perf branch 3 times, most recently from bd19035 to b824800 Compare April 5, 2023 21:27
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava and @samiskin)


pkg/ccl/changefeedccl/parquet.go line 54 at r7 (raw file):

func AddData(
	writer *parquet.Writer, updatedRow cdcevent.Row, prevRow cdcevent.Row, returnDatums bool,
) ([]tree.Datum, error) {

I only see returnDatums parameter being used in the tests.
I'd rather not. It's okay if in the test you have a helper that iterates the row again to extract datums.


pkg/util/json/parser.go line 335 at r7 (raw file):

// (i.e. jsonString([]byte)).
// See https://groups.google.com/g/golang-nuts/c/Zsfk-VMd_fU/m/O1ru4fO-BgAJ
func UnsafeGetBytes(s string) ([]byte, error) {

Let's not export this. It's fine to copy this utility method into your library.
While doing this, we should add a todo to replace it w/ less unsafe version once we switch to go 1.20


pkg/util/parquet/writer.go line 97 at r7 (raw file):

// compression schemes, allocator, batch size, page size etc
func NewWriter(sch *SchemaDefinition, sink io.Writer, opts ...Option) (*Writer, error) {
	cfg := newConfig()

I would inline it; and I don't think you need to make a "new config" Just regular, stack allocated value is fine
(you can pass opt.apply(&cfg) below)


pkg/util/parquet/writer.go line 154 at r7 (raw file):

func (r *RowWriter) writeIdx(idx int) {
	r.colIdxMap[idx/64] = r.colIdxMap[idx/64] | (1 << (idx % 64))

r.colIdxMap[idx >> 6 ] |= 1<<(idx & 64)


pkg/util/parquet/writer.go line 170 at r7 (raw file):

	}
	if r.idxIsWritten(idx) {
		return errors.AssertionFailedf("previously wrote datum to row at idx %d", idx)

I don't know if this is worth it... I mean, if the only reason we keep this bitmap is to tell
caller not to override the same column multiple times -- I'd say let the caller go for it.
Why not?
If you don't set anything, or you only set some columns, you wind up writing tree.DNull...
It's one thing to check bounds, and/or types. But I feel like this is a bit too much. And, there could be an argument
that if the caller knows that it only has 1 column, why force it to set more columns?


pkg/util/parquet/writer.go line 200 at r7 (raw file):

// returned RowWriter must be used to write all datums in the row before
// the next call to AddData.
func (w *Writer) AddData() (*RowWriter, error) {

rename to AddRow() perhaps?


pkg/util/parquet/write_functions.go line 34 at r7 (raw file):

//
// This means any array below will not be in use outside the writeBatch
// function below.

very nice.

Copy link
Contributor Author

@jayshrivastava jayshrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @samiskin)


pkg/ccl/changefeedccl/parquet.go line 54 at r7 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

I only see returnDatums parameter being used in the tests.
I'd rather not. It's okay if in the test you have a helper that iterates the row again to extract datums.

Done.


pkg/ccl/changefeedccl/parquet.go line 74 at r8 (raw file):

// populateDatums writes the appropriate datums into the datumAlloc slice.
func populateDatums(updatedRow cdcevent.Row, prevRow cdcevent.Row, datumAlloc []tree.Datum) error {

This is its own function so we can use this production code in tests rather than copy the code in tests.


pkg/ccl/changefeedccl/parquet.go line 75 at r8 (raw file):

// populateDatums writes the appropriate datums into the datumAlloc slice.
func populateDatums(updatedRow cdcevent.Row, prevRow cdcevent.Row, datumAlloc []tree.Datum) error {
	datums := datumAlloc[:0]

I prefer this only because I dislike using an idx when you have an iterator like so

        idx := 0
	if err := updatedRow.ForEachColumn().Datum(func(d tree.Datum, _ cdcevent.ResultColumn) error {
		datums[idx] = d
		idx += 1
        })....

        datums[idx] = eventType

pkg/util/parquet/writer.go line 97 at r7 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

I would inline it; and I don't think you need to make a "new config" Just regular, stack allocated value is fine
(you can pass opt.apply(&cfg) below)

Done.


pkg/util/parquet/writer.go line 154 at r7 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

r.colIdxMap[idx >> 6 ] |= 1<<(idx & 64)

Done (and deleted)


pkg/util/parquet/writer.go line 170 at r7 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

I don't know if this is worth it... I mean, if the only reason we keep this bitmap is to tell
caller not to override the same column multiple times -- I'd say let the caller go for it.
Why not?
If you don't set anything, or you only set some columns, you wind up writing tree.DNull...
It's one thing to check bounds, and/or types. But I feel like this is a bit too much. And, there could be an argument
that if the caller knows that it only has 1 column, why force it to set more columns?

Per our discussion, we will now pass an []tree.Datum


pkg/util/parquet/writer.go line 200 at r7 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

rename to AddRow() perhaps?

Done.

Code quote:

AddData

pkg/util/json/parser.go line 335 at r7 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Let's not export this. It's fine to copy this utility method into your library.
While doing this, we should add a todo to replace it w/ less unsafe version once we switch to go 1.20

Done.

Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 16 files at r3, 1 of 22 files at r4, 1 of 9 files at r7, 7 of 8 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava and @samiskin)


pkg/ccl/changefeedccl/parquet.go line 21 at r8 (raw file):

type parquetWriter struct {
	inner      *parquet.Writer

you could just embed *parquet.Writer if you want (so that you don't need to reference inner)


pkg/ccl/changefeedccl/parquet.go line 65 at r8 (raw file):

	}

	return w.inner.AddRow(w.datumAlloc)

perfect.


pkg/sql/sem/tree/datum.go line 1043 at r8 (raw file):

// AsDDecimal attempts to retrieve a DDecimal from an Expr, returning a DDecimal and
// a flag signifying whether the assertion was successful.
func AsDDecimal(e Expr) (*DDecimal, bool) {

surprised it wasn't there before.


pkg/util/parquet/write_functions.go line 123 at r8 (raw file):

		return []byte{}, nil
	}
	const maxStrLen = 1 << 30 // Really, can't see us supporting input JSONs that big.

s/JSONs/string/

This commit installs the apache arrow parquet library for Go
at version 11. The release can be found here:
https://github.com/apache/arrow/releases/tag/go%2Fv11.0.0

This library is licensed under the Apache License 2.0.

Informs: cockroachdb#99028
Epic: None
Release note: None
Copy link
Contributor Author

@jayshrivastava jayshrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews and for teaching me your ways 🙏

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @samiskin)

@jayshrivastava
Copy link
Contributor Author

bors r=miretskiy

@jayshrivastava
Copy link
Contributor Author

bors r-

@craig
Copy link
Contributor

craig bot commented Apr 6, 2023

Canceled.

This change implements a `Writer` struct in the new `util/parquet` package.
This `Writer` writes datums the `io.Writer` sink
using a configurable parquet version (defaults to v2.6).

The package implements several features internally required to write in the parquet format:
- schema creation
- row group / column page management
- encoding/decoding of CRDB datums to parquet datums
Currently, the writer only supports types found in the TPCC workload, namely INT, DECIMAL, STRING
UUID, TIMESTAMP and BOOL.

This change also adds a benchmark and tests which verify the correctness of the
writer and test utils for reading datums from parquet files.

Informs: cockroachdb#99028
Epic: None
Release note: None
This change adds the file `parquet.go` which contains
helper functions to help create parquet writers
and export data via `cdcevent.Row` structs.

This change also adds tests to ensure rows are written
to parquet files correctly.

Epic: None
Release note: None
@jayshrivastava
Copy link
Contributor Author

bors r=miretskiy

@craig
Copy link
Contributor

craig bot commented Apr 6, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants