Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Ingestion Job rewritten on Spark #1020

Merged
merged 15 commits into from
Oct 6, 2020
Merged

Conversation

pyalex
Copy link
Collaborator

@pyalex pyalex commented Sep 29, 2020

As part of simplification of ingestion flow this PR proposes to create separate job per each FeatureSet per Batch / Streaming mode.

Spark implementation of Batch Ingestion has next flow

  1. Read from BQ or File (parquet)
  2. Map source columns
  3. Validate
  4. Write to Redis / deadletter files

What's not implemented (in comparison to Beam job):

  1. FeatureSet update (no need since there's only one FeatureSet per job, we can just restart the job)
  2. Feature values metrics / In-flight metrics
  3. BQ sink (deprecating support of ingesting in BQ)
  4. Redis TTL

New spark jobs utilize spark's standard metrics reporting. However, I had to fork StatsD Reporter to support metrics with tags (which is extensions of standard protocol (see https://github.com/prometheus/statsd_exporter#tagging-extensions )).

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes https://github.com/feast-dev/feast/projects/9#card-45746519

Does this PR introduce a user-facing change?:


ingestion-spark/pom.xml Outdated Show resolved Hide resolved
implicit val modesRead: scopt.Read[Modes.Value] = scopt.Read.reads(Modes withName _.capitalize)

val parser = new scopt.OptionParser[IngestionJobConfig]("IngestionJon") {
head("feast.ingestion.IngestionJob", "0.8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the version come from build-info?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not solved yet. Added ToDo

@pyalex pyalex changed the title WIP Offline Ingestion Job rewritten on Spark Batch Ingestion Job rewritten on Spark Oct 5, 2020
@pyalex pyalex added the kind/feature New feature or request label Oct 5, 2020
@pyalex pyalex force-pushed the spark branch 2 times, most recently from 58adb85 to f92c4f0 Compare October 5, 2020 10:05
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
Signed-off-by: Oleksii Moskalenko <[email protected]>
@pyalex pyalex force-pushed the spark branch 7 times, most recently from 2a6ec88 to f8772d1 Compare October 6, 2020 04:37
Signed-off-by: Oleksii Moskalenko <[email protected]>
@woop
Copy link
Member

woop commented Oct 6, 2020

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pyalex, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit e47903f into feast-dev:master Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants