Skip to content

Metrics for Gobblin ETL

Issac Buenrostro edited this page Aug 21, 2015 · 7 revisions

Gobblin ETL comes equipped with instrumentation using [Gobblin Metrics](Gobblin Metrics), as well as end points to easily extend this instrumentation.

Operational Metrics

Each construct in a Gobblin ETL run computes metrics regarding it's performance / progress. Each metric is tagged by default with the following tags:

  • jobName: Gobblin generated name for the job.
  • jobId: Gobblin generated id for the job.
  • clusterIdentifier: string identifier the cluster / host where the job was run. Obtained from resource manager, job tracker, or the name of the host.
  • taskId: Gobblin generated id for the task that generated the metric.
  • construct: construct type that generated the metric (e.g. extractor, converter, etc.)
  • class: specific class of the construct that generated the metric.

This is the list of operational metrics implemented by default, grouped by construct.

Extractor Metrics

  • gobblin.extractor.records.read: meter for records read.
  • gobblin.extractor.records.failed: meter for records failed to read.
  • gobblin.extractor.extract.time: timer for reading of records.

Converter Metrics

  • gobblin.converter.records.in: meter for records going into the converter.
  • gobblin.converter.records.out: meter for records outputted by the converter.
  • gobblin.converter.records.failed: meter for records that failed to be converted.
  • gobblin.converter.convert.time: timer for conversion time of each record.

Fork Operator Metrics

  • gobblin.fork.operator.records.in: meter for records going into the fork operator.
  • gobblin.fork.operator.forks.out: meter for records going out of the fork operator (each record is counted once for each fork it is emitted to).
  • gobblin.fork.operator.fork.time: timer for forking of each record.

Row Level Policy Metrics

  • gobblin.qualitychecker.records.in: meter for records going into the row level policy.
  • gobblin.qualitychecker.records.passed: meter for records passing the row level policy check.
  • gobblin.qualitychecker.records.failed: meter for records failing the row level policy check.
  • gobblin.qualitychecker.check.time: timer for row level policy checking of each record.

Data Writer Metrics

  • gobblin.writer.records.in: meter for records requested to be written.
  • gobblin.writer.records.written: meter for records actually written.
  • gobblin.writer.records.failed: meter for records failed to be written.
  • gobblin.writer.write.time: timer for writing each record.
Clone this wiki locally