Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDB Shredder: switch to a HOCON config #256

Closed
benjben opened this issue Dec 17, 2020 · 1 comment
Closed

RDB Shredder: switch to a HOCON config #256

benjben opened this issue Dec 17, 2020 · 1 comment
Milestone

Comments

@benjben
Copy link
Contributor

benjben commented Dec 17, 2020

Sister ticket to #250

RDB shredder and loader will be using the same config file.

Compression format from the config will have to be added to the SQS message.

Format:

{
  # Human-readable identificator, can be random
  "name": "Acme Redshift",
  # Machine-readable unique identificator, must be UUID
  "id": "123e4567-e89b-12d3-a456-426655440000",

  # Data Lake (S3) region
  "region": "us-east-1",
  # Shredder output compression, GZIP or NONE
  "compression": "GZIP",
  # SQS topic name used by Shredder and Loader to communicate
  "messageQueue": "messages",

  # Schema-specific format settings (recommended to leave all three groups empty and use TSV as default)
  "formats": {
    # Format used by default (TSV or JSON)
    "default": "TSV",
    # Schemas to be shredded as JSONs, corresponding JSONPath files must be present. Automigrations will be disabled
    "json": [ ],
    # Schemas to be shredded as TSVs, presence of the schema on Iglu Server is necessary. Automigartions enabled
    "tsv": [ ],
    # Schemas that won't be loaded
    "skip": [ ]
  },

  # Warehouse connection details
  "storage" = {
    # Database, redshift is the only acceptable option
    "type": "redshift",
    # Redshift hostname
    "host": "redshift.amazon.com",
    # Database name
    "database": "snowplow",
    # Database port
    "port": 5439,
    # AWS Role ARN allowing Redshift to load data from S3
    "roleArn": "arn:aws:iam::123456789012:role/RedshiftLoadRole",
    # DB schema name
    "schema": "atomic",
    # DB user with permissions to load data
    "username": "storage-loader",
    # DB password
    "password": "secret",
    # Custom JDBC configuration
    "jdbc": {"ssl": true},
    # MAXERROR, amount of acceptable loading errors
    "maxError": 10,
    "compRows": 100000
  },

  # Additional steps. analyze, vacuum and transit_load are valid values
  "steps": ["analyze"],

  # Observability and logging opitons
  "monitoring": {
    # Snowplow tracking (optional)
    "snowplow": null,
    # Sentry (optional)
    "sentry": null
  }
}
@benjben benjben added this to the Release 35 milestone Dec 17, 2020
@chuwy chuwy changed the title RDB shredder: swtich to a HOCON config RDB Shredder: swtich to a HOCON config Dec 24, 2020
@chuwy
Copy link
Contributor

chuwy commented Dec 24, 2020

The format here isn't final. It still misses buckets for stateless-discovery, which we'll implement in #263

@benjben benjben changed the title RDB Shredder: swtich to a HOCON config RDB Shredder: switch to a HOCON config Jan 11, 2021
@chuwy chuwy closed this as completed in 9dfcc82 Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants