Skip to content

Business Object Format

Nate Weisz edited this page Jan 22, 2016 · 6 revisions

Registers a new Business Object Format (BOF) which is identified by usage, file type and a previously registered Business Object Definition.

Think of each format as providing the all the details needed to how to read and process a particular set of data files (ie file format, file structure, etc). Each format registered in Herd is a metadata object describing these specifics of the data layout for a previously registered Business Object Definition - down to details for each field in the schema.

Business Object Format is derived from Business Object Definition with the following relationships:

  • For one Business Object Definition, there could be many Business Object Formats defined.
  • For any given Business Object Format, there is only one Business Object Definition

The only data that can be updated are the format description, the schema, and the attributes. All other fields are cannot updated.

Updates to schema are highly discouraged. Updates are only to correct initial mistakes or typos. All other cases should be handled by creating a new format version.

New format versions must be "additive" to the previous version. Meaning, a new format version's schema can only add regular (non-partitioning) columns and all existing regular and partition columns must remain the same.

A Business Object Format must be associated with an existing Partition Key Group in order to use partition value ranges with Business Object Data Availability and Business Object Data Generate DDL services. *** Need links

Generate DDL:

Retrieves the DDL to initialize the specified type of the database system (e.g. Hive) by creating a table for the requested business object format. DDL service is designed to provide platform specific DDL, currently limited to Hive 13. Future enhancements will provide DDL for other DBMS’ which may have different methods for working and the DDL would be different. Notes:

  • If a generate DDL request is made for the business object format with partitionKey="partition" (case insensitive), then DDL is generated for a non-partitioning table.
  • If outputFormat = HIVE_13_DDL
    • If a partition column is also specified as a regular schema column (meaning it is present in the relative data file(s)), the schema column will be listed in the generated Hive DDL under a different name, which is created by prepending ORGNL_ prefix to the original column name.
    • Table name and all column names will be escaped using the Hive the proprietary back tick escaping to avoid Hive reserved words in DDL statement generation.
    • Single quote character, if not already escaped, will be escaped with an extra backslash in the DDL when used in delimiter, null value, escaped by character and column description.
    • A single backslash character will be escaped with an extra backslash in the DDL when used in delimiter and escaped by character. Any backslashes in null value and column description will not get escaped.
  • If the specified custom DDL name does not exist for the relative business object format, then a "Not Found" (status code 404) error will be returned.

Supported Business Object Format File Types Please see Business Object Data Generate DDL Post.

Conversion of Simple Data Types Please see Business Object Data Generate DDL Post.

Sample Hive DDL output for a table with multiple partitions and with "includeDropTableStatement" set to "true":

DROP TABLE IF EXISTS `exectest`;
  
CREATE EXTERNAL TABLE `exectest` (
    `ORGNL_trade_exec_dt` DATE,
    `avgsize` DOUBLE COMMENT 'This is an \'average size\' of the "test exec" data.',
    `totalsize` DOUBLE,
    `minsize` DOUBLE,
    `maxsize` DOUBLE,
    `totalcount` INT,
    `belowavg` INT)
PARTITIONED BY (`trade_rpt_dt` DATE, `trade_exec_dt` DATE)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' ESCAPED BY '\\' NULL DEFINED AS '\N'
STORED AS TEXTFILE;

Sample Hive DDL output for a non-partitioned table with "includeIfNotExistsOption" set to "true":

The "${non-partitioned.table.location}" token below is a placeholder for the actual location of the relative business object data for this non-partitioned table.

CREATE EXTERNAL TABLE IF NOT EXISTS `exectest` (
    `mydate` INT,
    `avgsize` DOUBLE,
    `totalsize` DOUBLE,
    `minsize` DOUBLE,
    `maxsize` DOUBLE,
    `totalcount` INT,
    `belowavg` INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\' NULL DEFINED AS '\N'
STORED AS TEXTFILE
LOCATION '${non-partitioned.table.location}';
Clone this wiki locally