Skip to content

Business Object Data

Nate Weisz edited this page Mar 28, 2016 · 15 revisions

Registers a new instance of Business Object Data (BOD) in the Herd system. A BOD represents a block of actual data as stored in a set of data files. Each BOD instance is identified by a primary partition (required) and sub-partition (optional) values and a previously registered business object format.

BOD is a logical segment of a business object definition in a specified business object format defined by a primary partition value along with optional sub-partition values. Each BOD instance is associated with one or more storage units. Each Storage Unit is an abstract metadata object associated with one of your previously registered BOD instances and storages. BOD can also be associated with a list of custom metadata (attributes).

Some of BOD attributes may be required based on the Business Object Format registration.

The call to this endpoint fails when business object data is being registered against S3 storage that has enabled path validation without the S3 key prefix velocity template configured. For more information on storage attributes, see Storage Post.

Note: When registering business object data in S3_MANAGED storage, the following validations may be performed for each file being registered depending on Storage Attributes present on the destination Storage Ensure the S3 file path adheres to the S3 Naming Convention Ensure the file is not referenced by another business object data in this storage Ensure the file actually exists on S3 - see note on S3 consistency for important information

S3 Consistency This service reads from S3 in two cases: when auto-discovering file and when validating the presence of files. Users should be aware of S3 read consistency behavior discussed [here]|https://forums.aws.amazon.com/ann.jspa?annID=3112] and use the endpoint discussed [here|https://forums.aws.amazon.com/ann.jspa?annID=3112] that guarantees read-after-write to avoid consistency issues. The read operations in this service also use the endpoint that guarantees read-after-write.

Availability

Performs a search and returns a list of business object data status information for a range of requested business object data in the optionally specified multiple or single storage.

The Business Object Format must be associated with an existing partition key group in order to use partition value ranges with this service.

The request must contain either a single partition value filter or up to 5 partition value filters when passed inside partitionValueFilters wrapper. Only a single partition value range is allowed per request. Partition key is required when passing multiple partition value filters inside partitionValueFilters wrapper. When request contains multiple partition value filters, the system will check business object data availability for n -fold Cartesian product of the partition values specified, where n is a number of partition value filters (partition value sets).

If the same business object data is registered in multiple storage and no storage is specified, then a "Bad Request" (status code 400) error will be returned. To resolve this, please specify multiple or single storage in the request.

When both businessObjectFormatVersion and business object data version are not specified, the businessObjectFormatVersion has the precedence. For each partition value, as a first step, the latest businessObjectFormatVersion is determined by a sub-query, which does the following:

  • selects all available data for the specified business object format (disregarding businessObjectFormatVersion), the relative partition value, and the storage name
  • gets the latest "VALID" businessObjectFormatVersion value from the records selected in the above step

Partition Value Tokens This endpoint supports special tokens for minimum and maximum partition values which will then return the actual business object data key for the relative partitions. This will do a string comparison between the partition values. So the format of your trade date must match the standard to be effective.

Token Case Sensitive Description
${maximum.partition.value} Y The maximum available partition value for the relative primary or sub-partition that the business object data is registered with. When business object data version is not specified, t he maximum available partition value for "VALID" business object data is returned back.
${minimum.partition.value} Y The minimum available partition value for the relative primary or sub-partition that the business object data is registered with. When business object data version is not specified, the minimum available partition value for "VALID" business object data is returned back.

The special tokens cannot be specified with a partition value range. A "not found" (status code 404) error will be returned when:

  • a special token and business object data version are both specified and there are no business object data registered using the relative partition values
  • a special token is specified without a business object data version and there are no "VALID" business object data registered using the relative partition values

Business Object Data Availability and Availability Reason The table below captures how business object data availability and reason are determined based on the request and the relative business object data and storage unit existence and statuses.

Request Storage Non-Glacier Storage Unit Existence and/or Status Glacier Storage Unit Existence and/or Status Response Business Object Data Availability Response Business Object Data Availability Reason
ENABLED ENABLED or or does not exist Available
ENABLED Not available ARCHIVED
or does not exist Not available NO_ENABLED_STORAGE_UNIT
Does not exist ENABLED or or does not exist Not available NOT_REGISTERED
ENABLED ENABLED or or does not exist Available
ENABLED Not available ARCHIVED
or does not exist Not available NO_ENABLED_STORAGE_UNIT
Does not exist ENABLED or or does not exist Not available NOT_REGISTERED
ENABLED or or does not exist ENABLED Not available ARCHIVED
ENABLED or or does not exist Not available NO_ENABLED_STORAGE_UNIT
ENABLED or or does not exist Does not exist Not available NOT_REGISTERED

Generate DDL

This API requires the Business Object Data to be registered according to Herd S3 Naming Convention. *** UPDATE?

The Business Object Format for this Business Object Data must be associated with an existing Partition Key Group in order to use Partition Value Ranges with this service.

The request must contain either a single Partition Value Filter or up to 5 Partition Value Filters when passed inside partitionValueFilters wrapper. Only a single Partition Value Range is allowed per request. Partition key is required when passing mutiple partition value filters inside partitionValueFilters wrapper. When request contains multiple Partition Value Filters, the system will check Business Object Data Availability for n-fold Cartesian product of the Partition Values specified, where n is a number of Partition Value Filters (Partition Value sets).

Notes:

  1. When both businessObjectFormatVersion and Business Object Data Version are not specified, the businessObjectFormatVersion has the precedence. For each partition value, as a first step, the latest businessObjectFormatVersion is determined by a subquery, which does the following:
  • selects all available data for the specified Business Object Format (disregarding businessObjectFormatVersion), the associated Partition Value, and the Storage Name
  • gets the latest "VALID" businessObjectFormatVersion value from the records selected in the above step
  1. If all or some Business Object Data is not available for the specified partition values and allowMissingData flag is not set to "true", then an Not Found (status code 404) error will be returned. Please use Business Object Data Availability Post to check for availability prior to generating DDL.
  2. If a Generate DDL request is made for the Business Object Format with partitionKey="partition" (case insensitive) and business object data paritionValue="none" (case insensitive), then DDL should return a DDL which treats this business object data as a table, not a partition.
  3. If a partition column is also specified as a regular schema column (meaning it is present in the data file(s)), the schema column will be listed in the generated Hive DDL under a different name, which is created by prepending ORGNL_ prefix to the original column name.
  4. If using data which is NOT stored in the S3_MANAGED bucket, then the naming convention must adhere to the S3 managed bucket naming convention (S3 Key Prefix Get). *** NEEDS UPDATE This will allow the services to discover the partitions and generate the DDL properly.
  5. Table name and all column names will be escaped using the Hive the proprietary back tick escaping to avoid Hive reserved words in DDL statement generation.
  6. Single quote character, if not already escaped, will be escaped with an extra backslash in the DDL when used in delimiter, null value, escaped by character and column description.
  7. A single backslash character will be escaped with an extra backslash in the DDL when used in delimiter and escaped by character. Any backslashes in null value and column description will not get escaped.
  8. If the specified custom DDL name does not exist for the relative business object format, then a "Not Found" (status code 404) error will be returned.
  9. If the same Business Object Data is registered in multiple Storages and no Storages are specified, then a "Bad Request" (status code 400) error will be returned. To resolve this, please specify specific storage(s) of the "S3" Storage Platform type in the storageName collection of the request.
  10. If the same Business Object Data is registered in multiple Storages and these Storages have been provided in the request, the generated DDL will point to the first Storage based on the order of the list provided in the storageName collection

Supported Business Object Format File Types

  1. Currently, only Hive version 0.13.0 is supported (outputFormat=HIVE_13_DDL).
  2. Business object format fyle type values are case insensitive.
  3. When the business object format has file type not listed below, the DDL generation will fail with "Unsupported format file type" error.
Business Object Format File Type Description Stored As value for Hive 0.13.0
BZ Bzip2 compressed text files TEXTFILE
GZ Gzip compressed text files TEXTFILE
ORC Optimized Row Columnar (ORC) file format ORC
PARQUET Parquet (http://parquet.io/) is an ecosystem wide columnar format for Hadoop PARQUET
TXT Plain text files TEXTFILE

Conversion of Simple Data Types

Notes:

  1. Currently, only Hive version 0.13.0 is supported (outputFormat=HIVE_13_DDL).
  2. Data types are case insensitive.
  3. When the business object format schema uses data types not listed below, the DDL generation will fail with "Column "<COLUMN_NAME>" has an unsupported data type" error.
Data Type Size Notes Hive 0.13.0
Numeric Types
TINYINT 1-byte signed integer, from -128 to 127 TINYINT
SMALLINT 2-byte signed integer, from -32,768 to 32,767 SMALLINT
INT 4-byte signed integer, from -2,147,483,648 to 2,147,483,647 INT
BIGINT 8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 BIGINT
FLOAT 4-byte single precision floating point number FLOAT
DOUBLE 8-byte double precision floating point number DOUBLE
DECIMAL Note: Introduced in Hive 0.11.0 with a precision of 38 digits DECIMAL
DECIMAL p,s Note: Hive 0.13.0 introduced user definable precision and scale DECIMAL(p,s)
NUMBER DECIMAL
NUMBER p DECIMAL(p)
NUMBER p,s DECIMAL(p,s)
Date/Time Types
TIMESTAMP Note: Only available starting with Hive 0.8.0
Strings: JDBC compliant java.sql.Timestamp format "YYYY-MM-DD HH:MM:SS.fffffffff" (9 decimal place precision)
TIMESTAMP
DATE Note: Only available starting with Hive 0.12.0 DATE
String Types
STRING specifies variable-length character data. The maximum length is 2 gigabytes. STRING
VARCHAR n Note: Only available starting with Hive 0.12.0
Note: Varchar types are created with a length specifier (between 1 and 65355)
VARCHAR
VARCHAR2 n VARCHAR
CHAR n Note: Only available starting with Hive 0.13.0 CHAR
Misc Types
BOOLEAN BOOLEAN
BINARY Note: Only available starting with Hive 0.8.0 BINARY

Multiple Partitioning Support (Sub-partition Auto-discovery) When the relative business object format has more than one partition column, each sub-partition will be added via a separate ALTER TABLE <table_name> ADD PARTITION ... statement. The relative sub-partitions will be auto-discovered using the storage files registered with this business object data in the relative storage location. For the multiple partitioning support, all sub-partition data (that excludes top level partition) shall be placed in the relative sub-directories that are created based on the format /=<sub- partition -value> . For example, if a business object has a three level partitioning, the relative data files shall be placed in S3 according to the following directory format: s3://< bucket name>/ /=/=/ where = /< data provider>/< usage >/< file type>/< business object definition name>/ schm-v< format version number>/data-v< data version number>/= Notes: The partition column names in the directory path are searched in a case-insensitive manner. The auto-discovery also has support to discover sub-partitions when all underscores are replaced with hyphens in the partition column names used in the directory paths.

Clone this wiki locally