Skip to content

Storage

Nate Weisz edited this page Jan 10, 2016 · 2 revisions

A Storage registered in Herd is a metadata object that describes a specific physical location that can be used to store data (i.e. data files). Each Storage has a unique name.

Storage is derived from Storage Slatform with the following relationships:

  • for one Storage Platform, there could be many Storages defined
  • for any given Storage, there is only one Storage Platform
  • Storage corresponds to an S3 bucket, an HDFS cluster, a database, or a NFS server.

"S3_MANAGED" is the name of the special S3 Storage instance that contains business object data controlled by Herd. Any changes to business object data in the managed storage should be done via the Herd system.

Storage can define a list of custom metadata (set of attributes) that can be used to include credentials required to connect to the storage. For example, for an S3 bucket, the relative storage can define "bucket.name" and "key.prefix.velocity.template" attributes.

There are 4 storage attributes that have special meaning to the system. These attribute names are the defaults, but they can be modified by updating the Herd Web Application Configuration Options.

  • bucket.name - the name of the bucket associated with the storage. This is required if file existence validation is configured for the storage.
  • key.prefix.velocity.template - specifies the S3 key prefix velocity template. This is required if path prefix validation is configured for the storage.
  • validate.path.prefix - optional boolean that determines whether path prefix validation is performed on this storage. validate.file.existence - optional boolean that determines whether file existence validation is performed on this storage.
Clone this wiki locally