hashmark is a MySQL time-series database and PHP library for data point insertion and analytic queries.
- Numeric and string data types.
- PHP client library for collecting data points in preexisting apps.
- Custom scripts for analysis and periodic data point collection.
- SQL macros allowing queries to reference intermediate results from prior statements.
- Configurable date-based partitioning.
- Cache and database adapters provided by bundled Zend Framework 1.x components.
- High unit test coverage.
- MySQL aggregate functions:
AVG
,SUM
,COUNT
,MAX
,MIN
,STDDEV_POP
,STDDEV_SAMP
,VAR_POP
,VAR_SAMP
- MySQL aggregate functions eligible for DISTINCT selection:
AVG
,'SUM
,COUNT
,MAX
,MIN
- Time intervals for aggregates: hour, day, week, month, year
- MySQL time functions for aggregates of recurrence groups (e.g. "1st of the month"):
HOUR
,DAYOFMONTH
,DAYOFYEAR
,MONTH
multiQuery($scalarId, $start, $end, $stmts)
Perform multiple queries using macros to reference prior intermediate result sets. Internally supports many of the functions below.
values($scalarId, $limit, $start, $end)
Return samples within a date range.
valuesAtInterval($scalarId, $limit, $start, $end, $interval)
Return the most recent sample from each interval within a date range.
valuesAgg($scalarId, $start, $end, $aggFunc, $distinct)
E.g. return **"average value between date X and Y" or "volume of distinct values between date X and Y."
valuesAggAtInterval($scalarId, $start, $end, $interval, $aggFunc, $distinct)
Similar to
valuesAgg
except that results are grouped into a given interval, e.g. "average weekly value between date X and Y."
valuesNestedAggAtInterval($scalarId, $start, $end, $interval, $aggFuncOuter, $distinctOuter, $aggFuncInner, $distinctInner)
Aggregate values returned by
valuesAggAtInterval
, e.g. "average weekly high between date X and Y."
valuesAggAtRecurrence($scalarId, $start, $end, $recurFunc, $aggFunc, $distinct)
E.g. "peak value in the 8-9am hour between date X and Y."
changes($scalarId, $limit, $start, $end)
Return from a date range each sample's date, value, and change in value from the prior sample.
changesAtInterval($scalarId, $limit, $start, $end, $interval)
Similar to
changes
except thatvaluesAtInterval
provides the source data, e.g. "weekly value and its change (week-over-week) between date X and Y."
changesAgg($scalarId, $start, $end, $aggFunc, $distinct)
E.g. "peak value change between date X and Y."
changesAggAtInterval($scalarId, $start, $end, $interval, $aggFunc, $distinct)
Similar to
changesAgg
except thatchanges
provides the source data, e.g. "weekly peak value change (week-over-week) between date X and Y."
changesNestedAggAtInterval($scalarId, $start, $end, $interval, $aggFuncOuter, $distinctOuter, $aggFuncInner, $distinctInner)
Aggregate values returned by
changesAggAtInterval
, e.g. "average of weekly peak value changes (week-over-week) between date X and Y."
changesAggAtRecurrence($scalarId, $start, $end, $recurFunc, $aggFunc, $distinct)
E.g. "peak value change on Black Friday between year X and year Y."
frequency($scalarId, $limit, $start, $end, $descOrder)
Return unique values and their frequency between date X and Y.
moving($scalarId, $limit, $start, $end, $aggFunc, $distinct)
Return from a date range each sample's date, value, and the aggregate value at sample-time. E.g. "values and their moving averages between date X and Y."
movingAtInterval($scalarId, $limit, $start, $end, $interval, $aggFunc, $distinct)
Similar to
valuesAtInterval
except thatmoving
provides the data source, e.g. "the last value and its moving average from each week between date X and Y."
Main database tables:
scalars
: Metadata and current value of a named string or number, e.g. "featureX:optOut".samples_decimal
: Historical values of a numeric data points inscalars
.samples_string
: Historical values of a string data points inscalars
.
Hashmark_Client
supplies methods for updating a current value (in scalars
) and adding a historical sample (in samples_decimal
or samples_string
).
incr
($name, $amount = 1, $newSample = false)decr
($name, $amount = 1, $newSample = false)set
($name, $amount, $newSample = false)get
($name)
<?php
if ($userOptedOutOfFeatureX) {
$client->incr('featureX:optOut', 1, true);
}
To enable drop-in client calls to work without any prior setup, e.g. if "featureX:optOut" above did not yet exist, use $client->createScalarIfNotExists(true)
.
Each script is just a class that implements the small Hashmark_Agent
interface.
The Agent/StockPrice.php demo fetches AAPL's price from Google Finance and creates a historical data point.
Cron/runAgents.php
normally runs each agent on a configured schedule, but a manual run might look like:
<?php
$agent = Hashmark::getModule('Agent', 'StockPrice');
$price = $agent->run($scalarId);
$partition = Hashmark::getModule(Partition, '', $db);
$partition->createSample($scalarId, $price, time());
<?php
$core = Hashmark::getModule('Core', '', $db);
$scalarFields = array();
$scalarFields['name'] = 'featureX:optOut';
$scalarFields['type'] = 'decimal';
$scalarFields['value'] = 0; // Initial value.
$scalarFields['description'] = 'Opt-out requests for featureX.';
$scalarId = $core->createScalar($scalarFields);
$savedScalarFields = $core->getScalarById($scalarId);
$savedScalarFields = $core->getScalarByName('featureX:optOut');
<?php
$categoryId = $core->createCategory('Feature Trackers');
if (!$core->scalarHasCategory($scalarId, $categoryId)) {
$core->addScalarCategory($scalarId, $categoryId);
}
<?php
$milestoneId = $core->createMilestone('featureX initial release');
$core->setMilestoneCategory($milestoneId, $releaseCategoryId);
<?php
$analyst = Hashmark::getModule('Analyst', 'BasicDecimal', $db);
$sampleDateMin = '2012-01-01 00:00:00';
$sampleDateMax = '2012-02-01 00:00:00';
$limit = 10;
// Returns first 10 samples: their dates, values, and running/cumulative totals
$analyst->moving($scalarId, $limit, $sampleDateMin, $sampleDateMax, 'SUM');
// Now only distinct values affect aggregates
$analyst->moving($scalarId, $limit, $sampleDateMin, $sampleDateMax, 'SUM', true);
// Returns first 10 samples: their dates and values
$analyst->values($scalarId, $limit, $sampleDateMin, $sampleDateMax);
// Returns first 10 samples: their dates and values
$analyst->values($scalarId, $limit, $sampleDateMin, $sampleDateMax);
// Returns first 10 samples: their dates, values, and difference from prior sample
$analyst->changes($scalarId, $limit, $sampleDateMin, $sampleDateMax);
Most recently tested with PHP 5.4.0beta1, PHPUnit 3.6.0RC4, and MySQL 5.5.16.
- PHP 5.2+
- MySQL 5.1+
- PDO or MySQL Improved
- apc, xcache or memcache
For tests:
- PHPUnit 3+
- bcmath
CREATE DATABASE hashmark DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_unicode_ci;
- Import Sql/Schema/hashmark.sql
- Optionally repeat 1 and 2 for a separate unit test DB.
Hashmark uses Zend Framework's database component. Refer to the ZF guide for option values. Example:
<?php
$config['DbHelper']['profile']['unittest'] = array(
'adapter' => 'Mysqli',
'params' => array(
'host' => '127.0.0.1',
'port' => 5516,
'dbname' => 'hashmark_test',
'username' => 'msandbox',
'password' => 'msandbox'
)
);
Config/Hashmark-dist.php only includes a database config profile for cron scripts and unit tests. Normally the client app will supply its own connection instance. For example:
<?php
$this->hashmark = Hashmark::getModule('Client', '', $db);
...
$this->hashmark->incr('featureX:optOut', 1, true);
Hashmark also uses Zend Framework's cache component. Refer to the ZF guide for option values.
Using Memcache as an example, you might update $config['cache
'] in Config/Hashmark-dist.php:
$config['Cache'] = array(
'backEndName' => 'Memcached',
'frontEndOpts' => array(),
'backEndOpts' => array(
'servers' => array(
array('host' => 'localhost', 'port' => 11211)
)
)
);
See Config/Hashmark-dist.php comments.
$ php -f Test/Install.php
pass: Connected to DB with 'cron' profile in Config/DbHelper.php
pass: Found all Hashmark tables with 'cron' profile in Config/DbHelper.php
pass: Connected to DB with 'unittest' profile in Config/DbHelper.php
pass: Found all Hashmark tables with 'unittest' profile in Config/DbHelper.php
pass: Loaded Hashmark_BcMath module.
pass: Loaded Hashmark_Cache module.
pass: Loaded Hashmark_Client module.
pass: Loaded Hashmark_Core module.
pass: Loaded Hashmark_DbHelper module.
pass: Loaded Hashmark_Partition module.
pass: Loaded Hashmark_Agent_YahooWeather module.
pass: Loaded Hashmark_Test_FakeModuleType module.
pass: Built samples_1234_20111000 partition name with 'm' setting in Config/Partition.php.
agents
: Available Agent classes.agents_scalars
: Agent's schedules and last-run metadata.categories
: Groups to support front-end browsing, searches, visualization, etc.categories_milestones
: For example, to link category "ShoppingCart" with milestone "site release 2.1.2".categories_scalars
: For example, to link category "ShoppingCart" with data point "featureX:optOut".milestones
: Events to correlate with scalar histories, e.g. to visualize "featureX:optOut" changes across site releases that tweak "featureX".samples_analyst_temp
: When Hashmark creates temporary tables to hold intermediate aggregates, it copies this table's definition.samples_decimal
andsamples_string
: Identical except for one column. Hashmark copies their definitions when creating new partitions.id
auto-increment values are seeded from the associated scalar'ssample_count
column.scalars
: The table holds columns that define each data point's type (string or decimal), current value, and other metadata.
Zend Framework's style is followed pretty closely. Parent classes, some abstract, live in the root directory. Child classes live in directories named after their parents. Class names predictable indicate ancestors, e.g. [Hashmark_Analyst_BasicDecimal`, and file names mirror the class name's last part, e.g. Analyst/BasicDecimal.php.
Analyst/
BasicDecimal.php
Analyst.php
...
Agent/
YahooWeather.php
...
Agent.php
...
- Analyst.php: Abstract base. For example, implementation BasicDecimal.php performs list and statistical queries.
- Cache.php: Zend_Cache wrapper that adds namespaces.
- Client.php: Input API for client apps to update scalars and add historical data points.
- Core.php: Internal API to manage scalars, categories, milestones, etc.
- DbHelper.php: Abstract base for Zend_Db adapter wrappers.
- Hashmark.php: Defines the
getModule()
factory. - Module.php: Abstract base for classes produced by factory
Hashmark::getModule().
- Partition.php: Management and querying of MyISAM and MRG_MyISAM tables holding scalars' historical values.
- Agent.php: Interface relied upon by Cron/runAgents.php.
- Util.php: Static/stateless helper class with methods like
randomSha1()
.
Most test-related files live under Test/
, but a few like Config/Test.php
live outside so cases can cover code relying on naming conventions.
Contains SQL templates. For example, Sql/Analyst/BasicDecimal.php templates allow Analyst/BasicDecimal.php to reuse and combine statements as intermediate results toward final aggregates.
gcMergeTables.php
: Drops merge tables based on hard limits defined in Config/Cron.php
.
gcUnitTestTables.php
: Drops test-created tables and runs FLUSH TABLES
.
runAgents.php
: Finds and runs all agent scripts due for execution based on their configured frequency.
First: php -f Test/Analyst/BasicDecimal/Tool/writeProviderData.php
which Test/Analyst/BasicDecimal/Data/provider.php. The BasicDecimal
suite relies on a bcmath
and a series of generators in Test/Analyst/BasicDecimal/Tool/
to provide calculate a comprehensive set of expected test results.
- Run suites for all modules:
phpunit [--group name] Test/AllTests.php
- Run a specific module's suite:
phpunit [--group name] Test/[module]/AllTests.php