Working with pivot tables
Given the following input, e.g. from a logfile:
2015-05-10 00:00:09 OK: 304 100 bytes
2015-05-10 00:00:48 GOOD: 200 102 bytes (ETag: 214ceb4b-980-3a7bbd9630480)
2015-05-10 03:00:49 ERROR: 404 512 bytes (Not found)
2015-05-11 00:00:17 OK: 304 102 bytes
2015-05-11 02:01:01 ERROR: 500 0 bytes (Internal Server Error)
2015-05-11 02:01:02 ERROR: 500 256 bytes (Internal Server Error)
...
We will parse this using sscanf()
, transforming the lines into arrays like the following:
["2015-05-10", "00:00:48", "GOOD", 200, 95, "ETag: 214ceb4b-980-3a7bbd9630480"]
We can the load this into our pivot table using the array offsets (if we had a map, we could use its string keys; for objects we'll pass references to the getters and for more complex situations we can pass closures). Putting it together, we get the following:
use io\streams\{TextReader, FileInputStream};
use util\data\PivotCreation;
$pivot= (new PivotCreation())
->groupingBy(2) // category
->groupingBy(3) // code
->spreadingBy(0) // date
->summing(4, 'bytes') // bytes
->create()
);
$reader= new TextReader(new FileInputStream('measures.log'));
while (null !== ($line= $reader->readLine())) {
$pivot->add(sscanf($line, '%[0-9-] %[0-9:] %[^:]: %d %d bytes (%[^)])'));
}
The resulting table will look something like this (using "b:" as an abbreviation for bytes - this becomes relevant once we sum on multiple columns):
.------------------------------------------------------- ~ ---------------------------.
| || Columns | |
| ||--------------------------------- ~ -| |
| Category | Count || 2015-05-10 | 2015-05-11 |- ~ -| Sum | Average |
|-----------|--------||---------------|---------------|- ~ -|------------|------------|
| OK | 2 || 1, b:100 | 1, b:102 |- ~ -| b:202 | b:101 |
| GOOD | 1 || 1, b:102 | |- ~ -| b:102 | b:102 |
| ERROR | 3 || 2, b:512 | 1, b:256 |- ~ -| b:768 | b:256 |
| ^- client | ^- 1 || ^- 1, b:512 | |- ~ -| ^- b:512 | ^- b:512 |
| ^- 404 | ^- 1 || ^- 1, b:512 | |- ~ -| ^- b:512 | ^- b:512 |
| ^- server | ^- 2 || ^- 1, b:0 | ^- 1, b:256 |- ~ -| ^- b:256 | ^- b:128 |
| ^- 500 | ^- 2 || ^- 1, b:0 | ^- 1, b:256 |- ~ -| ^- b:256 | ^- b:128 |
|-----------|--------||---------------|---------------|- ~ -|------------|------------|
| Total | 6 || b:714 | b:358 |- ~ -| b:1072 | b:178.7 |
`------------------------------------------------------- ~ ---------------------------´
The number of records grouped by the grouping columns can be retrieved via count()
. The aggregates can be accessed by passing the category to the respective methods.
$count= $pivot->count('OK'); // 2
$count= $pivot->count(); // 6
$count= $pivot->records('2015-05-10', 'OK'); // 1
$count= $pivot->records('2015-05-10'); // 4
$transferred= $pivot->column('2015-05-10', 'OK')['bytes']; // 100
$transferred= $pivot->column('2015-05-10')['bytes']; // 714
$transferred= $pivot->sum('OK')['bytes']; // 202
$transferred= $pivot->sum()['bytes']; // 1072
$average= $pivot->average('OK')['bytes']; // 101.0
$average= $pivot->average()['bytes']; // 178.7
We can dril down by the categories we grouped on by using the rows()
method. To calculate the distribution of categories in percent of the total, we'll use the count()
method.
$rows= $pivot->rows(); // ['OK', 'GOOD', 'ERROR']
// OK: 2 / 6 = 33.3%
// GOOD: 1 / 6 = 16.7%
// ERROR: 3 / 6 = 50.0%
$total= $pivot->count();
foreach ($rows as $cat) {
$count= $pivot->count($cat);
printf("%s: %d / %d = %.1f%%\n", $cat, $count, $total, $count / $total * 100);
}
// client: 1
// server: 2
foreach ($pivot->rows('ERROR') as $code) {
printf("ERROR %s: %dx\n", $row, $pivot->count('ERROR', $code));
}
It can also interesting to see a development over time, so we'll drill down based on the columsn instead.
$columns= $pivot->columns(); // ['2015-05-10', '2015-05-11']
// 2015-05-10: 714 / 1072 bytes = 66.6%
// 2015-05-11: 358 / 1072 bytes = 33.4%
$total= $pivot->total()['bytes'];
foreach ($columns as $date) {
$bytes= $pivot->column($date)['bytes'];
printf("%s: %d / %d bytes = %.1f%%\n", $date, $bytes, $total, $bytes / $total * 100);
}