-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-16830. IOStatistics API. #2069
HADOOP-16830. IOStatistics API. #2069
Conversation
8d1f9b4
to
b44f338
Compare
8a1693c
to
617c40c
Compare
617c40c
to
cf8391e
Compare
7eb94bc
to
0705b92
Compare
latest patch wires up stats collection from the workers on an s3a committer job, marshalls them as json in .pending/.pendingset files and then finally aggregates them into the _SUCCESS job summary file. Here's an example of a test run. 2020-07-03 16:47:08,981 [JUnit-ITestMagicCommitProtocol-testOutputFormatIntegration] INFO commit.AbstractCommitITest (AbstractCommitITest.java:loadSuccessFile(503)) - Loading committer success file s3a://stevel-ireland/test/ITestMagicCommitProtocol-testOutputFormatIntegration/_SUCCESS. Actual contents=
{
"name" : "org.apache.hadoop.fs.s3a.commit.files.SuccessData/1",
"timestamp" : 1593791227415,
"date" : "Fri Jul 03 16:47:07 BST 2020",
"hostname" : "stevel-mbp15-13176.local",
"committer" : "magic",
"description" : "Task committer attempt_200707120821_0001_m_000000_0",
...
"diagnostics" : {
"fs.s3a.authoritative.path" : "",
"fs.s3a.metadatastore.impl" : "org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore",
"fs.s3a.committer.magic.enabled" : "true",
"fs.s3a.metadatastore.authoritative" : "false"
},
"filenames" : [ "/test/ITestMagicCommitProtocol-testOutputFormatIntegration/part-m-00000" ],
"iostatistics" : {
"counters" : {
"committer_bytes_committed" : 4,
"committer_bytes_uploaded" : 0,
"committer_commits_aborted" : 0,
"committer_commits_completed" : 1,
"committer_commits_created" : 0,
"committer_commits_failed" : 0,
"committer_commits_reverted" : 0,
"committer_jobs_completed" : 1,
"committer_jobs_failed" : 0,
"committer_tasks_completed" : 1,
"committer_tasks_failed" : 0,
"stream_write_block_uploads" : 1,
"stream_write_block_uploads_data_pending" : 0,
"stream_write_bytes" : 4,
"stream_write_exceptions" : 0,
"stream_write_exceptions_completing_uploads" : 0,
"stream_write_queue_duration" : 0,
"stream_write_total_data" : 4,
"stream_write_total_time" : 0
},
"gauges" : {
"stream_write_block_uploads_data_pending" : 4,
"stream_write_block_uploads_pending" : 0,
},
"minimums" : { },
"maximums" : { },
"meanStatistics" : { }
}
}
I'm in a good mood here. Time for others to look at. |
|
||
| Category | Aggregation | | ||
|------------------|-------------| | ||
| `counter` | `min(0, x) + min(0, y)` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counters should be a simple addition right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but if something negative has got in I don't want it tainting the results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you mean max(0, x)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah -fixed
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/iostatistics.md
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/iostatistics.md
Outdated
Show resolved
Hide resolved
/** | ||
* Builder of the CounterIOStatistics class. | ||
*/ | ||
public interface CounterIOStatisticsBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Don't you think name is bit confusing here? We are providing support for min, max , mean as well here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's obsolete. Removed
snapshotIOStatistics(IOStatistics statistics) { | ||
|
||
IOStatisticsSnapshot stats = new IOStatisticsSnapshot(statistics); | ||
stats.snapshot(statistics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to call snapshot() again? I can see the snapshot calculation just happened in the previous call during constructor initialisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. cut this.
5e72e3f
to
64ed892
Compare
fad3d9d
to
243f186
Compare
+ and ability to set gauges, mix, max, means + methods to access/update these, and to get the raw refs to ease migration, pass them around, and better performance on oft-updated values (no map lookup) s3a instrumentation using this to update gauges on s3a block output stats; adding a base class for all those stats impls now they have a lot more commonality than a set of long counters. Change-Id: I0456b200509ffdb184c4ae86976b0b2789670c82
really rounding this out and moving s3a instrumentation to it Change-Id: I749a8ef78b33b4ea85de4d93eeefc7c7d71294a8
Change-Id: Id1d47856ce57ea8515b2a3d1de33cf88cd934389
all streasms which serve up statistics declare themselves as having the stream capability "iostatistics"; all those which forward getIOStatistics also do the same for stream capabilities. This gives a consistent way to probe for streams supporting the statistics even without invoking it. BufferedIOStatisticsOutputStream forwards syncable() calls after flushing itself. This benefits work in HADOOP-13327 where we need a BufferedOutputStream subclass to forward Syncable operations. and Syncable. Change-Id: Ib015f48396819873859b3fa644d4afd133e576d8
Change-Id: I1dd0b9e46d3e24c19f29f8a9e0bd16e762b74345
WiP there, but making it part of the public API so it can be used in other classes (e.g pendingset files) snapsnot implements the aggregate action covered in the specification. Change-Id: I4dfc04f4fe8f97a9c64f4fa5a636fa1da68bbb43 TODO: full round trip of a statistics instance
It's feeling way oveerengineered here, but at least now I can embed it in something like my committer stats. Which, if we get off the output streams on the write isn't completely useless Change-Id: Ifbeb3f546d1884cb92a853da5a54139cb98c03b2
This is does full end-to-end integration with the stats from individual file uploads (magic only) and task/job commit counters being aggregated through .pending and .pendingset writes, before making it through to _SUCCESS. Most of the work is in the hadoop-aws module; changes done in hadoop-common are focused on making that implementation straightforward, identifying troublespots etc. Overall, other than the fact we don't have worker thread level of statistic collection (hence no input stream stats), we do now have the ability to collect and aggregate statistics. Change-Id: I2cd82f7cfb5b1937cb69fb85555836a2b7b20f98
fully embracing the new stats in the S3A input stream, including unbuffer. Some test failures there related to total counts -something is overcounting. Lots of tuning of the implementation classes here; main API seems OK Change-Id: I0a627a54b421594c941458b82a12925383d865e3
Rebase onto trunk and merge changes which went in with HDFS-13934. That's already had a stats API intended for integration -but this wraps up. That uploader is going to be the first place where I add som min/max/mean tracking of upload performance because it is simple code. This is driving the changes in CounterStatisticsImpl (TODO: rename?) to make it easier to track and update these. New unit tests Change-Id: I66dab375cfe4d0ec386521dbfd83534f0ec32f40
* easy to track duration of operations * instrumened S3AInputStream to track time of GET requests * Instrumented S3A List operations to track time/count of LIST requests; The remote iterators returned in the S3A code all now serve these results. To aid with collection in applications, more classes implement IOStatisticsSource and pass down to their inner streams: * CompressionInputStream * CompressionOutputStream * LineReader LocatedFileStatusFetcher wlll aggregate stats coming from its filesystem list operations, so give total count/performance of operations against a store. See ITestLocatedFileStatusFetcher for use Change-Id: Iebe49341ae515c387194040c861e3002e1f366a2
- Improving API for duration tracking; testing it too - SelectInputStream counting right stats - both min and max stats have -1 as unset. Mixed feelings here, but setting min to 2^32 sucks too Change-Id: I4b1ba62dc619ea65308614f1a930642d4ce9c59b
still failing ITestS3AUnbuffer.testUnbufferStreamStatistics issue: valid regression or was the test previously broken. Change-Id: I218b484347737ef38822f9bf78fa18218254f4a7
-fixed failing unbuffer test -checkstyles -fighting findbugs -listing iterators toString methods print stats -as do others Change-Id: I06b1b21b2f6686d18ed80de24d4c30835a7521ba
This is actually based on my PoC of using this code through Parquet - we need to move it to using RemoteIterable through its chained work, and we want to keep collecting statistics. This is a movement and extension of the S3A listing code to do this. Making this available for Parquet &c forced me to make some of the functional classes in fs.impl public (the FunctionsRaisingIOE and WrappedIOException). -new package fs.functional for this stuff -moved the function declarations -created superclass for WrappedIOE, in case anything was using it. Really needs some tests of this stuff, as while Listing validates the correct codepaths, they don't try and break them Change-Id: I178509cc60e19b6f2fab09ffd15c6b6986c79095
Change-Id: I31a7a25624f6e10711ac6e7fa205be4bd7515a85
+marker tools will show stream stats @ verbose, shows that we can pick this up in our own tooling Change-Id: I4b0437b5179a43a65f2c5cc6d604dfe3593653f8
By making the methods to create a DurationTracker their own interface, and by moving that and the DurationTracker interface into the public package, it becomes possible to pass references to a factory into other classes, without having to know whether its been implemented by an IOStatisticsStore, or a simple stub. This is used in Listing to call into ListingCallbacks; it would also be how something like the SemaphoredExecutor could track wait times Change-Id: If75eb753bc84777aaaeb6e99a2ce1dad07eec8f6
SemaphoreExecutor does take a duration tracker factory, most of the S3A Statistics are now such factories. So we can track the duration of block acquisition time, etc. We're lagging here on how well this stuff is being incorporated back into the core filesystem statistics. I'm thinking * S3A FS implements IOStatsSource * There's an uber source which tracks everything * On close() of a stats source, its updated * And S3A Instrumentation and Statistics are updated to match. Really Statistics should just serve up the counters. ? What does this do for Metrics? Unsure. ? How to propagate stats from long-lived S3 subsidirary classes? We do that already in places. For duration tracking though, we'd want something which could either create two DurationTrackers (inefficient) or chain back into a second one which would let us forward this stuff from the inner to the outer, so using exactly the same System.currentTime values Change-Id: If4b404e45129fea4384b1943d3993d42e2474853
Change-Id: Ib54ff0505cf6f98c308da553f6c5b2b9b0e11daf
* If we add a forEach() method, we can log @ debug any IOStatistics, without the calling code needing to know anything about the interface. * If this is done for operations which go through remote iterators in our own code, we start logging operation costs ourselves. Slick. * Done that switch in S3AUtils, but not in production code elsewhere. Change-Id: I31167419c7a2755d6910127400820d6fee4945d6
* Docs * Javadocs * Ongoing work on making multihreading resilient to changes in the maps, so lining up for support for dynamic map expansion Change-Id: Id8220e40cdba9c91137bf0229f0daea067387103
-move to hadoop.util.functional -add unit tests -use in more code to see how well they work -tune close() and factory names I'm happy with them internally, we do need to mark as unstable at first though. Change-Id: I6493412c2de53468f8c6f6f2184be07db12c00b0
0b10316
to
a6935c0
Compare
💔 -1 overall
This message was automatically generated. |
Closing this; best to rebuild as a new PR atop trunk. For the next PR I plan to split into
trickier than I'd like, but, it means we can get the common one reviewed and in early |
IOStatistics API in hadoop common and s3afs impl. @mehakmeet has been working on an abfs version
Superceded by #2323 and #2324