-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hash.* field set #426
Add hash.* field set #426
Conversation
type: group | ||
short: Hashes, usually file hashes. | ||
description: > | ||
The hash fields represent different hash algorithms and their values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend establishing an expected format of the hashes such as lowercase hex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do different implementations for a given hash algorithm sometimes use upper and lower case? If this happens, perhaps we need to address it.
But we also have to think of people copying a hash in their search box. If we mandate simple lowercasing on ingest, searching for an uppercase hash will not yield results.
I think we may want to leave this unaddressed in this PR, and explore the idea of using the lowercase
normalizer (normalizer, lowercase). It wasn't a big priority when we determined that protocol names should be lowercased, as they're typically only a few characters long, so no big deal re-typing them.
But lowercasing hashes by hand is another scale of annoyance :-)
Looking at our current lowercasing guidance, I realize that post-asciidoc move, they refer to a documentation section that no longer exists. We'll address that shortly.
How would the file hash be different from the process hash? Are they both hashes of files? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cwurm!
Looking at these field definitions, I wonder if we could trim down the amount of hashes to only the most widely used. We can document the field set so that it's clear people can add their own hashes as well there, if they use other ones (guidance: lowercase, underscores between words). This way we will avoid having to add more algorithms every month :-)
I think we should also consider giving guidance on how to query for a hash if someone is uncertain which algo was used to produce the hash. Of course in most cases (e.g. with a threat feed in hand) you'll know which type it is. But I'm thinking of people sending each other hashes. Not sure if this scenario is likely, though.
@andrewkroh Also brings up a good point about differences in hashing different kinds of payloads. Within ECS, I definitely want hashes to be useable for file, processes but also network payloads. If the FIM tells me the hash of a file, which is available for download via a webserver: would a network monitoring agent compute the same hash for the payload of that file transfer?
If we agree to trim down the list of hashes a bit, we'd need to get the most widely used ones in there. MD5 and SHA1, I suppose. But other than that, I'll have to rely on other people's take.
Perhaps @MikePaquette you can pitch in on this?
type: group | ||
short: Hashes, usually file hashes. | ||
description: > | ||
The hash fields represent different hash algorithms and their values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do different implementations for a given hash algorithm sometimes use upper and lower case? If this happens, perhaps we need to address it.
But we also have to think of people copying a hash in their search box. If we mandate simple lowercasing on ingest, searching for an uppercase hash will not yield results.
I think we may want to leave this unaddressed in this PR, and explore the idea of using the lowercase
normalizer (normalizer, lowercase). It wasn't a big priority when we determined that protocol names should be lowercased, as they're typically only a few characters long, so no big deal re-typing them.
But lowercasing hashes by hand is another scale of annoyance :-)
Looking at our current lowercasing guidance, I realize that post-asciidoc move, they refer to a documentation section that no longer exists. We'll address that shortly.
The subtext of my question about FIM vs network monitoring agent is: do we need to support additional meta-data per hashing result?
|
My idea was that process hashes would be the hashes of their executables (e.g. To make the expectations clear, it would be nice if we could have separate documentation for
I would think it's unlikely for hashes to be represented in different encodings. Hexadecimal is the only one I've ever seen, so from my perspective, it would be reasonable to expect that. Happy to add a note to the documentation as @andrewkroh suggested. |
Ah I should have been a bit clearer by what I meant here. I didn't mean I'm pretty sure it wouldn't for binary transfers, but perhaps this can happen for text files (think code files: js, php, etc). |
Another thing I'm wondering is whether we already need to make Is there a case where one would have both a |
@MikePaquette I'd like your take on a few things here, when you have a moment: Official hashes We want to trim down the list of hashes in this proposal, and tell people how to add additional hashes. The goal is we don't want to maintain a complete list of hashes (aka add new hashes every week). What's your take on which ones we should include? The idea would also be to document the hash field set such that if people use a different hash, they just nest it as well under Reusable The current proposal is to introduce hash as reusable at I'm questioning this, and wondering whether we actually expect both Events would look like: { "process": { "executable": "/usr/bin/bash" },
"hash": { "md5": "..." }
} and { "file": { "path": "/usr/bin/bash" },
"hash": { "md5": "..." }
} |
I think so. An example would be a process executable hash and a file hash representing either one or more files specified on the command line or a process reading/writing file(s) while it's running. I'm not sure if anything is generating this kind of data at the moment, but it's something I've thought about. Just in general, there can be multiple files in an event and in fact we already have this situation in Auditbeat with |
Since these are all "extended" level fields, and beats implementations that use them already exist, I am not opposed to including the entire set in the ECS spec. I think this will result in more compatibility than if we encourage users to define their own custom
From a search perspective, I can see both positions here. Allowing I think the multiple hashes in one event is the case that could tip the odds one way or the other. How are we doing to handle that case? Is using the |
Heads up on the leading wildcard queries. They are purely a Kibana trick, not a core Elasticserch feature. When querying e.g. With that said, I think I've been convinced that we need the reusability. If an event describes a file interaction by a process, and a hash for each is calculated, it should not be ambiguous which hash matches which entity. Finally, I think we still need the So with that said, I do think there would be value in trimming down the list somewhat, because the list is quite long, and I wonder how many of these hashes have been added only for completeness' sake. Or are all of these shades of SHA hashes actually being used by many threat indicator sources / detection systems? If they are all legitimately used, I could go with adding them all. But I'd need to be convinced :-) |
@cwurm When you fix the changelog conflict, please keep your line above the line for PR 439, to reduce chances of further conflicts, if we merge other things before this. It will also keep the entries loosely ordered by PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I am OK with the the longer list or the shorter list.
Ok, so my takeaway is three things to be done:
I'll make those changes. |
I've trimmed down the list of hashes to MD5, SHA1, SHA256, and SHA512.
Added @webmat @andrewkroh @MikePaquette Let me know if this looks good now. |
schemas/hash.yml
Outdated
The hash fields represent different hash algorithms and their values. | ||
|
||
Field names for common hashes (e.g. MD5, SHA1) are predefined. Add fields for other hashes | ||
by lowercasing the hash algorithm and using underscore separators as appropriate (snake case). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small clarification: "the hash algorithm name". I'd also give an example at the end of the sentence, something like "for example sha3_512
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this trimmed down list. Thanks @cwurm!
One small nit above, not the end of the world. Then I'm good to merge this.
Thanks @webmat, I fixed the description. @MikePaquette you requested changes to this, does this look good now? |
Adds a
hash.*
field set containing various hash algorithms and their values. It is meant to be nested under other entities, notablyfile
(file hashes) andprocess
(hash of the process executable).These fields are already used by the Auditbeat FIM (
file_integrity
) module (see docs), and in elastic/beats#11722 I'm proposing to add them to the Auditbeat System process dataset as well.Note: The Auditbeat FIM module puts these in a top-level
hash.*
object. I personally think it makes more sense to have it nested under the entity that is hashed (there might even be multiple hashes in one event), which is why I marked it asreusable.top_level: false
. File hashes should then go underfile.hash.*
.Resolves #388.