Gramify is an analysis tool built with the purpose of enhancing attacks on complex passwords through the use of n-grams. Gramify offers three types of n-gram analysis
- Word
- Character
- Charset
These each perform n-grams at their respective levels.
Those unfamiliar with the term will most easily understand it as the n words that follow each other naturally. At a word level the sentence: "I am writing a program" can be split at 2-gram level into: ["I am", "am writing", "writing a", "a program"]
. at 3-gram level into: ["I am writing", "am writing a", "writing a program"]
. This can also be done at a character level for example with "abc defg" into the 3-gram ["abc", "bc ", "c d", " de", "def", "efg"]
. Logically you can imagine that using this on books, or song lyrics can turn into a powerful analytical form where you can extract quotes or find words commonly used together such as the words: "I am", "He is" instead of: "capricorn icecream".
The options offered by the gramify as of version 0.8 are as follows:
gramify.py word <input_file> <output_file> [--min-length=<int>] [--max-length=<int>]
--min-length refers to the minimum amount of words that ngrams should be. Ergo. at least <int>
words (Default: 1)
--max-length refers to the maximum amount of words that ngrams should be. Ergo. at least <int>
words (Default: 10)
Expecting long quotes or lyrics? Increase the --max-length, the penalty is often minor.
Example of input file format:
But now that I'm home feels like I'm in heaven
See I been travelin'
Ooh, whoah ooh whoah
I'm in heaven
Oh and I'm, feeling right at home
Feeling right at home
Feeling like I'm in heaven
It's like I'm in heaven
It's like I'm in heaven, ooh we oh
Output is unsorted data containing duplicates like the word "It's" and "I'm" as 1-gram or "I'm in heaven" as 3-gram. Sorting them as recommended (sort by occurrence) will put the best at the top, so you can HEAD the output file if the data is too much. Read more: https://en.wikipedia.org/wiki/N-gram
Some recommended commands would be:
gramify.py word <input_file> <output_file>
gramify.py word <input_file> <output_file> --max-length=32
hashcat -a0 -m0 <hashlist> <output_file> -r <sentence_mangling_rules>
hashcat -a1 -m0 <hashlist> <output_file> <output_file> -j"$ "
The options offered by the gramify as of version 0.8 are as follows:
gramify.py character <input_file> <output_file> [--min-length=<int>] [--max-length=<int>] [--rolling]
--min-length refers to the minimum amount of characters that ngrams should be. Ergo. at least <int>
characters (Default: 4)
--max-length refers to the maximum amount of characters that ngrams should be. Ergo. at least <int>
characters (Default: 8, or 32 if rolling)
--rolling explained later
k-grams or character-based n-grams are great for analyzing what passwords start or end with. Therefore it is split up into start_ mid_ and end_. The start_ is ideal with hashcat -a6 using it as the input dictionary with mask appended to it On the opposite end is the end_ which is great with hashcat -a7 using it as the end of the word with a mask prepended to it. Additionally you can use start_, mid_ and end_ in any combination using -a1 or combinatorX.exe (from hashcat-utils) to combine them back together in any combination available, resulting in probable passwords.
The start_ will contain ^.{--min-length, --max-length}
or everything from the start of the line until everything between --min-length and --max-length.
By default this would be the regex: ^.{4,8}
The end_ will contain .{--min-length, --max-length}$
or everything from the start of the line until everything between --min-length and --max-length.
By default this would be the regex: .{4,8}$
The mid_ section will contain the remainder of whatever is availabe: Seeing start_, mid_ and end_ as separate regex groups you could represent it as this: ^(.{--min-length, --max-length})(.*?)(.{--min-length, --max-length})$
--rolling addresses some of the limitations this has. The benefit of having them split is that you have 3 different parts that each have a specific function. But sometimes you're not looking for the specific start, mid, end but more the classic k-gram as specified before. This would be it. It produces one file that has character-based ngram for all lengths.
Some recommended commands would be:
gramify.py character <input_file> <output_file>
gramify.py character <input_file> <output_file> --max-length=128 (this would essentially empty out the mid_ file.
gramify.py character <input_file> <output_file> --rolling
hashcat -a0 -m0 <hashlist> start_<output_file> mid_<output_file> end_<output_file> -r <popular rules>
hashcat -a0 -m0 <hashlist> start_<output_file> mid_<output_file> end_<output_file> -r <top_1500.rule> -r <top_1500.rule>
hashcat -a6 -m0 <hashlist> start_<output_file> ?a?a?a?a -i
hashcat -a7 -m0 <hashlist> ?a?a?a?a end_<output_file> -i
hashcat -a1 -m0 <hashlist> start_<output_file> mid_<output_file>
hashcat -a1 -m0 <hashlist> start_<output_file> end_<output_file>
hashcat -a1 -m0 <hashlist> mid_<output_file> start_<output_file>
hashcat -a1 -m0 <hashlist> mid_<output_file> end_<output_file>
hashcat -a1 -m0 <hashlist> end_<output_file> start_<output_file>
hashcat -a1 -m0 <hashlist> end_<output_file> mid_<output_file>
Read more: https://nlp.stanford.edu/IR-book/html/htmledition/k-gram-indexes-for-wildcard-queries-1.html
gramify.py charset <input_file> <output_file> [--min-length=<int>] [--max-length=<int>] [--mixed]
--min-length refers to the minimum amount of characters that each word should have. Ergo. at least <int>
characters (Default: 4)
--max-length refers to the maximum amount of characters that each word should have. Ergo. at least <int>
characters (Default: 32)
--mixed do not make a distinction between upper and lowercase or upper, lowercase and numeric in two different passes
--filter make use of the start, mid and end filters to combine and only grab the first element, mid elements, or last element.
This type of n-gram is more focused on character set boundries. Moving from UPPERCASE to lowercase. From Digits to lowercase or vice versa. This way you're able to take the passwords (assuming a default --min-length of 4):
password123456 -> [password, 123456]
PASSword123123magicman -> [PASS, word, 123123, magicman]
THEBESTINTHEWORLD54321 -> [THEBESTINTHEWORLD, 54321]
there are a lot of things to say -> [there, things]
This can be great for extracting words or common patterns out of passwords, removing punctuation or discovering common themes. From here on we can use rules on our newly discovered words to find new passwords. Gramify builds on this concept by allowing an Uppercase character to go to a lowercase character, but only if it's the first in the word allowing the capture of items like PassWord123456 -> [Pass, Word, 123456]
.
An example of --filter using the above examples and the following available filters: solo, start, mid, and end, startmid, midend, startend:
python3 gramify.py charset hashmob.net_2022-07-03.found test.txt --filter 'start,mid,end,startmid,midend'
password123456 will be converted into [password, 123456] which will have:
password => start
password123456 => startend
123456 => end
PASSword123123magicman will be converted into [PASS, word, 123123, magicman] which will have:
PASS => start
word123123 => mid
magicman => end
PASSword123123 => startmid
word123123magicman => midend
password will be converted into [password]: which falls under 'solo' and therefore isn't written to a file as it's not part of our filters.
If you want even more options: using the --mixed will help with short words with many upper and lowercase like:
PaSSwOrd123123 -> [PaSSwOrd, 123123]
These will come in addition to the other passwords. Current settings do not allow for exclusive mixed generation.
Inspired by: https://github.com/hops/pack2 (https://github.com/hops/pack2/blob/master/src/cgrams.rs)