After some feedback about the length of time it was taking to create the edges in the NetworkX graph, we modified the FreqDist
object to memoize calls to N
, B
, and M
. This means that on a per edge basis, far fewer complete traversals of the distribution are carried out. Already we have observed minutes worth of performance improvements as a result. The Graph also now carries more information including edge weights by frequency, count, and by L1 norm. The Graph itself carries email count and file size information data alongside other information.
Released: Wednesday, July 6, 2016
Contributors: Benjamin Bengfort
Changes
- Added a
norm
method to the FreqDist and associatedM
property for maximal value - Memoized the frequency distribution so multiple calls to
M
,B
, andN
would not be recomputed - Added more data to graph including filesize of mbox and number of emails
- Added more weights to edges including freq (weight), norm, and count
- Added tests for the stats module
- Added an fdel to the memoize descriptor