Skip to content

Version 1.3

Latest
Compare
Choose a tag to compare
@bbengfort bbengfort released this 06 Jul 17:50

After some feedback about the length of time it was taking to create the edges in the NetworkX graph, we modified the FreqDist object to memoize calls to N, B, and M. This means that on a per edge basis, far fewer complete traversals of the distribution are carried out. Already we have observed minutes worth of performance improvements as a result. The Graph also now carries more information including edge weights by frequency, count, and by L1 norm. The Graph itself carries email count and file size information data alongside other information.

Released: Wednesday, July 6, 2016
Contributors: Benjamin Bengfort

Changes

  • Added a norm method to the FreqDist and associated M property for maximal value
  • Memoized the frequency distribution so multiple calls to M, B, and N would not be recomputed
  • Added more data to graph including filesize of mbox and number of emails
  • Added more weights to edges including freq (weight), norm, and count
  • Added tests for the stats module
  • Added an fdel to the memoize descriptor