Switch to classifier that supports Complement Model #2

dgourd · 2019-10-09T03:03:55Z

I'm not sure, but it looks like ClassifierReborn uses either a Gaussian or Multinomial model for classification. I think a Complement model is better for our purposes. Essentially, a complement model also uses the absence of features to make its classification while the other models only look at present features.

Also, complement models work better with skewed training datasets. The boundaries generated by other models get biased based on the frequency of a classification. The paper linked here goes into more of the advantages.

https://github.com/id774/naivebayes

dgourd · 2019-10-09T17:06:13Z

{:Accuracy=>0.2127193721157409, :Precision=>1.0, :Recall=>0.8509634820105108, :F1=>0.9194816540477577}

For testing, I randomly divided the sample dataset into quartiles. I used the first 3 quartiles for training and the last for classification testing with the above results. I just ran this once, but better metrics are when you aggregate 4 runs with each quartile being used as the classification set.

100% precision means that everything classified as relevant was actually cancer relevant. Recall means that ~15% of the cancer relevant articles were classified as not relevant. Accuracy is real low because our data set is very skewed. That's why its not included in F1 scores.

Results are very good already, and I just tokenized individual words in the combined title and abstract. Could be improved if we do some pre-processing for things like chemical formulas or switched to an n-gram tokenization process. If we consistently get a precision of 1, we can automatically classify those papers as cancer relevant and only have to adjudicate the remaining 15%.

dgourd added a commit that referenced this issue Oct 9, 2019

Quick Complement Naive Bayes implementation. #2

17cf394

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to classifier that supports Complement Model #2

Switch to classifier that supports Complement Model #2

dgourd commented Oct 9, 2019

dgourd commented Oct 9, 2019

Switch to classifier that supports Complement Model #2

Switch to classifier that supports Complement Model #2

Comments

dgourd commented Oct 9, 2019

dgourd commented Oct 9, 2019