New classifier for alcoholic beverages #511

cuducos · 2020-01-03T21:01:38Z

What is the problem?

The problem was that digitized receipts were not machine-readable and we could not afford to properly run OCR in all images we had (although we've tried). However, a couple of months ago the Chamber of Deputies started to offer eletronic receipts.

Since we know their URL (thanks @giovanisleite for #501) and they are structured HTML documens (that is to say, machine-readable), we can now try a classifier that idenfies alcoholic beverages in the reimbursements (what is not allowed).

We just need to take extra care to check whether the full amount of the eletronic receipt was actually reimbursed (even without remark, sometimes the Chamber of Deputies cuts off alcholic beverages from the reimbusements).

How can this be addressed?

I think the classifier should:

get the contents of available eletronic receipt
parse them
test them agains a dictionary of possible names for alcoholic beverages (@brunopazzim's drafted one in the early days of Serenata)

Surely we might go first to a exploratory notebook at github.com/okfn-brasil/notebooks to test whether results are worth it!

Who could help with this issue?

Anyone 💜

cuducos added analysis data collection labels Jan 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New classifier for alcoholic beverages #511

New classifier for alcoholic beverages #511

cuducos commented Jan 3, 2020

New classifier for alcoholic beverages #511

New classifier for alcoholic beverages #511

Comments

cuducos commented Jan 3, 2020