When we run an analysis, what do we want to get back? #3

cgreene · 2016-07-19T23:16:32Z

We need to design our results json so that we can later visualize the most important results via the results viewer from the UI team.

autokad · 2016-07-19T23:16:51Z

F1 Score

autokad · 2016-07-19T23:18:33Z

Confusion Matrix

autokad · 2016-07-19T23:19:38Z

Y Hat

cgreene · 2016-07-19T23:41:36Z

prediction scores

yl565 · 2016-07-20T01:56:51Z

Feature ranking, a list of selected features. For GLM, F-stat/t-stat and p-values of predictors, model goodness of fit

dhimmel · 2016-07-27T17:27:50Z

We should probably save the sklearn estimators representing any transformations and the classifier. The sklearn doc recommends pickle for estimator persistence. Pickle is a binary serialization format in Python. @dcgoss, @awm33, and others -- can we store binary files in our database?

dcgoss · 2016-07-28T01:33:00Z

@dhimmel relevant link: https://wiki.postgresql.org/wiki/BinaryFilesInDB#What_is_the_best_way_to_store_the_files_in_the_Database.3F

dhimmel · 2016-07-28T01:53:53Z

Python object serialization to base64 encoded text

@dcgoss cool. I think we the following solution will work:

import base64
import pickle
payload = ['a', 'list', 2, 'encode']
byte_pickle = pickle.dumps(payload, protocol=4)
base64_text = base64.b64encode(byte_pickle).decode()
# Save `base64_text` using a text field in the database
byte_pickle = base64.b64decode(base64_text.encode())
pickle.loads(byte_pickle)

FYI base64_text, which would be saved the database is gANdcQAoWAEAAABhcQFYBAAAAGxpc3RxAksCWAYAAABlbmNvZGVxA2Uu.

awm33 · 2016-07-28T02:03:42Z

@dhimmel base64 text is usually fine for small sizes. Can also be stored as text in JSON fields. How big are the binaries? Is gANdcQAoWAEAAABhcQFYBAAAAGxpc3RxAksCWAYAAABlbmNvZGVxA2Uu a typical example?

dhimmel · 2016-07-28T14:01:02Z

I pickle-->base64-->text converted best_clf from the example notebook. The resulting string had 219,788 characters. I assume different types of classifiers will have different sizes.

If I add an extra step to compress, so the entire compression becomes:

byte_pickle = pickle.dumps(best_clf, protocol=4)
byte_pickle = zlib.compress(byte_pickle)
base64_text = base64.b64encode(byte_pickle).decode()

Then base64_text is only 11,468 characters. @awm33, is that okay?

awm33 · 2016-07-28T14:06:01Z

@dhimmel Compressing is a good move. If we think this would go into the tens of megabytes or more, we may want to consider using blob storage such as S3 or GCS. Postgres can handle gigabytes of text, but it's not great for performance.

cgreene added the task label Jul 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When we run an analysis, what do we want to get back? #3

When we run an analysis, what do we want to get back? #3

cgreene commented Jul 19, 2016

autokad commented Jul 19, 2016

autokad commented Jul 19, 2016

autokad commented Jul 19, 2016

cgreene commented Jul 19, 2016

yl565 commented Jul 20, 2016 •

edited

Loading

dhimmel commented Jul 27, 2016 •

edited

Loading

dcgoss commented Jul 28, 2016

dhimmel commented Jul 28, 2016 •

edited

Loading

awm33 commented Jul 28, 2016

dhimmel commented Jul 28, 2016

awm33 commented Jul 28, 2016

When we run an analysis, what do we want to get back? #3

When we run an analysis, what do we want to get back? #3

Comments

cgreene commented Jul 19, 2016

autokad commented Jul 19, 2016

autokad commented Jul 19, 2016

autokad commented Jul 19, 2016

cgreene commented Jul 19, 2016

yl565 commented Jul 20, 2016 • edited Loading

dhimmel commented Jul 27, 2016 • edited Loading

dcgoss commented Jul 28, 2016

dhimmel commented Jul 28, 2016 • edited Loading

Python object serialization to base64 encoded text

awm33 commented Jul 28, 2016

dhimmel commented Jul 28, 2016

awm33 commented Jul 28, 2016

yl565 commented Jul 20, 2016 •

edited

Loading

dhimmel commented Jul 27, 2016 •

edited

Loading

dhimmel commented Jul 28, 2016 •

edited

Loading