Readable model dump. #43

BaiGang · 2015-11-18T02:11:21Z

Hi,

Currently all learning methods in wormhole save resulted models in binary format. This is pretty well in cases of solving machine learning competitions, i.e training and predicting both using wormhole components. However in more general cases when we train the models offline and want to apply them in an online component (in our case it's a server running on JVM), the binary format results in some inconvenience. So a readable model output in text format (or other exchangeable format such as protobuf) is highly expected.

Thanks,
Gang

BaiGang · 2015-11-20T05:29:58Z

I address the readable dump of DiFacto model by parsing the binary file saved via SaveModel, i.e Save in KVStore and IVal AdaGradEntry in DiFacto.

Ideally we can abstract the Entry data and the internal storage in KVStore using protobuf. This will make io implementations neat and make our model results exchangeable in various language and platforms.

BaiGang · 2015-11-20T05:31:49Z

So my proposal above is mainly related to ps-lite. I'll try it out and make a WIP pull request there.

mli · 2015-11-28T22:34:06Z

yeah, that's good suggestion.

i'll add a tool to convert the binary model into an ascii format.

at the same time, i'm trying to refact fm into a separate repo called dmlc/difacto, with two major changes

having a single machine multiple threads implementation, which should process data <100GB easily on a single machine. and also will be easy to have python/R bindings
switch to the dev branch of ps-lite, which is a simplified version of the master branch. mxnet is using it now and it works well

i hope to get it done in a week.

CNevd · 2015-11-29T00:09:49Z

Very nice, Look forward to the changes :)

BaiGang · 2015-11-30T13:26:33Z

Thanks and looking forward to the changes. : )

BaiGang · 2015-12-30T09:40:50Z

Any update on this?

I'm also interested in the refactor of ps-lite. It has no update for two months. So is it finalized?

formath · 2016-07-01T10:52:51Z

@BaiGang "I address the readable dump of DiFacto model by parsing the binary file saved via SaveModel". Can you share me the parsing method? Thanks.

CNevd · 2016-07-01T12:14:55Z

see dump.cc

toughJack · 2016-08-24T10:37:58Z

@BaiGang @mli
When I dump the model to text format, I found original feature ids are converted into new ids (large numbers). If I want to keep the original feature ids in model, how do I make it work?
Thanks!

mli · 2016-08-24T15:38:27Z

there is a revert key id function, I guess it is called in the data reader
On Wed, Aug 24, 2016 at 3:37 AM Xiaoqiang Feng [email protected]
wrote:

@BaiGang https://github.com/BaiGang @mli https://github.com/mli

When I dump the model to text format, I found original feature ids are
converted into new ids (large numbers). If I want to keep the original
feature ids in model, how do I make it work?
Thanks!

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#43 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAZv4fOh8TDMC4sKo4x5hG9lwtbN_BU8ks5qjB8GgaJpZM4GkXpd
.

formath · 2016-08-25T02:50:13Z

@toughJack Maybe you should change code in localizer.h like this.

else if (sizeof(I) == 8) {
#pragma omp parallel for num_threads(nt_)
    for (size_t i = 0; i < idx_size; ++i) {
      //pair_[i].k = ReverseBytes(blk.index[i]);
      pair_[i].k = blk.index[i];
      pair_[i].i = i;
    }

CNevd · 2016-08-25T04:41:33Z

@formath @toughJack see issues/8
just comment //pair_[i].k = ReverseBytes(blk.index[i]); will make ranges of servers imbalanced if your max key is small

mli · 2016-08-25T04:43:06Z

you manually set the max_key, so the servers will only partition that key
range
On Wed, Aug 24, 2016 at 9:41 PM CNevd [email protected] wrote:

@formath https://github.com/formath @toughJack
https://github.com/toughJack see issues/8
CNevd/Difacto_DMLC#8
just comment //pair_[i].k = ReverseBytes(blk.index[i]); will make ranges
of servers imbalanced if your max key is small

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#43 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAZv4Z4OM_YImOreDvnID5CcrS-tfAyHks5qjRz-gaJpZM4GkXpd
.

CNevd · 2016-08-25T04:45:25Z

@mli yes:)

formath · 2016-08-25T06:39:50Z

@CNevd Good suggestion. I always generate balanced uint64 feature id offline, so miss that. If max key is small, setting max_key is truly right.

toughJack · 2016-08-25T09:11:34Z

@mli
I noticed that you mentioned single machine multiple threads implementation of FM.
"1. having a single machine multiple threads implementation, which should process data <100GB easily on a single machine. and also will be easy to have python/R bindings"
I did not find any manual for single machine multiple threads version.
I wonder whether it works ? If it works, how to set the relative parameters and run?
Thanks

mli · 2016-08-25T17:41:54Z

just run multiple workers on the same machine
try to use lbfgs implemented on dmlc/difacto

On Thu, Aug 25, 2016 at 2:11 AM, Xiaoqiang Feng [email protected]
wrote:

@mli https://github.com/mli
I noticed that you mentioned single machine multiple threads
implementation of FM.
"1. having a single machine multiple threads implementation, which should
process data <100GB easily on a single machine. and also will be easy to
have python/R bindings"
I did not find any manual for single machine multiple threads version.
I wonder whether it works ? If it works, how to set the relative
parameters and run?
Thanks

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#43 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAZv4RX6j348wdvN1PUh2jIk4NMfh79Kks5qjVxHgaJpZM4GkXpd
.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readable model dump. #43

Readable model dump. #43

BaiGang commented Nov 18, 2015

BaiGang commented Nov 20, 2015

BaiGang commented Nov 20, 2015

mli commented Nov 28, 2015

CNevd commented Nov 29, 2015

BaiGang commented Nov 30, 2015

BaiGang commented Dec 30, 2015

formath commented Jul 1, 2016 •

edited

Loading

CNevd commented Jul 1, 2016 •

edited

Loading

toughJack commented Aug 24, 2016

mli commented Aug 24, 2016

formath commented Aug 25, 2016 •

edited

Loading

CNevd commented Aug 25, 2016

mli commented Aug 25, 2016

CNevd commented Aug 25, 2016

formath commented Aug 25, 2016

toughJack commented Aug 25, 2016

mli commented Aug 25, 2016

Readable model dump. #43

Readable model dump. #43

Comments

BaiGang commented Nov 18, 2015

BaiGang commented Nov 20, 2015

BaiGang commented Nov 20, 2015

mli commented Nov 28, 2015

CNevd commented Nov 29, 2015

BaiGang commented Nov 30, 2015

BaiGang commented Dec 30, 2015

formath commented Jul 1, 2016 • edited Loading

CNevd commented Jul 1, 2016 • edited Loading

toughJack commented Aug 24, 2016

mli commented Aug 24, 2016

formath commented Aug 25, 2016 • edited Loading

CNevd commented Aug 25, 2016

mli commented Aug 25, 2016

CNevd commented Aug 25, 2016

formath commented Aug 25, 2016

toughJack commented Aug 25, 2016

mli commented Aug 25, 2016

formath commented Jul 1, 2016 •

edited

Loading

CNevd commented Jul 1, 2016 •

edited

Loading

formath commented Aug 25, 2016 •

edited

Loading