Bad performance on big amount of training data #164

sntp · 2016-05-09T13:01:36Z

I trained the bot with ~17k conversations and now it takes a lot of time for response. Are there ways to avoid it?
Training data: https://gist.github.com/sntp/221f53c48bec929ac36d0951b496fcbd

gunthercox · 2016-05-09T14:31:11Z

Commit e5a9869 makes one small change to start to address this by reducing the number of read and write transactions that are made to the database. I will continue to post updates on this ticket to track performance improvement changes.

gunthercox · 2016-06-05T11:27:50Z

Pull request #173 allows the storage adapter to override an expensive method to provide a more efficient implementation. The get_response_statements method has been overridden on the MongoDB storage adapter to provide a much more efficient version that should yield a significant improvement in performance.

Nixellion · 2016-08-19T21:56:06Z

What about using SQLite? Will that speed up the process? Is there sqlite adapter?

I tried to do the same, I fed a 3.5Mb training file with converstations from social network, was curious what kind of answers i'll get from that :D

And firstly it took about 40 minutes to train, and now it is just stuck on trying to answer.
I tried using MongoDB but got an error pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [WinError 10061] No connection could be made because the target machine actively refused it

I gueees, because i need to download it and run the server, huh?..

sntp · 2016-08-19T23:11:36Z

Btw, Why not to use SQL database?

Nixellion · 2016-08-20T12:58:18Z

Oh, okay, mongodb works fine now. Much faster. But that Bulk error is annoying.

And I still think having a standard option of sqlite would be nice, it's much faster than json, but it is also just a single file and it does not require you to install anything but python. Just a thought. No rush though.

gunthercox · 2016-08-20T13:06:04Z

@Nixellion I'm glad you are getting better results with the Mongo DB adapter. The JSON file adapter is really just meant for testing and development because it is limited by the fact that it has to write to the hard disk each time it needs to save.

Sill looking into the bulk insert error, and I've opened a ticket for tracking the addition of a new SQLite storage adapter #241.

Nixellion · 2016-08-20T17:19:11Z

Cool, thanks!

Also found some old discussions back from 2014-2015, about making this bot smart enough to pass at least some of Turing tests\questions, building sentences from words, etc I hope you're still onto it :)

chenjun0210 · 2016-10-20T09:09:48Z

Does It support parallel Training？

gunthercox · 2016-10-20T11:32:49Z

Parallel training is only supported if the database being used supports concurrent writes. The default file database that ChatterBot uses does not support concurrent writes, but if you use mongo db it will.

chenjun0210 · 2016-10-20T16:01:40Z

my data size about
i use mongo db . but i dont know how to set the training parameters or when i use mongodb the default is parallel training ？？ thanks a lot

chenjun0210 · 2016-10-20T16:02:29Z

my data size is about 2G

gunthercox · 2016-10-21T14:02:32Z

You will probably need to do a bit of work to get the import process ready to bring in 2GB of data in parallel. I would recommend breaking it up, if possible, into a few files of manageable size. You will then have to use python's multiprocessing capabilities to start training processes on each subset of the data file. This functionality isn't built into ChatterBot at the moment, if you are unsure on how to accomplish this, feel free to ask any questions. Otherwise, I have opened a ticket to get support for this functionality added to ChatterBot (#354).

Martmists-GH · 2017-01-17T16:41:29Z

I've noticed that #597 using ujson has sped up processing a lot, though my training data is only ~300MB in size. I recommend trying it out to see how much faster it will go.

jxfruit · 2017-10-09T14:58:57Z

@Martmists hi, bro, have u solved the efficiency of bot's training and testing ?can u share some thoughts about improving efficiency ? tks

Martmists-GH · 2017-10-10T00:24:40Z

One thing to note is to NOT use the default JSON storage. It's slow due to constant I/O, it's relatively unoptimized and uses the stdlib JSON module. I recommend writing your own or trying to find one online.

jxfruit · 2017-10-10T01:29:16Z

@Martmists I have used mongodb as the storage adapter. However it is still very slowly for response about 7w data taking 41 seconds. I am working on finding other ways to improving efficiency. How about u?

gunthercox · 2017-12-02T22:37:34Z

I'm going to close this issue off, I don't believe there is any remaining actionable items here. Tickets have been created to implement changes that will help to improve response times. See #925 and its related tickets for further details.

gunthercox added the bug label May 24, 2016

gunthercox mentioned this issue Jul 16, 2016

Use bulk operations in mongo update method #211

Merged

gunthercox mentioned this issue Dec 1, 2016

Add performance benchmarking script #445

Merged

gunthercox closed this as completed Dec 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad performance on big amount of training data #164

Bad performance on big amount of training data #164

sntp commented May 9, 2016

gunthercox commented May 9, 2016

gunthercox commented Jun 5, 2016

Nixellion commented Aug 19, 2016 •

edited

Loading

sntp commented Aug 19, 2016

Nixellion commented Aug 20, 2016

gunthercox commented Aug 20, 2016

Nixellion commented Aug 20, 2016 •

edited

Loading

chenjun0210 commented Oct 20, 2016

gunthercox commented Oct 20, 2016

chenjun0210 commented Oct 20, 2016

chenjun0210 commented Oct 20, 2016

gunthercox commented Oct 21, 2016

Martmists-GH commented Jan 17, 2017

jxfruit commented Oct 9, 2017

Martmists-GH commented Oct 10, 2017

jxfruit commented Oct 10, 2017

gunthercox commented Dec 2, 2017

Bad performance on big amount of training data #164

Bad performance on big amount of training data #164

Comments

sntp commented May 9, 2016

gunthercox commented May 9, 2016

gunthercox commented Jun 5, 2016

Nixellion commented Aug 19, 2016 • edited Loading

sntp commented Aug 19, 2016

Nixellion commented Aug 20, 2016

gunthercox commented Aug 20, 2016

Nixellion commented Aug 20, 2016 • edited Loading

chenjun0210 commented Oct 20, 2016

gunthercox commented Oct 20, 2016

chenjun0210 commented Oct 20, 2016

chenjun0210 commented Oct 20, 2016

gunthercox commented Oct 21, 2016

Martmists-GH commented Jan 17, 2017

jxfruit commented Oct 9, 2017

Martmists-GH commented Oct 10, 2017

jxfruit commented Oct 10, 2017

gunthercox commented Dec 2, 2017

Nixellion commented Aug 19, 2016 •

edited

Loading

Nixellion commented Aug 20, 2016 •

edited

Loading