-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SampleApp goes to break mode in the last version (CNTK backend) #13
Comments
Hi @sharpwood, Thanks for opening the issue! As you have noticed, this exception only happens when using the CNTK backend. Sadly, the fact that this impossible-to-catch exception has been raised means that something has corrupted the memory of the process, including internal data structures used by the CLR. The CLR has detected this corruption, and therefore decided to stop executing code because it becomes unsafe to continue doing so. As such, it is very likely that whatever is causing this memory corruption is unrelated to the line of code currently highlighted in your screenshot, but instead have only been detected while executing that line. Since Keras# does not use unsafe operations and therefore never touches memory directly, the only possibility is that the memory corruption is being caused by CNTK itself. The most likely explanation is that Keras# is calling some of the CNTK APIs in an unexpected manner, CNTK is not performing enough argument checks to detect those incorrect calls, and proceeds to execute those incorrectly defined operations, resulting in memory corruption. The CNTK project should be releasing a new version tomorrow. We can try again with the new version to see if at least the error message improves, but for the time being, I would suggest experimenting with the TensorFlow backend instead. Regards, |
Hi @sharpwood, Thanks for the update. I've just updated to CNTK 2.3 but the issue is still the same. In fact, if you comment the lines in file CNTKFunction.cs: keras-sharp/Backends/CNTK.CPU/CNTKFunction.cs Lines 173 to 176 in 8134e8e
and instead replace them by // this.trainer.TrainMinibatch(input_dict, isSweepEndInarguments: false, computeDevice: DeviceDescriptor.CPUDevice);
updated.Add(c.constant(this.trainer.PreviousMinibatchLossAverage()));
updated.Add(c.constant(this.trainer.PreviousMinibatchEvaluationAverage())); therefore disabling the mini-batch training but still letting Keras# do all the rest besides that, you will see that the issue disappears. Keras# will still be preparing the mini-batches, coping memory from C# to CNTK's NDArrayView/Value, build the execution graph, just as before. The exception is that this time we will not let CNTK update the model, and as you see we will not run into memory issues during the training part (the model will never learn anything though, since no weight update is being made). Regards, |
The text was updated successfully, but these errors were encountered: