-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples: add ggml training for MNIST #908
Examples: add ggml training for MNIST #908
Conversation
I imagine that ultimately your goal is to implement GPU support, but none of the functions in It might be a good time to move the CPU backend code in
Alternatively, it might also be a good time to start porting the code in ggml to C++. |
I don't really have strong feelings regarding C vs. C++ in either direction, the transformation of uniformly distributed random numbers to normally distributed numbers is pretty easy though. I don't know if the C++ std library has support for SIMD instructions which could be used to speed up the transfromation (but I think this will not be relevant for the current use cases of ggml). |
Probably not, but since this is only done once during initialization, I don't think it needs to be very optimized. Mostly I was thinking about simplifying the implementation. |
I refactored the MNIST code for fully connected model to be mostly concentrated in The workflow that I envision for the MNIST example is that someone would first run the PyTorch training example (which downloads the MNIST dataset and writes a GGUF model to disk) and then the ggml train and eval examples. There are issues with downloading the MNIST dataset from https://yann.lecun.com/exdb/mnist/ that this way would be fixed without the need to add the dataset to the ggml repository. |
Thanks for the update. The implementation looks nice so far.
Any guess why that would be? |
I'm not sure why the performance with a batch size of 100 is worse, my best guess is that there for some reason more overhead in the ggml code. |
I re-added support for ggml graph export/import. I wrote a function I also added more information on loss/accuracy. |
I pushed training support for a convolutional neural network. There is a single binary each for ggml training/evaluation. For the training binary the user specifies I implemented backwards passes for I changed I implemented support for batch sizes > 1 for I added a fix for noncontinuous gradients to I am close to finishing the minimum features that I had planned for the MNIST example. I think changes to ggml will be fairly minimal from now on. One issue that I still need to resolve is what to do with the Metal files for MNIST. The README doesn't say anything about them but I would assume that they are to demonstrate Metal inference. I think it would make more sense to handle this via the ggml backends interface (which I plan to do in the future for CUDA). |
Let's remove the metal files. These were needed at the start as PoC and demonstration, but currently don't serve any purpose. |
I would like help with the web demo. For me it's already broken on master: it needs an additional source to compile and even then the demo is not functional when I run it locally. With this PR I have the same issue so I can't check whether I have broken anything. I adapted the README to the new code. I did not add images of the graphs because they will be subject to change once there is broadcasting support for the backwards pass of I did not add the MNIST dataset or any pretrained models to the GGML repository. They could in principle be added but if these files are iterated upon each iteration would increase the repository size. I don't have a strong opinion either way. Other than the things mentioned, I think this PR is in a state where it can be merged (after a rebase). |
@jboero now would be a good time to help with code review if you're interested. |
Make sure that you have Install Emscripten by following their instructions and then: $ source $EMSDK_PATH/emsdk_env.sh
$ emcc -I../../include -I../../include/ggml -I../../examples ../../src/ggml.c ../../src/ggml-quants.c ../../src/ggml-aarch64.c main.cpp -o web/mnist.js -s EXPORTED_FUNCTIONS='["_wasm_eval","_wasm_random_digit","_malloc","_free"]' -s EXPORTED_RUNTIME_METHODS='["ccall"]' -s ALLOW_MEMORY_GROWTH=1 --preload-file models/mnist (I have no idea why we need
If you still have problems, check the JS console for errors. |
Yes I'll test this out in the airport today. I see the comment on performance. That's been my experience too though purely CPU training would always be slow. |
@rgerganov thank you, I was able to get the web code to work. The problem was that I was trying to run it with |
I should maybe also mention that the MNIST example on master (and the web demo in this PR) are not 100% correct. The PyTorch code implicitly normalizes the MNIST data to the interval |
Looks like the CI is failing because of the removed metal sources |
5428681
to
ffcfa7e
Compare
Is it just me or is the MNIST download site in docs giving 403? |
For the direct download I had the same issue. Download via PyTorch worked. |
OK I see it attempts a backup link and works. I'm having build issues though - maybe my env is too new? Fedora 40 gcc 14:
|
I had at some point accidentally pushed a version that did not compile. Make sure you use the latest commit I force-pushed. |
Right much better. I can see it's slow and single threaded but it works. Also I can crank up the CPU speed live because it's just one thread and it doesn't overheat me too much. (no CUDA on this ultrabook but I'll retry with CUDA next week) Looks good to me but I will review line by line during my flight. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ggml CNN training performance is quite bad, ~10x slower than TensorFlow (presumably because im2col -> matrix multiplication is inefficient)
The pool
and repeat_back
ops could benefit from multi-threading - should bring a factor of x2 to the ggml CPU performance. The rest of the performance gap indeed is likely due to inefficient convolution implementation.
Feel free to merge - sent you a collab invite
Yes I agree the code looks fine just slow and not very parallelized. Anyway it's a great example. Thanks @JohannesGaessler |
I just realized that I pressed the wrong button when merging. I only rebased my commits but did not squash them. Notably this means that there are now several commits on master that will not compile due to another rebase that I had done. There have not been any other commits to master since so if we're going to fix this we should do it now. |
Can you try to force push on master? I'm not at PC and probably will be on one tomorrow |
This PR aims to add training to the MNIST example. The current state is that it seems to work (with bad performance). My current plans include:
mnist-common.h
, consistently use GGUF files instead of binary files.Please provide feedback, especially regarding whether the last two points should be part of ggml.