ANE support #18

tmc · 2023-12-06T12:30:42Z

The top level readme mentions that current device support is limited to COU and GPU, is ANE support in the works?

vade · 2023-12-07T22:11:10Z

This is a bit above my pay grade, but my understanding is that

The ANE is mostly designed as an inference only device that supports only forward prop
The ANE has layer support implemented in hardware and can't be easily extended (?)
The ANE only supports half Float (Float 16) accelerated compute, everything is managed / converted to it by the runtime.
The ANE requires talking to the OS runtime for scheduling, and there's never a guarantee that you can be resident on the ANE, you can just request it, and hope you get a time slice
The API is only exposed via CoreML / Swift
Internally the API is Espresso (? IIRC from stack traces) / a C++ library which isn't public
CoreMLTools, the public pythonic way to create CoreML models has a CoreML runtime exposed, but requires CoreML model specs (protobufs) to run on the ANE

I doubt that Apple would let an open source project leak the internal tooling of the ANE Runtime (Espresso?)

There's some ANE reverse engineering work that's sporadically happening, but I suspect this will be Metal / GPU for a while unless Apple exposes some cool new ways to publicly run arbitrary programs on the ANE (which would be dope)

Sorry to pop bubbles, and apologies if any of this is factually incorrect!

vade · 2023-12-07T22:14:08Z

One thought which would be cool however, to get both MLX and ANE inference would be:

Implement MLX via swift runtime.
Expose custom layers to your CoreML model export
Implement those layers via MLX in a Swift native app
Use CoreML runtime to load the model, request ANE inference

You could also implement MLX preprocessing to get IOSurface backed memory buffers in half float easily which would grant your app the same unified memory access and avoid a ton of overhead of moving data to the ANE which is default path without IOSurface backed buffers.

In theory you'd get:

Fast MLX pre processing with unified memory
CoreML ANE acceleration on native layers
MLX layer ops with unified memory outputs

That actually sounds fucking awesome.

awni · 2023-12-08T03:22:14Z

@vade basically said it all already, but at the moment we don't have plans to support ANE in MLX given it is a closed source API.

If / when that changes we will be first in line to add it as a supported device.

MikeyBeez · 2024-01-07T20:44:03Z

Apple blocks its developers. It always has. That way, only Apple can write good modern code. Then they don't and ignore their desktop anyway. New features are always only for integration with devices. I call their development environment block-ware, and Apple excels at it. Developers can't use the ANE. They can't use native TTS or STT. So how can one write a modern app? Pyobjc is a mess. Apple breaks their own peripherals with new versions of MacOS, etc. So just buy their new stuff, if they bother to create it, and forget about developing anything meaningful on their desktop platform.

vade · 2024-01-07T20:47:39Z

What are you talking about?

I ship / have shipped code for ANE via CoreML.

You can use TTS via NSSpeechSynthesizer or 3rd party APIs. You can use STT with NSSpeechRecognizer or via 3rd party tools like Whisper which have ironically been leveraged to use CoreML or Metal (see Whisper.cpp)

I'm not sure what your problem is other than not having accurate information.

MikeyBeez · 2024-01-07T20:52:07Z

@vade, On Apple Silicon? Neither API works on my M1 Mac.

vade · 2024-01-07T20:52:46Z

Yes, on Apple Silicon.

MikeyBeez · 2024-01-07T20:57:01Z

BTW, yes pywhispercpp does work, but I don't think that uses NSSpeechRecognizer. Whispercpp uses its own model which means running one on precious unified memory. If you have a code sample that does work on Apple silicon through NSSpeechRecognizer, I'd love to see it.

vade · 2024-01-07T20:59:34Z

This is getting off topic. I never claimed Whisper CPP uses apple native API. it clearly doesnt. The point I was making is there are both working native and 3rd party solutions for TTS and STT.

MikeyBeez · 2024-01-07T21:07:08Z

@vade, I misread you. Yes CoreML does work, but I've been unable to convert Huggingface models to .mlmodel format. There is one example for doing this, but I have not been able to extend the method to converting other models. And the example says the new model won't be as good anyway because the conversion process is lossy.

vade · 2024-01-07T21:12:00Z

CoreML is def a bit of a black art for conversion. We've had to learn a ton. Best to check Apples CoreML Tools repo / examples and git issues for guidance. The conversion process is only lossy if you choose to natively support the neural engine which as stated in this issue only supports 16 bit float natively. You an run CoreML on CPU or GPU at native 32 bit float however.

MikeyBeez · 2024-01-07T21:23:45Z

@vade, I appreciate your reply. Thank you, but I think I'm done trying to get Apple's block-ware to run. I don't want to "learn a ton" for something that should only require a simple API call. But that's what Apple does to its developers. Something that takes five minutes on Linux takes months on Apple. Why? Because it's block-ware. It's intended to be impossible or nearly impossible. Apple gives users the best user experience, but it screws its would-be developers. As I said, Apple wants a monopoly on meaningful development. Then it doesn't bother. I loved xcode years ago. Now it's a nightmare. They discontinued their best developer tools like QuartzComposer. Why? Because that's for in-house developers.

RahulBhalley · 2024-01-30T16:42:22Z

@MikeyBeez I agree with you somewhat. I did had trouble converting my model to CoreML because it's impossible to implement SVD op using whatever basic op implementations exist in CoreMLTools. I was stuck at this problem for ~8 months. It took me 2-3 days to do the same with LibTorch-Lite library. There was no support for FFT ops for 3 years since requested. Still there's no support 5-dimensional arrays in CoreML.

CoreML is hard to use unless Apple's in-house developers build with it.

I was super surprised to see Apple released Stable Diffusion models converted to CoreML to run on-device on iPhone while I couldn't run my comparatively lightweight model (<300 MBs) on 1024x1024 images!

fakerybakery · 2024-02-21T00:19:01Z

What about https://github.com/apple/ml-ane-transformers and https://github.com/tinygrad/tinygrad/tree/master/extra/accel/ane?

vade · 2024-02-21T15:38:44Z

Top is inference only as ANE doesnt support back prop as stated earlier

Second is private API / reverse engineering of ANE, which - if you think about it, wont be sanctioned, supported by Apple in any real scenario.

Now that MLX Swift exists, in theory there ways of doing zero copy CoreML Custom layers implemented in MLX, so you can take a model, cut it up so the graph that has operations that can run on ANE run on ANE, and layers that cant, can be implemented in MLX

In theory its best of both worlds, but requires ad hoc support on a per model implementation (or at perhaps better phrased as per architecture)

vade mentioned this issue Dec 7, 2023

Swift bindings? #15

Closed

awni added the wontfix This will not be worked on label Dec 8, 2023

awni closed this as completed Dec 8, 2023

briancpark mentioned this issue Dec 12, 2023

What is the Expected Inference Performance #49

Open

awni mentioned this issue Dec 13, 2023

Whether mlx will support AI Engine for training or inference？ #152

Closed

Willian-Zhang mentioned this issue Dec 31, 2023

keep dtype on model conversion ml-explore/mlx-examples#186

Merged

This was referenced Jan 3, 2024

Examples to utilize ANE (i.e. NPU) in Apple Silicon ml-explore/mlx-examples#217

Closed

[BUG] Doesn't run on the ANE #393

Closed

ZachNagengast mentioned this issue Feb 20, 2024

MLX argmaxinc/WhisperKit#33

Open

thinkverse mentioned this issue Jun 4, 2024

Apple neural engine ollama/ollama#4817

Closed

awni mentioned this issue Jun 25, 2024

Does mlx support Apple m chip npu #1229

Closed

awni mentioned this issue Aug 9, 2024

How to use the Apple Neural Engine with MLX? ml-explore/mlx-examples#925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANE support #18

ANE support #18

tmc commented Dec 6, 2023

vade commented Dec 7, 2023

vade commented Dec 7, 2023

awni commented Dec 8, 2023 •

edited

Loading

MikeyBeez commented Jan 7, 2024 •

edited

Loading

vade commented Jan 7, 2024

MikeyBeez commented Jan 7, 2024

vade commented Jan 7, 2024

MikeyBeez commented Jan 7, 2024 •

edited

Loading

vade commented Jan 7, 2024

MikeyBeez commented Jan 7, 2024

vade commented Jan 7, 2024

MikeyBeez commented Jan 7, 2024

RahulBhalley commented Jan 30, 2024

fakerybakery commented Feb 21, 2024 •

edited

Loading

vade commented Feb 21, 2024

ANE support #18

ANE support #18

Comments

tmc commented Dec 6, 2023

vade commented Dec 7, 2023

vade commented Dec 7, 2023

awni commented Dec 8, 2023 • edited Loading

MikeyBeez commented Jan 7, 2024 • edited Loading

vade commented Jan 7, 2024

MikeyBeez commented Jan 7, 2024

vade commented Jan 7, 2024

MikeyBeez commented Jan 7, 2024 • edited Loading

vade commented Jan 7, 2024

MikeyBeez commented Jan 7, 2024

vade commented Jan 7, 2024

MikeyBeez commented Jan 7, 2024

RahulBhalley commented Jan 30, 2024

fakerybakery commented Feb 21, 2024 • edited Loading

vade commented Feb 21, 2024

awni commented Dec 8, 2023 •

edited

Loading

MikeyBeez commented Jan 7, 2024 •

edited

Loading

MikeyBeez commented Jan 7, 2024 •

edited

Loading

fakerybakery commented Feb 21, 2024 •

edited

Loading