Skip to content

Releases: michaelfeil/infinity

0.0.7

12 Nov 10:41
6cad81c
Compare
Choose a tag to compare

What's Changed

  • Docker: Cuda11.8, make dependencies optional by @michaelfeil in #33
  • Breaking changes: New install groups pip install infinity-emb[server,logging,onnx-gpu-runtime]
  • add onnx-gpu by @michaelfeil in #26

Full Changelog: 0.0.6...0.0.7

0.0.6

11 Nov 13:29
2b930f1
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.0.5...0.0.6

0.0.5

06 Nov 07:58
a1389ce
Compare
Choose a tag to compare

What's Changed

  • Docker image multi by @michaelfeil in #24
  • patch missing event -> 200ms to 7ms inference at Batch size 1

Full Changelog: 0.0.4...0.0.5

0.0.4

04 Nov 12:33
be64ff2
Compare
Choose a tag to compare

What's Changed

PRs:

Issues:
Closes #5 ONNX Support via https://github.com/qdrant/fastembed/
Closes #22 making pytorch and optional dependency

tl,dr

  • fastembed as backend besides ct2 or torch
  • v1/models returns "backend"
  • makes torch an optional dependency
  • calculates "min" sleep time dynamically on startup _> slightly optimized.
  • default model is now "BAAI/bge-small-en-v1.5"

Full Changelog: 0.0.3...0.0.4

0.0.3

30 Oct 13:58
8116680
Compare
Choose a tag to compare

What's Changed

  • add Flash-Attention+ optimum-BetterTransformers by @michaelfeil in #20
  • Improve real-time / sleep strategy, async await for queues and result futures - reducing latency a bit by @michaelfeil in #12
  • add better FIFO queueing strategy - your requests now have a upper bound how long they queue by @michaelfeil in #19

Docs:

Full Changelog: 0.0.2rc0...0.0.3

0.0.2

22 Oct 10:51
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.0.1...0.0.2rc0

0.0.1

12 Oct 16:41
Compare
Choose a tag to compare

Initial release of Infinity

0.0.1-dev3

12 Oct 16:07
Compare
Choose a tag to compare
0.0.1-dev3 Pre-release
Pre-release

What's Changed

Full Changelog: 0.0.1-dev2...0.0.1-dev3

0.0.1-dev2 - Speedups

12 Oct 01:46
3ed24bb
Compare
Choose a tag to compare
0.0.1-dev2 - Speedups Pre-release
Pre-release

adds new dependency (orjson) for faster response serialization - 300%
uses torch.inference_mode() and delayed moving to CPU - 10%
adds uvicorn[standard] - slightly faster 2-5%?
Updates readme

#2

0.0.1-dev1

11 Oct 18:08
Compare
Choose a tag to compare
0.0.1-dev1 Pre-release
Pre-release

This is a release for testing the CI of Infinity.