docs: facial recognition and general clean-up (immich-app#11106)

* add facial recognition docs, clean up existing info * Update smart-search.md Co-authored-by: Alex <[email protected]> --------- Co-authored-by: Alex <[email protected]>
mmomjian · Jul 15, 2024 · cc1235d · cc1235d
1 parent 8193416
commit cc1235d
Show file tree

Hide file tree

Showing 5 changed files with 135 additions and 51 deletions.
diff --git a/docs/docs/FAQ.mdx b/docs/docs/FAQ.mdx
@@ -167,7 +167,7 @@ Immich uses CLIP models. For more information about CLIP and its capabilities, r
 
 ### How does facial recognition work?
 
-For face detection and recognition, Immich uses [InsightFace models](https://github.com/deepinsight/insightface/tree/master/model_zoo).
+See [How Facial Recognition Works](/docs/features/facial-recognition#How-Facial-Recognition-Works) for details.
 
 ### How can I disable machine learning?
 
@@ -181,19 +181,15 @@ However, disabling all jobs will not disable the machine learning service itself
 
 ### I'm getting errors about models being corrupt or failing to download. What do I do?
 
-You can delete the model cache volume, where models are downloaded. This will give the service a clean environment to download the model again. If models are failing to download entirely, you can manually download them from [Huggingface][huggingface] and place them in the cache folder.
+You can delete the model cache volume, where models are downloaded. This will give the service a clean environment to download the model again. If models are failing to download entirely, you can manually download them from [Hugging Face][huggingface] and place them in the cache folder.
 
 ### Can I use a custom CLIP model?
 
-No, this is not supported. Only models listed in the [Huggingface][huggingface] page are compatible. Feel free to make a feature request if there's a model not listed here that you think should be added.
+No, this is not supported. Only models listed in the [Hugging Face][huggingface] page are compatible. Feel free to make a feature request if there's a model not listed here that you think should be added.
 
 ### I want to be able to search in other languages besides English. How can I do that?
 
-You can change to a multilingual model listed [here](https://huggingface.co/collections/immich-app/multilingual-clip-654eb08c2382f591eeb8c2a7) by going to Administration > Machine Learning Settings > Smart Search and replacing the name of the model. Be sure to re-run Smart Search on all assets after this change. You can then search in over 100 languages.
-
-:::note
-Feel free to make a feature request if there's a model you want to use that isn't in [Immich Huggingface list][huggingface].
-:::
+You can change to a multilingual CLIP model. See [here](/docs/features/smart-search#CLIP-model) for instructions.
 
 ### Does Immich support Facial Recognition for videos?
 
@@ -234,7 +230,7 @@ ls clip/ facial-recognition/
 
 ### Why is Immich slow on low-memory systems like the Raspberry Pi?
 
-Immich optionally uses machine learning for several features. However, it can be too heavy to run on a Raspberry Pi. You can [mitigate](/docs/FAQ#can-i-lower-cpu-and-ram-usage) this or host Immich's machine-learning container on a [more powerful system](/docs/guides/remote-machine-learning), or [disable](/docs/FAQ#how-can-i-disable-machine-learning) machine learning entirely.
+Immich optionally uses transcoding and machine learning for several features. However, it can be too heavy to run on a Raspberry Pi. You can [mitigate](/docs/FAQ#can-i-lower-cpu-and-ram-usage) this or host Immich's machine-learning container on a [more powerful system](/docs/guides/remote-machine-learning), or [disable](/docs/FAQ#how-can-i-disable-machine-learning) machine learning entirely.
 
 ### Can I lower CPU and RAM usage?
 
@@ -243,10 +239,12 @@ The initial backup is the most intensive due to the number of jobs running. The
 - Lower the job concurrency for these jobs to 1.
 - Under Settings > Transcoding Settings > Threads, set the number of threads to a low number like 1 or 2.
 - Under Settings > Machine Learning Settings > Facial Recognition > Model Name, you can change the facial recognition model to `buffalo_s` instead of `buffalo_l`. The former is a smaller and faster model, albeit not as good.
-- For facial recognition on new images to work properly, You must re-run the Face Detection job for all images after this.
+  - For facial recognition on new images to work properly, You must re-run the Face Detection job for all images after this.
+- At the container level, you can [set resource constraints](/docs/FAQ#can-i-limit-cpu-and-ram-usage) to lower usage further.
+  - It's recommended to only apply these constraints _after_ taking some of the measures here for best performance.
 - If these changes are not enough, see [below](/docs/FAQ#how-can-i-disable-machine-learning) for instructions on how to disable machine learning.
 
-### Can I limit the amount of CPU and RAM usage?
+### Can I limit CPU and RAM usage?
 
 By default, a container has no resource constraints and can use as much of a given resource as the host's kernel scheduler allows. To limit this, you can add the following to the `docker-compose.yml` block of any containers that you want to have limited resources.
 
@@ -266,6 +264,8 @@ deploy:
 </details>
 For more details, you can look at the [original docker docs](https://docs.docker.com/config/containers/resource_constraints/) or use this [guide](https://www.baeldung.com/ops/docker-memory-limit).
 
+Note that memory constraints work by terminating the container, so this can introduce instability if set too low.
+
 ### How can I boost machine learning speed?
 
 :::note
@@ -275,21 +275,16 @@ This advice improves throughput, not latency. This is to say that it will make S
 You can increase throughput by increasing the job concurrency for machine learning jobs (Smart Search, Face Detection). With higher concurrency, the host will work on more assets in parallel. You can do this by navigating to Administration > Settings > Job Settings and increasing concurrency as needed.
 
 :::danger
-On a normal machine, 2 or 3 concurrent jobs can probably max the CPU. Beyond this, note that storage speed and latency may quickly become the limiting factor; particularly when using HDDs.
+On a normal machine, 2 or 3 concurrent jobs can probably max the CPU. Storage speed and latency can quickly become the limiting factor beyond this, particularly when using HDDs.
 
-Do not exaggerate with the amount of jobs because you're probably thoroughly overloading the server.
+The concurrency can be increased more comfortably with a GPU, but should still not be above 16 in most cases.
 
-More details can be found [here](https://discord.com/channels/979116623879368755/994044917355663450/1174711719994605708)
+Do not exaggerate with the job concurrency because you're probably thoroughly overloading the server.
 :::
 
-### Why is Immich using so much of my CPU?
-
-When a large number of assets are uploaded to Immich, it makes sense that the CPU and RAM will be heavily used for machine learning work and creating image thumbnails.
-Once this process is completed, the percentage of CPU usage will drop to around 3-5% usage
-
-### My server shows Server Status Offline | Version Unknown what can I do?
+### My server shows Server Status Offline | Version Unknown. What can I do?
 
-You need to enable Websocket on your reverse proxy.
+You need to enable WebSockets on your reverse proxy.
 
 ---
 

diff --git a/docs/docs/features/facial-recognition.md b/docs/docs/features/facial-recognition.md
@@ -2,7 +2,7 @@
 
 ## Overview
 
-Immich recognizes faces in your photos and videos and groups them together. You can then assign names to the faces and search for them.
+Immich recognizes faces in your photos and videos and groups them together into people. You can then assign names to these people and search for them.
 
 The list of people is shown in the Explore page.
 
@@ -18,13 +18,75 @@ The asset detail view will also show the faces that are recognized in the asset.
 
 ## Actions
 
-Additional actions you can do with a detected person are:
+Additional actions you can do include:
 
-- Change the feature face photo of the person
-- Set date of birth
-- Merge two or more detected faces into one person
-- Hide face
+- Changing the feature photo of the person
+- Setting a person's date of birth
+- Merging two or more detected faces into one person
+- Hiding the faces of a person from the Explore page and detail view
+- Assigning an unrecognized face to a person
 
 It can be found from the app bar when you access the detail view of a person.
 
 <img src={require('./img/facial-recognition-4.png').default} title='Facial Recognition 4' width="70%"/>
+
+## How Face Detection Works
+
+Face detection sends the generated preview image to the machine learning service for processing. The service checks if it has the relevant model downloaded and downloads it if not. The image is decoded, pre-processed and passed to the face detection model (with hardware acceleration if configured). The bounding boxes and scores outputted from this model are used to crop and preprocess the image once again to be passed to a facial recognition model (also accelerated if configured). The embeddings from the recognition model, together with the bounding boxes and scores from the face detection model, are then sent back to the server to be added to the database. The embeddings in particular are indexed so they can be searched quickly during facial recognition clustering.
+
+## How Facial Recognition Works
+
+The facial recognition algorithm we use is derived from DBSCAN, a popular clustering algorithm. It essentially treats each detected face as a point in a graph and aims to group points that are close to each other.
+
+:::note
+An important concept is whether something is a _core point_. A core point has a minimum number of points around it within a certain distance. A non-core point can only be assigned to a cluster if it can reach a core point; a non-core point can't be used to extend a cluster even if it's part of one. In Immich, the _Minimum Recognized Faces_ setting controls the threshold to be considered a core point.
+:::
+
+For each face, it looks around it to find other faces within a certain distance. Faces within this distance are considered similar, so it then checks if any of these faces are associated with a person.
+
+If there is an existing person, it assigns the person of the most similar face to the face being processed.
+
+If there is none, then it has to determine something from the DBSCAN algorithm: whether the face is a _core point_. If there are a certain number of similar faces (by default 3, including the face being considered), then this face is a core point. A new person is created for this face and the face is assigned to it. When other faces are processed, if they're similar to this face, they'll see that it has an associated person and can be assigned to that person.
+
+However, if there aren't enough similar faces, no new person will be created. Instead, the face will wait for all the other faces to be processed to see if any matches that previously didn't have an associated person now do. If they do, then the face will be assigned to that person. If not, this face will be considered an outlier, such as a stranger in the background of an image.
+
+The algorithm has some subtle differences compared to DBSCAN:
+
+- DBSCAN doesn't have a concept of incremental clustering: it clusters all points at once. In contrast, facial recognition has to evolve as more assets are added without re-clustering everything each time.
+  - The algorithm described above works within a set of queued assets. Once these faces are processed and a new round of faces are detected, the behavior will not be the same as traditional DBSCAN since it preserves the clusters (people) generated from the previous round.
+    - Facial recognition tries to wait for face detection and thumbnail generation to complete before starting for this reason: the larger the set of faces in the queue, the better the results will be.
+    - Re-running facial recognition on all assets afterwards does behave like DBSCAN, however.
+- DBSCAN is designed for range-based searches (i.e. points within a distance), but high-dimensional vector indices are generally optimized for getting the closest K results. The recognition algorithm doesn't try to get _all_ similar faces within a distance for performance reasons. Instead, it searches for a small number of matches for each face. The end result should be very similar if not identical, but with possibly different performance characteristics.
+  - Because of this, part of the recognition process is handled during a nightly job to ensure that unassigned faces with potential matches can be recognized.
+
+:::tip
+If you didn't import your assets at once or if the server was able to process jobs faster than you could upload them, it's possible that the clustering was suboptimal. If you haven't put effort into the current results, it may be worth re-running facial recognition on all assets for the best starting point. If it's too late for that, you can also manually assign a selection of unassigned faces and queue _Missing_ for Facial Recognition to help it learn and assign more faces automatically.
+:::
+
+## Configuration
+
+Navigating to Administration > Settings > Machine Learning Settings > Facial Recognition will show the options available.
+
+:::tip
+It's better to only tweak the parameters here than to set them to something very different unless you're ready to test a variety of options. If you do need to set a parameter to a strict setting, relaxing other settings can be a good option to compensate, and vice versa.
+:::
+
+### Facial recognition model
+
+There are a few different models available; the default is typically considered the best. On more constrained systems where the default is too intensive, you can choose a smaller model instead.
+
+### Minimum detection score
+
+This setting affects whether a result from the face detecton model is filtered out as a false positive. It may seem tempting to set this low to detect more faces, but it can lead to false positives that are difficult to deal with and can harm facial recognition. It is strongly recommended not to go below 0.5 for this setting. Setting it to a very high number like 0.9 is also not recommended: the default is already biased toward precision, so a threshold that high leads to many undetected faces.
+
+After changing this setting, it will only apply to new face detection jobs. To apply the new setting to all assets, you need to re-run face detection for all assets.
+
+### Maximum recognition distance
+
+The distance threshold described in How Facial Recognition Works. The default works well for most people, but it may be worth lowering it if the library has twins or otherwise very similar looking people. A threshold that's too low just means needing to merge duplicate people after facial recognition, whereas a threshold too high can produce unsalvageable results. It is strongly recommended not to go below 0.3 or above 0.7.
+
+### Minimum recognized faces
+
+The core point threshold described in How Facial Recognition Works. This setting has a few implications. First, it takes effect immediately in that people with fewer faces than this are hidden from view. Secondly, it makes clustering more robust as it prevents loosely-related faces from being linked to each other by requiring a certain level of density.
+
+Increasing this setting is a good idea if you increase the recognition distance or reduce the minimum detection score. Setting it to 1 effectively disables the concept of core points, but can be an option if you prefer a more hands-on approach.
diff --git a/docs/docs/features/hardware-transcoding.md b/docs/docs/features/hardware-transcoding.md
@@ -123,6 +123,7 @@ Once this is done, you can continue to step 3 of "Basic Setup".
 
 - You may want to choose a slower preset than for software transcoding to maintain quality and efficiency
 - While you can use VAAPI with NVIDIA and Intel devices, prefer the more specific APIs since they're more optimized for their respective devices
+- You can confirm the device is being recognized and used by checking its utilization (via `nvtop` for NVIDIA, `intel_gpu_top` for Intel, etc.) when transcoding. A lack of error logs when transcoding also indicates that it's being used.
 
 [hw-file]: https://github.com/immich-app/immich/releases/latest/download/hwaccel.transcoding.yml
 [nvct]: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

diff --git a/docs/docs/features/smart-search.md b/docs/docs/features/smart-search.md
@@ -7,29 +7,30 @@ Immich uses Postgres as its search database for both metadata and smart search.
 
 Smart search is powered by the [pgvecto.rs](https://github.com/tensorchord/pgvecto.rs) extension, utilizing machine learning models like [CLIP](https://openai.com/research/clip) to provide relevant search results. This allows for freeform searches without requiring specific keywords in the image or video metadata.
 
-Archived photos are not included in search results by default. To include them, mark the checkbox in [advanced search filters](/docs/features/smart-search#advanced-search-filters).
-
-:::tip Alternative CLIP Models
-More powerful models can be used for more accurate search results. For more information, see the related [FAQ](/docs/FAQ#can-i-use-a-custom-clip-model).
-:::
-
-:::info
-Smart Search is currently limited to 5,000 results for a single search on the web.
-:::
-
 ## Advanced Search Filters
 
 In addition, Immich offers advanced search functionality, allowing you to find specific content using customizable search filters. These filters include location, one or more faces, specific albums, and more. You can try out the search filters on the [Demo site](https://demo.immich.app).
 
-Smart search features include:
-
-- Search for one or more faces (with or without context search).
-- Search by Country or State or City or by all three.
-- Search by camera make and model.
-- Search by date range.
-- Search by file name.
-- Search by media types: image, video or all (**Note:** Image includes live images).
-- Search by condition: not in any album or archive or Favorite or all conditions.
+The filters smart search allows you to search by include:
+
+- People
+- Location
+  - Country
+  - State
+  - City
+- Camera
+  - Make
+  - Model
+- Date range
+- File name or extension
+- Media type
+  - Image (including live/motion photos)
+  - Video
+  - All
+- Condition
+  - Not in any album
+  - Archived
+  - Favorited
 
 <Tabs>
   <TabItem value="Computer" label="Computer" default>
@@ -47,3 +48,27 @@ Some search examples:
 
 </TabItem>
 </Tabs>
+
+## Configuration
+
+Navigating to `Administration > Settings > Machine Learning Settings > Smart Search` will show the options available.
+
+### CLIP model
+
+More powerful models can be used for more accurate search results, but are slower and can require more server resources. Check out the models [here][huggingface-clip] for more options!
+
+[Multilingual models][huggingface-multilingual-clip] are also available so users can search in their native language. These models support over 100 languages; the `nllb` models in particular support 200.
+:::note
+Multilingual models are much slower and larger and perform slightly worse for English than English-only models. For this reason, only use them if you actually intend to search in a language besides English.
+
+As a special case, the `ViT-H-14-quickgelu__dfn5b` and `ViT-H-14-378-quickgelu__dfn5b` models are excellent at many European languages despite not specifically being multilingual. They're very intensive regardless, however - especially the latter.
+:::
+
+Once you've chosen a model, change this setting to the name of the model you chose. Be sure to re-run Smart Search on all assets after this change.
+
+:::note
+Feel free to make a feature request if there's a model you want to use that we don't currently support.
+:::
+
+[huggingface-clip]: https://huggingface.co/collections/immich-app/clip-654eaefb077425890874cd07
+[huggingface-multilingual-clip]: https://huggingface.co/collections/immich-app/multilingual-clip-654eb08c2382f591eeb8c2a7