Utility Function for Vector Similarity Search #18

devansh-shah-11 · 2024-03-13T05:36:04Z

Description
We need a new utility function in Database.py that performs a vector similarity search. This function should take an embedding vector as input and return the most similar vectors from the MongoDB Atlas database using Euclidean distance as the similarity measure.

This utility function will be used by the recognise_face() endpoint to find the most similar face in the database.

Expected Behavior

The endpoint should take n as input from the user and return the top n most similar vectors from MongoDB Database

Benefits
This feature will automate the finding of top n most similar vectors to the given face to help identify the employee

Tasks
Explore the MongoDB vector search tutorial
Write a function to return the most similar vectors

Checklist

Modify API/database.py ✓ d6366eb Edit
Running GitHub Actions for API/database.py ✓ Edit
Modify API/route.py ✓ 7b8ca4e Edit
Running GitHub Actions for API/route.py ✓ Edit

The text was updated successfully, but these errors were encountered:

sweep-ai · 2024-03-13T05:36:10Z

🚀 Here's the PR! #20

See Sweep's progress at the progress dashboard!

⚡ Sweep Basic Tier: I'm using GPT-4. You have 5 GPT-4 tickets left for the month and 3 for the day. (tracking ID: 0b2bf4e2dc)

For more GPT-4 tickets, visit our payment portal. For a one week free trial, try Sweep Pro (unlimited GPT-4 tickets).

Install Sweep Configs: Pull Request

Tip

I can email you next time I complete a pull request if you set up your email here!

Actions (click)

↻ Restart Sweep

GitHub Actions✓

Here are the GitHub Actions logs prior to making any changes:

Sandbox logs for 91e83d1

Checking API/database.py for syntax errors... ✅ API/database.py has no syntax errors! 1/1 ✓
Checking API/database.py for syntax errors...
✅ API/database.py has no syntax errors!

Sandbox passed on the latest main, so sandbox checks will be enabled for this issue.

Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description.

https://github.com/devansh-shah-11/FaceRec/blob/91e83d1e0629dfb50ad9baecd37d3e4982a29f76/API/database.py#L1-L23

https://github.com/devansh-shah-11/FaceRec/blob/91e83d1e0629dfb50ad9baecd37d3e4982a29f76/API/route.py#L152-L220

Step 2: ⌨️ Coding

Modify API/database.py ✓ d6366eb Edit

Modify API/database.py with contents:
• Add a new method named `find_similar_vectors` in the `Database` class. This method should accept two parameters: `embedding_vector`, which is the vector for which we want to find similar vectors, and `n`, which is the number of top similar vectors to return.
• Inside this method, use MongoDB's aggregation framework to perform the vector similarity search. Since MongoDB does not natively support Euclidean distance calculations for vector similarity out of the box, you will need to manually implement this logic. One approach is to store the embedding vectors in a collection with a schema that includes the vector and a unique identifier. Then, use an aggregation pipeline to calculate the Euclidean distance between the input vector and the vectors stored in the database, sort the results by this calculated distance in ascending order, and limit the results to the top n entries.
• The method should return the top n most similar vectors from the MongoDB database.
• Note: This task assumes MongoDB does not have built-in support for vector similarity search based on Euclidean distance. If MongoDB introduces such a feature, the implementation should leverage that instead.
--- 
+++ 
@@ -22,3 +22,31 @@
 
     def update_one(self, collection, query, update):
         return self.db[collection].update_one(query, update)
+    def find_similar_vectors(self, collection, embedding_vector, n):
+        """
+        Find the top n most similar vectors in the database to the given embedding_vector.
+        This method uses the Euclidean distance for similarity measure.
+
+        :param collection: The MongoDB collection to search within.
+        :param embedding_vector: The embedding vector to find similar vectors for.
+        :param n: The number of top similar vectors to return.
+        :return: The top n most similar vectors from the MongoDB database.
+        """
+        pipeline = [
+            {
+                "$addFields": {
+                    "distance": {
+                        "$sqrt": {
+                            "$reduce": {
+                                "input": {"$zip": {"inputs": ["$vector", embedding_vector]}},
+                                "initialValue": 0,
+                                "in": {"$add": ["$$value", {"$pow": [{"$subtract": ["$$this.0", "$$this.1"]}, 2]}]}
+                            }
+                        }
+                    }
+                }
+            },
+            {"$sort": {"distance": 1}},
+            {"$limit": n}
+        ]
+        return list(self.db[collection].aggregate(pipeline))

Running GitHub Actions for API/database.py ✓ Edit

Check API/database.py with contents:
Ran GitHub Actions for d6366ebfcc133c30f5e069c0508a89b52686ba57:

Modify API/route.py ✓ 7b8ca4e Edit

Modify API/route.py with contents:
• Add a new endpoint in the `route.py` file for the `recognise_face` functionality. This endpoint should accept an embedding vector and a parameter n from the user, and use the `find_similar_vectors` method from the `Database` class to find and return the top n most similar vectors.
• The endpoint should extract the embedding vector and the value of n from the request, call the `find_similar_vectors` method with these parameters, and return the result to the client.
• Ensure proper error handling is in place for cases where the input data is invalid or the database operation fails.
--- 
+++ 
@@ -267,3 +267,23 @@
     client.find_one_and_delete(collection, {"EmployeeCode": EmployeeCode})
 
     return {"Message": "Successfully Deleted"}
[email protected]("/recognise_face")
+async def recognise_face(embedding: List[float], n: int):
+    """
+    Recognise a face by finding the most similar face embeddings in the database.
+
+    Args:
+        embedding (List[float]): The embedding vector of the face to be recognised.
+        n (int): The number of top similar vectors to return.
+
+    Returns:
+        dict: A dictionary containing the top n most similar face embeddings.
+
+    """
+    logging.info("Recognising face")
+    try:
+        similar_faces = client.find_similar_vectors(collection, embedding, n)
+        return {"similar_faces": similar_faces}
+    except Exception as e:
+        logging.error(f"Error recognising face: {str(e)}")
+        raise HTTPException(status_code=500, detail="Internal server error")

Running GitHub Actions for API/route.py ✓ Edit

Check API/route.py with contents:
Ran GitHub Actions for 7b8ca4e13c930240c7aef7d25b09dd19d42e82df:

Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/utility_function_for_vector_similarity_s_0cb05.

🎉 Latest improvements to Sweep:

New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request.^{Something wrong? Let us know.}

This is an automated message generated by Sweep AI.

devansh-shah-11 added enhancement New feature or request sweep labels Mar 13, 2024

devansh-shah-11 assigned devansh-shah-11 and Devasy23 Mar 13, 2024

This was referenced Mar 13, 2024

Sweep: Utility Function for Vector Similarity Search (✓ Sandbox Passed) #19

Closed

Sweep: Utility Function for Vector Similarity Search (✓ Sandbox Passed) #20

Closed

Devasy23 linked a pull request Mar 17, 2024 that will close this issue

Add vector_search function for pipeline aggregation #30

Merged

Devasy23 closed this as completed in #30 Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utility Function for Vector Similarity Search #18

Utility Function for Vector Similarity Search #18

devansh-shah-11 commented Mar 13, 2024 •

edited by sweep-ai bot

Loading

sweep-ai bot commented Mar 13, 2024 •

edited

Loading

🚀 Here's the PR! #20

Utility Function for Vector Similarity Search #18

Utility Function for Vector Similarity Search #18

Comments

devansh-shah-11 commented Mar 13, 2024 • edited by sweep-ai bot Loading

sweep-ai bot commented Mar 13, 2024 • edited Loading

🚀 Here's the PR! #20

Actions (click)

GitHub Actions✓

Step 1: 🔎 Searching

Step 2: ⌨️ Coding

Step 3: 🔁 Code Review

devansh-shah-11 commented Mar 13, 2024 •

edited by sweep-ai bot

Loading

sweep-ai bot commented Mar 13, 2024 •

edited

Loading