[FEATURE] Improve response upsert bulk endpoint performance #4449

frascuchon · 2023-12-21T11:55:32Z

The new bulk endpoint for response creation takes a long time to process the request (around 6 seconds for 100 responses). This may affect the global server performance. The bottleneck must be identified and fixed as soon as possible.

cc @jfcalvo

dosubot · 2023-12-21T11:55:49Z

🤖

Hey there @frascuchon! I'm Dosu, your friendly neighborhood bot, created to assist with bugs, answer questions, and guide you in your journey as a contributor. Just think of me as the roadie before the main act - the human maintainers. I'm here to help keep the show running smoothly while they're getting set to take the stage. Let's rock this code together! 🎸

Based on the current implementation of the response upsert bulk endpoint in Argilla, there are several strategies that could be employed to improve its performance:

Batch Processing: Instead of processing each response one by one, you can process them in batches. This can significantly reduce the number of database operations and thus improve performance.
Parallel Processing: If the processing of each response is independent of others, you can use parallel processing to speed up the operation. This can be achieved by using Python's concurrent.futures or asyncio libraries.
Database Optimization: You can optimize your database queries to improve performance. This can include using indexes for faster lookup, denormalizing your data, or using stored procedures.
Caching: If you have some data that is used frequently and does not change often, you can cache it to reduce the number of database hits.

However, the actual implementation of these strategies would depend on the specific requirements and constraints of your application.

The current implementation of the response upsert bulk endpoint is the create_current_user_responses_bulk function. This function is a POST request to the "/me/responses/bulk" endpoint. It takes in a ResponsesBulkCreate object in the request body, which contains multiple response items to be created or updated.

For each item in the request body, it retrieves the corresponding record using the _get_record function. It then authorizes the current user to create a response for the record. After authorization, it calls the upsert_response function from the datasets module to create or update the response. If any error occurs during this process, it appends a ResponseBulk object with the error detail to the responses_bulk_items list. If no error occurs, it appends a ResponseBulk object with the created or updated response to the responses_bulk_items list.

Finally, it returns a ResponsesBulk object containing all the ResponseBulk items in the responses_bulk_items list.

You can find the code for the create_current_user_responses_bulk function in the responses.py file.

As for the current performance of the response upsert bulk endpoint, I wasn't able to find specific metrics within the repository. However, implementing the strategies mentioned above should help improve its performance.

If you need further clarification or have additional questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

# Description This PR applies some performance improvements related to the bulk upsert responses endpoint since some environments are experimenting with a high execution time when running this operation with 100 items in the request. The performance improvements are: 1. Read all requested records at once instead of once at a time. 2. Stop checking the existence of the search index since is taking longer. These improvements are applied based on the profiling results using the test ... Also, since the current API controller is doing many things, a new design proposal is applied in the implementation by defining a simple use case implementation for this functionality. The use case wraps all the needed steps to cover the functionality and make the controller implementation quite simple, focusing only on exception handling and request/response serialization. Closes #4449 **Type of change** (Please delete options that are not relevant. Remember to title the PR according to the type of change) - [ ] New feature (non-breaking change which adds functionality) - [X] Refactor (change restructuring the codebase without changing functionality) - [X] Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** (Please describe the tests that you ran to verify your changes. And ideally, reference `tests`) Test locally **Checklist** - [ ] I added relevant documentation - [X] I followed the style guidelines of this project - [X] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [X] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

frascuchon · 2024-01-09T09:21:36Z

Fixed by #4451

frascuchon mentioned this issue Dec 21, 2023

Bulk annotation in Feedback datasets #4050

Closed

frascuchon assigned jfcalvo Dec 21, 2023

frascuchon added this to the v1.22.0 milestone Dec 21, 2023

frascuchon mentioned this issue Dec 22, 2023

perf: improve upsert responses bulk #4451

Merged

11 tasks

frascuchon linked a pull request Jan 9, 2024 that will close this issue

perf: improve upsert responses bulk #4451

Merged

11 tasks

frascuchon closed this as completed Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Improve response upsert bulk endpoint performance #4449

[FEATURE] Improve response upsert bulk endpoint performance #4449

frascuchon commented Dec 21, 2023 •

edited

Loading

dosubot bot commented Dec 21, 2023 •

edited

Loading

frascuchon commented Jan 9, 2024

[FEATURE] Improve response upsert bulk endpoint performance #4449

[FEATURE] Improve response upsert bulk endpoint performance #4449

Comments

frascuchon commented Dec 21, 2023 • edited Loading

dosubot bot commented Dec 21, 2023 • edited Loading

Sources

frascuchon commented Jan 9, 2024

frascuchon commented Dec 21, 2023 •

edited

Loading

dosubot bot commented Dec 21, 2023 •

edited

Loading