Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Improve response upsert bulk endpoint performance #4449

Closed
Tracked by #4050
frascuchon opened this issue Dec 21, 2023 · 2 comments · Fixed by #4451
Closed
Tracked by #4050

[FEATURE] Improve response upsert bulk endpoint performance #4449

frascuchon opened this issue Dec 21, 2023 · 2 comments · Fixed by #4451
Assignees
Labels
area: api Indicates that an issue or pull request is related to the Fast API server or REST endpoints status: wontfix Indicates that work won't continue on an issue or pull request team: backend Indicates that the issue or pull request is owned by the backend team type: improvement Indicates updates or improvements on existing features
Milestone

Comments

@frascuchon
Copy link
Member

frascuchon commented Dec 21, 2023

The new bulk endpoint for response creation takes a long time to process the request (around 6 seconds for 100 responses). This may affect the global server performance. The bottleneck must be identified and fixed as soon as possible.

cc @jfcalvo

Copy link

dosubot bot commented Dec 21, 2023

🤖

Hey there @frascuchon! I'm Dosu, your friendly neighborhood bot, created to assist with bugs, answer questions, and guide you in your journey as a contributor. Just think of me as the roadie before the main act - the human maintainers. I'm here to help keep the show running smoothly while they're getting set to take the stage. Let's rock this code together! 🎸

Based on the current implementation of the response upsert bulk endpoint in Argilla, there are several strategies that could be employed to improve its performance:

  1. Batch Processing: Instead of processing each response one by one, you can process them in batches. This can significantly reduce the number of database operations and thus improve performance.

  2. Parallel Processing: If the processing of each response is independent of others, you can use parallel processing to speed up the operation. This can be achieved by using Python's concurrent.futures or asyncio libraries.

  3. Database Optimization: You can optimize your database queries to improve performance. This can include using indexes for faster lookup, denormalizing your data, or using stored procedures.

  4. Caching: If you have some data that is used frequently and does not change often, you can cache it to reduce the number of database hits.

However, the actual implementation of these strategies would depend on the specific requirements and constraints of your application.

The current implementation of the response upsert bulk endpoint is the create_current_user_responses_bulk function. This function is a POST request to the "/me/responses/bulk" endpoint. It takes in a ResponsesBulkCreate object in the request body, which contains multiple response items to be created or updated.

For each item in the request body, it retrieves the corresponding record using the _get_record function. It then authorizes the current user to create a response for the record. After authorization, it calls the upsert_response function from the datasets module to create or update the response. If any error occurs during this process, it appends a ResponseBulk object with the error detail to the responses_bulk_items list. If no error occurs, it appends a ResponseBulk object with the created or updated response to the responses_bulk_items list.

Finally, it returns a ResponsesBulk object containing all the ResponseBulk items in the responses_bulk_items list.

You can find the code for the create_current_user_responses_bulk function in the responses.py file.

As for the current performance of the response upsert bulk endpoint, I wasn't able to find specific metrics within the repository. However, implementing the strategies mentioned above should help improve its performance.

If you need further clarification or have additional questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added area: api Indicates that an issue or pull request is related to the Fast API server or REST endpoints team: backend Indicates that the issue or pull request is owned by the backend team type: improvement Indicates updates or improvements on existing features status: wontfix Indicates that work won't continue on an issue or pull request labels Dec 21, 2023
frascuchon added a commit that referenced this issue Jan 8, 2024
<!-- Thanks for your contribution! As part of our Community Growers
initiative 🌱, we're donating Justdiggit bunds in your name to reforest
sub-Saharan Africa. To claim your Community Growers certificate, please
contact David Berenstein in our Slack community or fill in this form
https://tally.so/r/n9XrxK once your PR has been merged. -->

# Description

This PR applies some performance improvements related to the bulk upsert
responses endpoint since some environments are experimenting with a high
execution time when running this operation with 100 items in the
request.

The performance improvements are:

1. Read all requested records at once instead of once at a time.
2. Stop checking the existence of the search index since is taking
longer.

These improvements are applied based on the profiling results using the
test ...

Also, since the current API controller is doing many things, a new
design proposal is applied in the implementation by defining a simple
use case implementation for this functionality. The use case wraps all
the needed steps to cover the functionality and make the controller
implementation quite simple, focusing only on exception handling and
request/response serialization.

Closes #4449 

**Type of change**

(Please delete options that are not relevant. Remember to title the PR
according to the type of change)

- [ ] New feature (non-breaking change which adds functionality)
- [X] Refactor (change restructuring the codebase without changing
functionality)
- [X] Improvement (change adding some improvement to an existing
functionality)

**How Has This Been Tested**

(Please describe the tests that you ran to verify your changes. And
ideally, reference `tests`)

Test locally

**Checklist**

- [ ] I added relevant documentation
- [X] I followed the style guidelines of this project
- [X] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [X] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the `CHANGELOG.md` file (See
https://keepachangelog.com/)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@frascuchon frascuchon linked a pull request Jan 9, 2024 that will close this issue
11 tasks
@frascuchon
Copy link
Member Author

Fixed by #4451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: api Indicates that an issue or pull request is related to the Fast API server or REST endpoints status: wontfix Indicates that work won't continue on an issue or pull request team: backend Indicates that the issue or pull request is owned by the backend team type: improvement Indicates updates or improvements on existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants