Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cursor based pagination in PiccoloCRUD #15

Open
dantownsend opened this issue Jan 21, 2021 · 8 comments
Open

Implement cursor based pagination in PiccoloCRUD #15

dantownsend opened this issue Jan 21, 2021 · 8 comments
Labels
enhancement New feature or request

Comments

@dantownsend
Copy link
Member

Currently, only offset based pagination is supported in PiccoloCRUD (using the __page and __page_size query params). There are issues with this, as outlined in this article.

Modify PiccoloCRUD so it accepts a __cursor GET parameter. The value will be the ID of a row for now, but encoded as string. Encoding it as a string gives more flexibility (for example, we may want to return a UUID instead of an integer in the future).

The response should then be something like:

{
    "rows": [...],
    "next_cursor: "1234"
}

If both the __page and __cursor GET params are passed to an endpoint, an error should be returned.

@dantownsend dantownsend added the enhancement New feature or request label Jan 21, 2021
@dantownsend
Copy link
Member Author

Related - piccolo-orm/piccolo#47

@sinisaos
Copy link
Member

sinisaos commented Feb 4, 2021

Hi Daniel
I’ve never encountered cursor pagination before, but it looks interesting to me. I played around with it a bit, and it seems to work if I didn’t miss the point. I changed a bit of code in PiccoloCRUD endpoints.py.

  1. adding cursor model to serializers.py
# cursor model
class Cursor(pydantic.BaseModel):
    next_cursor: str
  1. adding cursor to Param class cursor: str = ""
  2. adding cursor to _split_params method
if key == "__cursor":
    try:
        cursor = str(value)
    except ValueError:
        logger.info(f"Unrecognised __cursor argument - {value}")
    else:
        response.cursor = cursor
    continue
  1. adding cursor pagination and base64 encoding and decoding for next_cursor value
async def _get_all(self, params: t.Optional[t.Dict[str, t.Any]] = None) -> Response:

    # rest of unchanged code

    # Pagination
    cursor = split_params.cursor
    page_size = split_params.page_size or self.page_size
    page = split_params.page

    # raise error if both __page and __cursor in query parametars
    if "__cursor" in params and "__page" in params:
        return JSONResponse(
            {
                "error": "Can't have both __page and __cursor as query parametars",
            },
            status_code=403,
        )

    # Cursor pagination

    # query where limit is equal to page_size plus one
    query = query.limit(page_size + 1)
    rows = await query.run()
    # initial cursor
    next_cursor = self.encode_cursor(str(rows[-1]["id"]))
    if cursor:
        # query parametar cursor
        next_cursor = self.decode_cursor(cursor)
        # querying by query parametar order
        if order_by and order_by.ascending:
            query = query.where(self.table.id >= int(next_cursor)).limit(
                page_size + 1
            )
        else:
            query = query.where(self.table.id <= int(next_cursor)).limit(
                page_size + 1
            )
        rows = await query.run()
        # reset cursor to new value from latest value from rows
        # if no more further results provide empty string as next_cursor
        next_cursor = (
            ""
            if len(rows) <= page_size
            else self.encode_cursor(str(rows[-1]["id"]))
        )

    # LimitOffset pagination

    # if page_size greater than max_page_size return error
    if page_size > self.max_page_size:
        return JSONResponse(
            {
                "error": "The page size limit has been exceeded",
            },
            status_code=403,
        )
    query = query.limit(page_size)
    if page > 1:
        offset = page_size * (page - 1)
        query = query.offset(offset).limit(page_size)
    rows = await query.limit(page_size).run()
    # We need to serialise it ourselves, in case there are datetime
    # fields.
    cursor_model = Cursor(next_cursor=next_cursor).json()
    json = self.pydantic_model_plural(include_readable=include_readable)(
        rows=rows
    ).json()
    return (
        CustomJSONResponse(f"{json[:-1]}, {cursor_model[1:]}")
        if "__cursor" in params
        else CustomJSONResponse(json)
    )

###########################################################################

def encode_cursor(self, cursor):
    cursor_bytes = cursor.encode("ascii")
    base64_bytes = base64.b64encode(cursor_bytes)
    return base64_bytes.decode("ascii")

def decode_cursor(self, cursor):
    base64_bytes = cursor.encode("ascii")
    cursor_bytes = base64.b64decode(base64_bytes)
    return cursor_bytes.decode("ascii")

Sorry for the long comment, but what do you think about this? If it's OK I can do PR, if it's not, just forget it :). Cheers.

@dantownsend
Copy link
Member Author

@sinisaos Great job!

I haven't used cursor based pagination before, but I do know it's the recommended way as limit + offset isn't very performant in Postgres, and there are weird edge cases where you can miss rows, or get duplicate rows.

If you make a pull request, I'll have a play around. I'm interested to try it out! Thanks.

@sinisaos
Copy link
Member

sinisaos commented Feb 4, 2021

@dantownsend Thank you. I'll do it later tonight.

@satishdash
Copy link

Any updates on the implementation here.:)

@sinisaos
Copy link
Member

sinisaos commented Aug 5, 2021

@satishdash All progress is at #34, but there has been no activity lately. If you have any code or ideas on how to improve it, feel free to contribute and make pull request.

@dantownsend
Copy link
Member Author

@satishdash Yeah, it has stalled a bit.

Cursor based pagination is really complex - @sinisaos did a great job of getting it to work, but I still struggle to wrap my head around it entirely! Here is the implementation:

#34

The problem is it adds a lot of logic to PiccoloCRUD. What I'd like to do is somehow move this logic into a separate paginator class, like Django REST Framework does:

https://github.com/encode/django-rest-framework/blob/98e56e0327596db352b35fa3b3dc8355dc9bd030/rest_framework/pagination.py#L577

That way we can add the new functionality, whilst keeping PiccoloCRUD maintainable. But I haven't been able to make much progress on it recently.

@sinisaos
Copy link
Member

@dantownsend This is done in separate package. You can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants