Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Add LanceDB Datasource #44853

Commits on Apr 18, 2024

  1. add: New LanceDB datasource for Ray Data

    This PR adds a new datasource for Ray Data that reads from LanceDB.
    This datasource is a thin wrapper around the LanceDB Python client that allows users to read data from LanceDB into Ray Data.
    
    On branch anyscalebrent/lancedb_datasource
    Changes to be committed:
    	modified:   python/ray/data/__init__.py
    	modified:   python/ray/data/datasource/__init__.py
    	new file:   python/ray/data/datasource/lancedb_datasource.py
    	modified:   python/ray/data/read_api.py
    brent-anyscale committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    be6051e View commit details
    Browse the repository at this point in the history
  2. upd: datasource __init__

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/__init__.py
    	modified:   python/ray/data/datasource/lancedb_datasource.py
    brent-anyscale committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    2db26aa View commit details
    Browse the repository at this point in the history
  3. upd: read_api.py - fix linting errors with line length

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/read_api.py
    brent-anyscale committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    50f2bef View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2024

  1. upd: rename lancedb resources to lance resources

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/__init__.py
    	modified:   python/ray/data/datasource/__init__.py
    	renamed:    python/ray/data/datasource/lancedb_datasource.py -> python/ray/data/datasource/lance_datasource.py
    	modified:   python/ray/data/read_api.py
    brent-anyscale committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    43f3dd2 View commit details
    Browse the repository at this point in the history
  2. upd: Additional updates to remove DB from Lance resources

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/__init__.py
    	modified:   python/ray/data/datasource/lance_datasource.py
    brent-anyscale committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    4577925 View commit details
    Browse the repository at this point in the history
  3. upd: Additional updates to remove DB from lance name

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/__init__.py
    	modified:   python/ray/data/datasource/__init__.py
    brent-anyscale committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    e2d7419 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    94ce5f0 View commit details
    Browse the repository at this point in the history
  5. upd: Lance ReadAPI comment for AZ support

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/read_api.py
    
    Signed-off-by: Brent Bain <[email protected]>
    brent-anyscale committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    5f4b253 View commit details
    Browse the repository at this point in the history
  6. upd: Include limk to LanceDB docs in read_api.py

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/read_api.py
    brent-anyscale committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    486b71e View commit details
    Browse the repository at this point in the history
  7. upd: lance_datasource - remove header comment

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    brent-anyscale committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    55ce8e6 View commit details
    Browse the repository at this point in the history
  8. upd: Change init params to Optional instead of Unions

    Signed-off-by: Brent Bain <[email protected]>
    
    The __init__ method of the LanceDatasource class now uses Optional instead of Union for the parameters.
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    brent-anyscale committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    2afef9a View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2024

  1. upd: lance_datasource - change to use to_batches

    This change updates lance_datasource to a simpler implementation
    of to_batches.
    
    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    brent-anyscale committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    757113a View commit details
    Browse the repository at this point in the history
  2. upd: lance_datasource - set parallelism based on number of fragments

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    brent-anyscale committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    931c6fe View commit details
    Browse the repository at this point in the history
  3. upd: lance_datasource - change from yield to return

    Yield isn't working as expected. Changing back to return.
    
    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    brent-anyscale committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    5de8caa View commit details
    Browse the repository at this point in the history
  4. upd: lance_dataset comment - changing for consistent naming

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    brent-anyscale committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    db4c528 View commit details
    Browse the repository at this point in the history
  5. upd: lance_datasource - changed how fragment reading is performed

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    brent-anyscale committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    f39b18e View commit details
    Browse the repository at this point in the history
  6. upd: lance datasource - comments updated

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    brent-anyscale committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    9b07881 View commit details
    Browse the repository at this point in the history
  7. upd: lance_datasource Add storage options to pass to Lance

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    brent-anyscale committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    cf3e700 View commit details
    Browse the repository at this point in the history

Commits on Apr 26, 2024

  1. Configuration menu
    Copy the full SHA
    02b4835 View commit details
    Browse the repository at this point in the history
  2. upd: lance datasource

    Changes to lance_datasource parallelism handling.
    Added initial test for lance_datasource.
    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    	modified:   python/ray/data/read_api.py
    	new file:   python/ray/data/tests/test_lance.py
    brent-anyscale committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    072ca1c View commit details
    Browse the repository at this point in the history
  3. upd: lance tests linting updates

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/datasource/lance_datasource.py
    	modified:   python/ray/data/tests/test_lance.py
    brent-anyscale committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    eb07726 View commit details
    Browse the repository at this point in the history
  4. upd: data-test-requirements - add lancedb

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/requirements/ml/data-test-requirements.txt
    brent-anyscale committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    a26625a View commit details
    Browse the repository at this point in the history
  5. upd: data BUILD - and Lance test

    Signed-off-by: Brent Bain <[email protected]>
    
    Changes to be committed:
    	modified:   python/ray/data/BUILD
    brent-anyscale committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    9b0d4d1 View commit details
    Browse the repository at this point in the history