Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/add basic database support #2564

Merged
merged 106 commits into from
Mar 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
2fccff8
Add SQLAlchemy and alembic as dependencies
jfcalvo Feb 13, 2023
f9fd7bc
Alembic init
jfcalvo Feb 13, 2023
ce4af6b
Set SQLite as default database with alembic configuration
jfcalvo Feb 13, 2023
fb273d1
Add create users table migration
jfcalvo Feb 13, 2023
5f76c45
Add create organizations table migration
Feb 13, 2023
b2b463a
Add create_workspaces_table migrations
Feb 13, 2023
434b0da
Increase black line length configuration to 120
Feb 13, 2023
d2e12eb
Add create_users_organizations migration
jfcalvo Feb 14, 2023
0af1957
Add create_users_workspaces migration
jfcalvo Feb 14, 2023
b13bb6c
Add create_workspaces_organizations migration
jfcalvo Feb 14, 2023
f50aa89
Add database.py setup
frascuchon Feb 14, 2023
b4ca280
wip: Add user orm model
frascuchon Feb 14, 2023
002abb2
wip: Add authentication with orm user model
jfcalvo Feb 14, 2023
3e2eca9
Allow fetch user info by api key
frascuchon Feb 14, 2023
e3e0c03
Add organization orm model
jfcalvo Feb 15, 2023
e9bea4d
Remove unused alembic README
jfcalvo Feb 15, 2023
e7b8ba8
Clarify SQLite db at .gitignore
jfcalvo Feb 15, 2023
11d1215
Set nullable to False for workspaces name column
jfcalvo Feb 15, 2023
473858a
Add Workspace orm model and relationships with User model
jfcalvo Feb 15, 2023
5086739
- Move database models from folder to one single file.
jfcalvo Feb 15, 2023
4291233
Focus on User and Workspace models, removing Organization by now
jfcalvo Feb 15, 2023
fac06bd
Remove organization_id column from workspaces table migration
jfcalvo Feb 15, 2023
06ab131
Define users_workspaces table as Association Object
jfcalvo Feb 15, 2023
c03bd0e
Make password_reset_token optional
frascuchon Feb 15, 2023
02a7303
Add missing constraints and indexes for migrations
jfcalvo Feb 15, 2023
f319bf1
chore: Merge branch 'feat/add-basic-database-support' of github.com:r…
frascuchon Feb 15, 2023
f31f0da
Merge branch 'develop' into feat/add-basic-database-support
jfcalvo Feb 16, 2023
1482e53
Tests/make tests working with new database system (#2351)
frascuchon Feb 16, 2023
01d92a0
Refactor User model (#2355)
jfcalvo Feb 16, 2023
d5cefb2
Add new users creation endpoint (#2359)
jfcalvo Feb 17, 2023
a82e536
Improve database sessions management (#2360)
jfcalvo Feb 17, 2023
57791fa
Add new endpoint to list users (#2366)
jfcalvo Feb 20, 2023
863a647
Add new endpoint to delete users (#2363)
jfcalvo Feb 20, 2023
e886536
Add first iteration of workspace endpoints (#2375)
jfcalvo Feb 21, 2023
d72b2fa
Add timestamps columns and fields to database models (#2378)
jfcalvo Feb 21, 2023
030b40d
Change tests configuration to allow testing with new endpoints using …
jfcalvo Feb 23, 2023
20bafa8
Add explicit ascendent order by inserted_at column to collection endp…
jfcalvo Feb 23, 2023
ff1b92e
Add unique constraint to workspaces name column (#2396)
jfcalvo Feb 23, 2023
05f4395
Fix wrong method name when user is not found on authenticate_user fun…
jfcalvo Feb 23, 2023
3e701ac
Adding id and timestamps to user (#2413)
frascuchon Feb 24, 2023
1201d46
Add factories and additional tests for new endpoints (#2415)
jfcalvo Feb 28, 2023
17998d8
Add __repr__ implementation for database models (#2441)
jfcalvo Feb 28, 2023
e20de8c
Rename users_workspaces to workspaces_users (#2436)
jfcalvo Feb 28, 2023
f73159f
Improve field constraints for some Pydantic schemas (#2432)
jfcalvo Feb 28, 2023
310ab28
Merge from develop branch (#2445)
frascuchon Mar 1, 2023
889fafa
Tests will link users to workspaces when needed (#2446)
frascuchon Mar 1, 2023
82623cd
Use ARGILLA_DATABASE_URL to set sqlalchemy and alembic database setti…
jfcalvo Mar 1, 2023
f6ce9df
Merge branch 'develop' into feat/add-basic-database-support
frascuchon Mar 2, 2023
09bcee6
Add new function to get workspace by name (#2464)
jfcalvo Mar 2, 2023
ce9e0e5
Clean user pydantic model (#2462)
frascuchon Mar 2, 2023
7e03a64
chore: Merge branch 'develop' into feat/add-basic-database-support
frascuchon Mar 2, 2023
e3e0884
Add basic role support to users (#2467)
jfcalvo Mar 3, 2023
f36fea4
Remove extra fields in User (#2470)
frascuchon Mar 3, 2023
ce49014
Add new ARGILLA_HOME_PATH environment variable (#2468)
jfcalvo Mar 3, 2023
c3e8c08
Merge branch 'develop' into feat/add-basic-database-support
frascuchon Mar 3, 2023
a72e6fb
Add migration to create users on database from YAML config file (#2459)
jfcalvo Mar 3, 2023
68295ce
Add default user and show warning message on startup when no users fo…
jfcalvo Mar 6, 2023
24ec2ab
Add new task to execute alembic database migrations (#2489)
jfcalvo Mar 6, 2023
678de36
chore: Merge branch 'develop' into feat/add-basic-database-support
frascuchon Mar 7, 2023
7bb8466
Remove creation of default user from server startup (#2498)
jfcalvo Mar 7, 2023
89e28d5
Create task CLI for Argilla users creation (#2488)
frascuchon Mar 8, 2023
6b6fc31
Add basic authorization policies (#2491)
jfcalvo Mar 8, 2023
8b6d7b3
Add task to create default user on database (#2502)
jfcalvo Mar 8, 2023
7866e1e
Move and rename some files to users tasks package (#2507)
jfcalvo Mar 8, 2023
63fcdd5
First integration of new resources with current API endpoints (#2505)
frascuchon Mar 8, 2023
b7eb0c6
Add changes so alembic migrations works fine with argilla as a Python…
jfcalvo Mar 9, 2023
8d36ef1
Config alembic when running tests on GitHub actions
jfcalvo Mar 9, 2023
8c05058
Set environment variable ALEMBIC_CONFIG on GitHub workflow
jfcalvo Mar 9, 2023
3a790c1
Merge branch 'develop' into feat/add-basic-database-support
frascuchon Mar 9, 2023
463f244
Alembic migrations working with argilla as Python package (#2511)
frascuchon Mar 9, 2023
3ba05b8
Merge branch 'develop' into feat/add-basic-database-support
jfcalvo Mar 10, 2023
6e30453
feat: repair stub child component
keithCuniah Mar 10, 2023
302dada
Merge branch 'feat/add-basic-database-support' of https://github.com/…
keithCuniah Mar 10, 2023
83a11e9
Integrate datasets crud endpoints (#2510)
frascuchon Mar 13, 2023
ebcd381
Clean pydantic user class (#2518)
frascuchon Mar 13, 2023
084de8f
Support api-key and password as parameter when using users.create_def…
jfcalvo Mar 13, 2023
dcb5390
Return all workspaces for admin users (#2523)
frascuchon Mar 14, 2023
6718b01
Make workspace mandatory for dataset requests (#2529)
frascuchon Mar 14, 2023
066be20
Change Dockerfiles to support database changes (#2524)
jfcalvo Mar 14, 2023
8dcad98
Add admin password and admin api-key to release Dockerfile (#2538)
jfcalvo Mar 14, 2023
a374585
Merge branch 'develop' into feat/add-basic-database-support
frascuchon Mar 14, 2023
1c796dc
Add support to create workspaces for argilla users.create task (#2544)
jfcalvo Mar 15, 2023
ecb1568
Explicit setup to add alembic.ini in package distribution (#2542)
frascuchon Mar 15, 2023
e2b9865
Change users.migrate task to be a basic click command (#2548)
jfcalvo Mar 15, 2023
5922b49
users.create task will skip without error when user already exists in…
frascuchon Mar 16, 2023
fe8cc71
Add --api-key as parameter to argilla.tasks.users.create (#2555)
jfcalvo Mar 16, 2023
6175f5a
Change policies for dataset update (#2549)
frascuchon Mar 16, 2023
0f7660d
Review annotator policies for dataset settings (#2547)
frascuchon Mar 16, 2023
539c9ae
Fix Pydantic User schema full_name attribute to don't use last_name w…
jfcalvo Mar 16, 2023
325ecfd
Raise EntityAlreadyExistsError for new endpoint handlers (#2556)
jfcalvo Mar 16, 2023
0dea3af
Add first name and last name to user schema (#2546)
jfcalvo Mar 16, 2023
e120bc5
Merge branch 'develop' into feat/add-basic-database-support
frascuchon Mar 16, 2023
bebdc32
Adapt the quickstart image (#2559)
frascuchon Mar 16, 2023
5c14200
Merge branch 'develop' into feat/add-basic-database-support
frascuchon Mar 16, 2023
03d8c5b
Merge branch 'feat/add-basic-database-support' of github.com:recognai…
frascuchon Mar 16, 2023
f213694
Add argilla_data volume to docker compose (#2560)
jfcalvo Mar 16, 2023
59f9c18
Add missing environment variables to server configuration docs (#2563)
jfcalvo Mar 17, 2023
71eed13
Remove old tests
frascuchon Mar 17, 2023
f249cd3
Replace team.apikey with admin.apikey on the entire project (#2568)
jfcalvo Mar 21, 2023
b77d92b
Merge branch 'develop' into feat/add-basic-database-support
jfcalvo Mar 21, 2023
657c4e7
New User Management documentation (#2541)
frascuchon Mar 22, 2023
da5b4a2
Use admin workspace by default on Quickstart Docker image (#2579)
jfcalvo Mar 22, 2023
f042c2f
Remove unused superuser attribute (#2600)
frascuchon Mar 24, 2023
a3e991c
Add new ADMIN_ENABLED environment variable to release.Dockerfile (#2601)
jfcalvo Mar 24, 2023
0132ab4
Merge branch 'develop' into feat/add-basic-database-support
jfcalvo Mar 24, 2023
3b22819
Upgrade CHANGELOG.md with changes associated to database support (#2578)
jfcalvo Mar 24, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 10 additions & 12 deletions .github/workflows/package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@ on:
- "feature/**"
- "feat/**"

env:
# Increase this value to reset cache if etc/example-environment.yml has not changed
CACHE_NUMBER: 5
ALEMBIC_CONFIG: src/argilla/alembic.ini

jobs:
test-elastic:
name: Tests ElasticSearch
Expand Down Expand Up @@ -71,9 +76,6 @@ jobs:
with:
path: ${{ env.CONDA }}/envs
key: conda-${{ runner.os }}--${{ runner.arch }}--${{ steps.get-date.outputs.today }}-${{ hashFiles('environment_dev.yml') }}-${{ env.CACHE_NUMBER }}
env:
# Increase this value to reset cache if etc/example-environment.yml has not changed
CACHE_NUMBER: 2

- name: Update environment
if: steps.filter.outputs.python_code == 'true' && steps.cache.outputs.cache-hit != 'true'
Expand All @@ -82,9 +84,6 @@ jobs:
- name: Cache pip 👜
uses: actions/cache@v2
if: steps.filter.outputs.python_code == 'true'
env:
# Increase this value to reset cache if pyproject.toml has not changed
CACHE_NUMBER: 0
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ env.CACHE_NUMBER }}-${{ hashFiles('pyproject.toml') }}
Expand All @@ -98,6 +97,8 @@ jobs:
- name: Run tests 📈
if: steps.filter.outputs.python_code == 'true'
run: |
pip install -e ".[server,listeners]"
alembic upgrade head
pytest --cov=argilla --cov-report=xml
pip install "spacy<3.0" && python -m spacy download en_core_web_sm
pytest tests/monitoring/test_spacy_monitoring.py
Expand All @@ -115,6 +116,7 @@ jobs:
test-opensearch:
name: Test OpenSearch
runs-on: ubuntu-latest

strategy:
matrix:
version: [ 1.3, 2.3 ]
Expand Down Expand Up @@ -161,9 +163,6 @@ jobs:
with:
path: ${{ env.CONDA }}/envs
key: conda-${{ runner.os }}--${{ runner.arch }}--${{ steps.get-date.outputs.today }}-${{ hashFiles('environment_dev.yml') }}-${{ env.CACHE_NUMBER }}
env:
# Increase this value to reset cache if etc/example-environment.yml has not changed
CACHE_NUMBER: 2

- name: Update environment
if: steps.filter.outputs.python_code == 'true' && steps.cache.outputs.cache-hit != 'true'
Expand All @@ -172,9 +171,6 @@ jobs:
- name: Cache pip 👜
uses: actions/cache@v2
if: steps.filter.outputs.python_code == 'true'
env:
# Increase this value to reset cache if pyproject.toml has not changed
CACHE_NUMBER: 0
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ env.CACHE_NUMBER }}-${{ hashFiles('pyproject.toml') }}
Expand All @@ -188,6 +184,8 @@ jobs:
- name: Run tests 📈
if: steps.filter.outputs.python_code == 'true'
run: |
pip install -e ".[server,listeners]"
alembic upgrade head
pytest --cov=argilla --cov-report=xml
pip install "spacy<3.0" && python -m spacy download en_core_web_sm
pytest tests/monitoring/test_spacy_monitoring.py
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -130,9 +130,11 @@ sw.*
# Vim swap files
*.swp


yarn.lock
package-lock.json

# App generated files
src/**/server/static/

# Old users db file
.users.yml
34 changes: 34 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,40 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- `ARGILLA_HOME_PATH` new environment variable ([#2564]).
- `ARGILLA_DATABASE_URL` new environment variable ([#2564]).
- Basic support for user roles with `admin` and `annotator` ([#2564]).
- `id`, `first_name`, `last_name`, `role`, `inserted_at` and `updated_at` new user fields ([#2564]).
- `/api/users` new endpoint to list and create users ([#2564]).
- `/api/users/{user_id}` new endpoint to delete users ([#2564]).
- `/api/workspaces` new endpoint to list and create workspaces ([#2564]).
- `/api/workspaces/{workspace_id}/users` new endpoint to list workspace users ([#2564]).
- `/api/workspaces/{workspace_id}/users/{user_id}` new endpoint to create and delete workspace users ([#2564]).
- `argilla.tasks.users.migrate` new task to migrate users from old YAML file to database ([#2564]).
- `argilla.tasks.users.create` new task to create a user ([#2564]).
- `argilla.tasks.users.create_default` new task to create a user with default credentials ([#2564]).
- `argilla.tasks.database.migrate` new task to execute database migrations ([#2564]).
- `release.Dockerfile` and `quickstart.Dockerfile` now creates a default `argilladata` volume to persist data ([#2564]).

### Changed

- `ARGILLA_USERS_DB_FILE` environment variable now it's only used to migrate users from YAML file to database ([#2564]).
- `full_name` user field is now deprecated and `first_name` and `last_name` should be used instead ([#2564]).
- `password` user field now requires a minimum of `8` and a maximum of `100` characters in size ([#2564]).
- `quickstart.Dockerfile` image default users from `team` and `argilla` to `admin` and `annotator` including new passwords and API keys ([#2564]).
- Datasets to be managed only by users with `admin` role ([#2564]).

### Removed

- `email` user field ([#2564]).
- `disabled` user field ([#2564]).
- Support for private workspaces ([#2564]).
- `ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY` and `ARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD` environment variables. Use `python -m argilla.tasks.users.create_default` instead ([#2564]).

[#2564]: https://github.com/argilla-io/argilla/issues/2564

## [1.5.0](https://github.com/recognai/rubrix/compare/v1.4.0...v1.5.0) - 2023-03-21

### Added
Expand Down
3 changes: 2 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
include src/argilla/alembic.ini
graft src/argilla/server/static
prune docs
prune docs
15 changes: 12 additions & 3 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,24 @@ services:
ports:
- "6900:6900"
environment:
ARGILLA_HOME_PATH: /var/lib/argilla
ARGILLA_ELASTICSEARCH: http://elasticsearch:9200
# Opt-out for telemetry https://docs.argilla.io/en/latest/reference/telemetry.html
# ARGILLA_ENABLE_TELEMETRY: 0
# ARGILLA_ENABLE_TELEMETRY: 0 # Opt-out for telemetry https://docs.argilla.io/en/latest/reference/telemetry.html

# Set user configuration https://docs.argilla.io/en/latest/getting_started/installation/user_management.html
# ARGILLA_LOCAL_AUTH_USERS_DB_FILE: /config/.users.yaml
# volumes:
#- ${PWD}/.users.yaml:/config/.users.yaml

# DEFAULT_USER_ENABLED: false # Uncomment this line to disable the creation of the default user
# DEFAULT_USER_PASSWORD: custom-password # Uncomment this line to set a custom password for the default user
# DEFAULT_USER_API_KEY: custom-api-key # Uncomment this line to set a custom api-key for the default user
networks:
- argilla

volumes:
# ARGILLA_HOME_PATH is used to define where Argilla will save it's application data.
# If you change ARGILLA_HOME_PATH value please copy that same value to argilladata volume too.
- argilladata:/var/lib/argilla
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.5.3
environment:
Expand Down Expand Up @@ -48,8 +55,10 @@ services:
ELASTICSEARCH_HOSTS: '["http://elasticsearch:9200"]'
networks:
- argilla

networks:
argilla:
driver: bridge
volumes:
argilladata:
elasticdata:
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ Within Elastic, it is possible to create snapshots of a running cluster. We high

### Mount back-up volume

When deploying Elastic, we need to define a `path.repo` via setting this as an environment variable in your `docker-compose.yml` or by setting this in your `elasticsearch.yml`, and passing this as config. Additionally, we need to pass the same `path.repo` to a mounted volume. By default, we set this `elasticdata:/usr/share/elasticsearch/backups` because the `elasticsearch` user needs to have full permisions to act on the repo. Hence, setting the volume to something different might require some additional permission configurations. Note that the `minimum_master_nodes` need to be explicitly set when bound on a public IP.
When deploying Elastic, we need to define a `path.repo` via setting this as an environment variable in your `docker-compose.yaml` or by setting this in your `elasticsearch.yml`, and passing this as config. Additionally, we need to pass the same `path.repo` to a mounted volume. By default, we set this `elasticdata:/usr/share/elasticsearch/backups` because the `elasticsearch` user needs to have full permisions to act on the repo. Hence, setting the volume to something different might require some additional permission configurations. Note that the `minimum_master_nodes` need to be explicitly set when bound on a public IP.

#### `docker-compose.yml`
#### `docker-compose.yaml`

```yaml
elasticsearch:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ You can set following environment variables to further configure your server and

### Server

- `ARGILLA_HOME_PATH`: The directory where Argilla will store all the files needed to run. If the path doesn't exists it will be automatically created (Default: `~/.argilla`).

- `ARGILLA_DATABASE_URL`: A URL string that contains the necessary information to connect to a database. Argilla uses SQLite by default, PostgreSQL is also officially supported (Default: `sqlite:///$ARGILLA_HOME_PATH/argilla.db?check_same_thread=False`).

- `ARGILLA_ELASTICSEARCH`: URL of the connection endpoint of the Elasticsearch instance (Default: `http://localhost:9200`).

- `ARGILLA_ELASTICSEARCH_SSL_VERIFY`: If "False", disables SSL certificate verification when connection to the Elasticsearch backend.
Expand Down
Loading