[DOCS] add migration notebook to docs format #5002

burtenshaw · 2024-06-12T11:44:19Z

This PR transfer the notebook guide by @frascuchon and @nataliaElv over to the documentation format, which makes it easier to follow.

You can find the page here: http://localhost:8000/argilla-python/guides/how_to_guides/migrate_from_legacy_datasets/

n.b. the page is not available via the menu. @davidberenstein1957 we should figure out navigation presentation for this page.

The guide looks like this:

davidberenstein1957 · 2024-06-12T12:02:44Z

Hi @burtenshaw, looks good. Adding it to guides makes sense to me too.

I have not taken the time to look at the migration guides yet but I think we might be able to make the migration a lot more user-friendly by providing code that maps everything to the Datasets using the 2.0 SDK automatically.

TextClassification -> TextField ("name=text") | MultiLabel|LabelQuestion ("name=label") + metadata configs
TokenClassification -> TextField ("name=text") | SpanQuestion ("name=span") + metadata configs
Text2Text -> TextField ("name=text") | TextQuestion ("name=text") + metadata configs

It takes us some more time to figure out the logic and mappings but will save a lot of community members a lot of time and effort too. @frascuchon @burtenshaw WDYT?

nataliaElv · 2024-06-12T12:06:09Z

The document says that Feedback datasets do not need to be migrated, but if you have a server with legacy+feedback datasets I assume that one would like to move all of them over to the new server, right? @burtenshaw

We may also want to discuss what to do with the migration of predictions to suggestions in multi-label text classification datasets. If I'm not mistaken, with the current code all labels would be selected as suggestions (as in legacy datasets you can have a prediction for each label). Maybe we can add a threshold that the user can modify so only labels where score > threshold are added to the suggestions?

davidberenstein1957 · 2024-06-12T12:09:00Z

The document says that Feedback datasets do not need to be migrated, but if you have a server with legacy+feedback datasets I assume that one would like to move all of them over to the new server, right? @burtenshaw

@nataliaElv I don't think the FeedbackDataset need to be moved. The underlying server logic is the same but the way of interacting with the server through the new SDK as API is different.

davidberenstein1957 · 2024-06-12T12:10:34Z

We may also want to discuss what to do with the migration of predictions to suggestions in multi-label text classification datasets. If I'm not mistaken, with the current code all labels would be selected as suggestions (as in legacy datasets you can have a prediction for each label). Maybe we can add a threshold that the user can modify so only labels where score > threshold are added to the suggestions?

@nataliaElv Good catch. I think we need to provide a warning and on let the user choose to add the score processing mechanism, which might be as simple as choosing the highest or lowest one.

burtenshaw · 2024-06-12T12:52:46Z

TextClassification -> TextField ("name=text") | MultiLabel|LabelQuestion ("name=label") + metadata configs TokenClassification -> TextField ("name=text") | SpanQuestion ("name=span") + metadata configs Text2Text -> TextField ("name=text") | TextQuestion ("name=text") + metadata configs

This is a nice suggestion but I think it depends on use, and the metrics are declining on legacy datasets.

Personally, I would implement this based on community engagement with this guide.

burtenshaw · 2024-06-12T12:54:48Z

The document says that Feedback datasets do not need to be migrated, but if you have a server with legacy+feedback datasets I assume that one would like to move all of them over to the new server, right? @burtenshaw

@nataliaElv I don't think the FeedbackDataset need to be moved. The underlying server logic is the same but the way of interacting with the server through the new SDK as API is different.

@nataliaElv Is on to something here. The guide proposes that the v2 server is 'their new server'. So from a user persepctive, they would want their datasets migrated. I will add the a snippet that shows them how to client.datasets on all other datasets.

burtenshaw · 2024-06-12T12:56:22Z

We may also want to discuss what to do with the migration of predictions to suggestions in multi-label text classification datasets. If I'm not mistaken, with the current code all labels would be selected as suggestions (as in legacy datasets you can have a prediction for each label). Maybe we can add a threshold that the user can modify so only labels where score > threshold are added to the suggestions?

@nataliaElv Good catch. I think we need to provide a warning and on let the user choose to add the score processing mechanism, which might be as simple as choosing the highest or lowest one.

I would add this to the tab with multilabel text classification as a note box.

argilla/docs/guides/how_to_guides/migrate_from_legacy_datasets.md

Helper functions for record mapping have been reviewed and tested with several datasets.

argilla/docs/how_to_guides/migrate_from_legacy_datasets.md

codecov · 2024-06-20T08:49:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.93%. Comparing base (8e9f42b) to head (9c0768e).
Report is 39 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff              @@
##           develop    #5002       +/-   ##
============================================
+ Coverage    61.13%   91.93%   +30.80%     
============================================
  Files          328      135      -193     
  Lines        17670     5818    -11852     
============================================
- Hits         10802     5349     -5453     
+ Misses        6868      469     -6399

Flag	Coverage Δ
argilla	`?`
argilla-server	`91.93% <ø> (-0.09%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

argilla/docs/how_to_guides/migrate_from_legacy_datasets.md

Co-authored-by: Paco Aranda <[email protected]>

frascuchon · 2024-06-20T10:05:17Z

argilla/docs/how_to_guides/migrate_from_legacy_datasets.md

+## Migrating Feedback Datasets on your Argilla 1.* server
+
+As mentioned above, `FeedbackDataset`'s are compatible with Argilla V2 and do not need to be reformatted. However, you may want to migrate your feedback datasets to the new server so that you can deprecate your Argilla 1.* server. Here is a guide on how to migrate your feedback datasets:
+
+```python
+import argilla as rg
+
+# Initialize the API with an Argilla server less than 2.0
+old_client = rg.Argilla(old_server_api_url, old_server_api_key)
+new_client = rg.Argilla(new_server_api_url, new_server_api_key) 
+
+dataset_name = "feedback-dataset"
+old_dataset = old_client.datasets(name=dataset_name)
+new_dataset = new_client.datasets.add(old_dataset)
+
+# Load the records from the old server
+new_dataset.records.log(
+    old_dataset.records(
+        with_responses=True, # (1)
+        with_suggestions=True,
+        with_vectors=True,
+    )
+)
+```
+
+1. The `with_responses`, `with_suggestions`, `with_vectors`, and `with_metadata` flags are used to load the records with the responses, suggestions, vectors, and metadata respectively.


This scenario has several problems.

We cannot share resources between clients, or even create new resources from other resources. Those actions will modify the internal state of the original resource, so old_dataset settings will change when the new_dataset is created

Copy responses from one server to another requires a revision of response users. User IDs won't exist in the target server so we should provide a user mapping in the same way we do for legacy datasets.

Not sure how common would be this scenario. But I propose to skip this for now and think a bit more about how the SDK should work for this. cc @burtenshaw

Also, suggestions are linked to questions by a question ID. When we do:

new_dataset.records.log( list(old_dataset.records(with_responses=False, with_suggestions=True, with_vectors=True)), )

an error will be raised since the error contains wrong question info for the provided dataset.

burtenshaw changed the title ~~docs: add migration notebook to docs format~~ [DOCS] add migration notebook to docs format Jun 12, 2024

burtenshaw requested review from frascuchon and nataliaElv June 12, 2024 11:47

burtenshaw marked this pull request as ready for review June 12, 2024 11:47

frascuchon reviewed Jun 12, 2024

View reviewed changes

argilla/docs/guides/how_to_guides/migrate_from_legacy_datasets.md Outdated Show resolved Hide resolved

frascuchon reviewed Jun 12, 2024

View reviewed changes

argilla/docs/guides/how_to_guides/migrate_from_legacy_datasets.md Outdated Show resolved Hide resolved

frascuchon reviewed Jun 12, 2024

View reviewed changes

argilla/docs/guides/how_to_guides/migrate_from_legacy_datasets.md Outdated Show resolved Hide resolved

frascuchon reviewed Jun 13, 2024

View reviewed changes

argilla/docs/guides/how_to_guides/migrate_from_legacy_datasets.md Outdated Show resolved Hide resolved

burtenshaw added 2 commits June 13, 2024 12:44

docs: add migration notebook to docs format

9fb55b0

docs: add information for feedback datasets on existing servers

fd979dd

frascuchon force-pushed the docs/migration-guide-in-docs-format branch from 4452759 to fd979dd Compare June 13, 2024 10:45

frascuchon changed the base branch from docs/migration-notebook to feat/v2.0.0 June 13, 2024 10:45

frascuchon mentioned this pull request Jun 13, 2024

Docs: Migration notebook suggestions #4996

Closed

16 tasks

frascuchon and others added 3 commits June 13, 2024 13:06

Merge branch 'feat/v2.0.0' into docs/migration-guide-in-docs-format

781c530

Merge branch 'feat/v2.0.0' into docs/migration-guide-in-docs-format

dcced28

[DOCS] Update and define all record mapping functions (#5009)

f5794e3

Helper functions for record mapping have been reviewed and tested with several datasets.

frascuchon approved these changes Jun 18, 2024

View reviewed changes

nataliaElv reviewed Jun 18, 2024

View reviewed changes

argilla/docs/how_to_guides/migrate_from_legacy_datasets.md Show resolved Hide resolved

nataliaElv reviewed Jun 18, 2024

View reviewed changes

argilla/docs/how_to_guides/migrate_from_legacy_datasets.md Outdated Show resolved Hide resolved

Base automatically changed from feat/v2.0.0 to develop June 19, 2024 13:46

frascuchon reviewed Jun 20, 2024

View reviewed changes

argilla/docs/how_to_guides/migrate_from_legacy_datasets.md Show resolved Hide resolved

fix: fix errors in code snippets

7aebd54

frascuchon reviewed Jun 20, 2024

View reviewed changes

argilla/docs/how_to_guides/migrate_from_legacy_datasets.md Outdated Show resolved Hide resolved

frascuchon reviewed Jun 20, 2024

View reviewed changes

argilla/docs/how_to_guides/migrate_from_legacy_datasets.md Outdated Show resolved Hide resolved

burtenshaw and others added 2 commits June 20, 2024 11:03

Update argilla/docs/how_to_guides/migrate_from_legacy_datasets.md

5022318

Co-authored-by: Paco Aranda <[email protected]>

Update argilla/docs/how_to_guides/migrate_from_legacy_datasets.md

41ac96a

Co-authored-by: Paco Aranda <[email protected]>

frascuchon reviewed Jun 20, 2024

View reviewed changes

docs: drop migration of feedback datasets between servers

9c0768e

frascuchon merged commit 39779c1 into develop Jun 20, 2024
5 of 7 checks passed

frascuchon deleted the docs/migration-guide-in-docs-format branch June 20, 2024 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOCS] add migration notebook to docs format #5002

[DOCS] add migration notebook to docs format #5002

burtenshaw commented Jun 12, 2024 •

edited

Loading

davidberenstein1957 commented Jun 12, 2024

nataliaElv commented Jun 12, 2024

davidberenstein1957 commented Jun 12, 2024

davidberenstein1957 commented Jun 12, 2024

burtenshaw commented Jun 12, 2024

burtenshaw commented Jun 12, 2024

burtenshaw commented Jun 12, 2024

codecov bot commented Jun 20, 2024 •

edited

Loading

frascuchon Jun 20, 2024

frascuchon Jun 20, 2024

[DOCS] add migration notebook to docs format #5002

[DOCS] add migration notebook to docs format #5002

Conversation

burtenshaw commented Jun 12, 2024 • edited Loading

davidberenstein1957 commented Jun 12, 2024

nataliaElv commented Jun 12, 2024

davidberenstein1957 commented Jun 12, 2024

davidberenstein1957 commented Jun 12, 2024

burtenshaw commented Jun 12, 2024

burtenshaw commented Jun 12, 2024

burtenshaw commented Jun 12, 2024

codecov bot commented Jun 20, 2024 • edited Loading

Codecov Report

frascuchon Jun 20, 2024

Choose a reason for hiding this comment

frascuchon Jun 20, 2024

Choose a reason for hiding this comment

burtenshaw commented Jun 12, 2024 •

edited

Loading

codecov bot commented Jun 20, 2024 •

edited

Loading