-
Notifications
You must be signed in to change notification settings - Fork 175
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* sandbox rebuild v1 * switch * fix hpo 3sigma * after pre-commit * sandbox readme zh * finish doc * other_configs -> extra_configs * other_configs -> extra_configs * res_name -> meta_name * hooker -> hook * analyze -> analyse * after pre-commit * analyse -> analyze * analyser.py -> analyzer.py * analyser.py -> analyzer.py * analyser.py -> analyzer.py * regist -> register, DICT -> MAPPING
- Loading branch information
Showing
36 changed files
with
797 additions
and
489 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
type: dj_text_quality_classifier | ||
dataset_path: './outputs/demo-process/demo-processed.jsonl' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,68 @@ | ||
# Sandbox config example for dataset | ||
# Sandbox config example | ||
|
||
# global parameters | ||
project_name: 'demo-sandbox' | ||
dataset_path: './demos/data/demo-dataset.jsonl' # path to your dataset directory or file | ||
np: 4 # number of subprocess to process your dataset | ||
experiment_name: 'demo-sandbox-run0' # for wandb tracer name | ||
hpo_config: null # path to a configuration file when using auto-HPO tool. | ||
|
||
export_path: './outputs/demo-sandbox/demo-sandbox.jsonl' | ||
# configs for each job, the jobs will be executed according to the order in the list | ||
probe_job_configs: | ||
- hook: 'ProbeViaAnalyzerHook' | ||
meta_name: 'analysis_ori_data' | ||
dj_configs: 'configs/demo/process.yaml' | ||
extra_configs: | ||
# - hook: 'ProbeViaModelInferHook' | ||
# meta_name: 'analysis_ori_model' | ||
# dj_configs: | ||
# dataset_path: './demos/data/demo-dataset.jsonl' | ||
# export_path: './outputs/demo-sandbox/demo-sandbox.jsonl' | ||
# data_probe_algo: 'uniform' | ||
# data_probe_ratio: 0.5 | ||
# extra_configs: | ||
# (...model configs) | ||
|
||
# sandbox configs | ||
# for refining recipe using k-sigma rules | ||
path_k_sigma_recipe: './outputs/demo-sandbox/k_sigma_new_recipe.yaml' | ||
refine_recipe_job_configs: | ||
- hook: 'RefineRecipeViaKSigmaHook' | ||
meta_name: 'analysis_ori_data' | ||
dj_configs: 'configs/demo/process.yaml' | ||
extra_configs: | ||
path_k_sigma_recipe: './outputs/demo-process/k_sigma_new_recipe.yaml' | ||
# - hook: 'RefineRecipeViaModelFeedbackHook' | ||
# meta_name: | ||
# dj_configs: | ||
# extra_configs: | ||
# (...model configs) | ||
|
||
# for gpt3 quality classifier as data evaluator | ||
data_eval_config: 'configs/demo/sandbox/gpt3_data_quality_eval_config.yaml' | ||
#data_eval_config: | ||
# type: dj_text_quality_classifier | ||
execution_job_configs: | ||
- hook: 'ProcessDataHook' | ||
meta_name: | ||
dj_configs: './outputs/demo-process/k_sigma_new_recipe.yaml' | ||
extra_configs: | ||
- hook: 'TrainModelHook' | ||
meta_name: | ||
dj_configs: | ||
extra_configs: 'configs/demo/sandbox/gpt3_extra_train_config.json' | ||
|
||
# for gpt3 model training | ||
model_train_config: 'configs/demo/sandbox/gpt3_extra_train_config.json' | ||
|
||
# process schedule | ||
# a list of several process operators with their arguments | ||
process: | ||
- language_id_score_filter: | ||
lang: 'zh' | ||
min_score: 0.5 | ||
evaluation_job_configs: | ||
- hook: 'ProbeViaAnalyzerHook' | ||
meta_name: 'analysis_processed_data' | ||
dj_configs: 'configs/demo/process.yaml' | ||
extra_configs: | ||
# - hook: 'ProbeViaModelInferHook' | ||
# meta_name: 'analysis_trained_model' | ||
# dj_configs: | ||
# dataset_path: './demos/data/demo-dataset.jsonl' | ||
# export_path: './outputs/demo-sandbox/demo-sandbox.jsonl' | ||
# data_probe_algo: 'uniform' | ||
# data_probe_ratio: 0.5 | ||
# extra_configs: | ||
# (...model configs) | ||
- hook: 'EvaluateDataHook' | ||
meta_name: 'eval_data' | ||
dj_configs: | ||
extra_configs: 'configs/demo/sandbox/gpt3_data_quality_eval_config.yaml' | ||
# - hook: 'EvaluateModelHook' | ||
# meta_name: 'eval_model' | ||
# dj_configs: | ||
# oextra_configs: | ||
# (...model configs) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.