Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try 40K dataset (4x data) #73

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions data/data.xml.dvc
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
md5: f66db4fac66a93d4feaa939b4506c3ab
md5: 25c4a84510a41557840c61692dd14c11
frozen: true
deps:
- path: get-started/data.xml
repo:
url: https://github.com/iterative/dataset-registry
rev_lock: 705bc71a0a13c47b9e5147a3524fafc41f8ac7fa
rev_lock: cf6481baf56f156aa0876709cc231aaf3f3a3c29
rev: get-started-40K
outs:
- md5: 22a1a2931c8370d3aeedd7183606fd7f
size: 14445097
- md5: 4bd325a30d5f1d5ea1a451d98767ddde
size: 59918667
hash: md5
path: data.xml
38 changes: 19 additions & 19 deletions dvc.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ stages:
deps:
- path: data/data.xml
hash: md5
md5: 22a1a2931c8370d3aeedd7183606fd7f
size: 14445097
md5: 4bd325a30d5f1d5ea1a451d98767ddde
size: 59918667
- path: src/prepare.py
hash: md5
md5: f54d670ac8a4f63206781fc31d1f2651
Expand All @@ -18,38 +18,38 @@ stages:
outs:
- path: data/prepared
hash: md5
md5: 153aad06d376b6595932470e459ef42a.dir
size: 8437363
md5: f8934609be51496ee500f80eea539c6f.dir
size: 35339221
nfiles: 2
featurize:
cmd: python src/featurization.py data/prepared data/features
deps:
- path: data/prepared
hash: md5
md5: 153aad06d376b6595932470e459ef42a.dir
size: 8437363
md5: f8934609be51496ee500f80eea539c6f.dir
size: 35339221
nfiles: 2
- path: src/featurization.py
hash: md5
md5: e22789fc9581cad11ef7a6fa3aa3f17b
size: 4158
params:
params.yaml:
featurize.max_features: 200
featurize.max_features: 500
featurize.ngrams: 2
outs:
- path: data/features
hash: md5
md5: 4281fdd8e973e3bbe5abc0ae10adebc7.dir
size: 2232588
md5: c9308f114f6a8f06fb5ba2b40ea81678.dir
size: 12597137
nfiles: 2
train:
cmd: python src/train.py data/features model.pkl
deps:
- path: data/features
hash: md5
md5: 4281fdd8e973e3bbe5abc0ae10adebc7.dir
size: 2232588
md5: c9308f114f6a8f06fb5ba2b40ea81678.dir
size: 12597137
nfiles: 2
- path: src/train.py
hash: md5
Expand All @@ -63,27 +63,27 @@ stages:
outs:
- path: model.pkl
hash: md5
md5: 46f38c08e3d5174e5e3fb8753994d38b
size: 1957931
md5: b568c889ca6a5719632188daa0bfd513
size: 3365545
evaluate:
cmd: python src/evaluate.py model.pkl data/features
deps:
- path: data/features
hash: md5
md5: 4281fdd8e973e3bbe5abc0ae10adebc7.dir
size: 2232588
md5: c9308f114f6a8f06fb5ba2b40ea81678.dir
size: 12597137
nfiles: 2
- path: model.pkl
hash: md5
md5: 46f38c08e3d5174e5e3fb8753994d38b
size: 1957931
md5: b568c889ca6a5719632188daa0bfd513
size: 3365545
- path: src/evaluate.py
hash: md5
md5: a1a59f55636170fb56e0c6afd3e28fa4
size: 3315
outs:
- path: eval
hash: md5
md5: 0deddc7fb86151e1cb4684e93be58f70.dir
size: 1292365
md5: 6fe98138454e84433ffaf097fc5cfd51.dir
size: 4964239
nfiles: 8
2 changes: 1 addition & 1 deletion params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ prepare:
seed: 20170428

featurize:
max_features: 200
max_features: 500
ngrams: 2

train:
Expand Down
Loading