Skip to content

Releases: determined-ai/determined

0.13.3

08 Sep 22:47
Compare
Choose a tag to compare

Changelog

df113a3 chore: bump version: 0.13.3rc1 -> 0.13.3
1a76cad docs: More release notes for 0.13.3.
edafbd8 chore: bump version: 0.13.3rc0 -> 0.13.3rc1
fc046f1 docs: More release notes for 0.13.3.
ee14ebf fix: filter out non numeric metric values in WebUI [DET-4078] (#1258)
3838584 fix: match column key for experiment list name column for taglist renderer (#1263)
7b1a357 docs: Release notes for 0.13.3.
a636057 chore: bump version: 0.13.3.dev0 -> 0.13.3rc0
afaf7c9 chore: bump version: 0.13.2 -> 0.13.3.dev0
1cffc44 ci: add mypy and ci coverage to det-deploy local [DET-4089] (#1251)
6c7483a fix: add missing cluster_name for det-deploy local (#1249)

Docker images

  • docker pull determinedai/determined-master:0.13.3
  • docker pull determinedai/determined-master:df113a37
  • docker pull determinedai/determined-master:df113a3709f4ec3fc563a82f2a5c18692e7b107f
  • docker pull determinedai/determined-dev:determined-master-df113a37
  • docker pull determinedai/determined-dev:determined-master-df113a3709f4ec3fc563a82f2a5c18692e7b107f

0.13.2

03 Sep 17:57
Compare
Choose a tag to compare

Changelog

c6fb44e chore: bump version: 0.13.2rc4 -> 0.13.2
946fb37 docs: Release notes for 0.13.1 and 0.13.2. (#1241)
5c729b2 chore: bump version: 0.13.2rc3 -> 0.13.2rc4
26b3f97 fix: use correctly set visible devices for keras/estimators (#1235) [DET-4076]
4253736 chore: bump version: 0.13.2rc2 -> 0.13.2rc3
bb12b15 chore: migrate experiment list api [DET-3696, DET-3697, DET-3698] (#1228)
cf26ba9 chore: bump version: 0.13.2rc1 -> 0.13.2rc2
b36bcdb fix: set nonroot UID with k8s (#1233)
6170178 chore: bump version: 0.13.2rc0 -> 0.13.2rc1
945a73e chore: rename slotsPerNode to maxSlotsPerPod [DET-4066] (#1229)
4edbb8f chore: bump version: 0.13.2.dev0 -> 0.13.2rc0
3b27bf9 chore: bump version: 0.13.1.dev0 -> 0.13.2.dev0
c99bd55 docs: edits for k8s docs (#1227)
8a4702b test: remove flaky k8s unit tests (#1231)
4b4449f chore: lock static query map [DET-3633] (#1216)
fbc97ab fix: support loading old exp confs from GET /api/v1/experiment (#1209)
bc9adc9 docs: fix typo in kubernetes topic guide (#1219)
e790dbb docs: add k8s release notes (#1226)
7b391c1 fix: fix a deadlock when gracefully terminate a trial [DET-3879] (#1180)
125bc08 fix: address corner case in k8s resourceQueue [DET-4053] (#1218)
b6edaba fix: set default TF session in EstimatorTrial (#1213)
4041e5b feat: support config file for tensorboard [DET-3900] (#1191)
b551615 style: cleanup style issues [DET-3994, DET-4007] (#1169)
ac75355 docs: minor tweaks for cluster config / k8s docs (#1221)
4c45601 chore: update notebooks, tensorboards, api and docs to dev proxy address [DET-3995] (#1192)
af4a7cd feat: has validation filter [DET-3962] (#1171)
49a59bc docs: fixes for k8s docs (#1217)
8533bbc feat: modify k8 events to be more understandable [DET-3172] (#1215)
34a744f docs: Kubernetes docs [DET-3901, DET-3902, DET-3903, DET-3904, DET-3905] (#1151)
13deed4 fix: warn out when cathing SystemExit [DET-2956] (#1116)
c8b7b44 docs: fix download link (#1214)
ea8e2e0 feat: user preferences [DET-3719] (#1170)
7e55daf docs: fix broken external links (#1212)
c0078f8 fix: update filterable_view to unfilter shared agent jobs (#1211)
cbb8943 ci: dump master logs after webui tests (#1205)
c11eda5 fix: select metric based on config search metric then fallback to first available metric (#1162)
c9531f0 fix: fix loading tensorboard from prior versions [DET-4008] (#1201)
0b33f19 feat: support multi-GPU training < slotsPerNode for k8 RP (#1206)
cf5ec95 docs: update docs for multiple dtrain single node [DET-3822] (#1173)
05897ef ci: reduce e2e-cpu parallelism (#1204)
f373845 chore: remove a direct ruamel import (#1203)
25d17d3 docs: minor tweaks to native API tutorial for copy/paste friendliness (#1196)
7432f61 feat: add set experiment labels and description endpoints [DET-3891 DET-3890] (#1117)
2d48fbf fix: fix prior_batches_processed backfill migration [DET-4006] (#1200)
1d86e39 chore: convert owner to user in react code [DET-3519] (#1139)
a494cf5 chore: update trialWorkloadSequencer.checkpointModelCompleted to make sure all state changes are snapshotted (#1199)
e255979 fix: don't pass all environment variables through sshd (#1186)
ab671d1 test: fix flake with experiment_config_test (#1197)
c5f4d0a chore: dump docker logs before cluster-up times out (#1198)
9cddc6a ci: reduce e2e-cpu parallelization (#1194)
acd0788 feat: add GET /api/v1/experiment/id to the new api [DET-3753] (#1193)
9c2ad03 feat: support multi-dtrain single-node in host mode [DET-3821] (#1188)
20d4409 docs: configure rstfmt for formatting docs files [DET-3784] (#1024)
5730050 style: add more React focused linting (#1178)
eab3549 feat: dynamicaly update range based on provided data [DET-3923] (#1147)
6af6217 fix: don't double count failed trial completions [DET-3986] (#1176)
fc89a4c feat: support MaxSlots in k8 RP (#1189)
939b57e fix: apply filter limit to dashboard [DET-3973] (#1172)
73d9414 style: remove state filter groups [DET-3911] (#1168)
cb98745 chore: perform authentication check in react router [DET-3929] (#1181)
c5b1e77 docs: Improve docs for Checkpoint-related APIs (#1179)
3b10cad docs: add full experiment config example to remove_steps migration guide [DET-3993] (#1190)
b86142e chore: update WebUI screenshots [DET-3909] (#1183)
a5118b4 chore: update webui attributions list (#1182)
51a0750 fix: support alternate ruamel packaging. (#1187)
e43a508 ci: double e2e-cpu parallelism (#1185)
adb4d76 chore: remove unused constants in harness (#1184)
c89aba0 docs: add scheduling unit to migration table (#1177)
075e463 build: add default make target to react module (#1163)
735c552 feat: support default gpu and cpu pod specs in master.yaml [DET-3942] (#1165)
e4be85f docs: fixes for agent config and agent labels [DET-3958] (#1159)
c28a0df chore: upgrade black to version 20 (#1174)
81db2a5 chore: bump version: 0.13.0.dev0 -> 0.13.1.dev0
818c22c docs: Release notes for 0.13.0. (#1146)
80a4dc2 feat: override trial network mode to host only when multi-agent (#1167)
a9b99b4 feat: change scheduler to allow multiple dtrain jobs per node [DET-3819] (#1158)
08f5ca2 fix: enable pagination size picker when entries are at least 10 count [DET-3931] (#1140)
e7e19f9 feat: custom multi select [DET-3957] (#1161)
0764e94 feat: consolidate main nav [DET-2766, DET-2984, DET-3908] (#1034)
e74b410 fix: validate records_per_epoch exists when epochs are used (#1160)
3c8c047 fix: fix task openability check and a trial detail decoder mismatch [DET-3924 DET-3937] (#1148)
c0cef95 style: enlarge task list table overflow trigger target (#1141)
7d0ef9d fix: ensure properly wrapping in pytorch. [DET-3745] (#1114)
250e4f4 fix: return the original tf.keras model when not training. [DET-3718] (#1112)
abbb6ed docs: improve release note instruction (#1157)
f427e6d docs: add robots.txt (#1155)
2284753 docs: update Sphinx theme, enable Kickfire (#1154)
85fdb18 feat: log registry metadata and description changes [DET-3749] (#1153)
c21bf05 chore: generate sitemap for docs site. (#1125)
99530d0 feat: checkpoint metadata post request [DET-3916] (#1152)
dc6d855 docs: change checkpoint policy for FasterRCNN example (#1150)
fc45f4f feat: improve support for large number of trial on k8s [DET-3772, DET-3811, DET-3790, DET-3773, DET-3829] (#1086)
f56c431 fix: check that searcher validation metric values are scalars [DET-2134] (#1149)
377ebf3 feat: allow Python libraries in commands to use TLS for the master [DET-3899] (#1121)
277fe65 fix: remove id column from trial info table on trial detail page (#1145)
8f082ec feat: added docs for adaptive_asha [DET-3301] (#1106)
cdd2b72 fix: patch request body parsing for model metadata [DET-3939] (#1144)
3dd4704 docs: minor tweaks to remove steps migration guide (#1142)
ea736db docs: fixes for PyTorch + TensorBoard docs. (#1143)
535e0a1 docs: update examples doc (#1123)
fc87f69 fix: validate hyperparameter configs (#1138)
03c6727 docs: improve docs for tensorboard_timeout. (#1124)
d34f0be chore: remove mnist_tp adaptive config (#1137)
31ec0f2 fix: refrain from loading experiment configs where they aren't needed (#1136)
b13461e fix: Improve old experiment config upgrade logic [DET-3918 DET-3933 DET-3932] (#1135)
6ad5d76 fix: apply user authentication to stream endpoints [DET-3935] (#1134)
d41ae19 docs: add remove steps migration guide (#1132)
b25d997 fix: support shell with k8s (#1133)
7bfcf13 fix: show progress bar in experiment info box (#1131)
3ec871a chore: rename websocket actors to be more descriptive (#1122)
26a712d chore: add --verbose to twine uploads (#1130)
52679af fix: asha searcher progress for brackets with one rung (#1129)
b2655ff fix: better handle tasks with mixed TLS/non-TLS masters (#1128)
665c7ba fix: make master cert file world-readable in containers (#1127)
58ba985 fix: recursively search log dir for tfevents [DET-3915] (#1126)
83e090f chore: Add README for new proposed release note process. (#1113)

Docker images

  • docker pull determinedai/determined-master:0.13.2
  • docker pull determinedai/determined-master:c6fb44e0
  • docker pull determinedai/determined-master:c6fb44e0e4787eecae0674db5e09e8bd83f05f7d
  • docker pull determinedai/determined-dev:determined-master-c6fb44e0
  • docker pull determinedai/determined-dev:determined-master-c6fb44e0e4787eecae0674db5e09e8bd83f05f7d

0.13.1

31 Aug 23:34
Compare
Choose a tag to compare

Changelog

44461a7 chore: bump version: 0.13.1rc1 -> 0.13.1
8c97f68 docs: release notes for 0.13.1 (#1210)
dfb6686 chore: bump version: 0.13.1rc0 -> 0.13.1rc1
ff1eb49 chore: upgrade and apply black version
bf2c24d chore: bump version: 0.13.1.dev0 -> 0.13.1rc0
40eeea3 chore: bump version: 0.13.0 -> 0.13.1.dev0
c7b3e45 fix: fix loading tensorboard from prior versions [DET-4008] (#1201)
d30f3ff fix: fix prior_batches_processed backfill migration [DET-4006] (#1200)
1483598 fix: fix task openability check and a trial detail decoder mismatch [DET-3924 DET-3937] (#1148)

Docker images

  • docker pull determinedai/determined-master:0.13.1
  • docker pull determinedai/determined-master:44461a75
  • docker pull determinedai/determined-master:44461a75b0a5b2b4d66c009271dcb2da90a20481
  • docker pull determinedai/determined-dev:determined-master-44461a75
  • docker pull determinedai/determined-dev:determined-master-44461a75b0a5b2b4d66c009271dcb2da90a20481

0.13.0

21 Aug 03:15
Compare
Choose a tag to compare

Changelog

5ea6d3b chore: bump version: 0.13.0rc2 -> 0.13.0
09cc0d7 fix: remove id column from trial info table on trial detail page (#1145)
35b99ee docs: Release notes for 0.13.0. (#1146)
48ae2de chore: bump version: 0.13.0rc1 -> 0.13.0rc2
cb904d1 fix: patch request body parsing for model metadata [DET-3939] (#1144)
391e629 docs: minor tweaks to remove steps migration guide (#1142)
d3109f5 fix: validate hyperparameter configs (#1138)
34d1c75 fix: refrain from loading experiment configs where they aren't needed (#1136)
233db8e fix: Improve old experiment config upgrade logic [DET-3918 DET-3933 DET-3932] (#1135)
e7535a8 fix: apply user authentication to stream endpoints [DET-3935] (#1134)
f260022 docs: add remove steps migration guide (#1132)
5dc593a fix: support shell with k8s (#1133)
d8ed866 fix: show progress bar in experiment info box (#1131)
51786c4 chore: add --verbose to twine uploads (#1130)
b16fe72 chore: bump version: 0.13.0rc0 -> 0.13.0rc1
671b886 fix: better handle tasks with mixed TLS/non-TLS masters (#1128)
34804f8 fix: make master cert file world-readable in containers (#1127)
c4335d6 fix: recursively search log dir for tfevents [DET-3915] (#1126)
dda78fd chore: bump version: 0.13.0.dev0 -> 0.13.0rc0
9e73151 fix: don't panic when an unknown service is encountered (#1118)
a686a37 docs: various fixes for tf.keras docs. (#1119)
08338f6 fix: drop null values on trial metrics (#1120)
44fdf15 docs: add docs for behavior of keras's on_epoch_end callback (#1105)
4be2c4a chore: publish trial details page [DET-3843] (#1103)
d1114f3 fix: add explicit searcher sync before close trial [DET-3889] (#1101)
b341993 refactor: cleanup trial detail [DET-3683] (#1107)
ae271c4 chore: add a shell-entrypoint.sh (#1102)
c163364 fix: address scrolling issue when scrolling up after hitting the back to top button (#1108)
6314b62 style: fix table cell styles when there is only one cell (#1115)
b9cb988 feat: make sharding configurable in wrap_dataset (#1041)
7e129f4 feat: kill idle tensorboards [DET-3808] (#1104)
79fdaea fix: handle failed stack deletion (#1109)
f2a50ff fix: actually kill trials (#1111)
7a30cdb feat: add logging to agent to track usage [DET-3878] (#1110)
2364a06 fix: let shell proxying work with master TLS [DET-3882] (#1100)
bde1641 feat: improve shell cli for optional args (#1098)
2318ea0 fix: fix cluster resource pie chart color mapping [DET-3861] (#1094)
cc9a7c1 feat: trial table [DET-3019] (#1078)
c0b3fe5 chore: add kill trial endpoint [DET-3739] (#1071)
48c6230 feat: allow the harness to connect to the master over TLS [DET-3775] (#1076)
3fd156f terminating state now unaffected if transitioning to itself (#1084)
0ab58b5 fix: handle when config is undefined for trial length (#1096)
b3c7b8f fix: support shell with host networking mode (#1099)
69515e7 fix: run at least one batch per searcher op (#1095)
979d352 fix: proxy shell through master via HTTP CONNECT (#1067)
1e12e12 chore: update archive/unarchive endpoint to the new api (#1093)
065c1b0 chore: migrate continue trial form and remove steps [DET-3780 DET-3854] (#1050)
5093a2c feat: add total_batches_processed to /trials/{id}/details endpoint (#1089)
0b3b7e8 fix: record progress in partial units (#1091)
1b6c884 chore: add proxy urls to tensorboards and notebooks [DET-3736] (#1070)
ef223bd fix: support tensorboard in us-east-1 (#1092) [DET-3842]
9db9936 fix: hide table pagination when unnecessary (#1088)
f978a32 feat: repr methods for python classes [DET-3750] (#1068)
76f2299 fix: update num_batches to total_batches_processed (#1087)
7b9976d chore: bump version: 0.12.13.dev0 -> 0.13.0.dev0 (#1033)
d0b384a fix: handle failed checkpoints correctly in trial_workload_sequencer [DET-3853] (#1083)
b30a8ad feat: support user specified pod template specs [DET-3716, DET-3731] (#1005)
e6ee088 feat: add trial detail chart [DET-3013] (#1063)
26730b1 chore: unpin overly specific versions (#1085)
68604fb docs: minor fixes for data access tutorial (#1082)
3aac066 fix: fix typo in error message (#1081)
5351570 chore: delete useless test (#1080)
5982751 fix: update logout check to check via endpoint call (#1062)
701095e chore: extract training validation metrics (#1073)
933f7f7 feat: hide experiment max slots if it's unset (#1075)
45109ab fix: preserve rng state for all cloud access (#1079)
3485d82 feat: add ability to perform initial validation [DET-3584] (#1061)
f939b1a test: only send slack messages for master branch CI (#1077)
1721ea8 docs: add two new FAQs. (#1072)
ea939db feat: add gaea nas [DET-3553] (#824)
39726d6 fix: handle failed validations correctly [DET-3838] (#1069)
a5f325f feat: enable experiment detail [DET-3788] (#1065)
5ec42fe test: make nonroot tests stricter (#1060)
038e241 fix: trigger the correct click handler (#1064)
740c61e chore: infobox style update and refactor [DET-3796] (#1058)
6751572 fix: route experiment list links to react (#1059)
f6be408 feat: get checkpoints for trial endpoint [DET-3735] (#1052)
f25d1e1 feat: add archive and unarchive endpoints [DET-3582] (#1037)
1fd05a6 build: adjust resource deletion order [DET-3793] (#1056)
379f04c fix: update regex to handle newlines in the log messages (#1053)
64393f7 feat: get checkpoints for experiment endpoint (#1046)
8713952 docs: apply our own best practices in tutorials (#1054)
e1520c6 docs: tweak index page text (#1055)
2b0302e feat: make agent's TLS verification level configurable [DET-3774] (#1049)
b47cf66 chore: refactor API error handler to return the error. (#1031)
59339f8 chore: minor fixes for PyTorch GAN example. (#1047)
57f27df test: handle concurrent cloudwatch CI uploads (#1048)
87dd08e chore: make metric_maker test quieter (#1045)
9080ff9 fix: don't ignore SHA operations from promotion of exited trials [DET-3546] (#1022)
90d3b80 feat: add tensorboard sources to task list [DET-3760] (#1014)
5d9b4f5 fix: fix string formatting in k8 rp (#1042)
45034bd feat: enable list pages [DET-3705] (#992)
1f92f9f chore: change search_metric to val_acc for tf_keras_native_dtrain 4bb2ab6 [DET-3794] (#1044)
ab17de6 feat: get checkpoint endpoint [DET-3734] (#1040)
ca18eb5 chore: add nil check for validation metrics (#1043)
cfe24db docs: various fixes. (#1038)
c91614f refactor: convert io.null and io.undefined to ioNullOrUndefined [DET-3747] (#1036)
2ea28f2 chore: migrate trial Infobox [DET-3014 DET-3016] (#981)
9f1afa9 fix: dismiss trial log download warning modal upon confirmation (#1025)
bdee966 fix: filter out disabled slots for reporting (#1030)
eae0747 fix: show archive / unarchive options under task actions properly (#1020)
fd6410a docs: release notes for 0.12.13 (#1027) (#1032)
0b58bba chore: fix tensorpack nightly tests (#1029)
81338d1 chore: silence protobuf warnings. (#1028)
398cf66 feat: add experiment cancel and kill endpoints [DET-3581] (#1026)
7767524 feat: add master logs endpoint [DET-3680] (#1007)
920025c feat: launch notebook [DET-3741] (#991)
e5c975f feat: remove steps from the UX (#968)
2b88fa7 style: refine tables (#993)
049a3c1 style: polish experiment detail [DET-3517, DET-3621] (#974)
b68d83d docs: tweak intro to model registry tutorial (#1023)
6f3c765 fix: imagenet_gaea merge and rebase conflicts [DET-3769] (#1021)
4f2d35a chore: refactor experiment creation modal and add trial continue workflow (#971)
6978480 test: disable imagenet_nas_arch_pytorch for release [DET-3770] (#1018)
c71cfbc docs: promote PyTorch flexible primitives migration guide [DET-3771] (#1019)

Docker images

  • docker pull determinedai/determined-master:0.13.0
  • docker pull determinedai/determined-master:5ea6d3b9
  • docker pull determinedai/determined-master:5ea6d3b9fa3d982df30f84582ed0c4dc3d36ddb0
  • docker pull determinedai/determined-dev:determined-master-5ea6d3b9
  • docker pull determinedai/determined-dev:determined-master-5ea6d3b9fa3d982df30f84582ed0c4dc3d36ddb0

0.12.13

06 Aug 20:01
Compare
Choose a tag to compare

Changelog

cb8f5f0 chore: bump version: 0.12.13rc3 -> 0.12.13
3dc74cb docs: release notes for 0.12.13 (#1027)
d502eb2 chore: bump version: 0.12.13rc2 -> 0.12.13rc3
bb7e628 fix: don't ignore SHA operations from promotion of exited trials
ec6ea98 chore: bump version: 0.12.13rc1 -> 0.12.13rc2
8f5e8e7 docs: tweak intro to model registry tutorial (#1023)
73fea0d chore: bump version: 0.12.13rc0 -> 0.12.13rc1
a6c83a0 docs: promote PyTorch flexible primitives migration guide [DET-3771] (#1019)
f67552e test: disable imagenet_nas_arch_pytorch for release [DET-3770] (#1018)
4e156c2 chore: bump version: 0.12.13.dev0 -> 0.12.13rc0
f9e5b2c chore: rename wrap_lrscheduler and return original scheduler (#1017)
89e1183 fix: naming for model registry CLI commands and entities [DET-3767] (#961)
abcbe17 docs: fix object detection example (#1016)
f5d0f45 fixing random number divergence on gcs checkpoint upload (#1015)
2574f90 feat: add training tricks to imagenet gaea example (#617)
58f4aa0 feat: Enable save/restore of RNG and AMP state [DET-3066][DET-3742] (#1004)
e433331 docs: revisions for model registry tutorial (#1011)
c8e63dd fix: fix bugs in "det model describe" (#1012)
d8e6493 fix: fix PyTorchTrial warning messages (#1013)
dc6a773 chore: deprecate pytorch.reset_parameters [DET-3578] (#1003)
21b5de0 fix: unets example environment config [DET-3553] (#1010)
bb9abbb chore: bump version: 0.12.12.dev0 -> 0.12.13.dev0 (#998)
103b49b feat: support setting CUDA_VISIBLE_DEVICES for estimator_trial (#1001)
5e58e78 feat: clean and add pagination to templates API [DET-3754] (#1006)
72b8926 feat: add modal confirmation for trial logs download [DET-3740] (#999)
6a42ca2 fix: val_accuracy curve for unets_tf_keras example (#844)
01ccd12 feat: support private master images in helm (#1008)
f781320 fix: fix HOME in nonroot notebooks [DET-3751] (#1002)
bf46455 docs: add documentation for asynchronous adaptive searcher (#799)
ea95102 chore: migrate command task wait page over to react [DET-3704] (#990)
a6835d2 feat: add activate and pause experiment endpoints [DET-3580] (#864)
7cb3b25 test: add CI for k8s [DET-3652, DET-3653, DET-3654, DET-3655] (#975)
453e802 docs: reformat experiment config documentation. (#995)
d6fd77b fix: use random port for notebooks [DET-3648] (#996)
6cb1f33 feat: add task logs [DET-3661] (#964)
baa0148 docs: model registry tutorial (#916)
eb9b2e3 chore: Use more concise syntax for const hyperparameters in examples. (#994)
416c1ee fix: handle case where exit code is not set (#997)
70c4a09 fix: correct the decoder for /agents endpoint (#1000)
04643aa docs: make Pytorch primitive public [DET-3204] (#816)
a9a818d fix: fix missing override from CRA for storybooks (#978)
902a3e6 feat: add archived status column to Experiment list [DET-3540] (#967)
2d7d0c3 fix: include trial_id in get_checkpoints queries (#987)
b9fbbda docs: improvements to notebooks, environment docs (#983)
913fdc3 feat: add command infra endpoints (#986)
b23fa52 chore: update cloud images and default task envs. (#973)
2a8ca59 fix: support lists and dictionaries in make_metric [DET-3710] (#970)
ec57db3 fix: navbar cluster link (#980)
a473704 fix: undo a mistaken edit from 'fix make check' (#979)
b5a2d33 fix: update resource chart to use the correct library [DET-3721] (#976)
767fb01 fix: fix make check (#977)
36f9fc3 feat: support proxying for k8 [DET-3422] (#877)
d0178e0 feat: experiment chart [DET-3004] (#954)
8ccbfe6 chore: add missing get-deps and clean targets for api-ts-sdk (#933)
6b0da08 feat: support task and agent summaries for k8 RP [DET-3424, DET-3425] (#891)
7b6e42d feat: handle missing PodFailed messages from k8 (#946)
01f9562 chore: fetch upon task batch kill [DET-3665] (#956)
dc0072d feat: add tag list component and add read/write to experiment labels [DET-3544] (#960)
a1bc1d8 feat: add config modal to experiment detail [DET-3005] (#965)
4393c1d chore: install setuptools_scm before pytest [DET-3709] (#969)
0ab167a feat: Add support for G4 instances on AWS [DET-1771] (#953)
ced3f4c Revert "feat: add config modal to experiment detail"
1379bf1 feat: add config modal to experiment detail
4b58124 fix: fix decoder for generic commands (#963)
ddf4e32 fix: when trials are killed, do not rollback steps (#959)
0cc7821 fix: fix scroll to latest entry [DET-3673] (#936)
c5ee5b9 feat: experiment trials table [DET-3007] (#920)
a7ef3d7 chore: add function parameter linting rule (#958)
6b8f3e0 fix: correct io-ts types (#962)
46076c9 feat: add pending state to log viewer's download action [DET-3682] (#957)
5838371 feat: add trial details actions logs and Tensorboard (#931)
509d61e fix: Update version of Sphinx theme dependency. (#955)
a7e2310 chore: migrate checkpoint modal [DET-3006] (#897)
807b42c docs: minor tweaks for AWS installation docs (#950)
34d8c71 feat: add fork action and Monaco editor [DET-3568 DET-3127 DET-3660] (#901)
2717259 feat: show total resource count in cluster charts (#919)
274c628 chore: only copy current harness version to trial [DET-3681] (#952)
45c0c6b feat: forward k8 events to task logs [DET-3549] (#900)
1c210f9 docs: various fixes for model definition and API docs (#951)
591fb14 docs: tweak dev prerequisites (#947)
8d488ea feat: streaming trial logs [DET-3536] (#876)
9908907 feat: naive client-side trial logs download [DET-3521] (#867)
685213d docs: add discussion of AWS and GCP quotas. (#924)
4e9cbba chore: downgrade warning log to info and improve message (#944)
df3c1e7 fix: reorder react routes to match more specific routes first (#943)
1f82faf feat: cleanup existing kubernetes resources on start-up [DET-3421] (#873)
49887ab chore: open command tasks in a new tab by default. (#921)
00fd6c6 feat: support pod lifecycles and logs in k8 RP [DET-3418, DET-3420, DET-3417] (#865)
3ad4369 fix: clean up master/agent dependencies (#935)
aedd2ed feat: add support for kubernetes trial log entries (#925)
796449e docs: release notes for 0.12.12 (#932)
cb1675e feat: add enable_cors option to master config. bypass react dev proxy [DET-3638] (#913)
44d7578 docs: remove string optimizers from examples (#928)
9d6ec5e fix: check for sub analytics functions before making calls [DET-3636] (#892)
1562ab2 fix: restore previous checkpoint export functionality (#930)
44fc7e1 test: add trial logs navigation test (#910)
10aac23 feat: add basics for trial details page [DET-3011] (#922)
7c55512 chore: suppress eslint personal config (#918)
f11ef7f feat: automatically set container proxy env variables [DET-3414] (#838)
b75442a feat: add custom reducers to estimators [DET-3098] (#923)

Docker images

  • docker pull determinedai/determined-master:0.12.13
  • docker pull determinedai/determined-master:cb8f5f0
  • docker pull determinedai/determined-master:cb8f5f038e9defb85d4d6bc066374f1828511206
  • docker pull determinedai/determined-dev:determined-master-cb8f5f0
  • docker pull determinedai/determined-dev:determined-master-cb8f5f038e9defb85d4d6bc066374f1828511206

0.12.12

23 Jul 19:49
Compare
Choose a tag to compare

Changelog

231769f More Go checksum fixups for dependencies.
96b5572 fix: clean up master/agent dependencies (#935)
5ba4c3a chore: bump version: 0.12.12rc3 -> 0.12.12
ecbd9ac chore: bump version: 0.12.12rc2 -> 0.12.12rc3
c0c319e fix: restore previous checkpoint export functionality (#930)
73684d8 chore: bump version: 0.12.12rc1 -> 0.12.12rc2
c61998d docs: remove string optimizers from examples (#928)
e1fcc11 docs: release notes for 0.12.12 (#932)
4dc4f64 chore: bump version: 0.12.12rc0 -> 0.12.12rc1
858cbe3 feat: add custom reducers to estimators [DET-3098] (#923)
3aef9f4 chore: bump version: 0.12.12.dev0 -> 0.12.12rc0
27be82a fix: fix nonroot dtrain and nonroot shell [DET-3111] (#871)
014ccff docs: minor fixes for adaptive HP topic guide (#912)
0688853 chore: update log viewer to trigger scrolled-to-top event when scroll top is close enough (#879)
6636a6f chore: loosen pyzqm requirements (#917)
0d6ca73 chore: bump CLI GitPython dependency from 2.1.11 to 3.1.3 [DET-3499] (#915)
fe180bf docs: Minor fixes for AWS topic guide. (#904)
321709d test: add additional tests for async adaptive (#818)
97744cd Revert "feat: support custom reducers for estimators (#837)" (#914)
56df7d2 feat: update helm chart for k8 RP [DET-3542] (#882)
fad06e9 feat: support custom reducers for estimators (#837)
7af6533 fix: upgrade lodash to fix vulnerability (#903)
8fc97c3 fix: fix a parsing problem with tasks start time [DET-3657] (#890)
ef81e34 fix: fix log viewer timestamp copy paste [DET-3631, DET-3632, DET-3634, DET-3641] (#889)
b97a331 docs: remove duplicate entry from API reference (#909)
eafbe2f fix: fix a react build problem (#911)
75322c1 docs: remove incorrect statement. (#905)
ff355f1 docs: reference documentation for the model registry (#907)
218d473 feat: experiment list batch [DET-3001] (#866)
2c6eb8c docs: fixes for examples, remove tf-cifar tutorial (#902)
fa236e8 chore: upgrade react dependencies [DET-3649] (#894)
10fe7e4 feat: add experiment detail actions [DET-3083] (#858)
cbc6423 chore: bump task container versions [DET-3576, DET-3556] (#899)
724a53c feat: add basic trial details endpoint consumption [DET-3640] (#884)
a059e57 test: fix pytorch parallel (#896)
3f6ed8e feat: update Pytorch checkpoint exporting API [DET-3465] (#842)
a2743ef test: skip master logs test for now, unable to diagnose flake (#878)
7975cc5 ci: work around there existing no distributed tests (#888)
2818426 fix: use local log line ids for trial logs (#893)
27bfc08 chore: validate segment key (#880)
635d96d docs: add docs on data access for dtrain [DET-3506] (#872)
fa5fae6 fix: improve CLI's custom certificate handling [DET-3630] (#883)
2339878 feat: add experiment info box [DET-3554 DET-3012] (#841)
c832154 adds register_version cli command (#881)
84a914e ci: enable multi-node testing [DET-3444] (#852)
a4b784a feat: add --head option for printing trial logs [DET-3527] (#875)
1038186 chore: remove figure options from plotly (#874)
3f8de80 refactor: add get-or-else support (#811)
b8e7987 chore: upgrade agent VM image to newer kernel version
9e1664c feat: support addTask and startTask for k8 RP [DET-3416, DET-3419] (#798)
f38b84c test: upload cloud watch CI logs to S3 [DET-3515] (#855)
37d4973 ci: move react api copy command over to build step from get-deps [DET-3565] (#856)
d989428 feat: add simple Tensorboard launch action to UI [DET-3231] (#836)
8e58136 fix: minor spelling fix for a filename (#860)
2ed285b chore: fix a low severity lodash security vulnerability (#851)
394e317 feat: model versions sdk and CLI [DET-3477] [DET-3480] (#861)
36c8bae ci: fix docs publish (#849)
5a2acc6 fix: don't accept string optimizers for multi-GPU tf keras [DET-3567] (#859)
1e707df fix: use TF Tensorboard writer by default [DET-3353] (#857)
260ffe8 docs: fix get models documentation (#846)
c1f02c2 chore: bump version: 0.12.11.dev0 -> 0.12.12.dev0 (#853)
37bd64d chore: set up browser NDJSON stream consumption [DET-3451] (#815)
e514f7e test: update unit tests for Pytorch flexible primitives [DET-3200] (#829)
922105d docs: release notes for 0.12.11 (#850)
634ad5b chore: add response headers to bust cache for elm and react index.html (#847)
9d41b9e fix: update examples link (#845)
bbdf964 feat: remove steps from pytorch callbacks [DET-3252] (#831)
ecbdde7 feat: don't silence api errors in dev (#840)
ed64384 feat: directly consume Swagger generated TS client [DET-3535 DET-3552] (#819)
f275f8e chore: link react trial logs for improved rendering performance [DET-3530] (#834)
d2b5e00 fix: metrics for unets tf_keras example [DET-3553] (#843)
053dfa3 feat: react trial logs [DET-3128] (#830)

Docker images

  • docker pull determinedai/determined-master:0.12.12
  • docker pull determinedai/determined-master:231769f
  • docker pull determinedai/determined-master:231769f96ab30c710231cc26552cad264c899a35
  • docker pull determinedai/determined-dev:determined-master-231769f
  • docker pull determinedai/determined-dev:determined-master-231769f96ab30c710231cc26552cad264c899a35

0.12.11

08 Jul 23:04
Compare
Choose a tag to compare

Changelog

5993e2e chore: bump version: 0.12.11rc2 -> 0.12.11
ba62016 chore: bump version: 0.12.11rc1 -> 0.12.11rc2
b03e21c fix: update examples link (#845)
cd2e4dd chore: add response headers to bust cache for elm and react index.html (#847)
b309be9 chore: bump version: 0.12.11rc0 -> 0.12.11rc1
1746c44 chore: link react trial logs for improved rendering performance [DET-3530] (#834)
a7b4c25 feat: react trial logs [DET-3128] (#830)
e1171b6 chore: bump version: 0.12.11.dev0 -> 0.12.11rc0
dad64cc ci: update webui e2e tests to kill experiment instead of cancel (#835)
6920dbd feat: add allgather_metrics to EstimatorContext (#826)
8f45512 test: add nightly test for pytorch flexible primitive example [DET-3534] (#827)
e501ea0 feat: experiment list filter [DET-2999, DET-3000] (#796)
a1b494e feat: model versions endpoints [DET-3478] (#822)
21fb956 fix: fix an issue with cluster resource computation [DET-3509] (#832)
1c5d151 feat: added cli logging to native (#833)
6843c8f feat: add unets tf.keras example [DET-3397] (#825)
a9d7007 feat: clean up swagger spec (#823)
afc6e3f revert: added cli logging to native. (#821)
3d42608 docs: add example for Pytorch flexible primitives [DET-3202] (#778)
0c99a0c feat: added cli logging to native [DET-3316] (#788)
9918ae1 refactor: dissolve experiment table and task table (#791)
f40b75e docs: improve docs for graceful trial termination (#809)
e9721d0 fix: correct the active task counter on dashboard [DET-3510] (#804)
3f599df docs: add warning for max_slots [DET-3145] (#814)
d9ed73f feat: add preview search to new API (#813)
523b6b8 style: update master logs [DET-3471] (#793)
be5e99f feat: add experiments details page and endpoint [DET-3003] (#795)
6c59f30 Revert "feat: add preview search to new API (#777)" (#812)
d7d9176 fix: update the comment reference. (#802)
f44b527 feat: add preview search to new API (#777)
5fbb4a3 feat: support follow flag in trial logs (#810)
0c1dcd3 chore: bump version: 0.12.10.dev0 -> 0.12.11.dev0
d44261e fix: don't set Segment key to quotes (#803)
ee08c46 docs: update docs for estimator callbacks [DET-3461] (#800)
1836016 feat: support Pytorch multiple optimizers and LR schedulers [DET-3194, DET-3195, DET-3196, DET-3197, DET-3198] (#807)
ef1406c ci: ensure all release jobs have the proper filters (#805)
fad2ffd revert: support Pytorch multiple optimizers and LR schedulers (#806)
b860646 feat: support Pytorch multiple optimizers and LR schedulers [DET-3194, DET-3195, DET-3196, DET-3197, DET-3198] (#707)
2ae78e1 docs: release notes for 0.12.10 (#786)
7c51a47 docs: improve shared fs checkpoint exporting documentation. [DET-3392] (#797)
ad61ccd fix: retry if upload fails with requests.exceptions.ConnectionError [DET-3358] (#792)
7aea74c chore: log failed trial's trial logs when experiment succeeds [DET-3501]
6131fc8 fix: check for analytics library (#794)
a6f4114 feat: model registry create CLI (#787)
2e10c64 feat: task list batch [DET-3224] (#780)
d3f27c3 chore: refactor master to send batches in RUN_STEP [DET-3253] (#704)
5e48ae8 ci: remove cypress logs (#763)
498044c chore: point cluster and master logs routes to react (#757)
14396a1 fix: fix broken docs examples link [DET-3462] (#785)
e18fc6a feat: model registry describe and list CLI (#781)
89e8fb0 feat: task list search [DET-3222] (#768)
fcbeec4 fix: add missing sort-fix eslint plugin (#775)
ce31e56 style: update task table styles (#773)
d965527 build: swap wget for curl and add it as a dependency (#784)
e629043 build: add a missing dependency step (#783)
06d0850 feat: generate and use swagger typescript client [DET-3249 DET-3324 DET-3355] (#691)

Docker images

  • docker pull determinedai/determined-master:0.12.11
  • docker pull determinedai/determined-master:5993e2e
  • docker pull determinedai/determined-master:5993e2e0b866d8b4123bc8361d29fd5baa212756
  • docker pull determinedai/determined-dev:determined-master-5993e2e
  • docker pull determinedai/determined-dev:determined-master-5993e2e0b866d8b4123bc8361d29fd5baa212756

0.12.10

27 Jun 00:35
Compare
Choose a tag to compare

Changelog

ba5f7fb chore: bump version: 0.12.10rc3 -> 0.12.10
67e55da chore: bump version: 0.12.10rc2 -> 0.12.10rc3
6200680 fix: fix broken docs examples link [DET-3462] (#785)
7c7a0ca docs: release notes for 0.12.10 (#786)
9fb197c chore: bump version: 0.12.10rc1 -> 0.12.10rc2
aef27f5 chore: bump version: 0.12.10rc0 -> 0.12.10rc1
f0c8f6e chore: bump version: 0.12.10.dev0 -> 0.12.10rc0
ddeeddd feat: paginated CLI trial logs [DET-3442] (#779)
c481350 feat: add asha searcher (#735)
b751ebe fix: send Terminate response after on_trial_close callback [DET-3433] (#772)
d91c51f fix: fix relative asset paths for swagger-ui [DET-3437] (#764)
3477ac8 fix: avoid unnecessary re-rendering on each agent poll [DET-3427] (#760)
a93c794 fix: don't terminate container gang immediately when one container exits [DET-3435] (#774)
9e80a86 feat: add filter on task list page [DET-3223] (#756)
2263178 fix: ignore stale termination timeouts in trial (#769)
06538e6 feat: model python class (#767)
4d5d185 feat: add experiments page and table [DET-2998 DET-3015] (#742)
2d660c9 fix: hide master logs on elm (#770)
92aff8d chore: bump version: 0.12.9.dev0 -> 0.12.10.dev0
d829644 chore: bump version: 0.12.8.dev0 -> 0.12.9.dev0
f321a2b docs: update the path to example configs (#765)
2242d7b feat: add trial logs to the new API [DET-3308] (#766)
ea4f4f1 refactor: abstract task filters to be reusable (#748)
4c78789 feat: list models endpoint [DET-3278] (#762)
af948b5 feat: registry patch (#759)
f5e4e49 fix: dev sidebar (#754)
588aead chore: clean up GET agents endpoint (#758)
8e5f03a fix: checkpoint workload fails if upload fails (#752)
4a80514 style: antd style adjustments (#732)
5e93770 refactor: separate reusable task table columns (#741)
20af9b1 fix: import typo (#753)
e5ef920 fix: set a fallback array for computing available resources [DET-3411] (#751)
0362b9d feat: model registry get and post (#743)
5d6cf83 chore: don't filter metrics with "/" (#749)
153a9d4 chore: raise Eslint check level for sort and unused variable rules. (#750)
4aa9229 fix: shared_fs checkpoint validation (#746)
4bce303 feat: add GET experiments endpoints to new api (#717)
699e685 feat: show dev pages in development (#744)
c51e3eb feat: add no action state to task action dropdown [DET-3381] [DET-3393] (#725)
d9f443b style: fix lint issues (#740)
fee5bbf feat: add GBT TF Estimator example (#727)
ee4ec9f fix: add coalesce for checkpoints with null metadata [DET-3400] (#747)
54b6f6c feat: add top level resource provider [DET-3179, DET-3180] (#684)
cc7c68a refactor: abstract reading and writing to clipboard [DET-3396] (#736)
b0a13fa style: add linting rule to require await for async functions (#739)
7a6bc0a feat: make master & db deployable via helm [DET-3294] (#728)
d2a4c2b refactor: extract icon filter buttons from dashboard to be a reusable component (#734)
edab561 chore: clean up Pytorch LR Scheduler helper [DET-3270] (#715)
402915f refactor: separate task types [DET-3395] (#737)
2eeb4c4 style: add linting to prevent multi-spaces (#738)
bdf0036 style: fix task card, menu and dropdown styles to be uniform [DET-3286] (#723)
112ed39 feat: logs component and master logs [DET-2997 DET-3041] (#626)
e85c8bc refactor: rename asha to sha (#733)
c75e693 feat: model registry database migration [DET-3277] (#724)
e59a5a1 feat: increase GLOO timeout [DET-3309] (#729)
0acab3e fix: "det-deploy local agent-up" works for remote master [DET-3386] (#730)
657ef14 feat: link to swagger-ui from WebUI (#726)
b562a35 fix: correctly set steps for eval in EstimatorTrial (#731)
cb4090e chore: simplify logic in patchUsername (#702)
85a8501 feat: add tasks table component and task list page [DET-3221] (#652)
1c22922 feat: show experiments in increments [DET-3320] (#703)
9b2b8a9 docs: various improvements for checkpoint documentation (#718)
05b5eda feat: retry ConnectionError and ProtocolError types for GCS upload (#722)
ae6e5cd chore: remove is_chief calculation for non-horovod distributed training [DET-3338] (#705)
6d8c07c fix: use custom TLS cert only for Determined API requests [DET-3360] (#716)
f01c17c feat: add webui version mismatch notification on elm (#697)
dadde23 fix: change cache busting mechanism on react to query string (#696)
67db1b8 docs: adjust Keras documentation to indicate support for model.stop_training (#714)
e346c95 docs: add info to topic guide for graceful trial termination [DET-3361] (#713)
4bb60d5 feat: add user endpoints to new api (#689)
60ced66 fix: learning rate scheduler fix for bert squad example [DET-2897] (#711)
ee4ba43 feat: add a timeout to trial termination [DET-3246] (#690)
091bd09 chore: update webui test dependencies (#706)
4f71eb4 fix: handle auth check cancelation (#710)
26d10d5 feat: add context decorators and fix task cards [DET-2982] (#682)
63803a9 chore: better logging for websocket failures (#709)
e41fa59 fix: upgrade scheme when using websockets (#708)
559b504 fix: add missing directory in Swagger config path [DET-3312] (#680)
8f7b68e fix: correctly use mixed precision with multi-GPU in PyTorchTrial [DET-3285] (#699)
5aa0eb3 chore: bump version: 0.12.7.dev0 -> 0.12.8.dev0
b3d40e3 chore: bump version: 0.12.6.dev0 -> 0.12.7.dev0
ffb5de0 ci: ensure npm ci does not dirty package-lock.json via npm-force-resolution (#694)
4a349a3 docs: minor fixes for TensorBoard docs (#700)
28ef4b4 feat: add cluster page with donut charts [DET-2985] (#618)
aa48768 feat: store test cluster logs and improve test readme [DET-3269] (#657)
5b68448 docs: release notes for 0.12.7 (#701)
4b26ce7 fix: fix nightly tests file locations (#698)
39d7f18 docs: checkpoint metadata [DET-3211] (#671)
3b12c87 feat: add det user change-username to CLI [DET-3322] (#692)
a331567 fix: data caching by rank for distributed setting [DET-2897] (#693)
c33df86 chore: bump task environments version (#695)
983546d fix: fix broken examples tests [DET-3321] (#688)
b43ffe1 docs: add explanation of det-nobody user (#686)
4cd1d6e refactor: restructure examples [DET-3126] (#673)
eca9e21 fix: Fix typo in terraform files for max_agent_starting_period (#685)
2f8f2c6 docs: document on_trial_close estimator hook (#683)
5ada60a chore: add User-Facing API Change label reminder (#676)
5a7e2e7 fix: apply same model compilation args to trial and native mode in TfKerasTrial [DET-3314] (#681)

Docker images

  • docker pull determinedai/determined-master:0.12.10
  • docker pull determinedai/determined-master:ba5f7fb
  • docker pull determinedai/determined-master:ba5f7fb0b580a300bb888e10c52d4b098a111e7f
  • docker pull determinedai/determined-dev:determined-master-ba5f7fb
  • docker pull determinedai/determined-dev:determined-master-ba5f7fb0b580a300bb888e10c52d4b098a111e7f

0.12.8

13 Jun 00:32
Compare
Choose a tag to compare

Changelog

c8497c6 chore: bump version: 0.12.8rc0 -> 0.12.8
cd5a66e chore: bump version: 0.12.8.dev0 -> 0.12.8rc0
60cc187 chore: bump version: 0.12.7 -> 0.12.8.dev0
5909230 chore: bump task environments version (#695)
01e56a5 docs: add explanation of det-nobody user (#686)
c7533a0 fix: Fix typo in terraform files for max_agent_starting_period (#685)
97a26a2 docs: document on_trial_close estimator hook (#683)

Docker images

  • docker pull determinedai/determined-master:0.12.8
  • docker pull determinedai/determined-master:c8497c6
  • docker pull determinedai/determined-master:c8497c6bde3bdc7121d3a2071e88814153a61555
  • docker pull determinedai/determined-dev:determined-master-c8497c6
  • docker pull determinedai/determined-dev:determined-master-c8497c6bde3bdc7121d3a2071e88814153a61555

0.12.7

11 Jun 20:43
Compare
Choose a tag to compare

Changelog

d770579 chore: bump version: 0.12.7rc0 -> 0.12.7
19bf22e chore: bump version: 0.12.7.dev0 -> 0.12.7rc0
1ca3a87 docs: release notes for 0.12.7 (#701)
55a81f4 chore: bump version: 0.12.6.dev0 -> 0.12.7.dev0
31c0edc docs: add RPM package install documentation (#674)
e3757af feat: support IndexedSlices for multi-GPU TF2 training [DET-3186] (#608)
052da17 build: build storybooks as part of CI [DET-3248] (#622)
7d9a115 feat: enable sign in button when last username is recalled (#679)
44103f9 fix: update /info to not require auth (#677)
642b851 feat: checkpoint export from database fields (#664)
9b78b54 chore: remove unused variables (#675)
1d03c75 fix: update state labels to be more user-friendly (#672)
481d72a fix: eagerly update experiments on successful write actions [DET-3263] (#642)
3b07678 feat: add task list page route and placeholder [DET-3220] (#636)
4c2d0a6 feat: remember last logged in username [DET-3274] (#660)
18c8125 refactor: set up experiments context [DET-3255] (#640)
5e5b188 chore: add license to pip metadata (#669)
05aa3d2 feat: support TF Keras EarlyStopping callbacks [DET-3240] (#666)
4056146 docs: add to FAQ how to port a TF core graph model (#650)
c8bb942 feat: support Estimator early stopping hooks [DET-3239] (#661)
3ab90a6 test: temporarily disable AMP test since it causes NaNs (#670)
629f106 feat: treat NaN metrics as an error (#667)
db76932 fix: set auth cookie path to apply site wide (#668)
6588f77 feat: decouple agent information from workloads starting tasks [DET-3178] (#631)
f604a28 feat: read cookies in the new API auth module (#665)
9da1063 fix: space out WebUI plot x-axis ticks a bit more (#658)
b9d9324 feat: support early stopping callbacks on a validation step (#662)
cfb3f51 feat: add user auth to new api (#649)
414bfdf fix: set authentication failure reason synchronously. (#659)
ed94d86 feat: decouple agents from transmitting container status changes [DET-3174] (#646)
f27146a fix: address minor login issues (#611)
d014500 revert: "revert: "feat: support stopping training in trial code [DET-3238] (#648)" (#654)" (#656)
44a398a feat: ensure WebUI version is up to date with platform version (#632)
5baea6a revert: "feat: support stopping training in trial code [DET-3238] (#648)" (#654)
ee1314f feat: support stopping training in trial code [DET-3238] (#648)
fa09a74 ci: download protoc install to /tmp (#653)
9759ce7 docs: release notes for 0.12.5 (#595) (#651)
5f476df chore: remove yarn mentions from tests (#635)
8662fda fix: correct filename in Elm Makefile (#647)
0e7ca0a feat: add checkpoint metadata to cli describe commands (#645)
84e875a test: fix nightly nas and iris tf keras tests [DET-3264] (#644)
4ff9fa0 feat: checkpoint metadata api (#619)
cbbe117 chore: move proto files to determined namespace (#639)
fafd686 feat: add template endpoints to new api (#638)
4bad652 feat: support USER_CANCELLED exited reason (#637)
d1146d3 refactor: update link to support secure blank targets (#612)
f71d64e feat: add page component [DET-3232] (#614)
25e725e feat: support gradient clipping in PyTorchTrial via callbacks (#615)
80e39d0 feat: add antd breadcrumb stories [DET-3002] (#582)
5c9afa2 feat: add activate, pause, and cancel actions to task cards [DET-2934] (#585)
a3e121a feat: add end of training callback to EstimatorTrial (#621)
8056055 feat: make agent starting period configurable [DET-3219] (#624)
8fdc371 chore: upgrade proto libraries (#630)
bdfd980 fix: correct logic for checking if a validation is the best one seen (#601)
f590fc3 chore: remove container recovery (#629)
a8c1bb2 feat: add master endpoint to new api (#627)
678d53d chore: ignore pkg dir in proto sub project (#628)
65b5c17 chore: bump version: 0.12.5.dev0 -> 0.12.6.dev0 (#625)
13c0db2 chore: move proto to separate top level package (#620)
897f2f6 revert: make agent starting period configurable [DET-3219] (#623)
7f83e97 feat: make agent starting period configurable [DET-3219] (#610)
b01b560 fix: read docker config file from HOME directory (#587)
e0d0447 feat: make GCP operation tracker timeout configuration [DET-3182] (#598)
0011218 feat: add agent endpoints for new api (#613)
b08657e test: set seed for fashion mnist nightly convergence test (#616)
92ecfc0 fix: pass checkpoint gc metadata as a file (#606)
52b006e feat: decouple container logs from agents (#604)
3379eca test: remove WebUI e2e-tests dependency on det-deploy [DET-3072 DET-2652] (#575)
c5c5eaf fix: simplify login and logout (#553)
976617e refactor: remove additional determined routing [DET-3216] (#609)
6dcfde7 refactor: separate API configs (#584)
4050020 feat: initial grpc support (#552)
fd34fec chore: resolve new node security vulnerabilities (#607)
3bb6b83 feat: support early trial termination (#586)
88c3fbb test: create a test suite for examples (#597)
5474ac4 ci: fix upload-try-now-template (#599)
2c457aa ci: fix changelog generation (#603)
eb057e2 docs: various fixes for Native API docs
9e17699 fix: synchronize before gradient clipping in PyTorch (#602)
fa97c3b fix: properly stringify optional public message (#590)
159c41f docs: update sphinx theme version
924f1d5 feat: BERT on SQuAD Dataset (#574)
1e7025b docs: fix checkpoint load default path
31bcf57 fix: avoid saving pytorch model architecture (#594)
445f5cd docs: clarify documentation for agent startup script
21b1832 docs: fixes for PyTorch API docs.
beba5f8 feat: use str instead of pathlib.Path in checkpoint callbacks
173e48f fix: enable logging with --local --test mode (#589)
d415758 fix: use "Agent ID" instead of "Agent Name" in CLI.

Docker images

  • docker pull determinedai/determined-master:0.12.7
  • docker pull determinedai/determined-master:d770579
  • docker pull determinedai/determined-master:d770579b5ab09c662fa5325b535a8d4e202d7564
  • docker pull determinedai/determined-dev:determined-master-d770579
  • docker pull determinedai/determined-dev:determined-master-d770579b5ab09c662fa5325b535a8d4e202d7564