Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: concerns with "Persisting" experiments #2960

Closed
2 tasks
Tracked by #2911
jorgeorpinel opened this issue Oct 24, 2021 · 25 comments
Closed
2 tasks
Tracked by #2911

guide: concerns with "Persisting" experiments #2960

jorgeorpinel opened this issue Oct 24, 2021 · 25 comments
Labels
C: guide Content of /doc/user-guide type: discussion Requires active participation to reach a conclusion. type: enhancement Something is not clear, small updates, improvement suggestions

Comments

@jorgeorpinel
Copy link
Contributor

Follow-up to #2845

Term/Concept of "persisting"

Per #2845 (comment)

First, "persisting" usually means to continue something despite it being difficult. We want it to mean to make something (that was ephemeral) persistent. So not sure it's a good title already.

One option is to be explicit, avoid term "persist", and say "commit experiments to Git". This is a probably good first step regardless.

Then we should consider all the reasons why people want to do commit experiments and generalize them into a term. Currently we only mention sharing experiments, but then this whole content should be inside https://dvc.org/doc/user-guide/experiment-management/sharing-experiments instead (which I doubt is right but maybe?)

Other

@jorgeorpinel jorgeorpinel added type: enhancement Something is not clear, small updates, improvement suggestions C: guide Content of /doc/user-guide labels Oct 24, 2021
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 24, 2021

re #2845 (comment)

What do you think about "retrieving experiments"? dvc exp apply and dvc exp branch are both about getting experiment results into a user's normal workflow

@dberenbaum retrieving them from where? I think I get the idea but it's also not super intuitive. I think we either we find a self-explanatory term that covers all motives, or we keep it explicit i.e. "commit the exp".

@iesahin
Copy link
Contributor

iesahin commented Nov 2, 2021

IMHO "commit" is too generic and overused.

Users don't only commit to share, they bring experiments to the standard Git workflow.

@dberenbaum 's "retrieving" is meaningful in this context, we retrieve them from DVC to Git.

How about "consolidating experiments"? 😄

@jorgeorpinel
Copy link
Contributor Author

I wouldn't call "commit" generic. It's relates specifically to version control. And it's a pretty clear way to express what we're trying to say: We sell DVC Experiments as something that doesn't pollute your Git history (no commits) so it makes sense to say how to actually commit some if needed.

they bring experiments to the standard Git workflow

Currently we only mention sharing experiments, we should probably emphasize git flow in general more then.

How about "consolidating experiments"? 😄

Vague. My vote sticks with the very explicit and correct "commit" unless there's a much better alternative.

@iesahin
Copy link
Contributor

iesahin commented Nov 17, 2021

I wouldn't call "commit" generic.

It's generic in Git context and refs/exps/12ABC... like hashes can also be considered commits. (Stashes are also commits in this sense.) Commit doesn't only mean that something is produced with git commit, it may also mean a kind of object in Git and DVC experiments are creating those objects, therefore technically creating commit objects but don't register them to the usual Git references. (.git/refs/tags, .git/refs/heads, etc.)

So using commit here will require to make this distinction: "experiments are commit objects but not yet committed to Git." For example, "carrying experiments to Git" or "moving experiments to Git" doesn't have this loaded meaning for "commit."

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Nov 23, 2021

This discussion has become overcomplicated and too Git-focused. We're off topic.

To recap: "to persist exps" is a strange term and we want something better. @dberenbaum proposed "to retrieve exps" but it's unclear why "retrieve" (where are they being pulled from?). I proposed "commit experiments" but you think it's a generic term @iesahin

It's not generic. It's a specific and simple Git action: git commit. This does not refer to any other objects such as custom refs or stashes (btw those are not commits but again, it's an irrelevant discussion I think). And even if you're a Git expert AND happen to care and know that dvc exps are custom refs which could technically be considered commits, the phrase "commit the experiment" already disambiguates -- it's clear what we mean.

In summary, I see no problem in saying "commit the experiment to Git" as well as "bring experiment to your regular Git workflow" (@iesahin's suggestion) as needed. Both are valid and better depending on the context. Can we go with those?

@dberenbaum
Copy link
Collaborator

"retrieving" is meaningful in this context, we retrieve them from DVC to Git.

Yup, that's what I meant, retrieving them from DVC.

Are we discussing the title of https://dvc.org/doc/user-guide/experiment-management/persisting-experiments? That was my main focus initially. I don't mind any using any of the suggested phrases in the doc, but not sure they make sense in the title. Does "committing experiments" make sense when one of the sections is about bringing experiments to your workspace?

@jorgeorpinel
Copy link
Contributor Author

Yes, the title and replacing the current usage of term "persisting" in general.

Does "committing experiments" make sense when one of the sections is about bringing experiments to your workspace?

For the title I'd go with the concept of "bringing experiments to the regular Git workflow", just not sure what the shortest way of that is.

@iesahin
Copy link
Contributor

iesahin commented Nov 29, 2021

How about "Transferring Experiments (to Git)" as a title?

We can convey the meaning in various phrases, "bringing" is one of them. I think our first goal is that the title sounding natural and correct.

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Dec 2, 2021

How about "Transferring Experiments

I don't think so. 1) the experiments are already in Git (just not the regular workflow) and 2) "transferring" sounds like data management.

Is something wrong with "bringing experiments to the regular Git workflow" now? You suggested that @iesahin (I think)

@iesahin
Copy link
Contributor

iesahin commented Dec 7, 2021

There is nothing wrong about "bringing experiments to the regular Git workflow" but using it as a title is a bit mouthful :)

I'm looking for better ways, this doesn't mean something is wrong with the current. I also don't believe "persisting" is wrong, though there may be better phrases the convey the meaning.

BTW, how is "convey the experiments?" :))

@jorgeorpinel
Copy link
Contributor Author

Got tit. Well if there's nothing wrong with it and we can't come up with a better alternative let's go with that? We can't just brainstorm forever 🙂

"Convey" sounds very foreign IMO. It's typically associated with communication.

@iesahin
Copy link
Contributor

iesahin commented Dec 8, 2021

Well if there's nothing wrong with it and we can't come up with a better alternative let's go with that? We can't just brainstorm forever

That's certainly an option. We can use "persisting" until we unanimously see a better alternative. It's not a problem actually.

@jorgeorpinel
Copy link
Contributor Author

Doesn't make sense to me... We unanimously agreed "persisting" is not good and we want to change it. Let's go with "bringing experiments to the regular Git workflow" for now please.

@shcheklein
Copy link
Member

It feels we got in cycles here and I'm not sure even that everyone on the same page anymore here.

The way I see it - Jorge and Dave don't like persisting and have some alternatives.

Emre - it feels you don't have a strong opinion?


On side note: could we use. I think I'm fine with some like committing experiments or saving experiments to make it shorter.

@iesahin
Copy link
Contributor

iesahin commented Dec 9, 2021

We unanimously agreed "persisting" is not good and we want to change it. Let's go with "bringing experiments to the regular Git workflow" for now please.

I never had strong opinions against "persisting", but even if I had, it doesn't mean we decided unanimously on something. Every one of us maybe against "persisting", but that doesn't mean the alternatives are always better. I've offered maybe 20+ phrases along the way, some of them were worse than the others. "Bringing" is ok with me, I'm not against it, but it feels weird to put this long phrase to the title. "Bringing Experiments" also makes one to ask "from where?", so I don't feel inclined to it.

@shcheklein Yes, I don't have strong opinions, I'm just trying to come up with ideas to see if they sound good.

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Dec 22, 2021

I think we did all agreed to these phrases: "bringing experiments to your regular Git workflow" and "committing experiments".

Going in circles indeed 😅 Hopefully one of us will just take the initiative when possible and make a PR on this.

@iesahin
Copy link
Contributor

iesahin commented Dec 22, 2021

"Committing experiments" to the title, "bringing exps to the regular workflow" to the content. That's the deal it looks :)

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Mar 10, 2022

change ... from Persisting experiments to Committing experiments... I actually find that problematic 😢 IMHO "committing" is more of an implementation detail here. If I were to read/skim these docs and compare to other experiment tracking solutions I would look for (search / grep) the terms "persist" and "track" (=what) and not "commit" (=how)

Originally posted by @omesser in #3319 (comment)

@jorgeorpinel jorgeorpinel added the type: discussion Requires active participation to reach a conclusion. label Mar 10, 2022
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Mar 10, 2022

compare to other experiment tracking solutions I would look for (search / grep) the terms "persist" and "track" (=what)

Interesting @omesser, do you have examples of other tools or docs/articles in the space where "persisting" is used in this way?

In my mind "committing" is not an implementation detail but actually the user's goal (=what): you have a bunch of DVC Experiments (ephemeral by nature) and want to get some of them into the regular Git workflow i.e. you want to (Git) commit them (as discussed above).

@omesser
Copy link
Contributor

omesser commented Mar 10, 2022

@jorgeorpinel - Good point calling me out on mentioning "persist" + "track" as equals here. I see your point doubting my usage of "persist" for experiments. It's not widely used indeed, it's too narrow

When I wrote:

If I were to read/skim these docs and compare to other experiment tracking solutions I would look for (search / grep) the terms "persist" and "track" (=what) and not "commit" (=how) - all the more important when it comes to titles!

..."Persisting" was probably my backend instincts kicking in for understanding where something is saved(persisted, if you will) if I weren't to find the term "track". But, you're right, the correct thing for me to have done would be to search for "track[ing]".
So while I definitely see "persist" used in the way I mentioned (to save to persistency, I stand behind this being familiar 😉 ), it's indeed not used in the context of ML experiments. The conventional term (it's in the name) is experiment tracking.

Let me modify/focus my above claim: what: track, how: by commit to git

"In my mind "committing" is not an implementation detail but actually the user's goal (=what)"

Can this be a little iterative/dvc "biased" though? meaning - your "what" here already assumes git-centric approach which is "unique" to us (user knows they want to commit to git).

What I'm referring to is familiarity/discoverability of people that don't know how this works. Most (all?) other tools for "tracking" experiments don't revolve around git as persistency / tracking mechanism. So, I would say commit to git is our "how" (how we track experiments). The "layman" ds/ml eng wouldn't necessarily know / look for a way to "commit" their experiment but to track their experiments.
Obviously, tracking is wider than just persisting - it can include discoverability / comparability / monitoring / insights...
Committing to git satisfies some of those to some extent, for sure, maybe not all. And that's why I (currently) perceive it as "how" experiments are tracked

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Mar 11, 2022

I had no intention in calling you out @omesser 🙂, just moving the terminology discussion here so the PR could move fwd. And I asked for examples because we looked before and couldn't find many.

Object persistency is definitely a thing in databases indeed! Good example, thanks. But I also wouldn't expect our audience (ML Engineers) to be familiar with that (esp. as a verb). Maybe I'm underestimating reader ability to define from context.

In any case my argument isn't that "to persist" cannot work, but rather that it's sub-optimal. If what we're talking about is "to commit" the experiment, why not say that and avoid risking to confuse some readers?

The conventional term is experiment tracking.

To clarify, we're basically talking about renaming https://dvc.org/doc/user-guide/experiment-management/persisting-experiments . I think the topic is more narrow than experiment tracking as a whole (as you point out as well).
That said we don't really use that key term much, maybe we should... ⌛ (separate concern)

what: track, how: by commit to git

I'd counter with - what= commit DVC experiment to Git, how= dvc exp branch/apply
Keep in mind that all DVC experiments are automatically tracked, but they're not regular Git commits and you can't see or manage them with porcelain git commands or Git platforms (Github).

your "what" here already assumes git-centric approach

I do assume that, since (as you say) GitOps is an intended differentiator of DVC.

familiarity/discoverability of people that don't know how this works

Good question! But probably "to persist" and "to commit" {something} are equally alien to them haha. If there's another better term though, we should consider it!

@jorgeorpinel

This comment was marked as off-topic.

@iesahin
Copy link
Contributor

iesahin commented Mar 12, 2022

I think persist (as @dberenbaum in an earlier discussion said) was a bit larger than what we mean by committing experiments to Git. Persist may also mean to send these to a common repository (dvc exp push), for example. What we're trying to figure out is to convey the meaning for dvc exp apply operation, basically moving Git ref in .git/refs/exps/ABCD123... as a commit from the HEAD.

DVC experiments requires Git to run, so these better reflect that connection too. So, although I understand the concern behind commit, among the candidates I tried to produce above, it looked like the best.

@jorgeorpinel
Copy link
Contributor Author

Persist may also mean to send these to a common repository

Good point but for now we have a separate page for that called Sharing Experiments.

@dberenbaum
Copy link
Collaborator

Let's close this one. Hopefully with coming changes to push for sharing experiments, we can focus less on how to "persist" them locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: guide Content of /doc/user-guide type: discussion Requires active participation to reach a conclusion. type: enhancement Something is not clear, small updates, improvement suggestions
Projects
None yet
Development

No branches or pull requests

5 participants