Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement masking to control how embedded points are updated #620

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

matthieuheitz
Copy link
Contributor

No description provided.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.7%) to 87.245% when pulling c35db5f on matthieuheitz:fixedpoints into f86c922 on lmcinnes:master.

Copy link
Owner

@lmcinnes lmcinnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. I would like to consider different names than pin_mask. It is a small thing, and naming is hard (I don't have a particularly good alternative suggestion), but good names help users a lot. My best suggestion at the moment is inertia or sample_inertia (similar to sample_weight used in other sklearn models). Of course we would need then then pass through 1.0 - inertia but that seems tractable. I would welcome your thoughts however.

@@ -2671,7 +2727,7 @@ def fit_transform(self, X, y=None):
r_emb: array, shape (n_samples)
Local radii of data points in the embedding (log-transformed).
"""
self.fit(X, y)
self.fit(X, y, pin_mask)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to make this a keyword arg in case we get more fit parameters later (such as a init fit param).

Suggested change
self.fit(X, y, pin_mask)
self.fit(X, y, pink_mask=pin_mask)

@lmcinnes
Copy link
Owner

Note that the remaining test failures are due to how coveralls plays with azure, so you can safely ignore them.

Another note: I may start a new 0.6dev branch and target this at that so we can merge it in sooner and work on it more easily without messing with the main branch. Let me know if you think that would be a good idea.

@jondo
Copy link
Contributor

jondo commented Mar 30, 2021

I have now tested this, and I found out that I additionally needed to block this rescaling such that my points (which have coordinates slightly outside [0, 10]) stay fixed.

@jondo
Copy link
Contributor

jondo commented Mar 30, 2021

Also, I would like to base my pinned initial embedding on the spectral embedding, and I suggest this change to get it.

Update: The suggested change was merged into master 👍

@lmcinnes
Copy link
Owner

Matthieu raised the rescaling issue with me elsewhere. It is a little tricky as the actual init does need to land in a reasonable spot, or the resulting embedding can go very badly. Leaving the rescaling in ensured that we had a sensible starting point. Otherwise there is the question of whether we leave it to the user -- it is not hard to accidentally provide a bad initialization that produces unexpected results and is hard to diagnose as to what is going wrong. I was hoping to avoid that if possible. Perhaps a reasonable option would be to come up with some semi-reasonable checks and warn if the provided initialization is troublesome?

@jondo
Copy link
Contributor

jondo commented Mar 31, 2021

I also think that warning instead of rescaling is the way to go. Perhaps "one of the coordinate ranges is outside [8, 12] (i.e. more than 20 % off)" is a semi-reasonable condition? This assumes that the layout optimization is independent of absolute embedding location.

@kruus
Copy link

kruus commented May 4, 2021

A colleague working on an interactive visualization tool had an interesting scenario. A user could (say) inspect and then drag certain points to left/right and "pin" the x-axis of the dragged points while leaving other axes of the dragged points adaptable.

Supporting also a 2-D nsamples x dim pin_mask seems to describe the capability he was looking for.
In this case, we want the behavior of gradient weights other_mask and current_mask to change from scalars into vector gradient multipliers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants