-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add semi supervised classification #40
Conversation
@EdenWuyifan Could you please test this dataset using f1 as metric? The target column is I think it will raise this error: |
Sure. I will try it. |
30620ec
to
f526a65
Compare
f526a65
to
f95ec58
Compare
eb2546e
to
c196545
Compare
ace9e1d
to
b812886
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, the code looks good. Just adding a few suggestions for improvement here.
time_bound_run=5, score_sorting='auto', metric_kwargs={'average': 'micro'}, split_strategy_kwargs=None, | ||
start_mode='auto', verbose=False): | ||
""" | ||
Create/instantiate an AutoMLSemiSupervisedClassifier object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Briefly explain what is semi-supervised classification (e.g., what are assumptions on the training data/labels). Maybe point out to sklearn documentation if there is good documentation on this there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is done by Roque on this commit 9f99863.
@roquelopez Do we want to move this script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of having a demo_utils
folder was to group all the code needed to run Alpha-AutoML in the NYU HSRN server. For instance, in values.yaml file, there are some configs specifically for NYU HSRN server. This will not work in other Kubernetes servers. That is why I prefer to have a demo
to a kubernetes
folder. However, if we can add documentation to set up these configs, I think it's ok to have the kubenetes
folder.
Also, why do we need jupyterhub, jupyterlab, and jupyterlab-server in the Dockerfile? If it is a 'generic' Dockerfile only the notebook
dependency is enough. By generic I mean that users can run it in any kind of environment, not exclusively in Kubernetes. I think that jupyterhub, jupyterlab, and jupyterlab-server dependencies were added there because they are needed by the Jupyterhub instance in NYU HSRN. If that is the case, that Dockerfile should not be in the root directory, it should be in a folder specifically for the demo. The Dockerfile in the root should contain only the necessary dependencies to run Alpha-AutoML. Also, I think PipelineProfiler doesn't work in jupyterlab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current Dockerfile in the root folder is not specific to the HSRN cluster. It works on any installation of JupyterHub or in a single machine using docker (it starts the traditional Jupyter UI). This is why there are two 'flavors' of the image being built in the CI scripts (ghcr.io/vida-nyu/alpha-automl
and ghcr.io/vida-nyu/alpha-automl-jupyterhub
).
That said, I don't know why we are installing the dependencies like that with pinned versions. We actually should probably be just extending pre-built images with the latest versions of JupyterHub. In any case, I think all of this should be discussed and changed in a separate PR. I would just revert it for now and we can discuss it the next meeting.
BTW, many people only use JupyterLab, so PipelineProfilter needs to be fixed too. It should be a high priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware of ghcr.io/vida-nyu/alpha-automl
and ghcr.io/vida-nyu/alpha-automl-jupyterhub
. I agree, let's discuss this in another PR.
Also, why is this moving the |
Not blocking merging this PR, but found another detail that should be addressed. I run the example notebook and noted that the pipeline summary shown in |
66798e3
to
15974a3
Compare
@EdenWuyifan feel free to merge it. |
Fix #22.