MachineTeaching

Our project was to conduct the interactive multi-class machine teaching of images through triplets. We first collected data from the crowd, where we showed users triplets of images and had users identify which images were similar. For each dataset, we collected ~6000 triplets using Amazon Mturk. Then, we used the triplets to create a t-Stochastic Triplet Embedding (tSTE) for the images.

The next step after obtaining the embedding was to actually teach through triplets. We noted the fact that the tSTE gave us a probabilisitc model representing the probability of each triplet under the embedding (p_ijk). We further realized that instead of optimizing over the kernel, we could directly optimize the tSTE objective function in an online manner through stochastic gradient descent. This led to the development of four different selection strategies for determining the next triplet to show to users:

Random- Simply show a random triplet to the user.
Most Uncertainty- Show the triplet with current probability closest to 0.5.
Best Gradient Increase- Take a gradient step assuming the user identifies the triplet correctly. Next, take a gradient step assuming the user identifies the triplet incorrectly. Let p be the probability of the triplet before the gradient steps, and let p1, p2 be the probabilities after gradient step assuming the user gets the triplet correct/incorrect. So, select the triplet that maximizes p * p1 + (1-p) * p2 - p.
Best Gradient Increase Random Sample- Similar to 3, except recognize the fact that a gradient step affects not only the current triplet, but also O(n^2) other triplets. So, for each triplet randomly sample from all other triplets that are affected by the gradient step and average across these, selecting the triplet the maximizes p * p1_avg + (1-p) * p2_avg - p_avg.

For strategies 2-4, it is too computationaly intensive to evalute all possible triplets in the online scenerio. So, for each strategy we randomly sampled and scored as many triplets as possible given the online setting (<1.5s time limit).

We launched a session on Amazon Mechanical Turk to evaluate our aglorithms. The hit consisted of a teaching phase and a testing phase. In the teaching phase, we randomly assigned each user to a selection strategy, and presented a series of triplets using that strategy. We displayed both the triplet and the class labels, and told users whether they identified the triplet correctly/incorrectly. In the testing phase, we only showed single images to the users and asked users to select which class the image belongs to. We ran the experiment on a total of ~120 users per dataset.

The test results of the different selection strategies for the China dataset can be found in china_dataset_results.png. We will follow up with results on the Seabed dataset as well.

In order to create the teaching interface, we used Flask & SqlAlchemy for our backend and hosted on Pythonanywhere. We developed the front end using HTML/CSS & Javascript. We took advantage of "sessions" to personlize the teaching for each user. Since we designed the ML algorithms to take ~1.5s per click, we needed to heavily optimize parts of the code to make our application responsive for real-time users. Furthermore, we had to scale our solution with multiple processors and randomized load-balancing to handle the load from concurrent users while also maintaining user sessions. These steps were necessary since we ultimately launched on hundreds of users from Amazon MTurk, many of whom participated in the hit concurrently.

Name		Name	Last commit message	Last commit date
Latest commit History 597 Commits
analysis		analysis
analysis_seabed		analysis_seabed
analysis_triplet		analysis_triplet
cython_tste		cython_tste
kernel visualization		kernel visualization
kernel_visualization		kernel_visualization
results		results
sim		sim
static		static
templates		templates
.DS_Store		.DS_Store
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
X_initial.npy		X_initial.npy
X_initial_2.npy		X_initial_2.npy
X_initial_oct.npy		X_initial_oct.npy
X_initial_seabed_2.npy		X_initial_seabed_2.npy
analysis_chinese.py		analysis_chinese.py
china_dataset_results.png		china_dataset_results.png
chinese_triplet_scores.npy		chinese_triplet_scores.npy
database.py		database.py
diff_rohan.npy		diff_rohan.npy
flask_app_bird.py		flask_app_bird.py
flask_app_crowd.py		flask_app_crowd.py
flask_app_oct.py		flask_app_oct.py
flask_app_teaching.py		flask_app_teaching.py
flask_app_teaching_bird.py		flask_app_teaching_bird.py
flask_app_test.py		flask_app_test.py
flask_combined.py		flask_combined.py
generate_triplets.py		generate_triplets.py
kernel_target_analysis.png		kernel_target_analysis.png
login_app.py		login_app.py
mturk.py		mturk.py
page_model.py		page_model.py
page_model_bird.py		page_model_bird.py
page_model_no_db.py		page_model_no_db.py
page_model_oct.py		page_model_oct.py
page_model_seabed.py		page_model_seabed.py
requirements.txt		requirements.txt
triplets_chinese_chars.txt		triplets_chinese_chars.txt
triplets_seabed.txt		triplets_seabed.txt
triplets_test.py		triplets_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MachineTeaching

About

Releases

Packages

Contributors 4

Languages

rohbat/MachineTeaching

Folders and files

Latest commit

History

Repository files navigation

MachineTeaching

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages