Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add wordrank in dockerfile #1460

Merged
merged 8 commits into from
Jul 19, 2017
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 22 additions & 4 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ FROM ubuntu:16.04

MAINTAINER Parul Sethi <[email protected]>

ENV GENSIM_REPOSITORY https://github.com/RaRe-Technologies/gensim.git
ENV GENSIM_VERSION bd6db9a41baf219ecc4a1770cc21b01c8ff122e5
ENV GENSIM_REPOSITORY https://github.com/parulsethi/gensim.git
ENV GENSIM_VERSION add_wordrank_in_docker

# Installs python, pip and setup tools (with fixed versions)
RUN apt-get update \
Expand Down Expand Up @@ -47,6 +47,7 @@ RUN pip2 install \
matplotlib==2.0.0 \
nltk==3.2.2 \
pandas==0.19.2 \
spacy==1.8.1 \
git+https://github.com/mila-udem/blocks.git@7beb788f1fcfc78d56c59a5edf9b4e8d98f8d7d9 \
-r https://raw.githubusercontent.com/mila-udem/blocks/stable/requirements.txt

Expand All @@ -56,13 +57,18 @@ RUN pip3 install \
matplotlib==2.0.0 \
nltk==3.2.2 \
pandas==0.19.2 \
spacy==1.8.1 \
git+https://github.com/mila-udem/blocks.git@7beb788f1fcfc78d56c59a5edf9b4e8d98f8d7d9 \
-r https://raw.githubusercontent.com/mila-udem/blocks/stable/requirements.txt

# avoid using old numpy version installed by blocks requirements
RUN pip2 install -U numpy
RUN pip3 install -U numpy

# Download english model of Spacy
RUN python2 -m spacy download en
RUN python3 -m spacy download en

# Download gensim from Github
RUN git clone $GENSIM_REPOSITORY \
&& cd /gensim \
Expand All @@ -76,12 +82,14 @@ RUN git clone $GENSIM_REPOSITORY \
RUN mkdir /gensim/gensim_dependencies

# Set ENV variables for wrappers
ENV WR_HOME /gensim/gensim_dependencies/wordrank
ENV FT_HOME /gensim/gensim_dependencies/fastText
ENV MALLET_HOME /gensim/gensim_dependencies/mallet
ENV DTM_PATH /gensim/gensim_dependencies/dtm/dtm/main
ENV VOWPAL_WABBIT_PATH /gensim/gensim_dependencies/vowpal_wabbit/vowpalwabbit/vw

# For fixed version downloads of gensim wrappers dependencies
# For fixed version downloads of gensim wrappers dependencies
ENV WORDRANK_VERSION 44f3f7786f76c79c083dfad9d64e20bacfb4a0b0
ENV FASTTEXT_VERSION f24a781021862f0e475a5fb9c55b7c1cec3b6e2e
ENV MORPHOLOGICALPRIORSFORWORDEMBEDDINGS_VERSION ec2e37a3bcb8bd7b56b75b043c47076bc5decf22
ENV DTM_VERSION 67139e6f526b2bc33aef56dc36176a1b8b210056
Expand All @@ -90,7 +98,17 @@ ENV VOWPAL_WABBIT_VERSION 69ecc2847fa0c876c6e0557af409f386f0ced59a

# Install custom dependencies

# TODO: Install wordrank (need to install mpich/openmpi with multithreading enabled)
# Install mpich (a wordrank dependency) and remove openmpi to avoid mpirun conflict
RUN apt-get purge -y openmpi-common openmpi-bin libopenmpi1.10
RUN apt-get install -y mpich

# Install wordrank
RUN cd /gensim/gensim_dependencies \
&& git clone https://bitbucket.org/shihaoji/wordrank \
&& cp /gensim/docker/wordrank_install.sh /gensim/gensim_dependencies/wordrank/install.sh \
&& cd /gensim/gensim_dependencies/wordrank \
&& git checkout $WORDRANK_VERSION \
&& sh ./install.sh

# Install fastText
RUN cd /gensim/gensim_dependencies \
Expand Down
20 changes: 20 additions & 0 deletions docker/wordrank_install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this script from PR. You already download it with the wordrank repo. After it, use awk or sed to replace #export CC=icc CXX=icc to export CC=gcc CXX=g++


printf "1. clean up workspace\n"
./clean.sh

printf "\n2. install glove to construct cooccurrence matrix\n"
wget http://nlp.stanford.edu/software/GloVe-1.0.tar.gz # if failed, check http://nlp.stanford.edu/projects/glove/ for the original version
tar -xvzf GloVe-1.0.tar.gz; rm GloVe-1.0.tar.gz
patch -p0 -i glove.patch
cd glove; make clean all; cd ..

printf "\n3. install hyperwords for evaluation\n"
hg clone -r 56 https://bitbucket.org/omerlevy/hyperwords
patch -p0 -i hyperwords.patch

printf "\n4. build wordrank\n"
#export CC=icc CXX=icpc
export CC=gcc CXX=g++ # uncomment this line if you don't have an Intel compiler, but with gcc all #pragma simd are ignored as of now
cmake .
make clean all