Convert to absolute paths in wordrank #1503

parulsethi · 2017-07-24T17:08:49Z

Converted relative paths to absolute for every wordrank command.

menshikh-iv · 2017-07-24T18:22:46Z

gensim/models/wrappers/wordrank.py

@@ -118,14 +117,14 @@ def train(cls, wr_path, corpus_file, out_name, size=100, window=15, symmetric=1,
                    utils.check_output(w, args=command, stdin=r)

        logger.info("Deleting frequencies from vocab file")
-        with smart_open(vocab_file, 'wb') as w:
+        with smart_open(join(meta_dir, vocab_file), 'wb') as w:


Please move join to definition of vocab_file (line 91) and same changes for all smart_open arguments

piskvorky

Minor code style suggestions.

piskvorky · 2017-07-25T02:35:11Z

gensim/models/wrappers/wordrank.py

        # prepare training data (cooccurrence matrix and vocab)
-        model_dir = os.path.join(wr_path, out_name)
-        meta_dir = os.path.join(model_dir, 'meta')
+        model_dir = join(wr_path, out_name)


Using full namespace os.path.join is preferable.

There are many joins in Python and its various libraries, and the context makes the code immediately easier to read and understand for other readers.

piskvorky · 2017-07-25T02:37:08Z

gensim/models/wrappers/wordrank.py

        cmd_del_vocab_freq = ['cut', '-d', " ", '-f', '1', temp_vocab_file]

        commands = [cmd_vocab_count, cmd_cooccurence_count, cmd_shuffle_cooccurences]
-        input_fnames = [corpus_file.split('/')[-1], corpus_file.split('/')[-1], cooccurrence_file]
+        input_fnames = [join(meta_dir, corpus_file.split('/')[-1]), join(meta_dir, corpus_file.split('/')[-1]), cooccurrence_file]


string.split('/') is not portable -- see os.path.split, os.path.basename etc.

piskvorky · 2017-07-25T02:37:55Z

gensim/models/wrappers/wordrank.py

@@ -126,7 +125,7 @@ def train(cls, wr_path, corpus_file, out_name, size=100, window=15, symmetric=1,
        with smart_open(cooccurrence_shuf_file, 'rb') as f:
            numlines = sum(1 for line in f)
        with smart_open(meta_file, 'wb') as f:
-            meta_info = "{0} {1}\n{2} {3}\n{4} {5}".format(numwords, numwords, numlines, cooccurrence_shuf_file, numwords, vocab_file)
+            meta_info = "{0} {1}\n{2} {3}\n{4} {5}".format(numwords, numwords, numlines, cooccurrence_shuf_file.split('/')[-1], numwords, vocab_file.split('/')[-1])


Dtto on split.

Elsewhere in the file (and in gensim) the standard C-style %s %d %f string formatting is used; best to keep it consistent here as well.

@piskvorky formatting with {}.format more preferable for Python now. I think we should use format method instead of C-style formatting.

kept {}.format for now

piskvorky

Minor code style comments.

piskvorky · 2017-07-26T13:02:41Z

gensim/models/wrappers/wordrank.py

        cmd_del_vocab_freq = ['cut', '-d', " ", '-f', '1', temp_vocab_file]

        commands = [cmd_vocab_count, cmd_cooccurence_count, cmd_shuffle_cooccurences]
-        input_fnames = [join(meta_dir, corpus_file.split('/')[-1]), join(meta_dir, corpus_file.split('/')[-1]), cooccurrence_file]
+        input_fnames = [os.path.join(meta_dir, os.path.split(corpus_file)[-1]), os.path.join(meta_dir, os.path.split(corpus_file)[-1]), cooccurrence_file]


This line is a little hard to navigate -- any way to restructure the logic to make it more readable? Maybe factor out some of the arguments into separate lines?

piskvorky · 2017-07-26T13:05:05Z

gensim/models/wrappers/wordrank.py

        os.makedirs(meta_dir)
        logger.info("Dumped data will be stored in '%s'", model_dir)
-        copyfile(corpus_file, join(meta_dir, corpus_file.split('/')[-1]))
+        copyfile(corpus_file, os.path.join(meta_dir, corpus_file.split('/')[-1]))


Isn't os.path.split()[-1] simply os.path.basename()?

parulsethi added 2 commits July 24, 2017 22:32

convert to absolute paths for every command

5802bdb

use sorted in directory structure test

53158e1

menshikh-iv reviewed Jul 24, 2017

View reviewed changes

move join() to var definition

7c13de9

piskvorky requested changes Jul 25, 2017

View reviewed changes

parulsethi added 2 commits July 25, 2017 15:31

made requested changes

7345023

change gensim pin to develop in dockerfile

5bbe888

menshikh-iv merged commit 7a9e98e into piskvorky:develop Jul 25, 2017

piskvorky reviewed Jul 26, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert to absolute paths in wordrank #1503

Convert to absolute paths in wordrank #1503

parulsethi commented Jul 24, 2017

menshikh-iv Jul 24, 2017 •

edited

Loading

parulsethi Jul 24, 2017

piskvorky left a comment

piskvorky Jul 25, 2017 •

edited

Loading

parulsethi Jul 25, 2017

piskvorky Jul 25, 2017 •

edited

Loading

parulsethi Jul 25, 2017

piskvorky Jul 25, 2017

menshikh-iv Jul 25, 2017 •

edited

Loading

parulsethi Jul 25, 2017

piskvorky left a comment

piskvorky Jul 26, 2017

piskvorky Jul 26, 2017

Convert to absolute paths in wordrank #1503

Convert to absolute paths in wordrank #1503

Conversation

parulsethi commented Jul 24, 2017

menshikh-iv Jul 24, 2017 • edited Loading

Choose a reason for hiding this comment

parulsethi Jul 24, 2017

Choose a reason for hiding this comment

piskvorky left a comment

Choose a reason for hiding this comment

piskvorky Jul 25, 2017 • edited Loading

Choose a reason for hiding this comment

parulsethi Jul 25, 2017

Choose a reason for hiding this comment

piskvorky Jul 25, 2017 • edited Loading

Choose a reason for hiding this comment

parulsethi Jul 25, 2017

Choose a reason for hiding this comment

piskvorky Jul 25, 2017

Choose a reason for hiding this comment

menshikh-iv Jul 25, 2017 • edited Loading

Choose a reason for hiding this comment

parulsethi Jul 25, 2017

Choose a reason for hiding this comment

piskvorky left a comment

Choose a reason for hiding this comment

piskvorky Jul 26, 2017

Choose a reason for hiding this comment

piskvorky Jul 26, 2017

Choose a reason for hiding this comment

menshikh-iv Jul 24, 2017 •

edited

Loading

piskvorky Jul 25, 2017 •

edited

Loading

piskvorky Jul 25, 2017 •

edited

Loading

menshikh-iv Jul 25, 2017 •

edited

Loading