Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Clojure Contrib Package #11205

Merged
merged 37 commits into from
Jul 1, 2018
Merged

Clojure Contrib Package #11205

merged 37 commits into from
Jul 1, 2018

Conversation

gigasquid
Copy link
Member

@gigasquid gigasquid commented Jun 8, 2018

Description

This adds a Clojure package for MXNet under a contrib package

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • [ X] Changes are complete (i.e. I finished coding on this PR)
  • [ X] All changes have test coverage:
  • [ X] Code is well-documented:
  • [ X] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

This is the initial commit of the Clojure package in as a contrib project. There will be a period of stabilization, feedback, and contribution. After that, it may graduate to out of contrib.

The initial package has been tested by the community in a separate repo prior to this PR https://github.com/gigasquid/clojure-mxnet

Here is the original issue:
#8971

Architecture & Design:
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Clojure

Current List of Package Completion and Contribution Needs
https://cwiki.apache.org/confluence/display/MXNET/Clojure+Package+Contribution+Needs

Comments

There have been areas of improvement noted from feedback on the mailing list.

  • Increased test coverage
  • Benchmarks

In addition there will be a need to integrate with the Makefile and CI.

I would like to propose that this initial PR be merged as it represents a working, community tested development milestone and then add on the improvements as separate PRs to allow smaller commits, better visibility, and community involvement.

@gigasquid
Copy link
Member Author

@nswamy This is ready for review/feedback

@lanking520
Copy link
Member

lanking520 commented Jun 8, 2018

Hi Carin, is there anyway I can test it? I am pretty new to Clojure
Found tutorial in the PR, my bad, will test it

@kurman-zz
Copy link

Do you think you can split examples from cde0ee3 into a separate commit to make the review process easier?

@gigasquid
Copy link
Member Author

@kurman sure - I will rewrite the commits and and push tomorrow (wrapping up today 😄 )

@gigasquid
Copy link
Member Author

@kurman - I rewrote the commits into 2 parts - one with the main package and tests and another with the examples. Please let me know if I can do anything else to make the review process easier.

Clean up dependencies in namespaces to make it 
compatible with the plugin
@kurman-zz
Copy link

@gigasquid thanks for splitting the commit, I am already finding that it is easer going through the PR.

@kovasb
Copy link

kovasb commented Jun 14, 2018

Thanks for this contribution!

In terms of code quality this looks very good, getting a lot done in a straightforward manner. People should feel good shipping this if/when test coverage is satisfactory.

My only code question is: Where do the corner cases tend to lie? Whats the most complicated part where bugs might be lurking?

Some higher level questions:

  • What is the debug experience like? It seems like errors can come from 3 places (mxnet core, scala, or clojure), is this a problem in practice? Can users track down what they did wrong?

  • What does maintenance look like? As MXNet & Scala API evolve, how much work does this codebase need to do to keep up? What changes are picked up automatically via introspection, vs requiring changing the codebase?

  • How do people learn to use the thing? Is it easy to go between the clojure functions and existing MXNet docs/tutorials? How is the documentation going to work long term?

For a v1 the answers to a lot of these are probably uncertain, but to the extent the code/design already has an opinion/constraint that might be useful to know.

@gigasquid
Copy link
Member Author

gigasquid commented Jun 15, 2018

@kovasb Thanks so much for taking the time to review and providing such insightful feedback and questions.

As far as corner cases go, I feel pretty confident about the interop with the Scala api. I would say the risk for complications would be lurking in the areas that I haven't fully ported over yet and explored. I ran into some trouble with the protocols for the custom operators with class loading and deferred working on it more to focus on some other areas. I'm hoping it was just some silly thing that I wasn't doing properly because in theory, if it's a Scala/Java class - it can all be interoped with, but I won't be 100% sure until I dive in again.

You have some excellent higher level questions:

  • The debug experience is an area that could use improvement. I've added it as an item on the https://cwiki.apache.org/confluence/display/MXNET/Clojure+Package+Contribution+Needs page. The regular JVM stack traces that occur in clojure point to the errors in the Scala or Clojure class and can be pinned down pretty readily and are not bad. The more frustrating ones that I have encountered is when there is a core dump, (for example binding some layers with a bad shape), in the mxnet-core. You can still find the line where the error occurred in the underlying Scala/Clojure code, but it is not easy to navigate. From what I understand, I believe that this might be the same case in the current Scala package as well, so I'm thinking that working together we could make it better for both packages if we could tackle this.

  • I think the story for maintenance is pretty good because of leveraging the Scala package. The Clojure package also has the flexibility to evolve at its own rate by deciding when to move to the next Scala version or wrap a new Scala api. I think the philosophy in MXNet is to not make breaking changes is very good and will enable stability. The mxnet core api changes for NDArray and Symbol flow through the Scala package via the c code generators. The clojure package picks these changes up from the Scala classes by running its own generator, (which is a simple lein command). Changes to the higher level Scala apis will also be picked up or course, while any new apis would need to be added and incorporated manually.

  • The Clojure apis have been developed with the goal of being intuitive from the existing apis while also being idiomatic to use from the Clojure perspective. I've been able to use existing tutorials from both Python and Scala and translate them to Clojure. Here is an example:

From the Python tutorial https://mxnet.incubator.apache.org/tutorials/basic/symbol.html you have:

net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=128)
net = mx.sym.Activation(data=net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(data=net, name='fc2', num_hidden=10)
net = mx.sym.SoftmaxOutput(data=net, name='out')

Translation to clojure follows pretty directly from this:

(def data (sym/variable "data"))
(def fc1 (sym/fully-connected "fc1" {:data data :num-hidden 128}))
(def act1 (sym/activation "act1" {:data fc1 :act-type "relu"}))
(def fc2 (sym/fully-connected "fc2" {:data act1 :num-hidden 64}))
(def net (sym/softmax-output "out" {:data fc2}))

But also can be written in a more idiomatic, threading way using the as-> clojure function

(as-> (sym/variable "data") data
  (sym/fully-connected "fc1" {:data data :num-hidden 128})
  (sym/activation "act1" {:data data :act-type "relu"})
  (sym/fully-connected "fc2" {:data data :num-hidden 64})
  (sym/softmax-output "out" {:data data}))

As far as documentation goes, the design choice has been made to generate the code from the symbol and ndarray apis and persist that to files rather than read into memory as macros so that it enable automated API documentation with Codox. Going forward, I envision a section of the MXNet webpage with tutorials and API documentation similar to the other language bindings to also help the user getting started.

# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add "set -e" to this script?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn’t that just below set -evx ?

### Use Prebuilt Jars
There are deployed jars on Clojars for each supported system

* `[org.apache.clojure-mxnet/clojure-mxnet-linux-gpu "0.1.1-SNAPSHOT"]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is building these packages? How is the version number related to the mxnet version?

Can we add instructions to build the package as well? Something like when you build a scala package in the local ivy repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. The section was taken from the README in the external git repo that I had for testing. I had built and deployed those jars manually and pushed them to clojars.

I'm not sure how the build will happen within the project when it gets merged long term. We will need some discussion on that. For the short term, I can continue to build the jars manually and push them to clojars for general users to consume and give feedback on.

I definitely can document the process that I use to do the build and deploy 👍

# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set -e

# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set -e

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution!

@gigasquid
Copy link
Member Author

Thanks for taking the time to review @larroy 😸 I'll get to work on incorporating your feedback.

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package testing LGTM except the OpenCV dependency. Can you comment on the OpenCV part?

chmod 777 lein
sudo cp lein /usr/local/bin
sudo apt install libcurl3
sudo add-apt-repository ppa:timsc/opencv-3.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the opencv version that is in ubuntu too old? Is not clear why we are adding this to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@gigasquid gigasquid Jun 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - that is the issue I'm looking into. It appears that the version of the opencv that was used to build the scala jars for nexus is different than the opencv required by caffe. I'm looking into changing the unit test from the clojure package to run off of the ones generated by make scalapkg and make scalainstall instead, (which is most likely a better way to go anyway).

Using the scala jars from nexus on ubuntu uses this version
sudo apt install -y libopencv-imgcodecs3.4
which clashes with the one required in caffe build libopencv-dev

@gigasquid gigasquid requested a review from szha as a code owner June 27, 2018 02:14
@gigasquid
Copy link
Member Author

@larroy The CI Clojure unit tests now pass 😸
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11205/22/pipeline

jenkins___incubator-mxnet___pr-11205____22


## Introduction

MXNet is a first class, modern deep learning library that AWS has officially picked as its chosen library. It supports multiple languages on a first class basis and is incubating as an Apache project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should AWS be mentioned at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by the +1 - I'm interpreting that it's ok to leave the AWS in the text, but if I understood that wrong, please let me know :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, please remove all mentions of AWS.

# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn’t that just below set -evx ?

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a rough pass and everything LGTM, thanks a lot everybody!

(callback/invoke cb i batch-num eval-metric))
(.dispose batch)
(inc batch-num)))
(println "Epoch " i " Train-" (eval-metric/get eval-metric))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it a better practice to use logging? in Scala we often use logback for logging, so you can configure logging for these classes independently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - excellent point. I put an item to improve that in https://cwiki.apache.org/confluence/display/MXNET/Clojure+Package+Contribution+Needs

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I added a comment WRT println vs logging. But this should not block the PR. Amazing Job! thanks for the contribution!

Copy link
Member

@lanking520 lanking520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gigasquid for your contribution and I can't wait to see Clojure go into MXNet 😊

`cd ~/mxnet`


`git checkout tags/1.2.0 -b release-1.2.0`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the package requires a fixed version of MXNet as 1.2 or it can be updated accordingly with mxnet?


**Why build on the Scala package?**

The motivation section addresses this, but the main reason is high leverage is using the great work that the Scala package has already done.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


The motivation section addresses this, but the main reason is high leverage is using the great work that the Scala package has already done.

**How can I tell if the gpu is being used?**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also be shown in the message that CUDA is finding a best algorithm... As long as a Context.gpu() passed in the code as a context, GPU should be used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I updated the README

### Deferred
* Feed Forward API
* OSX gpu support Scala - defer to adding via Scala first
* CustomOp port - defer due to class loader issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also working on improving this part, please let me know your issues, see if I can help


There are a few steps to get up and running with the dependencies on Ubuntu.

Please see this issue for a handy guide: [Running on Ubuntu](https://github.com/gigasquid/clojure-mxnet/issues/2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any dependencies required for Scala, refer to this one. I think Aman is happy with the solution

The dataset can be obtained here: [https://github.com/yoonkim/CNN_sentence](https://github.com/yoonkim/CNN_sentence). The two files `rt-polarity.neg`
and `rt-polarity.pos` must be put in a directory. For example, `data/mr-data/rt-polarity.neg`.

You also must download the glove word embeddings. The suggested one to use is the smaller 50 dimension one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I cannot reproduce the same accuracy that Python have. There may be changes on the Scala Training code.

- remove AWS reference
- put in system specific build instructions with jars
- clean up
@gigasquid
Copy link
Member Author

Thanks @lanking520 for the review :) - In regards to the CNN text classification, the example at the default is running on a limited dataset (to fit in my laptop memory). There is an item on the Needs Contribution Help page about taking another look at that example with Word2Vec. I added to compare it to the python example results as well.

Copy link
Member

@yzhliu yzhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more curious about the gen/ndarray(symbol).clj. @gigasquid Did you hand-write all these functions, or you have a way to auto-generate it?


Run `make scalapkg` then `make scalainstall`

then replace the correct jar for your architecture in the project.clj, example `[ml.dmlc.mxnet/mxnet-full_2.11-osx-x86_64-cpu "1.3.0-SNAPSHOT"]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch - will update





Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my understanding, you're listing all the operators below. Can Clojure auto-generate these functions? or can we have a function to allow users call by function name? e.g., NDArray.call("foo", arg1, arg2, ...,).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, Clojure generated these by the write file method. I saw @gigasquid uploads the 3000+ lines of code on Symbol and NDArray in the Gen folder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yzhliu @lanking520 That is correct. We are autogenerating the code in the gen directories using the /dev/generator.clj code. This code gets invoking using a command line lein generate-code and produces these files by inspecting the classes created by Scala with java reflection. Where the scala package creates these classes using macros directly into memory without persisting, the clojure package chose to persist them to files to help the user discover what functions are available, generate api documentation, and to be able to see what new functions were added via a git diff from release to release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we generate the source code file during compiling, instead of checking in to the repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is definitely a possibility to investigate. There are pros and cons to doing it manually vs compile time. For example, doing it as a manual step gives you a way to control the flow of changes into the project and a git diff. Where, doing it at compile time gives the advantage of not having to worry about a manual step. I'm definitely open to changing this aspect of it, especially if the manual generate-code step becomes a burden. I added it to the https://cwiki.apache.org/confluence/display/MXNET/Clojure+Package+Contribution+Needs as feature to further investigate. Thanks for the feedback :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently on Scala side, @nswamy proposed a solution: Generate the signature of the file and documentation rather than the entire implementation. Maybe Clojure can follow the similar step to do. We can put this as a feature in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I added the detail of this to the Clojure Package Contribution Needs - thanks :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gigasquid I am also concerned that we are merging generated code, we do not want to merge massive amounts of generated code -- this is also a moving target as and when new operators are added or modified. The proposal to only have the interface/documentation also helps us to have a config driven implementation. Do you think you can prioritize this before 1.3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nswamy Of course. I can work on that next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks so much :), you rock!





Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as that in ndarray.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same process as described in the ndarray.

@gigasquid
Copy link
Member Author

Thanks @yzhliu for your review. If you have any other questions, please let me know 😄

@szha
Copy link
Member

szha commented Jun 30, 2018

@gigasquid and team, congrats! It's been a long way since @gigasquid pitched the idea of having a clojure frontend for mxnet since NIPS last year, and I'm thrilled to see this becoming the reality now. This effort opens up mxnet to the clojure enthusiasts, and I hope that we will attract more people from this community and see more interesting work coming soon :)

@larroy
Copy link
Contributor

larroy commented Jul 1, 2018

Let's merge this in unless somebody has additional comments.

@szha szha merged commit 8917318 into apache:master Jul 1, 2018
[t6/from-scala "0.3.0"]

;; Jars from Nexus
;[org.apache.mxnet/mxnet-full_2.11-osx-x86_64-cpu "1.2.1"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if Clojure always depended on Maven jars, would the clojure-jars be always 1 release behind Scala? or are you consuming from the source build from the master?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI build is taking it from master when the jars are installed locally with make scalainstall. I kept the maven jars lines in there as well commented out so that it is easy for a user to switch back and forth.

XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
* Initial commit for clojure package - src code

* examples for clojure package

* Add cloverage code coverage plugin.

Clean up dependencies in namespaces to make it 
compatible with the plugin

* add bug fix from Kamil Hryniewicz

* add basic optimizer test

* add Kamil to thanks

* add Christian Weilbach

* add set -e to scripts

* added documentation about the current process of building and deploying

* unit tests for util

* add tests for generator
- Also fix parameter names to lowercase

* make the test-validate use a spec in util-test

* update to 1.2.1 scala version

* add generator test fix & enhancements from Burin

* add test for eval-metric

* add callback unit test

* switch project default to linux-cpu for CI
- add thanks for Avram

* remove reference to the clojars snapshot

* @kurman feedback newline/scripts
- add newline to gitignore
- add -vx options to scripts

* feedback from @kurman - use scala instead of $

* move gitignore back into clojure-package

* feedback from @kurman - lowercase testing.md

* feedback from @kurman use generate code alias for better clarity
Supported APIs

* feedback from @kurman - cnn text example

* Add clojure package testing to CI

* bug fix from jimdunn - skip validation in fit when eval-data is nil

* Change CI to use locally installed scala jars instead of nexus

* clean ci & bump clojure-package version to be inline with scala pacakge version
-remove makefile modification and use script instead

* update cat image link since the other one is broken

* change cat link to one in the repo

* Update README
- remove AWS reference
- put in system specific build instructions with jars
- clean up

* missing minus sign in example - contributed by jimdunn

* feedback from @lanking520

* feedback from @lanking520 - specify that branch checkout is optional

* feedback from @yzhliu - update README ml.dmlc -> org.apache
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.