Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows support for autotvm - Do not merge #4548

Closed
wants to merge 94 commits into from

Conversation

jmorrill
Copy link
Contributor

This PR is not meant to give anyone a heart attack. @soiferj encouraged me to submit this PR so he could take a peek. So please don't code review seriously for merge. Feel free to close if you don't want to look at it :)

Two discuss topics related:
https://discuss.tvm.ai/t/unofficial-autotvm-on-windows-guide/4711 (Google doc has notes on quirks)
https://discuss.tvm.ai/t/added-windows-support-to-c-rpc-server/5007

Currently, there is no support for autotvm in Windows out of the box. Most challenges are related to fork() not being supported, which autotvm code uses extensively for getting multi-core performance in python. Having no fork() means data sent to process pools must be able to be pickled. Also, having no fork() means python pools or subprocesses need to be reused for performance reasons. This was very apparent in the local_executor.py, where starting a new python subprocess w/ python entry point could take almost 1000ms.

To overcome these issues I have opted to use pathos library in some spots which uses dill to serialize. dill can serialize much more than pickle, notably functions.

I've tried to keep the linux behavior the same, but have not tested it. Most of the time I "ifdef"ed the python code with os.name == 'nt' so it was easy to spot.

Notable problems are:

  • Need to fix ipv6 in base.py get_addr_family
  • If IP_ANY (0.0.0.0) was having trouble, so i replaced IP_ANY with 127.0.0.1
  • local_executor.py, timeouts are not supported because a pool is used for perf reasons. Timeouts will work on RPC server side if using the C++ RPC server.
  • I'm new to Python, so things may be able to be expressed better
  • Took some liberties with C++ RPC and main CMakeLists.txt, which may not be appreciated
  • Python RPC server, I restart the python subprocess after n-trials as some cuda kernels cause big leaks and killing the proc is the only way to fix. I suggest the C++ RPC server as its much faster.
  • Possibly many more.

jmorrill and others added 30 commits November 9, 2019 21:50
@FrozenGene
Copy link
Member

I think that's a good idea @FrozenGene . Maybe starting with the CPP server PR first, as it is more contained and less risky?

I can cherry pick to a new PR once the CI verifies my latest changes (resolved some merge conflicts with my branch).

sounds good to me.

@soiferj
Copy link
Contributor

soiferj commented Feb 11, 2020

@jmorrill have you gotten a chance to work on the CPP server PR?

@jmorrill
Copy link
Contributor Author

@jmorrill have you gotten a chance to work on the CPP server PR?

So sorry @soiferj! It's the time of the year where kids bring home sickness.
Anyways, created a PR here.
#4857

jmorrill and others added 20 commits February 17, 2020 14:19
@tqchen tqchen closed this Oct 11, 2020
@tqchen
Copy link
Member

tqchen commented Oct 11, 2020

This PR is superseded by another PR to add rpc server support (into the mainline) Thanks @jmorrill for very insightful investigations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants