Skip to content
This repository has been archived by the owner on Sep 26, 2021. It is now read-only.

New Driver: Vultr #958

Closed
wants to merge 1 commit into from
Closed

New Driver: Vultr #958

wants to merge 1 commit into from

Conversation

janeczku
Copy link
Contributor

@janeczku janeczku commented Apr 4, 2015

Vultr driver updated to the new driver specs. Replaces PR #538 by @JamesClonk.

  • Conforms with new driver specs
  • Added flags for ipv6, automatic backups and private networking
  • Added pre-create validations for regionID, planID and API credentials
  • Added bats integration tests
  • Added unit tests
  • Added driver documentation
  • Added client request throttling to avoid hitting API rate-limit

bats-test-result

Signed-off-by: Jan Broer [email protected]

@hairyhenderson
Copy link
Contributor

@janeczku - I'd like to try this out, can you rebase it please? :)

@janeczku
Copy link
Contributor Author

@hairyhenderson Can do - probably tomorrow.

@janeczku
Copy link
Contributor Author

Rebased the PR. Please test to break :) @hairyhenderson

@hairyhenderson
Copy link
Contributor

Thanks @janeczku!

MachineID string
MachineName string
PublicIP string
PrivateIP string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an IPAddress property too? It should be the same as PublicIP, or PrivateIP if PrivateNetworking is true. (see #1043)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood PrivateNetworking :)... So, can you either add IPAddress which is in sync with PublicIP, or can you rename PublicIP to IPAddress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its best to just rename PublicIP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@hairyhenderson
Copy link
Contributor

Ok, so here's an issue, but I don't know if it's in this code or if has to do with some of the recent SSH changes:

$ machine create -d vultr --vultr-api-key XXX dm-vultr
Creating SSH key...
Creating Vultr virtual machine...
Error dialing TCP: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
Error dialing TCP: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
Error dialing TCP: ssh: handshake failed: ssh: unable to authenticate, attempted methods [publickey none], no supported methods remain
Error dialing TCP: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
Error dialing TCP: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
... (repeat a billion times)
To see how to connect Docker to this machine, run: machine env dm-vultr
$

It appears that the resulting VM is working fine, so it might be nothing.

/cc @nathanleclaire @ehazlett @sthulb - I vaguely recall something recently that might be related?

}

if d.PrivateIP == "" || d.PrivateIP == "0" {
d.PrivateIP = "N/A"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this is N/A instead of empty string? The convention in other drivers is that it gets set to ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed, it should be an empty string if the IP is not set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hairyhenderson i've seen this when you use an invalid username.

EDIT: sorry this was meant for the Error dialing TCP: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain error.

@hairyhenderson
Copy link
Contributor

@janeczku - looks like the start command is asynchronous. Can you change Start() (and probably Stop() too) so that it waits until the machine is reachable before returning? I think most other drivers behave this way.

$ machine -D stop dm-vultr
command=stop machine=dm-vultr
$ machine -D start dm-vultr
command=start machine=dm-vultr
$ machine ls
NAME         ACTIVE   DRIVER         STATE     URL                        SWARM
dm-vultr     *        vultr          Running   tcp://45.63.7.71:2376      
$ machine -D ssh dm-vultr
dial tcp 45.63.7.71:22: connection refused
$ machine -D ssh dm-vultr
dial tcp 45.63.7.71:22: connection refused
...
$ machine -D ssh dm-vultr
Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-52-generic x86_64)
...

cli.BoolFlag{
EnvVar: "VULTR_IPV6",
Name: "vultr-ipv6",
Usage: "enable ipv6 for droplet",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Usage: "Enable IPv6 for VPS",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@hairyhenderson
Copy link
Contributor

@janeczku - overall, this is looking pretty good. Might be nice too to add a bit more info-level logging, especially when long operations (during create for example) are happening.

@janeczku
Copy link
Contributor Author

@hairyhenderson thanks for your valuable observations!

looks like the start command is asynchronous. Can you change Start() (and probably Stop() too) so that it waits until the machine is reachable before returning? I think most other drivers behave this way.

They should :) But at least the DO driver respectively its API package has asynchronous start/stop commands as well. I will implement this in the Vultr driver and since they share most code we can then easily add this change to the DO driver as well.
The drivers start/stop commands are in fact synchronous. The API request doesn't return until the machine is powered-on. Of course SSH is not available until the machine is fully booted (couple of secs). I suspect this is a limitation for the other drivers as well (They will report success as soon as the machine is powered-on and not when its fully booted).

Ok, so here's an issue, but I don't know if it's in this code or if has to do with some of the recent SSH changes: Error dialing TCP: ssh: handshake failed ....

This error occurs when the SSHKey has not yet been provisioned to the new instance (but the SSHd is already running). So this is expected behavior. The provisioning does succeed after all.

Might be nice too to add a bit more info-level logging, especially when long operations (during create for example) are happening.

Can do. See also PR #1216 which would make this unnecessary.

@janeczku
Copy link
Contributor Author

Merged reviewers suggestions, rebased again and squashed.

@hairyhenderson
Copy link
Contributor

@janeczku looks like you'll need to rebase again

@janeczku
Copy link
Contributor Author

janeczku commented Jun 6, 2015

Hey @hairyhenderson, would you take a final look at the patch and (hopefully) give your OK? I rebased again and added a tiny bit of code making RancherOS the default OS for new instances.

/cc @ibuildthecloud @imikushin
Test builds / Free trial for VULTR

}
return nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hairyhenderson FYI, these two functions have been added after you last took a look (to allow booting RancherOS).

@hairyhenderson
Copy link
Contributor

Hey @hairyhenderson, would you take a final look at the patch and (hopefully) give your OK? I rebased again and added a tiny bit of code making RancherOS the default OS for new instances.

Yep! I'll have a look. Might need to wait until Monday or Tuesday, but I'll get to this :)

Test builds / Free trial for VULTR

Thanks for the build! And, too bad the free credit is only for new users ;)

@janeczku
Copy link
Contributor Author

janeczku commented Jun 6, 2015

@hairyhenderson Thank you!

// RancherOS - Create iPXE boot script
func (d *Driver) createBootScript() error {
content := `#!ipxe
set base-url https://releases.rancher.com/os/latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would https: work w/ iPXE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imikushin Not in every environment. HTTPS support has to be specifically enabled when compiling the iPXE firmware. So for Vultr HTTPS works, for other iPXE environments one might need to use plain HTTP.

@janeczku
Copy link
Contributor Author

ping @hairyhenderson 😏

@hairyhenderson
Copy link
Contributor

@janeczku - hey, sorry, I've been super-busy with a new job and a new baby (born yesterday!) so it might be a little while before I can take a good look at this... In the meantime, looks like there's some merge conflicts again? I'll try to take a look in the next few days :)

@janeczku
Copy link
Contributor Author

hey, sorry, I've been super-busy with a new job and a new baby (born yesterday!)

@hairyhenderson, thanks for the feedback and congrats!

EnvVar: "VULTR_OS",
Name: "vultr-os-id",
Usage: "Vultr operating system ID. Default: 159 (RancherOS)",
Value: 159,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... While RancherOS is definitely cool, and is considered "supported" from docker-machine's point of view, the default for remote drivers (like Vultr) is Ubuntu (see https://github.com/docker/machine/blob/master/docs/index.md#specify-a-base-operating-systems). So IMO you should change the default to Ubuntu (probably 14.04 - i.e. 160), just to keep in line with what everyone expects and assumes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I disagree, but...
RancherOS can be configured to look like Ubuntu :)
It's just a matter of what console container to use. Currently, RancherOS
does use ubuntu-console by default in cloud deployments.

Just my 2 cents.

On Tue, Jun 23, 2015, 23:41 Dave Henderson [email protected] wrote:

In drivers/vultr/vultr.go
#958 (comment):

  • })
    +}

+// GetCreateFlags registers the flags this driver adds to "docker hosts create"
+func GetCreateFlags() []cli.Flag {

  • return []cli.Flag{
  •   cli.StringFlag{
    
  •       EnvVar: "VULTR_API_KEY",
    
  •       Name:   "vultr-api-key",
    
  •       Usage:  "Vultr API key",
    
  •   },
    
  •   cli.IntFlag{
    
  •       EnvVar: "VULTR_OS",
    
  •       Name:   "vultr-os-id",
    
  •       Usage:  "Vultr operating system ID. Default: 159 (RancherOS)",
    
  •       Value:  159,
    

So... While RancherOS is definitely cool, and is considered "supported"
from docker-machine's point of view, the default for remote drivers (like
Vultr) is Ubuntu (see
https://github.com/docker/machine/blob/master/docs/index.md#specify-a-base-operating-systems).
So IMO you should change the default to Ubuntu (probably 14.04 - i.e. 160),
just to keep in line with what everyone expects and assumes.


Reply to this email directly or view it on GitHub
https://github.com/docker/machine/pull/958/files#r33076225.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, sure. But even though it might look like Ubuntu, it's not Ubuntu ;) I really think all of the remote drivers need to have the same default image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hairyhenderson I have changed the default OS back to Ubuntu. Since the iPXE deployment of RancherOS is the better choice for Vultr (deploying Ubuntu takes up to 5 minutes) i have added instructions to the docs concerning the possibility to use RancherOS as base OS.

/cc @imikushin

@hairyhenderson
Copy link
Contributor

@janeczku - so far this looks pretty good! Just the one thing about the default for --vultr-os-id, then IMO this is ready to merge :)

/cc @ehazlett @nathanleclaire

@janeczku
Copy link
Contributor Author

janeczku commented Jul 5, 2015

@ehazlett @nathanleclaire you wanna chime in? i'd be happy to finally see this merged :)

@djmaze
Copy link

djmaze commented Jul 18, 2015

Thanks for your work @janeczku! I just tried out this fork with my Vultr account. It does not matter if I choose RancherOS or Ubuntu, docker-machine create always fails like this:

Validating Vultr VPS parameters...
Creating Vultr VPS...
Waiting for IP address to become available...
Error creating machine: json: cannot unmarshal string into Go value of type float64
You will want to check the provider to make sure the machine and associated resources were properly removed.

Looking at my Vultr console, the servers have been created successfully. Also, I am able to login using the SSH keys created by docker-machine (in ~/.docker/machine/...).

Any idea what's going wrong here? How to debug this?

@janeczku
Copy link
Contributor Author

@djmaze thanks for your feedback. This is caused by recent changes to Vultr's API protocol. It is already fixed in the Vultr API client and i will push it to the machine driver later today.

@janeczku
Copy link
Contributor Author

@djmaze this is now fixed. You can try the dev builds available here: https://github.com/janeczku/machine/releases/tag/v0.4.0-dev
Please report back if you find any other issues!

@djmaze
Copy link

djmaze commented Jul 19, 2015

@janeczku Great, it works now. Thanks for the update!

@djmaze
Copy link

djmaze commented Jul 19, 2015

After creating multiple machines on Vulr, docker-machine ls gives the following:

error getting state for host docker3: Rate limit reached - please try your request again later.  Current rate limit: 1 request/sec

It seems rate limiting does not work as needed.

@janeczku
Copy link
Contributor Author

@djmaze Thanks for your feedback. The rate limit error is due to the ls command querying the state of the hosts asynchronously (dispatching a goroutine for every host). The driver has no way to throttle these concurrent queries. This has to be addressed upstream in the libmachine code: https://github.com/docker/machine/blob/master/libmachine/host.go#L381

@djmaze
Copy link

djmaze commented Jul 20, 2015

Excuse my limited Go knowledge. The tokenbucket library uses a mutex. Shouldn't this work across goroutines already?

Ah I see, there is one driver instance per machine, so one bucket per host. That's unfortunate ;/

@janeczku
Copy link
Contributor Author

@djmaze You got that right. Vultr really needs to change their 1/sec rate limit policy to something more reasonable like x/minute or x/hour.

@djmaze
Copy link

djmaze commented Jul 20, 2015

So forgive me if this is stupid, but I just made the token bucket global:

diff --git a/drivers/vultr/vultr.go b/drivers/vultr/vultr.go
index 11249b7..98a3b04 100644
--- a/drivers/vultr/vultr.go
+++ b/drivers/vultr/vultr.go
@@ -96,10 +96,12 @@ func GetCreateFlags() []cli.Flag {
        }
 }

+var vultrTokenBucket = tokenbucket.NewBucket(1*time.Second, 1)
+
 func NewDriver(machineName string, storePath string, caCert string, privateKey string) (drivers.Driver, error) {
        d := &Driver{MachineName: machineName, storePath: storePath, CaCertPath: caCert, PrivateKeyPath: privateKey, bucket: nil}
        // throttle API queries according to 1/s rate-limit
-       d.bucket = tokenbucket.NewBucket(1*time.Second, 1)
+       d.bucket = vultrTokenBucket
        return d, nil
 }

This makes the ls command work for me. The tockenbucket code itself should be safe from parallel use in goroutines because it uses mutex and channels, right?

Of course we can do better and initialize the bucket on first use instead (using a mutex). Apart from that, is there anything wrong with this change?

@janeczku
Copy link
Contributor Author

Thats interesting! I would not have expected a single tokenbucket to work across multiple goroutines. It was my believe that when you have multiple receivers (<-d.bucket.SpendToken(1)) on the same channel, they would all unblock once there is a single transmit on that channel and therefore breaking the throttling mechanism. It turns out thats not the case (at least with unbuffered channels). There is only one active receiver at a time, the other ones are "waiting in line").
Again, great find 👍

@janeczku
Copy link
Contributor Author

Confirmed working 1req/s API query throttling.

Janek$ ./docker-machine_darwin-amd64 -D ls
GetState() timestamp: 2015-07-21 00:45:38.064903604 +0200 CEST
GetState() timestamp: 2015-07-21 00:45:39.06484465 +0200 CEST
GetState() timestamp: 2015-07-21 00:45:40.064840547 +0200 CEST
GetState() timestamp: 2015-07-21 00:45:41.06476442 +0200 CEST

Co-Authored-By: Fabio Berchtold <[email protected]>
Signed-off-by: Jan Broer <[email protected]>
@djmaze
Copy link

djmaze commented Jul 21, 2015

Unfortunately, now I am not able to create machines based on RancherOS with the Vultr driver. This is the output when using --vulr-os-id=159:

Validating Vultr VPS parameters...
Creating Vultr VPS...
Deploying RancherOS
Error creating machine: json: cannot unmarshal number into Go value of type string

With the default Ubuntu OS the installation works.

It seems the API has changed again and Vultr is now returning SCRIPTID as a number instead of a string.

@djmaze
Copy link

djmaze commented Jul 21, 2015

Got it working again by making SCRIPTID an int and adjusting the rest of the code accordingly. So again, this has to be fixed upstream first.

@janeczku
Copy link
Contributor Author

@djmaze Pushed fix to the Vultr API client repo. It was not enough to change the SCRIPTID field to integer since the API continues to return this field as a string in other calls. See here: https://github.com/JamesClonk/vultr/pull/7/files

@ehazlett
Copy link
Contributor

Hi , thanks for your efforts and persistence in submitting this driver. We are extremely excited that there is so much interest in Docker Machine and we really appreciate your interest. However, at this time it is proving to be extremely difficult for us to keep up with reviewing and testing each of these drivers for inclusion in the Machine core. We really want to switch to a more pluggable model, as well as polish up a few things about the driver model which need to be changed to ensure a smooth and sustainable future.

Therefore, we will be moving to a plugin model for 0.5 and would love to have you involved in the design and development process. We are closing the outstanding driver PRs at this time, but please keep the code. We will stick closely to the current driver interface and you should be able to re-use a lot (if not all) of the existing driver along with the new plugin model. We will be moving all of the drivers which are merged directly into Machine today to the plugin model when it is available, so there will be no special treatment of those, and there will be documentation outlining the process of developing and using a Docker Machine driver plugin.

With all of that being said, we want to apologize for the lack of feedback on your pull request. As contributors ourselves, we understand that being left in limbo is no fun. We would have liked to address this sooner, and in the future we will be more responsive around these kinds of issues.

Once again, we thank you for the contribution and the tremendous support. Keep hacking strong!

If you want to contribute to the design of the plugin model, we'd love to get your input on this issue where we will be planning it:

#1626

@ehazlett & @nathanleclaire

@ehazlett ehazlett closed this Jul 31, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants