In http://blog.defunct.ca/2011/07/22/moving-nova-compute-to-a-separate-instance/, I was able to successfully move nova-compute to a separate instance. The only problem here is that the nova-compute instance used nova-network running on the controller, which introduced a single point of failure in our environment. If the controller dropped offline, the gateway for virtual machines running on the compute node would be inaccessible, meaning instances would not be able to access the outside world until the controller came back online.
Fortunately, some improvements have been made to Nova as outlined in http://unchainyourbrain.com/openstack/13-networking-in-nova. Essentially, we can now run a nova-network on each compute node, which forces the compute node to be the gateway for instances running on it. This means there’s no longer that dependency between the controller (or whatever runs nova-network) and virtual machines running on the compute node.
To move to this configuration, I had to run the following on the compute nodes:
# apt-get install nova-network
I then had to add the following configurations to the /etc/nova/nova.conf file on the compute nodes:
--ec2_dmz_host=192.168.0.1 --multi_host
Specifying –ec2_dmz_host=192.168.0.1 causes this iptables rule to get added:
Chain nova-network-PREROUTING (1 references)
pkts bytes target prot opt in out source destination
0 0 DNAT tcp -- any any anywhere 169.254.169.254 tcp dpt:www to:192.168.0.1:8773… and this allows cloud-init on the Ubuntu instances to grab whatever it is they’re grabbing from the EC2 API running on the controller. When the Ubuntu instances boot but can’t hit the EC2 API (I have 192.168.0.1 assigned to my controller, which runs the EC2 API), cloud-init seems to spin forever and the instances never really seems to boot. If you uninstall cloud-init, the instances will boot, but configuration does not appear to be complete (ie. missing ssh keys in /etc/ssh/). I tried using my controller’s public IP or the controller’s 10.176.65.54 address, but neither seemed to work. The latter is understandable as the instance will not be able to hit 10.176.65.54 since it’s not attached to that network, but it was my understanding that it should be able to hit the external IP.
Anyway, I also removed this from /etc/nova/nova.conf on the compute nodes as we no longer have to route through the controller:
--routing_source_ip=x.x.x.x
For good measure:
# /etc/init.d/nova-compute restart # /etc/init.d/nova-network restart
Finally, I deleted my 192.168.0.0/24 on the controller and re-created it:
nova-manage network create --fixed_range_v4=192.168.0.0/24 --num_networks=1 --network_size=256 --multi_host=T --label=test
The key above is specifying the –multi_host=T.
This was more or less it. Now when an instance is first started on a compute node, the compute node itself gets an IP assigned from the network above and that IP gets assigned to the bridge br100. The instances on the host are then configured to use that IP as their gateway and traffic no longer gets routed through the controller.
One thing I noticed while working on this configuration was that my previous VPN connection didn’t permit multiple clients. As such, I had to move my VPN server/clients to use tls-server and tls-client, which required a bit more work (see this for more info).
My openvpn.server file:
mode server tls-server dev tap ifconfig 192.168.0.1 255.255.0.0 cert /etc/openvpn/controller.crt key /etc/openvpn/controller.key dh /usr/share/doc/openvpn/examples/easy-rsa/2.0/keys/dh1024.pem ca /usr/share/doc/openvpn/examples/easy-rsa/2.0/keys/ca.crt daemon
… and openvpn.client for compute1:
tls-client remote 10.176.65.54 dev tap cert /etc/openvpn/compute1.crt key /etc/openvpn/compute1.key ca /etc/openvpn/ca.crt daemon keepalive 10 60 up /etc/openvpn/openvpn.up up-restart script-security 2
The /etc/openvpn/openvpn.up file contains:
#!/bin/bash /sbin/ifconfig tap0 0.0.0.0 up /usr/sbin/brctl addif br100 tap0 echo 0
Unlike our original configuration, br100 is IPd automatically by nova-network, so we no longer need to set an IP when openvpn starts on the clients. However, if the controller node (which subsequently runs the openvpn server) restarts, our clients cannot ping the 192.168.0.1 address even after the server comes back online. By adding the keepalive and up/up-restart entries to the openvpn.client file, we can force openvpn to get HUPd if the connection drops (or the server reboots).
There’s still a bit of magic happening here, but hopefully I’ve captured enough of this configuration to reconstruct this setup if necessary.