pip uninstall broken on Debian Squeeze

May 9th, 2012

I noticed that I was unable to uninstall python packages via pip (0.7.2-1) on a Debian Squeeze instance:

root@diamondbuilder:~# pip freeze | grep swift
root@diamondbuilder:~# pip install swift
Downloading/unpacking swift
  Downloading swift-1.4.8.tar.gz (421Kb): 421Kb downloaded
  Running setup.py egg_info for package swift
Installing collected packages: swift
  Running setup.py install for swift
    changing mode of build/scripts-2.6/swift from 644 to 755
    changing mode of build/scripts-2.6/swift-account-audit from 644 to 755
    changing mode of build/scripts-2.6/swift-account-auditor from 644 to 755
    changing mode of build/scripts-2.6/swift-account-reaper from 644 to 755
    changing mode of build/scripts-2.6/swift-account-replicator from 644 to 755
    changing mode of build/scripts-2.6/swift-account-server from 644 to 755
    changing mode of build/scripts-2.6/swift-bench from 644 to 755
    changing mode of build/scripts-2.6/swift-container-auditor from 644 to 755
    changing mode of build/scripts-2.6/swift-container-replicator from 644 to 755
    changing mode of build/scripts-2.6/swift-container-server from 644 to 755
    changing mode of build/scripts-2.6/swift-container-sync from 644 to 755
    changing mode of build/scripts-2.6/swift-container-updater from 644 to 755
    changing mode of build/scripts-2.6/swift-dispersion-populate from 644 to 755
    changing mode of build/scripts-2.6/swift-dispersion-report from 644 to 755
    changing mode of build/scripts-2.6/swift-drive-audit from 644 to 755
    changing mode of build/scripts-2.6/swift-form-signature from 644 to 755
    changing mode of build/scripts-2.6/swift-get-nodes from 644 to 755
    changing mode of build/scripts-2.6/swift-init from 644 to 755
    changing mode of build/scripts-2.6/swift-object-auditor from 644 to 755
    changing mode of build/scripts-2.6/swift-object-expirer from 644 to 755
    changing mode of build/scripts-2.6/swift-object-info from 644 to 755
    changing mode of build/scripts-2.6/swift-object-replicator from 644 to 755
    changing mode of build/scripts-2.6/swift-object-server from 644 to 755
    changing mode of build/scripts-2.6/swift-object-updater from 644 to 755
    changing mode of build/scripts-2.6/swift-oldies from 644 to 755
    changing mode of build/scripts-2.6/swift-orphans from 644 to 755
    changing mode of build/scripts-2.6/swift-proxy-server from 644 to 755
    changing mode of build/scripts-2.6/swift-recon from 644 to 755
    changing mode of build/scripts-2.6/swift-recon-cron from 644 to 755
    changing mode of build/scripts-2.6/swift-ring-builder from 644 to 755
    changing mode of build/scripts-2.6/swift-temp-url from 644 to 755
    changing mode of /usr/local/bin/swift-account-audit to 755
    changing mode of /usr/local/bin/swift-object-expirer to 755
    changing mode of /usr/local/bin/swift-proxy-server to 755
    changing mode of /usr/local/bin/swift-container-replicator to 755
    changing mode of /usr/local/bin/swift-container-sync to 755
    changing mode of /usr/local/bin/swift-orphans to 755
    changing mode of /usr/local/bin/swift-get-nodes to 755
    changing mode of /usr/local/bin/swift-drive-audit to 755
    changing mode of /usr/local/bin/swift-dispersion-populate to 755
    changing mode of /usr/local/bin/swift-account-reaper to 755
    changing mode of /usr/local/bin/swift-object-replicator to 755
    changing mode of /usr/local/bin/swift-init to 755
    changing mode of /usr/local/bin/swift-dispersion-report to 755
    changing mode of /usr/local/bin/swift-oldies to 755
    changing mode of /usr/local/bin/swift-ring-builder to 755
    changing mode of /usr/local/bin/swift-form-signature to 755
    changing mode of /usr/local/bin/swift-container-server to 755
    changing mode of /usr/local/bin/swift-container-updater to 755
    changing mode of /usr/local/bin/swift-bench to 755
    changing mode of /usr/local/bin/swift-object-info to 755
    changing mode of /usr/local/bin/swift-recon to 755
    changing mode of /usr/local/bin/swift to 755
    changing mode of /usr/local/bin/swift-object-server to 755
    changing mode of /usr/local/bin/swift-object-updater to 755
    changing mode of /usr/local/bin/swift-container-auditor to 755
    changing mode of /usr/local/bin/swift-account-auditor to 755
    changing mode of /usr/local/bin/swift-account-server to 755
    changing mode of /usr/local/bin/swift-recon-cron to 755
    changing mode of /usr/local/bin/swift-object-auditor to 755
    changing mode of /usr/local/bin/swift-account-replicator to 755
    changing mode of /usr/local/bin/swift-temp-url to 755
Successfully installed swift
Cleaning up...
root@diamondbuilder:~# pip freeze | grep swift
swift==1.4.8
root@diamondbuilder:~# pip uninstall swift
Uninstalling swift:
Proceed (y/n)? y
  Successfully uninstalled swift
root@diamondbuilder:~# pip freeze | grep swift
swift==1.4.8
root@diamondbuilder:~#

Fortunately, I found this and this, which indicates the issue lies w/ Debian’s python-setuptools (0.6.14-4). Anyway, using that info, I did:

root@diamondbuilder:~# cd /usr/local/lib/python2.6/dist-packages
root@diamondbuilder:/usr/local/lib/python2.6/dist-packages# mv swift-1.4.8.egg-info/ swift-1.4.8-py2.6.egg-info/
root@diamondbuilder:/usr/local/lib/python2.6/dist-packages# cd -
/root
root@diamondbuilder:~# pip uninstall swift
Uninstalling swift:
  /usr/local/bin/swift
  /usr/local/bin/swift-account-audit
  /usr/local/bin/swift-account-auditor
  /usr/local/bin/swift-account-reaper
  /usr/local/bin/swift-account-replicator
  /usr/local/bin/swift-account-server
  /usr/local/bin/swift-bench
  /usr/local/bin/swift-container-auditor
  /usr/local/bin/swift-container-replicator
  /usr/local/bin/swift-container-server
  /usr/local/bin/swift-container-sync
  /usr/local/bin/swift-container-updater
  /usr/local/bin/swift-dispersion-populate
  /usr/local/bin/swift-dispersion-report
  /usr/local/bin/swift-drive-audit
  /usr/local/bin/swift-form-signature
  /usr/local/bin/swift-get-nodes
  /usr/local/bin/swift-init
  /usr/local/bin/swift-object-auditor
  /usr/local/bin/swift-object-expirer
  /usr/local/bin/swift-object-info
  /usr/local/bin/swift-object-replicator
  /usr/local/bin/swift-object-server
  /usr/local/bin/swift-object-updater
  /usr/local/bin/swift-oldies
  /usr/local/bin/swift-orphans
  /usr/local/bin/swift-proxy-server
  /usr/local/bin/swift-recon
  /usr/local/bin/swift-recon-cron
  /usr/local/bin/swift-ring-builder
  /usr/local/bin/swift-temp-url
  /usr/local/lib/python2.6/dist-packages/swift
  /usr/local/lib/python2.6/dist-packages/swift-1.4.8-py2.6.egg-info
  /usr/local/lib/python2.6/dist-packages/test
Proceed (y/n)? y
  Successfully uninstalled swift
root@diamondbuilder:~# pip freeze | grep swift
root@diamondbuilder:~#

I suppose I could have just cycled through /usr/local/lib/python2.6/dist-packages/swift-1.4.8.egg-info/installed-files.txt, removing files based off that, but I’m not familiar enough w/ python / pip to know if anything further would have been required.

Nagios servicedependency

May 9th, 2012

I’ve got a series of nodes which have 3 SNMP-based checks configured on them. I wanted 2 of these checks to depend on 1, so that we only get a single alert if snmpd goes down or if the node drops offline. To do this, I created a servicedependency like so:

define servicedependency{
        host_name                       nodeX
        service_description             disk_usage
        dependent_service_description   load_avg_5m, swap_usage
        execution_failure_criteria      n
        notification_failure_criteria   u
}

As you can see, load_avg_5m and swap_usage depend on disk_usage.

With this configuration in place, I noticed that I was sometimes getting a notification for load_avg_5m, swap_usage and then disk_usage, since this was the order that these services were being checked. Unfortunately, I couldn’t find a way to configure the order in which the services were checked, but fortunately did find the following here:

“*One important thing to note is that by default, Nagios will use the most current hard state of the service(s) that is/are being depended upon when it does the dependeny checks. If you want Nagios to use the most current state of the services (regardless of whether its a soft or hard state), enable the soft_state_dependencies option.”

So, in my case, the current HARD state of disk_usage wasn’t UNKNOWN when the other checks failed, and therefore these services failed first and sent notifications. I’ve since set soft_state_dependencies=1 in /etc/nagios3/nagios.conf, and hope that this helps in reducing the amount of unnecessary notifications I get when there’s a node or snmpd outage.

Fast and dirty install of Graphite on Debian Squeeze

January 23rd, 2012

Please note that this is not a production-ready installation document! As the title suggests, this is a fast and dirty installation of Graphite for testing, and assumes you’re installing on a new virtual machine dedicated to Graphite. This will break stuff if you run on an existing server!

These installation instructions are basically the steps from http://graphite.wikidot.com/installation, with a few minor adjustments to work on Debian Squeeze.

First things first:

# cd /root
# apt-get update
# apt-get install bzr

Once bzr’s installed, we can:

# bzr branch lp:graphite

Install Whisper:

# cd graphite/whisper
# python setup.py install

At the time of writing, Whisper can be installed from apt-get on Squeeze, but the version doesn’t match what we’ve pulled Launchpad.

Install Carbon:

# cd ../carbon
# python setup.py install

Now we copy some sample configurations into place:

# cd /opt/graphite/conf
# cp carbon.conf.example carbon.conf
# cp storage-schemas.conf.example storage-schemas.conf

Graphite depends on a number of other packages, and I’ve made every attempt to grab stuff from stock apt repos rather than building (unnecessarily) from source. To see what’s missing, run:

cd /root/graphite
python check-dependencies.py

This should return something like this:

# python check-dependencies.py
[FATAL] Unable to import the 'cairo' module, do you have pycairo installed for python 2.6.6?
[FATAL] Unable to import the 'django' module, do you have Django installed for python 2.6.6?
[FATAL] Unable to import the 'tagging' module, do you have django-tagging installed for python 2.6.6?
[WARNING] Unable to import Interface from zope.interface.
Without it, you will be unable to run carbon on this server.
[WARNING] Unable to import the 'mod_python' module, do you have mod_python installed for python 2.6.6?
mod_python is one of the most common ways to run graphite-web under apache.
Without mod_python you will still be able to use the built in development server; which is not
recommended for production use.
wsgi or other approaches for production scale use are also possible without mod_python
[WARNING]
Unable to import the 'memcache' module, do you have python-memcached installed for python 2.6.6?
This feature is not required but greatly improves performance.
 
[WARNING]
Unable to import the 'ldap' module, do you have python-ldap installed for python 2.6.6?
Without python-ldap, you will not be able to use LDAP authentication in the graphite webapp.
 
[WARNING]
Unable to import the 'twisted' package, do you have Twisted installed for python 2.6.6?
Without Twisted, you cannot run carbon on this server.
[WARNING]
Unable to import the 'txamqp' module, this is required if you want to use AMQP.
Note that txamqp requires python 2.5 or greater.
3 necessary dependencies not met. Graphite will not function until these dependencies are fulfilled.
6 optional dependencies not met. Please consider the warning messages before proceeding.

Now, to get this stuff installed:

# apt-get install python-cairo
# apt-get install python-django-tagging
# apt-get install python-twisted
# apt-get install python-memcache
# apt-get install libapache2-mod-wsgi

I don’t know enough about mod_python (which we’re supposed to install), but the sample vhost configuration below refers to mod_wsgi, so I installed that instead.

Now we configure Apache and modify the sample vhost configuration file provided:

# rm /etc/apache2/sites-enabled/000-default
# cp -a examples/example-graphite-vhost.conf /etc/apache2/sites-enabled/graphite
# cp -a conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi
# sed -i s%"@DJANGO_ROOT@/contrib/admin/media/"%"/usr/share/pyshared/django/contrib/admin/media/"% /etc/apache2/sites-enabled/graphite
# sed -i 's%WSGISocketPrefix /etc/httpd/wsgi/%WSGISocketPrefix /var/run/apache2/wsgi%' /etc/apache2/sites-enabled/graphite

Once done, go ahead and restart Apache:

# /etc/init.d/apache2 reload

Finally:

# cd /opt/graphite/webapp/graphite
# python manage.py syncdb
# chown -R www-data:www-data /opt/graphite/storage/
# cd /opt/graphite/
# ./bin/carbon-cache.py start

If that’s all worked, you should be able to pump data into Graphite:

# echo "carbon.installation.test $RANDOM `date +%s`" | nc -w 1 localhost 2003

On the above, if I don’t pass -w 1 to nc, nc just sits there doing nothing.

Now, browsing your server’s http://x.x.x.x/ should load up the Graphite app and hopefully you can see a graph for the data you’ve sent in.

Using nova-network’s multi_host to remove SPOF

September 19th, 2011

In http://blog.defunct.ca/2011/07/22/moving-nova-compute-to-a-separate-instance/, I was able to successfully move nova-compute to a separate instance. The only problem here is that the nova-compute instance used nova-network running on the controller, which introduced a single point of failure in our environment. If the controller dropped offline, the gateway for virtual machines running on the compute node would be inaccessible, meaning instances would not be able to access the outside world until the controller came back online.

Fortunately, some improvements have been made to Nova as outlined in http://unchainyourbrain.com/openstack/13-networking-in-nova. Essentially, we can now run a nova-network on each compute node, which forces the compute node to be the gateway for instances running on it. This means there’s no longer that dependency between the controller (or whatever runs nova-network) and virtual machines running on the compute node.

To move to this configuration, I had to run the following on the compute nodes:

# apt-get install nova-network

I then had to add the following configurations to the /etc/nova/nova.conf file on the compute nodes:

--ec2_dmz_host=192.168.0.1
--multi_host

Specifying –ec2_dmz_host=192.168.0.1 causes this iptables rule to get added:

Chain nova-network-PREROUTING (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DNAT       tcp  --  any    any     anywhere             169.254.169.254     tcp dpt:www to:192.168.0.1:8773

… and this allows cloud-init on the Ubuntu instances to grab whatever it is they’re grabbing from the EC2 API running on the controller. When the Ubuntu instances boot but can’t hit the EC2 API (I have 192.168.0.1 assigned to my controller, which runs the EC2 API), cloud-init seems to spin forever and the instances never really seems to boot. If you uninstall cloud-init, the instances will boot, but configuration does not appear to be complete (ie. missing ssh keys in /etc/ssh/). I tried using my controller’s public IP or the controller’s 10.176.65.54 address, but neither seemed to work. The latter is understandable as the instance will not be able to hit 10.176.65.54 since it’s not attached to that network, but it was my understanding that it should be able to hit the external IP.

Anyway, I also removed this from /etc/nova/nova.conf on the compute nodes as we no longer have to route through the controller:

--routing_source_ip=x.x.x.x

For good measure:

# /etc/init.d/nova-compute restart
# /etc/init.d/nova-network restart

Finally, I deleted my 192.168.0.0/24 on the controller and re-created it:

nova-manage network create --fixed_range_v4=192.168.0.0/24 --num_networks=1 --network_size=256 --multi_host=T --label=test

The key above is specifying the –multi_host=T.

This was more or less it. Now when an instance is first started on a compute node, the compute node itself gets an IP assigned from the network above and that IP gets assigned to the bridge br100. The instances on the host are then configured to use that IP as their gateway and traffic no longer gets routed through the controller.

One thing I noticed while working on this configuration was that my previous VPN connection didn’t permit multiple clients. As such, I had to move my VPN server/clients to use tls-server and tls-client, which required a bit more work (see this for more info).

My openvpn.server file:

mode server
tls-server
dev tap
ifconfig 192.168.0.1 255.255.0.0
cert /etc/openvpn/controller.crt
key /etc/openvpn/controller.key
dh /usr/share/doc/openvpn/examples/easy-rsa/2.0/keys/dh1024.pem
ca /usr/share/doc/openvpn/examples/easy-rsa/2.0/keys/ca.crt
daemon

… and openvpn.client for compute1:

tls-client
remote 10.176.65.54
dev tap
cert /etc/openvpn/compute1.crt
key /etc/openvpn/compute1.key
ca /etc/openvpn/ca.crt
daemon
keepalive 10 60
up /etc/openvpn/openvpn.up
up-restart
script-security 2

The /etc/openvpn/openvpn.up file contains:

#!/bin/bash
 
/sbin/ifconfig tap0 0.0.0.0 up
/usr/sbin/brctl addif br100 tap0
echo 0

Unlike our original configuration, br100 is IPd automatically by nova-network, so we no longer need to set an IP when openvpn starts on the clients. However, if the controller node (which subsequently runs the openvpn server) restarts, our clients cannot ping the 192.168.0.1 address even after the server comes back online. By adding the keepalive and up/up-restart entries to the openvpn.client file, we can force openvpn to get HUPd if the connection drops (or the server reboots).

There’s still a bit of magic happening here, but hopefully I’ve captured enough of this configuration to reconstruct this setup if necessary.

Unable to console into Ubuntu 10.04 QEMU image

September 18th, 2011

… turns out the image was missing /etc/init/ttyS0.conf:

 
# ttyS0 - getty
#
# This service maintains a getty on ttyS0 from the point the system is
# started until it is shut down again.
 
start on stopped rc or RUNLEVEL=[2345]
stop on runlevel [!2345]
 
respawn
exec /sbin/getty -8 38400 ttyS0 vt102

Once I dropped that in there, I was able to console into the image.

Programmatically interfacing with novaclient

September 16th, 2011

Guessing most people are aware of this, but documenting for my own knowledge. :P

root@nova-cc:~# python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from novaclient.v1_0 import client
>>> client = client.Client(USERNAME, API_KEY,PROJECT_ID [, AUTH_URL])
>>> client.servers.list()
[<Server: Server 40>, <Server: Server 41>, <Server: Server 42>, <Server: Server 43>, <Server: Server 44>, <Server: Server 45>, <Server: Server 46>]
>>> for server in client.servers.list():
...     server.delete()
... 
>>> client.servers.list()
[]
>>>

Running openstack-dashboard

August 2nd, 2011

The latest version of openstack-dashboard requires Keystone, and as I understand it this isn’t supported by the version of Nova I’m running (2011.3~d2-0ubuntu0~ppa1~natty1). Fortunately, I found this, which outlines how to use an older version of openstack-dashboard that does work without Keystone.

# apt-get update
# apt-get install bzr
# cd /root
# bzr branch lp:openstack-dashboard
# cd openstack-dashboard/
# bzr revert -r 46
# cd local
# cp -a local_settings.py.example local_settings.py

You now need to configure local_settings.py with correct values for NOVA_DEFAULT_ENDPOINT, NOVA_DEFAULT_REGION, NOVA_ACCESS_KEY, NOVA_SECRET_KEY, NOVA_ADMIN_USER, NOVA_PROJECT. Most of these values can be ripped out of novarc on your cloud controller.

Continue on with the installation:

# apt-get install -y python-setuptools
# easy_install virtualenv
# python tools/install_venv.py
# tools/with_venv.sh dashboard/manage.py syncdb

When you run “dashboard/manage.py syncdb”, it’ll prompt you asking if you want to create a Django superuser (since none exist at this point). I answered yes, entering a username that matched the value of NOVA_ADMIN_USER. I initially tried creating a Django user with a different username, and upon logging into openstack-dashboard I failed to see my Nova project.

Lastly, go ahead and start up the server:

# tools/with_venv.sh dashboard/manage.py runserver 0.0.0.0:8000

At this point, you should be able to access your dashboard on http://x.x.x.x:8000, replacing x.x.x.x with your openstack-dashboard server's IP.

If you run into any issues, refer to this, which contains valid information for this particular version of the dashboard.

Again, running with this old version of the dashboard isn't ideal, and you certainly don't want to run it as root, but hopefully this will point you in the right direction if you struggle to get the latest version to work with Nova. My next task will to be to get a version of the dashboard and Nova which work together installed and operational.

Moving nova-compute to a separate instance

July 22nd, 2011

I want to quickly document how I accomplished this. Again, I used virtual machines (running Ubuntu Natty), but used public cloud server instances rather than private virtual machines.

First things first. Here’s the eth1 (private network) addresses assigned to my cloud servers:

nova-cc (our Nova cloud controller node):

eth1: 10.176.65.54

nova-compute (our Nova compute note, which will run our instances (QEMU or UML):

eth1: 10.176.95.220

Similar to previous posts, I went ahead and used 192.168.0.0/16 for Nova network as I didn’t have public IPs, nor did I want to interfere with the 10.176.64.0/18 network which is already used by this cloud provider.

On nova-cc, we need to install mysqld. Nova defaults to using SQLite, which works great when everything is running off a single instance. However, now that we’ve got another instance that needs to talk to nova-cc, we need a SQL server that it can connect to.

# apt-get update
# apt-get install mysql-server

Once installed, edit /etc/mysql/my.cnf and change this from:

bind-address            = 127.0.0.1

to:

bind-address            = 10.176.65.54

Finally, restart mysqld:

# service mysql restart

Now hop into the mysql shell and create a database and user/password to connect with:

# mysql -u root
mysql> CREATE DATABASE nova;
mysql> GRANT ALL PRIVILEGES ON nova.* TO nova@10.176.65.54 IDENTIFIED BY 'somepasshere';
mysql> GRANT ALL PRIVILEGES ON nova.* TO nova@10.176.95.220 IDENTIFIED BY 'somepasshere';

On nova-compute, the only nova-related package you really need is nova-compute:

# apt-get -y install python-software-properties
# add-apt-repository ppa:nova-core/milestone
# apt-get update
# apt-get install nova-compute

On both nova-cc and nova-compute:

# cat >> /etc/nova/nova.conf << "EOF"
--sql_connection=mysql://somepasshere@10.176.65.54/nova
--image_service=nova.image.glance.GlanceImageService
--glance_api_servers=10.176.65.54:9292
--rabbit_host=10.176.65.54
EOF
# for SERVICE in `ls -1 /etc/init.d/nova*`; do service $SERVICE restart; done

Now what we need to do is create our 192.68.0.0/16 network. We’ll use OpenVPN to do this, and we’ll use the eth1 private network (10.176.64.0/18) to do this this. Again, the idea here is to have a completely separate network which won’t interfere with what’s already out there.

On both:

# apt-get install openvpn

On nova-cc:

# cd /etc/openvpn
# openvpn --genkey --secret openvpn.key
# scp openvpn.key root@10.176.95.220:/etc/openvpn
# cat > /etc/openvpn/openvpn.server << "EOF"
dev tap
ifconfig 192.168.0.1 255.255.255.0
secret /etc/openvpn/openvpn.key
daemon
EOF
# cat > /etc/network/if-pre-up.d/00openvpn << "EOF"
#!/bin/bash
 
/usr/sbin/openvpn --config /etc/openvpn/openvpn.server
 
exit 0
EOF
# chmod 755 /etc/network/if-pre-up.d/00openvpn

On nova-compute:

# cat > /etc/openvpn/openvpn.client << "EOF"
remote 10.176.65.54
dev tap
ifconfig 192.168.0.3 255.255.255.0
secret /etc/openvpn/openvpn.key
daemon
EOF
# cat > /etc/network/if-pre-up.d/00openvpn << "EOF"
#!/bin/bash
 
/usr/sbin/openvpn --config /etc/openvpn/openvpn.client
 
/usr/sbin/brctl addbr br100
/usr/sbin/brctl addif br100 tap0
 
/sbin/ifconfig tap0 0.0.0.0
/sbin/ifconfig br100 192.168.0.3
 
exit 0
EOF
# chmod 755 /etc/network/if-pre-up.d/00openvpn

Finally, on both nodes:

# echo "--flat_interface=tap0" >> /etc/nova/nova/.conf

This causes nova-network to bridge into tap0.

Let’s recap what we’ve done. On nova-cc, we’re configuring OpenVPN to act as a server. We’re bringing tap0 up with IP 192.168.0.1/24 and the /etc/network/if-pre-up.d/00openvpn script ensures that the VPN server is started on on boot (specifically, before the other network devices are brought up). On nova-compute, we configure OpenVPN as a client, and the /etc/network/if-pre-up.d/00openvpn script creates a bridge (br100), adds the tap0 interface to it, and then brings 192.168.0.3 up on br100. If I recall correctly, the tap0 device doesn’t appear to be “up” until we ifconfig it, which is why we just set it to 0.0.0.0. Don’t quote me on this though, as I can’t quite remember. :P

I know very little about bridging, but essentially a bridge “connects two or more different physical ethernets together to form one large (logical) ethernet” (taken from /usr/share/doc/bridge-utils/HOWTO), and this is precisely what we have done here. We bridge the virtual interfaces for running instances (ie. vnet0) with tap0 (our VPN connection), which means that nova-cc can speak to instances running on nova-compute, and vice-versa. This is also essential as dnsmasq (our DHCP server) runs on nova-cc (spawned by nova-network), and without this bridging in place our instances would not be able to have their networking configured automatically on boot by the DHCP server.

Also, the reason why we don’t have to explicitly configure br100 on nova-cc is because that runs nova-network, which handles the bridging automatically. The only thing we did need to do on the nova-cc side is instruct nova-network on which device to bridge into (–flat_interface=tap0). The last thing I’ll say here is that OpenVPN used device tun0 by default, but we have to use tap0 (a virtual Ethernet adapter) as brctl creates Ethernet bridges, and a tun device is a “virtual point-to-point” link (see this for a tad more info).

Go ahead and reboot each instance, one at a time, to ensure that everything comes up as expected.

Once back up, on nova-cc:

# mysql -u root nova
mysql> SELECT * FROM fixed_ips WHERE id=4;
+---------------------+---------------------+------------+---------+----+-------------+------------+-------------+-----------+--------+----------+
| created_at          | updated_at          | deleted_at | deleted | id | address     | network_id | instance_id | allocated | leased | reserved |
+---------------------+---------------------+------------+---------+----+-------------+------------+-------------+-----------+--------+----------+
| 2011-07-22 16:21:35 | 2011-07-22 20:48:26 | NULL       |       0 |  4 | 192.168.0.3 |          1 |        NULL |         0 |      0 |        0 |
+---------------------+---------------------+------------+---------+----+-------------+------------+-------------+-----------+--------+----------+
1 row in set (0.00 sec)
 
mysql> UPDATE fixed_ips SET reserved=1 WHERE id=4 LIMIT 1;

What we’re doing here is “reserving” 192.168.0.3 for the other end of the VPN link on nova-compute. 192.168.0.2 is already reserved, but I’m not sure if nova uses this or will use it for something at some point. As such, just play it safe and reserve another available IP.

In theory, if you now launch an instance on nova-cc, it should build on nova-compute and the IP assigned should be accessible via nova-cc. The instance on nova-compute will have a gateway of 192.168.0.1 (which is physically on nova-cc), which means that all traffic in and out of the instance will travel through nova-cc. This also means that if nova-cc goes down, instances will not be able to communicate with the outside world (or potentially each other, though I’ve not tested myself).

That should be able it. I’ve probably missed a few things, but the general gist should be here. Also, I’m aware that there are no security best-practices implemented here, but the idea is to just get everything up and running as a proof of concept, and fine-tune later.

Can’t ssh to UML instances when creating w/ valid keypair

July 14th, 2011

While creating UML instances on nova, I noticed I wasn’t able to ssh into my instances using the keypair I previously created. Looking at the logs on the nova-compute node, I saw:

2011-07-13 21:15:22,256 INFO nova.virt.libvirt_conn [-] instance instance-0000003d: injecting key into image 3
2011-07-13 21:15:22,256 DEBUG nova.utils [-] Running cmd (subprocess): sudo losetup --find --show /var/lib/nova/instances/instance-0000003d/disk from (pid=838) execute /usr/lib/pymodules/python2.7/nova/utils.py:143
2011-07-13 21:15:22,424 DEBUG nova.utils [-] Running cmd (subprocess): sudo kpartx -a /dev/loop0 from (pid=838) execute /usr/lib/pymodules/python2.7/nova/utils.py:143
2011-07-13 21:15:22,509 DEBUG nova.utils [-] Running cmd (subprocess): sudo kpartx -d /dev/loop0 from (pid=838) execute /usr/lib/pymodules/python2.7/nova/utils.py:143
2011-07-13 21:15:22,563 DEBUG nova.utils [-] Running cmd (subprocess): sudo losetup --detach /dev/loop0 from (pid=838) execute /usr/lib/pymodules/python2.7/nova/utils.py:143
2011-07-13 21:15:22,604 WARNING nova.virt.libvirt_conn [-] instance instance-0000003d: ignoring error injecting data into image 3 (Mapped device was not found (we can only inject raw disk images): /dev/mapper/loop0p1)

I tried running the kpartx commands above, but they didn’t return anything. This was because the image I was using had no partition table.

To fix, I effectively created a new image and copied data from the original one.

To begin, create a spare file (see this for more info):

# cd /root
# dd if=/dev/zero of=CentOS5.6-AMD64-new-root_fs bs=1 count=0 seek=1024M

Now, create a partition to span the entire disk (replace /dev/loop0 with whatever losetup returns):

# losetup --show --find CentOS5.6-AMD64-new-root_fs
/dev/loop0
# fdisk /dev/loop0

Now, use kpartx to make the partition visible to the host, and create a filesystem on that partition:

# parted
# kpartx -a /dev/loop0
# mke2fs -j /dev/mapper/loop0p1

Mount the original image and copy data over:

# losetup --show --find CentOS5.6-AMD64-root_fs 
/dev/loop1
# mkdir /mnt/loop{0,1}
# mount /dev/mapper/loop0p1 /mnt/loop0
# mount /dev/loop1 /mnt/loop1
# cd /mnt/loop1
# rsync -a . /mnt/loop0

Update the fstab on the new image (this is necessary as the partition layout has now changed):

# cd /mnt/loop0/etc
# sed -i 's/ubda/ubda1/g' fstab
# cd /
# umount /mnt/loop{0,1}
# kpartx -d /dev/loop0
# losetup -d /dev/loop{0,1}

Modify /etc/nova/libvirt.xml.template, changing this line from:

<root>/dev/ubda</root>

to:

<root>/dev/ubda1</root>

That should be about it.

Using UML instances on OpenStack Nova

July 2nd, 2011

As mentioned in http://blog.defunct.ca/?p=411, I’m running OpenStack on a XenServer virtual machine and need to be able to use something like UML to run instances from within the VM. I had to hack a number of things in order to get this to work.

First things first. Let’s get install user-mode-linux:

# apt-get update
# apt-get install user-mode-linux

Now, grab the CentOS 5.6 x86_64 image from http://fs.devloop.org.uk/ (we’re using a 64-bit XenServer VM):

# cd /root
# wget http://fs.devloop.org.uk/filesystems/CentOS-5.6/CentOS5.6-AMD64-root_fs.bz2
# bunzip2 CentOS5.6-AMD64-root_fs.bz2

There is an image on http://wiki.openstack.org/Nova/UML, however I couldn’t get this image to boot properly. As such, I opted for the CentOS 5.6 image above.

A few things within the image needed adjusting, so I:

# mkdir /mnt/image
# losetup --find --show CentOS5.6-AMD64-root_fs
/dev/loop0
# mount /dev/loop0 /mnt/image
# cp -a /usr/lib/uml/modules/2.6.35.1/ /mnt/image/lib/modules/
# chroot /mnt/image
# sed -i 's@LABEL=ROOT@/dev/ubda@g' /etc/fstab
# chkconfig network on
# exit
# umount /mnt/image
# losetup -d /dev/loop0

Note that /usr/lib/uml/modules/2.6.35.1/ is provided by the user-mode-linux package on Ubuntu 10.10, so adjust accordingly to what you’re running. Also, UML seems to use device /dev/ubda, so we modify /etc/fstab w/ that.

Now we can bundle up the image:

# cd /root
# euca-bundle-image -i CentOS5.6-AMD64-root_fs
# euca-upload-bundle -b uml-image-bucket -m /tmp/CentOS5.6-AMD64-root_fs.manifest.xml
# euca-register uml-image-bucket/CentOS5.6-AMD64-root_fs.manifest.xml

Once that’s done, we remove this from /etc/nova/nova.conf:

--libvirt_type=qemu

… and add this:

--libvirt_type=uml
--use_cow_images=false
--libvirt_xml_template=/etc/nova/libvirt.xml.template

Since I’ve specified –libvirt_xml_template in /etc/nova/nova.conf, we need to now create that file:

# cp -a /usr/share/pyshared/nova/virt/libvirt.xml.template /etc/nova/libvirt.xml.template

Now open up /etc/nova/libvirt.xml.template and remove the following:

#if $getVar('vncserver_host', False)
        <graphics type='vnc' port='-1' autoport='yes' keymap='en-us' listen='${vncserver_host}'/>
#end if

The reason for doing this is because it appears that we run into an issue similar to the one reported in here. Update: I’ve since created a bug report for this UML issue here.

Now we can restart the nova-compute service and create an instance:

# service nova-compute restart
# euca-run-instances ami-778c501e -k mykey -t m1.tiny

(replace ami-778c501e with your image name, which can be found by running euca-describe-images)

If your instance doesn’t go into a running state, have a look at the libvirt.xml file under /var/lib/nova/instances/####/ (replacing #### w/ your instance’s name, found by running euca-describe-instances), ensuring that there is no reference to the vnc stuff in there. If there is, then the template hasn’t been updated or isn’t being used correctly. Otherwise, your instance should be ssh-able, using the IP returned by euca-describe-instances.