Overview

Update: Two years later, I've switched to Online.net after a lot of disk problems at Hetzner (almost every 6 moonths). See the bottom of the page for Online.net migration.

Hetzner is poor (cheap?) entrepreneur's dream. This is about setting up a company IT infrastructure. It's more like system administration than software development. But in micro ISVs, developers (and managers) are both and all of them, right? The predecessor of this document is development infrastructure, which consisted of a single machine on Linode. However as we progress, we need more than a 512MB tiny VPS. We discovered Hetzner, and there we have the opportunity to make a company network on Vmware ESXi. Here we document how we did it. Basically we have bunch of VMs, one reverse proxy, one development machine (similar to the predecessor), one (or more) deployment machines, and finally one business machine. Vmware ESXi is straightforward. The tricky part is the backup (on free version), so this document is more about creating the machines and backing them up.

1. Initial ESXi Setup

Hetzner did it for us, for 25€. The thing is since ESXi cannot route, it needs a separate public IP to reach the management console, via vSphere client. And for our current plan we have software RAID, and surprise, ESXi does not support it, so we have two 3TB datastores. Add the second one from configuration -> storage. Other installation options are followed from: http://wiki.hetzner.de/index.php/VMware_ESXi/en

2. Virtual Switch and Gateway VM Setup

Since we cannot afford a new subnet we use a single machine as a reverse proxy, and set other VMs local IPs. Even if we happen to afford subnets, Hetzner guide says that it is required to route subnet to a router VM on ESXi (they don't want to do the routing). To add virtual switch, go to Configuration -> Network (in vSphere) and click Add Networking. Select Virtual Machine, create standard switch, give the label "Company Local Network". Create a new VM, named "Company Gateway", 64 bit Ubuntu Server (512MB), 2 NICs, one on Company Local Network, one on VM Network. Before finishing set the VM Network NIC MAC to the MAC given by Hetzner (00:50:56:00:XX:YY). Install Ubuntu Server. Install SSH server. Follow Initial User Management section from development infrastructure  page (restrict ssh access, remove root). Give 10.0.0.1 IP to the local NIC. Uncomment net.ipv4.ip_forward=1 on /etc/sysctl.conf and do echo 1 > /proc/sys/net/ipv4/ip_forward. See: https://nowhere.dk/articles/tip_nat_with_ubuntus_ufw_firewall, http://codeghar.wordpress.com/2012/05/02/ubuntu-12-04-ipv4-nat-gateway-and-dhcp-server/, and https://help.ubuntu.com/12.04/serverguide/firewall.html#ip-masquerading. Then, setup other VMs on the Network, with static IP's on 10.0.0.x network.

3. ESXi Backup with GhettoVCB

Since the dedicated machine with ESXi does not have software raid, we need to backup regularly in case of hard drive failure. Note that this backup is separate from data (db & files) backup, which is machine specific and taken to Dropbox (off-site backup). (http://communities.vmware.com/docs/DOC-8760) To setup backups ssh to management interface, create a folder named ghettoVCB, upload ghettoVCB files here, update conf file and run with ./ghettoVCB.sh -a -g ghettoVCB.conf which will backup all machines to the location specified in conf file. It is this straightforward. To recover just copy back to first datastore or add to inventory directly. Additionally add crontab to backup daily. Edit /var/spool/cron/crontabs/root add: 0 0 * * * /vmfs/volumes/datastore2/config/ghettoVCB-master/backup.sh > /vmfs/volumes/datastore2/config/ghettoVCB-master/ghettoVCB-backup-$(date +\%s).log Note that this is not persisted accross reboots. Do not reboot unless very necessary. A lot of configuration is reset on reboots, need to check extensively. (NOTE: we can setup software raid or iSCSI for important data files, for instantaneous hard disk failure)

4. ESXi Security

I followed this link to secure ESXi http://serverfault.com/questions/309942/how-to-secure-an-vmware-esxi-host-on-a-root-server

Changing ssh port was very PITA. I could not access via SSH, since default firewall thinks that ssh port is 22 but I changed it to [some another port]. I had to access to console with LARA, change port back to 22 (/etc/services) and then add firewall rule as in http://communities.vmware.com/message/1853590. As netstat in ESXi use: "esxcli network ip connection list". From vSphere Client firewall, disabled all incoming ports except: 902, 68, 22 (no ssh), 161, 546, 53, [ssh port], 902, 443. Scanned with nmap and only 443,902 and [ssh port] ports are open. Add ssh keys and disable password login as in: http://www.howtoforge.com/ssh_key_based_logins_putty. Then changed access to 902/403 from 127.0.0.1/32 from vShpere client firewall. Follow: http://www.virtuallifestyle.nl/2010/03/tunneling-a-vsphere-client-connection-over-ssh/. The only open port is [ssh port].

5. Gateway VM Other Services Setup

NAT configuration with Gateway VM is explained at step 2. Here other services for Gateway VM such as VPN, Mail, DNS are configured. These services used to be at Development machine since it was the only machine (at linode), but here we install all infrastructure services to the gateway vm so that all other machines are just dump VMs at local network.

First enable key only login as in http://www.howtoforge.com/ssh_key_based_logins_putty.

Then, install OpenVPN according to https://help.ubuntu.com/12.04/serverguide/openvpn.html

Install nginx (future: consider cherokee). Note that nginx configuration files are at /etc/nginx/sites-available and they are enabled by symbolic link to sites-enabled. Configuration files are backed up to Dropbox/machines/gw/config/nginx. Create symbolic links with "sudo ln -s /etc/nginx/sites-available/trac.yoursite.com trac.yoursite.com" command at sites-enabled folder. Start nginx with "sudo /etc/init.d/nginx restart"

Install bind. https://help.ubuntu.com/12.04/serverguide/dns-installation.html. I imported the zones from linode DNS manager. Zone config files are backed up to Dropbox/machines/gw/config/bind

Install mail with command "sudo apt-get install postfix". Select internet site and mail name as mail.yoursite.com. Forward all mail sent to here to sysadmin as in http://www.cyberciti.biz/faq/linux-unix-bsd-postfix-forward-email-to-another-account/.

Install fail2ban with "sudo apt-get install fail2ban".

Install tiger.

sudo apt-get install fail2ban
cd /etc/fail2ban
sudo cp jail.conf jail.local
sudo vi jail.local
//change destemail to [email protected]
//change action = %(action_)s to action = %(action_mw)s
//change ssh maxretry to 3
//save
sudo vi /etc/rsyslog.conf
//change $RepeatedMsgReduction off
//save
sudo /etc/init.d/rsyslog restart
sudo /etc/init.d/fail2ban reload

6. To Be Done

In order to get back to work as soon as possible, some of the features are ommitted. Here are some future features to be done;

  • Create a software RAID on VM; with harddisks on datastore 1 and 2. Mount this new drive as separate partition on /apps/data folder. So, in case of catastrophe (hd failure), we recover VM backup from datastore 2 and data folder from RAID without any data loss.
  • For complete high availability on catastrophe, duplicate the ESXi infrastructure on another dedicated server, arrange backups to cross datastore 2 (if possible), and setup the data partition NOT as software RAID but as iSCSI, so that distributed VMs can share data folder. You can postpone this until you have a lot of paying customers...
    • Aside from shared data folder complete high availability setup requires; failover IP, nginx round robin (or haproxy), jetty session share, pgpool, and each passive ESXi checks the active and calls hetzner api to set itself as IP failover. Active IP distributes the load to passive ESXi jetty servers. This would be the ideal case, but some down time is not bad compared to this overhead.
    • Another option could be to start an Amazon EC2 (or linode) instance in case of hw fail until fixed. This may be the least extra overhead option until we need new ESXi only for loadbalancing

***Update:*** Online.net Migration

After a lot of disk problems in Hetzner (lack of HW raid and even problems in case of software raid) finally decided to look for alternatives. First asked Hetzner if was possible to upgrade to a new server without a setup fee, they said no. OVH soyoustart was good, with SSD HW raid, but didn't have stock. Then decided to get this one; ​https://www.online.net/en/dedicated-server/dedibox-md2k15 OMG it's a real server! Not desktop hardware like Hetzner. They say they detect the hardware problems by their hardwarewatch technology and change the malfunctioning parts automatically. I was just hoping that they hot swap the dead SSDs (HW raid). Hetzner probably shuts down the machine. (Update2: online.net changed dediboxmd specs and removed hardware raid, what I say here is applicable for hardware raid)

Initial ESXi Setup

The installation was so seamless, just selected Virtualization, then ESXi 5.5, voila! ESXi even shows the CPU temperature! It was like an oasis in the desert after Hetzner's crappy user experience. Hetzner even asks for a fee for ESXi installation, just a fucking image! For the gateway VM, you need an additional IP, order from online.net console, and create a MAC address.

Virtual Switch and Gateway VM Setup

To add virtual switch, go to Configuration -> Network (in vSphere) and click Add Networking. Select Virtual Machine, create standard switch, give the label "Company Local Network". Then, create a new gateway VM in ESXi with 2 NICs, one on Company Local Network, one on VM Network. Before finishing set the VM Network NIC MAC to the MAC given by Online.net (XX:XX:XX:XX:XX:XX). Install Ubuntu Server. Install SSH server. Restrict ssh access, change ssh port. Give 10.0.0.1 IP to the local NIC. Also the other NIC ip must be configured (no DHCP). This was troublesome.

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
address [your public ip]
netmask 255.255.255.255
post-up route add [online.net gateway] eth0
post-up route add default gw [online.net gateway]

auto eth1
iface eth1 inet static
address 10.0.0.1
netmask 255.255.255.0
 Uncomment net.ipv4.ip_forward=1 on /etc/sysctl.conf and do echo 1 > /proc/sys/net/ipv4/ip_forward. See: ​https://nowhere.dk/articles/tip_nat_with_ubuntus_ufw_firewall, ​http://codeghar.wordpress.com/2012/05/02/ubuntu-12-04-ipv4-nat-gateway-and-dhcp-server/, and ​https://help.ubuntu.com/12.04/serverguide/firewall.html#ip-masquerading. Then, setup other VMs on the Network, with static IP's on 10.0.0.x network.