Ever since we started TestingBot (almost 2 years ago!) we’ve been running TestingBot on Amazon AWS (EC2 + S3 + other services).
These last few months however, we’ve been moving everything from Amazon to our own private cloud.
Originally Amazon AWS seemed like a good fit for us: easy to setup, manage and maintain.
We’d scale up and down, depending on the number of tests our customers were running.
As it turned out, AWS has its disadvantages:
- Noisy neighbors: sometimes instances would behave much slower than usual, because other people on the same hypervisor were using all the hypervisor’s resources.
- Expensive: AWS is expensive, as soon as we start an instance, we’re billed for the entire hour, even if we only need to run a 2 minute test on it.
In July we started looking into running our own private cloud on a bunch of dedicated servers. Originally we planned to use VMWare’s vSphere and vCenter, but after testing VMWare for 2 weeks we concluded it would not satisfy our needs:
- Expensive: complicated/expensive licensing + expensive support
- Black box: whenever something went wrong it was hard to troubleshoot since we can’t look at the code. VMWare does have good documentation though.
- Complicated API: we needed an API that would help us automate launching/destroying VMs. VMWare’s APIs are complicated to test and use.
After we ditched VMWare, we decided to look into an open-source solution: KVM + Qemu.
This turned out to be an instant success: easy to install, setup and use.
Together with libvirt we quickly had a proof of concept system where we could easily launch and destroy virtual machines.
Everything was looking good, we just had one more wish: eliminate booting the VMs.
Since booting the VMs takes time and resources from the CPU, eliminating it would mean faster VM turnaround and less IO on our VM host servers.
We eventually stumbled upon GridCentric. They’re working on VMS, which eliminates boot storms (for example: a big spike in VM boots early in the morning when people start their work-day). After a proof of concept we quickly had a system where we could launch a VM with RAM already loaded into the VM, ready to immediately run the test.
Now we’re running our own cloud; as soon as a customer wants to run a test we spin up a VM in less than 10 seconds, run the test and destroy the VM after the test has finished. This way we guarantee pristine VMs, fast tests and a secure environment.
Together with these changes, we’ve changed some more things on TestingBot:
- Updated our OSX VMs to OSX Mavericks
- Added IE11 to our grid
- Created a “prerun” capability to download and run any program you specify before running your test so you can customize the VM to your liking.