Distributed Hosting

Hi guys,

Recently I’ve been looking in to cloud database products. I was kind of hoping that by cloud the vendors meant that the data wasn’t fixed to any particular server, so we could get replication across multiple data-centres etc, mostly for increased robustness. Turns out that very few people do it. Amazon will replicate data across data-centres in the same region, but we couldn’t replicate between a US and EU based data-centre, for example.

So then it got me to thinking about a project that I’m working on. I was planning on starting it out on one of these VPS or cloud based products where you can increase your allocated resources as necessary. I know there’s a practical limit to how far you can scale such things before you need to move horizontally, but I figured I’d have some time on such a product before I needed to look in to additional hosting.

However, the more and more I look at it, the more it appears that most cloud based hosting is simply either a virtualised server instance, or a resource restricted user account whose limits can be adjusted as necessary. If I out-grow the server (which I hope I will as my customer base increases) then I still need to get an additional server, for example. What I’m actually looking for is where I’d be able to keep saying “gimme more CPU resource”, or “I want more RAM” and to be able to scale beyond the power of a single server, if that makes sense?

So, if there is actually a distributed hosting package available, where I can seriously out-grow the capacity of a single server, what is the one to look at? I do my development mostly in PHP and MySQL. Are products such as AWS or Google App Engine suitable? What experience do you guys have? Which would you recommend? If they also come with a CDN that’s an extra bonus :slight_smile:

Thanks

Probably worth a look at rackspace cloud sites (not cloud servers which is their ec2/vps equivalent). I tend to agree with 37 signals ‘getting real’ advice though: don’t overthink or over engineer for the future unnecessarily. Keep an eye on structural decisions in your application that may affect future scalability more than what infrastructure you will deploy on; few scenarios can’t be handled and relatively easily scaled by a fairly traditional set up of a load balancer/caching proxy in front of multiple web and database nodes.

Thanks. I have a few concepts in development that should scale well on their own, but we’ve been tripped up in the past by various theoretical solutions not being so good in practice. NFS etc letting us down and being out of sync. MySQL replication failing miserably. I’m sure you know the kind of thing I mean. The ideal solution isn’t always to throw more power at it, admittedly, but I feel fairly well clued up for other solutions so I’m curious about how this one works and if anyone uses it successfully.

I’ve always looked at the virtualisation / cloud solution and though that there has to be a limitation with it somewhere - be it scaling beyond the abilities of one host server.

Now, you can take work and break it down into ‘units’ to be processed and re-assemble after, sort of how the Google infrastructure works for search, but that does’t work well with PHP / mySQL based thoughts / sites.

The next solution, as you’ve noted is just to add another instance to gain more capacity.

As has already been said, start out simple and grow it, if you need more capacity add it then rather than trying to over engineer from the start “just incase”.

Oh, I’m not trying to over engineer. I was hoping for the opposite: Simplicity. Though the more I thought about it, and the more I looked in to it, the less that looked possible with modern technology. I do like the idea of cloud servers (or virtualised platforms in general) in that you can pick the instance up and move it to another physical server with minimal fuss as one fairly large lump, but fundamentally there is no distributed, unlimited power kind of solution available at server side that I can see.

I do however feel that you need to code to the platform to a degree. Experience has taught me that a generic code base is rarely ideal. The most obvious example from my experience is MySQL engines. You can’t just say “I’m going to switch from InnoDB to NDB Cluster” and expect it to work. To get the best out of any platform you need to consider the actual platform somewhat and engineer your solution accordingly. Not over-engineering can be a fine line

Yeah, Cloud / Virtualisatio is very good for moving stuff about. IF you try to cram everything into one instance, there will always become bottlenecks though, even if its just the task scheduling. It is more efficient to have lots of smaller instances doing the work and coming back together once complete.

Certainly, yeah, I agree about thinking about your platform and working with it before you start :slight_smile:

we couldn’t replicate between a US and EU based data-centre,

You can replicate data from any source to any destination over WAN, but that can be done from internal tools such as rsync. I would recommend you to consider Cloud Servers or Load Balancing Servers.

Oh, I know that you can replicate to anywhere, but what I mean is that Amazon (and others like them) do not support replication from the US to EU and vice versa. What we were looking at were the multi availability zones (multi-AZ) and read-replicas (read only replicants) that we wanted to use so that the databases would be high-availability by using load balancing etc and using the nearest available database server with capacity. But Amazon don’t support it. Turns out that they use a product like DRBD that does disc-level replication and failover is slow.

I wouldn’t use RSync for replication of database, but yes, it’s great for synchronising your web files. I use it for taking offline copies of sites and migrations but a standard MySQL Dump for the data which is always a full data-set

Antnee, did you ever try PostgreSQL with replication, I’ve found it less troublesome than MySQL (which is better since 5.5 with semi-synchronous transactions). There are limitations in which replication topologies you can use, and of course your application might not be suitable for using it, but worth a look if replication is a high priority.