Skip to main content

Google AppEngine doesn’t fit the needs of startups on the runway


I tweeted yesterday that I have found AppEngine a poor fit for my startup. The topic deserves followup, since I am big fan of AppEngine in general.

The primary need in a startup is to find a sufficiently large, sufficiently paying audience to enable continued survival. Since by definition the startup is boldly going where no startup has gone before, this implies experimentation. Startups need to mitigate risk by validating a large portfolio of ideas. Spinning the “idea -> experiment -> validation” cycle at hyperspeed reduces risk and maximizes value.

JUnit Max logs all of its errors to an AppEngine-hosted server. It also stores a summary of each test run: number of tests run, number passed, time, and (optionally) the user id of the programmer. I use the errors to prioritize the time I spend fixing defects. I hope to use the log of test runs to power novel services that help programmers find meaning in their work. I picked AppEngine because it was free for small-scale projects and because it handled lots of system administration tasks for me.

But…

I now have a month worth of data from the Max user community in the server. AppEngine has two attributes that have frustrated my efforts to spin my validation loop for JUnit Max using this data: time limits and owned data. First, what’s stored in AppEngine stays in AppEngine. There is no general way to download my data set. In extremis I have resorted to copying html tables from the online user interface and pasting them into Excel. Locked up in the data on the server are answers to a million questions like, “How many people are setting their user id?” If I had the data locally, answering this question would take seconds. As the data is in AppEngine, I can’t run an ad hoc query.

Which leads to the second attribute of AppEngine that has proven to be a problem: time limits. All processing in AppEngine is triggered by sending a URL and URLs can’t take longer than 5 seconds to process. Ad hoc queries, of the kind required to quickly answer the “How many people are setting their user ids?” can’t run in five seconds, not at least the way I write them (I’m using JDO as an interface). Conceptually simple operations like setting a default value for a new data column in existing rows turn into a game of writing a servlet that queries for rows that don’t have the column set and then setting as many rows as possible before timing out (this has turned out to be 30-40 rows for simple operations), and then setting a cron job to run the servlet once a minute until done. An operation that could have been done in a second manually turns into an hour of messing about.

Since Max is still young, the user base is small and the data involved is also small. This negates on of the great advantages advertised for AppEngine: scale. While I’m sure that transparent scaling would be great, I don’t have that problem. If I can’t find a way to experiment faster I fear I will never have that problem.

The frustrating part is that I’m sure that if I had the data in, say, Smalltalk I could do all this experimentation at lightning speed. The data fits in memory. If I wanted to process it to understand user behavior I could do so in a second. If I wanted to produce an experiment like this one, which shows the time lapse between green test runs (thanks to David Saff for being an example), I could code it in minutes. Even if a product idea required processing the entire data set on every hit, I could do it. As it is, I have to scale back my ideas to what AppEngine can accomplish fetching a handful of records.

Where next?

I am emphatically not anti-AppEngine in general. Once I work out which features people will pay for and how I can incrementally process data to deliver those features, I’m sure it will be a cheap way to scale quickly. That is, AppEngine supports climb out and level flight just fine. On the runway, though, its limitations are potentially fatal. I’m now looking for an environment for my data that encourages experimentation and is still cheap and simple to operate. I’m looking forward to the day when moving back to AppEngine is a way to solve my scaling problems.

Now if I can just figure out how to liberate my data 5 seconds at a time…

Comments