Google Application Engine

Back in March I speculated that Google was soon to announce a third party service platform. Well, on the evening of April 7th, Google Application Engine was announced. It’s been heavily covered over the last couple of weeks and I’ve been waiting to get a beta account so I can write some code against it. I’ve not yet got an account but Sriram Krishnan has been playing with it and sent me the following excellent review.

· Guest book development video: Developing and deploying an application on Google App Engine (9:29)

· Techcrunch: Google Jumps Head First Into Web Services With Google App Engine.

· Google App Engine Limitations: evan_tech.

· What’s coming up: We’re up and Running!

· High Scalability: Google App Engine – A second Look

Sriram’s review of Application Engine.

Overall

It’s well designed from end to end, builds on a good ecosystem of tools, most scenarios for a typical web 2.0 app are covered. If I were to ever get into the Facebook-app writing business, AppEngine would be my first choice. However, any startup which requires code to execute outside the web request-reply cycle is out of luck and would need to use EC2.

The mailing list is overflowing so there is obviously huge community interest and lots of real coders building stuff.

The datastore is a bit wonky for my taste. It neither fits into SQL/RDBMS nor the clean spreadsheet model of Amazon SimpleDb – it’s a ORM with some querying thrown-in and that leads to some abstraction leakages . The limitations on queries are going to take a bit of getting used to since they’re not intuitive at all(they only support queries where they can scan the index sequentially for results, the choice of datatype is not straightforward). The datastore was the area where I found myself consulting the docs most frequently.

Python-only is probably a big con at the moment. I’m a big Python fan but its pretty apparent that a lot of people want PHP and Ruby. However, when you poke around the framework, it is pretty apparent that the framework is built to be language agnostic and that the creators had support for other languages in mind from the beginning.

Lack of SSL support, unique IPs per app instance are other problems. The latter really kicks in when you’re calling other Web 2.0 APIs. A lot of them do quota calculations based on IP address and this wont work when you’re sharing your IP with a bunch of other apps. Lack of SSL support is not a blocker (since you can use Google’s inbuilt authentication system) but will block any non-serious app.

The beta limits are too conservative and they are too aggressive in enforcing them – they kept nuking my benchmarking apps for relatively short bursts of activity (more on that later). This really makes me hesitate to put anything non-trivial on AppEngine. If I were them, I would loosen up these limits or get customers to pay a bit extra for more CPU/network slices

The Web Framework

I’m familiar with Python and Django so I’m probably not the best person to judge the learning curve. It’s very clean and usable (I like it much better than ASP.NET) and I found myself being reasonable productive within a few minutes.

There are also put hooks in so that you can use almost any Python framework of your choice with a bit of work – you’re not stuck to the one provided. On the mailing list, there’s a lot of activity around porting other frameworks (pylons, web.py, cherrypy, etc) to AppEngine. If it were up to me, I would be using Aaron Swartz’s web.py but that is more a stylistic personal preference.

Python was not originally designed to be sandboxed so Google had to make some major cuts to make it ‘safe’ – they don’t allow opening sockets for example. This has caused a lot of open source Python code to stop working – essential libraries like urllib (the equivalent of .net’s HttpWebRequest) need some porting work.

The tools support is a bit sparse – debugging is mostly through printf/exception stack traces However, what it lacks in tooling is made up for in the speed of its edit cycle – just edit a .py file and then refresh the page.

Some people are going to have trouble getting used to the lack of sessions but I think the pain will be temporary (some people have started working on using the datastore as a Django session store to session state). From my limited testing, I didn’t see much machine affinity – Google seems happy to spin up processes on different machines and kill them the moment they finish serving the request.

The Datastore

You specify your data models in Python and there’s some ORM magic that takes place behind the scenes. They have a few inbuilt data types and you can use expando (dynamic) properties to assign properties at runtime which haven’t been defined in your model. Data schema versioning is a big question-mark at the moment – if I were Google, I would look into supporting something like RoR’s migrations

Querying is done through a SQL-subset called GQL on specifically defined indexes. For a query to succeed, the query must be supported by an index and the scan needs to find sequential results and this puts some restrictions on the kinds of queries you can execute (you can’t have inequality operators on more than one attribute, for example). Several indexes are auto-generated and you can request others to be created.

They appear to auto-generate several indexes.

Entities can be grouped together through ReferenceProperties into groups. Each group is stored together. Queries within one group can be bunched together into a transaction (everything is optimistic concurrency by default). Bunching together lots of entities into one group is bad since Google seems to do some sort of locking on the entity group – the docs say some updates might fail.

No join support. Like SimpleDb, they suggest de-normalization.

The datastore tools are sparse at the moment. I had to write code to delete stale data from my datastore since the website would only show me 20 items at a time.

All the APIs (the datastore, user auth, mail) are offered through Google’s internal RPC mechanism. Google calls the individual RPC messages protocol buffers and all the AppEngine APIs are implemented using the afore-mentioned stub generators (this is what you get with the local SDK as well).

Benchmarks

This section is woefully short – it is very hard to run benchmarks since Google will keep killing apps with high activity. Here’s what I got

Gets/puts/deletes are all really fast. I benchmarked a tight loop running a fixed number of iterations, each query operating on a single object or retrieving a single object (which I kept tuning to avoid hitting the Google limits). Each averaged 0.001 s(next to nothing – almost noise).

Turning up the number of results to retrieve meant a linear increase in numbers. I inserted multiple entities with just a single byte in each to have the least possible serialization/de-serialization overhead. For 50 results, the query execution time was around 0.15s, for 100, around 0.30s and so on. I saw a linear increase all the way until I hit Google’s limits on CPU usage.

I can’t measure this correctly but a ballpark guesstimate is that Google nukes your app if you use up close to 100% CPU (by running in a tight loop like I did) for over 2 seconds for any given request. For every app, they tell you the number of CPU cycles used (a typical benchmark app cost me around 50 megacycles) and I think they do some quota calculations based on megacycles used per second.

Overall, perf seems excellent but I would worry about hitting quota limits due to a Digg/Slashdot effect. I plan on trying out some more complex queries and I’ll let you know if I see something weird.

The Tools

The dashboard is excellent. Gives you nice views on error logs, what’s in the datastore, usage patterns for all your important counters (requests, CPU, bandwidth, etc)

Good end-to-end flow for the common tasks – registering a domain and assigning it to your application, managing multiple versions of your app, looking at logs,etc.

James Hamilton, Windows Live Platform Services
Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.