The SmugMug Tale

Don MacAskill did one of his usual excellent talks at MySQL Conf 09 this. My rough notes follow.

Speaker: Don MacAskill

Video at:

· SmugMug:

o Bootstrapped in ’02 and still operating without external funding

o Profitable and without debt

o Top 400 website

o Doubling yearly

· SmugMug Challenge:

o Users get unlimited storage & bandwidth

o Photos up to 48Mpix (more than 500m)

o Video up to 1920x180p

· 300+ four core hosts (mostly diskless)

o Mostly AMD but really excited by Intel Nehalem [JRH: so am I]

· 5 datacenters (3 in Silicon Valley, 1 in Seattle, and 1 in Virginia) [JRH: corrected from 4 to 5 — thanks Modesto Alexandre]

· Only 2 ops guys

· Lots of AWS use (Simple Storage Service, Elastic Compute Cloud, etc.)

· Service deployment model: servers automatically load their config from a central role database. On reboot, the configured role is loaded. Role change is a DB update followed by a reboot. [JRH: very nice]

· Binary data all stored in Amazon S3 (PB of data at this point)

· Akamai for content distribution network

· Structured data

o MySQL (InnoDB mostly)

o Scaled up and out using cheap multi-core CPUs with lots of memory

o 4+ cores, 64GB memory, >2TB storage

· Heavy use of MemcacheD (over 1TB of memory)

o Over 96% hit rate and fall back to MySQL for cold data access

o Been using it since first released 4 to 5 years back

· Compute:

o Amazon EC2 for photo and video processing and encoding

o Depend upon EC2 for scaling up to high traffic times and, more importantly, being able to scale down to low traffic times such as the middle of the night (SmugMug is predominantly a North American service at this point). During scale down periods 10’s of cores and during scale up periods 100s if not 1000s of cores)

§ Totally autonomous scaling up and down using SkyNet (written by SmugMug)

· Web Servers:

o Diskless with PXE boot

· MySQL:

o Most important technology in use at SmugMug

o Super dependent on replication for performance, reliability, and high availability

o No data loss in over 7 years

o No joins or other 4.x+ features

§ Like the Drizzle project ( since its re-focuses MySQL on the core they actually use – lean and mean.

o Vertically partitioned. They have looked at sharding several times but have always managed to find a way to avoid it so far

· InnoDB

o Running 1.0.3+ patches (Percona XtraDB) in production (great for concurrency bound issues)

§ Great relationship with Percona (“Crazy concentration of talent under 1 roof”) who does MySQL support

· MySQL Details:

o Data integrity is number 1 issue

o Next most important is write latency since scaling reads is relatively easy.

o Replication kept at less than 1sec behind

o Big RAM (64GB+) to keep indexes in memory

o Previously had many concurrency issues (better now).

· MySQL Usage:

o Not very relational. Mostly a key-value store

o Very denormalized

o No joins or complex selects

o 96% MemcacheD hit rate to cool MySQL

· MySQL Issues:

o Need a better filesystem:

§ They use the CentOS linux distro

§ MySQL is storage intensive (IOPS & capacity)

§ Ext3 is broken and sucks. Fsck sucks as well

§ Ext4 is also old and busted

§ Want good volume management

§ Ext3 serialized writes to a given file

§ Love ZFS

· Transactional, copy-on-write, end-to-end data integrity, on the fly corruption detection and repair, integrated volume management, snapshots and clones supported, and open source software

· Unfortunately ZFS doesn’t run on Linux and SmugMug is a Linux shop

o Replication:

§ Unknown state on crash

§ Did *.info get written at commit or 2 months out of date (in one instance)?

· Transactional replication to the rescue

§ Bringing up TB+ slaves is slow

§ Backups using LVM/ZFS a pain

§ Single thread for replication can fall behind

§ Transactional replication patches from Google are GREAT and solves these issues

· InnoDB only

· Taking these patches to production next week.

· Sun Sushi Toro aka S7410

o NAS box with a few twists:

§ 2x quad-core Opterond with 64GB RAM

§ 100GB Readzilla SSD

§ 2x 18GB Writezilla SSd (20k write IOPS)

§ 22x 1TB 7200 RPM HDD

§ Clustered for HA

§ SSD performance with HDD economy

§ Toro supports ZFS on Linux

§ Can access using : NFS, iSCSI, CIFS, HTTP, FTP, etc.

§ Supports compression (1.5 compression ratio on their workload)

§ Cost: $80k ($142k clustered) – nobody pays list price though

§ SmugMug has 5 of these devices

§ 5 different MySQL workloads hosted on a single shared cluster

§ Backups are a breeze (great snapshot support with roll back)

· Rollback can selectively skip operations

· Investigating 10GigE and actively testing

o Intel NICS with Arista switches at less than $500/port

o Using copper twinax SFP+

· Expect 100% SSD in the future (not for bulk data)

· Excited about Drizzle (scaled down MySQL)

· Request from Oracle:

o MySQL is a crown jewel – take care of it

o GPL ZFS (lots of applause)

James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | | | blog:

2 comments on “The SmugMug Tale
  1. Have you guys seen the photos of the S7410 system in use at SmugMug?
    And here’s a David Lutz editorial that should interest everyone:

    MySQL Performance on Sun Storage 7000

    If you saw Don MacAskill’s keynote (The Smugmug Tale) at the recent MySQL Conference and Expo, you know that he had lots of positive things to say about his experience running MySQL on the Sun Storage 7410 at Smugmug. The 7410 marks the current high end of the Sun Storage 7000 Unified Storage line of network attached storage devices. For the past few months, I have been investigating MySQL database performance on the entry level and mid-range models, to see whether they might provide similar benefits to smaller MySQL sites. I have to admit that I was more than a little surprised at just how well things performed, even on the entry level 7110. For the whole story, read on… (Full article here: )

  2. Greg Linden says:

    That was a great talk by Don. Very impressive how he has scaled SmugMug using a variety of technologies, including heavy use of Amazon S3 and EC2.

    By the way, it turns out the slides from the talk are also available here

    but the talk really is much better with the additional context Don gives in the video.

    James, have you heard much about the Sun Toro device Don is raving about? A blend of SSD and terabyte disk? Don seems to love it and seemed to claim SSD performance with hard disk capacities using it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.