James Hamilton's Blog RSS 2.0
 Saturday, November 03, 2007

Last week Hillary Clinton presented at Microsoft to a sold out crowd of roughly 2,000 people.  Jennifer Hamilton attended and sent her notes my way.

 

                                                                --jrh

 

o    About 2000 people

o    Speech similar to one given on Monday night with a bit more a technology focus

·         US has always been the "Innovation Nation"--a hallmark of how country was founded and has grown

·         Can't assume it will stay that way--have to ask the hard questions and build

·         Don't think we're doing a good job--want to seize the mantle of innovation

·         Important not just for our industry but for the country

·         Innovation has fueled the opportunities of those born here and those who came here

o    4 big goals:

1.       Restore American leadership in the world

2.       Rebuild a strong and prosperous middle class

3.       Reform government to competence and more results-oriented

4.       Reclaim the future for our children and our dreams

o    For each of the four goals, she has set specific goals for what she would do as president

o    Spoke of Sputnik being a defining moment in her childhood

·         At that time America was the leader in everything

·         Then Sputnik and called into question

·         Had a republican pres that didn't blame the dems but went after the problem

·         Wants to do that same sort of thing

1.       Restore American leadership in the world

·         Partly its Iraq but this not the only international problem the next president will inherit

·         Our strategic/economic/innovation position eroding -- Clinton will restore the bi-partisan balance on end an era of "cowboy diplomacy"

·         Can't be a leader if no-one is following

o   All the problems we have, global-warming, g-terrorism, g-economics, we can't solve on our own

2.       Rebuild a strong and prosperous middle class

·         Economy has worked well for some of us, but hasn’t for many.

·         People struggling to maintain middle-class lifestyle.

·         Feel invisible to their government.

·         Feel their standing on trap-door--one misstep from disaster

·         Environmental a big part: we import more foreign oil post-9/11 than before

·         Take away tax-subsidy from oil companies to put towards alternative energies

·         Health-care (joked it’s an issue she has a "little experience in")--need a system of shared responsibility and choices

o    Insurance companies will have to change--she's offering them a new business model--they've made a lot of money not insuring people

·         50B spent in underwriting to avoid coverage plus more unproductive costs arguing on coverage

·         Big push towards electronic records for medical records

·         One of big problems in Katrina is how many records were lost

·         Wants to create a framework to give us private, confidential, secure electronic records

o    Also need to pay for prevention--insurance companies won’t

o    And manage chronic conditions

o    All added up will reduce costs and cover everyone

·         Improve education--it hasn't advanced either

o    Need to make college affordable and offer cheaper loans

o    Harder to go to university than 30 yrs ago

o    75% of students are from top 25% of income

o    Only 3% from bottom 25%

3.       Reform government to competence and to be results-oriented

·         We have been building a two-tier system

·         Tax system tilted towards top income

·         US was #1 for internet access 6 yrs ago--now 14th-25th depending on survey

·         Got to end Bush's muzzling of science

o    As president first thing will do is issue executive order to not interfere with science and lift the ban on ethical stem-cell research

·         End cronyism and appoint qualified people--re Katrina

4.       Reclaim the future for our children and our dreams

·         Don't want to be part of 1st gen of Americans who leave their country worse than when they found it.

·         Thrilled at idea of being first women president, but not running because is female. She is running because she feels she is the best-qualified

·         Not interested in all the personal attacks--am an expert on it  --have been recipient for over 15 yrs--that won't educate a child

·         Wants people to think that our best years are still ahead of us

 

James Hamilton, Windows Live Platform Services
Bldg RedW-C/1279, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | Msft internal blog: msblogs/JamesRH

 

Saturday, November 03, 2007 12:37:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Ramblings
 Thursday, November 01, 2007

Shankar Pal of SQL Server went to VLDB this year and passed his notes my way.  Find them here: http://www.mvdirona.com/jrh/perspectives/content/binary/ShakarPal_VLDB2007.docx.

 

Key points from my perspective:

·         Werner Vogels

o   Amazon able to lose a data center without missing SLA (note that this would also allow them to bring a down data center for service and implies they don’t need backup power or other datacenter-level redundancy – this can potentially save 20% of the total cost of a data center. I don’t know if they are exploiting this capability)

o   SLAs are two-way: a commitment to deliver a certain quality of service one way and a commitment the other way to deliver no more than a specified load

o   Amazon has implemented their services as a cluster of services. Services can scale up a single node at a time (elastic computing).  All data access is through the services. 

o   Repeats Stonebraker’s “One size doesn’t fit all” in databases.

·         Eric Brewer:

o   Founder of Inktolmi and Berkeley DB researcher

o   Discussed work he is doing in the third world.

·         Surajit Chaudhuri and Vivek Narasayya presented a retrospective on self-tuning database management systems

·         Michael Stonebraker

o   Presented “the end of an era: It’s time for a rewrite” and essentially argued that the current set of “elephants” in DB2, SQL Server, and Oracle are optimized for OLTP in a small memory world.  Outside of OLTP, these products are a poor fit and, even in OLTP, large memory systems make their disk I/O optimizations much less relevant.

o   He argues to get rid of redo log by keeping many different copies (I’m not ready to get rid of the redo log but I totally agree on the base point)

o   Mike still doesn’t buy that eventual consistency is the right model for high scale distributed systems

 

The conference proceedings is at:

http://sqlserver/projects/clouddb/Conferences/Forms/AllItems.aspx?RootFolder=%2fprojects%2fclouddb%2fConferences%2fVLDB%20%28Int%27l%20Conf%20on%20Very%20Large%20Data%20Bases%29%202007%2fVLDB%202007%20Proceedings%2fVLDB%202007%20Proceedings&FolderCTID=&View=%7b3CB64B8A%2d1B85%2d45AE%2d91B6%2d4063E886D023%7d

 

                                                --jrh

 

James Hamilton, Windows Live Platform Services
Bldg RedW-C/1279, One Microsoft Way, Redmond, Washington, 98052
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
JamesRH@microsoft.com

H:mvdirona.com | W:research.microsoft.com/~jamesrh  | Msft internal blog: msblogs/JamesRH

 

ShakarPal_VLDB2007.docx (29.92 KB)
Thursday, November 01, 2007 4:52:39 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0] - Trackback
Software
 Tuesday, October 30, 2007

Jacek Becla of the Stanford Linear Accelerator (SLAC http://www.slac.stanford.edu/) team held a 1 day workshop on October 25th focused on Extremely Large Databases (http://www-conf.slac.stanford.edu/xldb07/). The goal was to look at “practical issues related to extremely large databases that push beyond the current commercial state of the art”.  SLAC has built some enormous DBs in the past including Babar (currently over 2 Petabytes), and they are working on a Large Synoptic Survey Telescope (http://www.lsst.org/lsst_home.shtml) that is expected to produce an O(100) petabyte DB.

 

The 57 attendees were from industry (Google, Yahoo!, Microsoft, IBM, MySQL, Oracle, Teradata, Vetica, Objectivity, Greenplum) and academia (Stonebraker and Dewitt), National Labs (PNNL, LLNL, and Oak Ridge), and also included many astronomers and high-energy physicist. The attendee list is here: http://www-conf.slac.stanford.edu/xldb07/listAllParticipants.asp.

 

What I found was most notable is, 5 years ago, high-energy physics and astronomers had the largest databases in the world by several order of magnitude whereas today, industry appears to be catching up. For example, the AT&T call detail and email IP storage is over 1.2 petabytes.

 

The second thing I found interesting was the Google presentation where they talked about the MySQL cluster behind the advertising system.  I found this interesting for two reasons: 1) they are using MySQL rather than Big Table, and 2) they have two engineers full time on Innodb maintenance.  The Google speaker didn’t give firm numbers but said there were “100s to 1000s of servers” in the MySQL cluster and the cluster is both geo-distributed and geo-redundant.  5 full-time DBAs manage the cluster.

 

My rough notes from the workshop follow.

 

Examples of Future of Large Scale Scientific Databases:

·         Large Hadron Collider (LHC) -- Dirk Duellmann, CERN IT

·         15 PB each year (after discarding through filtering)

o   Metadata is roughly 1/3 of a petabyte

·         100s thousands of CPU for analysis

·         200 computer centers with 12 large centers

·         Most of the data filesystem based

o   1995 Object DBs

o   2001 forward: OODBs dying so move to RDBMS + files

o   Using Oracle and MySQL for metadata with all data in filesystem

o   <5% metadata

o   Note that the mixed model is much more administratively expensive.

o   Using compression

·         Focus today is on data management rather than analysis but the collider is not yet live so this may change.

·         Dirk really wants more data under database management

·         Analysis phase is never overwrite (read-only and produce new data)

·         OS RHEL 5

·         CERN and Tier 1 storage in Oracle RAC 10g (4-way), Tier 2 is MySQL & SQLight

o   100 MB/S IOs per cluster at present (expect to be able to grow 5x as it goes live)

o   0.5 TB RAM

o   Moving to quad core and 64bit

o   5 DBA for all the storage

o   3 9s achieved

·         Deployment issues

o   Power and UPS

o   Increasing CPU power per box and more disk per server

o   JBOD and Oracle ASM today

o   Many hardware problems in commodity systems

o   Oracle patching issues (some security patches don’t support rolling upgrade – getting better)

o   Global system monitoring is difficult

o   Software licensing and not all sites upgrading at the same time.

o   Note: DB is exposed directly to internet

·         During analysis:

o   DBs get in the way and B-trees don’t help much

o   Typical queries “select … where v1>4 and v2>5 and …. And V99>3”

§  Bit map indexes would help but very space intensive

o   Large data sets >(10^9) input data sets

 

Extremely Large Database in Astronomy – LSST Kian-Tat Lim

·         Large Synoptic Survey Telescope (will be placed in Chile)

o   Will cover the entire night sky twice a week

·         Assets:

o   8.4m mirror

o   3.2 gigapixel camera (wow!)

·         Looking for dark matter and energy

·         Store images in FS and metadata in DBs

·         Most of the data is append only

·         How big when completed:

o   49B objects

o   2.8 trillion source

o   Expected to hit 14PB by 2024 (5.5PT data/rest indices)

o   2669 columns/object (growing) [object is astronomical object]

o   56 columns/source  [source is a particular observation of an astronomical object]

o   Believe that the system is comparable to commercial systems in complexity and size (assuming commercial systems continue to grow)

·         Never modifies raw data (databases are updated)

·         RAW data is never modified and constantly reprocessed with detailed provenance tracking

·         Plan to release a new data version (raw data and current processing) once a year.

·         Note ½ the science will come out from real time alerts of changes (10 to 60 second latency)

·         Expected to upgrade systems constantly so want portable code and preferably open source

·         Three replicated data access centers and one archive center (geo-distributed copies)

·         Lots of select * access so not clearly a win to go column oriented

·         Execution plan:

o   Map/reduce over DBS and FSs

·         Want:

o   procedural primitives (stored procs) in the DB

o   Relax consistency requirements

o   Wants fault tolerant software rather than expensive big iron

 

Academic Panel Notes:

·         55PB image data in FS & 20 PB metadata in DB

·         Computation does not fit into DB support today

o   Model: select data, do pixel-based calcs (as much as 10^11 cals per data point)

·         Most data is write-once, read-money (metadata like averages does get updated)

·         Need support for:

o   Spatial types (native rather than extender support)

o   Vector types

o   Array types

o   Often approx queries would be helpful to test hypothesis

·         Data access distribution:

o   Statistical astronomy: want to dig into large portion of all data

o   10^10 objects scanned to find data region of interest

 

Industry panel notes:

Google

·         MySQL DB used in advertising (traditional database application)

·         Shard and replicate

·         100s of 1000s of systems in clusters

·         QPS is incredibly large

·         Commodity hardware

·         Constrain the query model to allow scale-out

·         They have a couple of engineers on innodb engineering at Google

·         95% of load from querying (not transactional load) – very replicatable load

·         Geo-replicated and within the data center replicated for scaling

·         5 DBAs on this project (RDBMSs are not loved at Google)

·         Said that need SQL DB for OLTP apps … big table not appropriate for this. Need real time replication into Big Table

·         BigTable is used for Analysis.

 

AOL:

·         DB behind message board

·         TB scale

·         Need to be always up

·         Using Oracle, Sybase, PostgreSQL (200TB project), and MySQL (small install)

·         Geo-replicated and geo-hosted close to user

 

AT&T Research:

·         Call detail and email IP storage

·         Data stored raw and in DB form

·         About 1.2 PB data

·         Used for billing support, law enforcement, marketing, analysis

·         Used a proprietary DB called Daytona

·         Need to load and query simultaneously

·         Write once, read-many DB (never delete)

 

Ebay

·         2 large scale instances of over a PB in analytical DBs

·         24 hour a day query workload with concurrent load

·         8M queries/day

·         Index ratio: only 2% overhead. Mostly full scans

·         Expect system data storage size to triple over next 6 months

·         Storage is not the problem – it’s IO throughput

·         20TB between two instances 1000s of kilometers apart

·         6 to 8b records/day loaded (100B/day available to load but can’t to it)

·         Mixed workloads: loads, transforms, and queries in parallel

·         Weekly full system image backups with concurrent updates and queries

·         9m SQL requests to day (Teradata)

 

Yahoo

·         Operational data stores:

o   No adhoc query

o   Partitioned

o   Very low latency

·         Warehouse:

o   Production load

o   Multi-year analysis

o   Proprietary, custom, column-based data store

·         Business unit data

o   Often Oracle hosted

·         Map/reduce workload

o   Hadoop based with 2,000 nodes

o   Fairly new system

o   They are collaborating on HBASE (DB over Hadoop)

·         Stream Processing:

o   “Don’t let the data touch the disk”

o   25B events per day / 25TB per day

·         Note: commodity systems but using NetApp file stores

 

General Notes from Industry panal:

·         Ebay and AOL using SANs, Yahoo using NetApp, all others using DAS

·         Yahoo planning to move off NetAPP

·         Ebay using Teradata with dirty read as most commonly used consistency model

o   Lots of table scans and very few indexs

o   Piggy backed scans are very important to them

·         All speakers went real time access to new data coming in. Batch load warehouses don’t work in general

o   To achieve this, few indexes can be used

·         Everyone on commodity hardware, everyone on commodity disk or trying to get there (ebay on enterprise disk)

·         All using compression and most I/O bound

·         Can’t use sampling when looking for low probability events – need full scans for needle in haystack

o   Note: Some Bigtable are based upon samples

·         “Designing for the unknown query”

o   Fast scans and low effectiveness of indexes driven by unpredictable query load and the need for real time loading

·         Monitor database usage and optimize for evolving usage patterns

·         Google comment: “We want parallel data management but not parallel SQL (too restrictive). Big vendors need to adapt or Hadoop will”

·         Stonebraker observation: Industry is spending MUCH more on data management than Academic research

·         Industry push: free software license (built on own or open source but licensed software is not practical)