Bio-IT World Keynote

Chris Dagdigian of BioTeam presented the keynote at this year’s Bio-IT World Conference. I found this presentation interesting for at least two reasons: 1) it’s a very broad and well reasoned look at many of the issues in computational science and, 2) an innovative example of cloud computing is presented where BioTeam and Pfizer implement protein docking using Amazon AWS.

The presentation is posted at: http://blog.bioteam.net/wp-content/uploads/2009/04/bioitworld-2009-keynote-cdagdigian.pdf and I summarize some of what caught my interest below:

· Argues that virtualization is “still the lowest hanging fruit in most shops” yielding big gains for operators, users, the environment, and budgets

· Storage:

o Storage still cheap and getting cheaper but operational costs largely unchanged

o Data Triage needed: volume of data production is outpacing declining fully burdened cost of storage (including operational costs)

o Lessons learned from a data loss event (10+TB lost)

§ Double disk failure on RAID5 volume holding SAN FS metadata with significant operational errors

§ Need more redundancy than RAID5

§ Need SNMP and email error reporting

§ Need storage subsystems to actively scrub, verify, and correct errors

o Concludes the storage discussion by pointing out that cloud services offer excellent fully burdened storage costs

· Utility Computing

o It is expensive to design for peak demand in-house

o Pay-as-you-go can be compelling for some workloads

o Explained why he “drank the Amazon EC2 Kool-Aid: saw it, used it, solved actual customer problems with it. As an example, Chris looked at a protein docking project done by Pfizer & BioTeam.

· Protein Docking project architecture:

o Borrows heavily from Rightscale Grid Edition

o Inbound and outbound in Amazon SQS

o Job specification in JSON

o Data stored in Amazon S3

o Job provenance and metadata stored in SimpleDB

o Worker instances dynamic spawned in EC2 where structures are scored

o All results stored in S3 (EC2 <-> S3 bandwidth free)

o Download the top ranked docked complexes

o Launch post-processing EC2 instances to score, rank, filter, and cluster results into S3 (bring the computation do the data)

· Don’t want to belittle the security concerns but whiff hypocrisy in the air

o Is your staff really concerned or just protecting their turf

o It is funny to see people demanding security measures they don’t practice internally across their own infrastructure

· Next-Gen & utility storage

o Primary analysis onsite; data moved to remote utility storage service after passing QC tests

o Data would rarely (if ever) move back

o Need to reprocess or rerun?

§ Spin up cloud servers to re-analyze in situ

§ Terabyte data transit not required

James Hamilton, Amazon Web Services

1200, 12th Ave. S., Seattle, WA, 98144
W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 |
james@amazon.com

H:mvdirona.com | W:mvdirona.com/jrh/work | blog:http://perspectives.mvdirona.com

One comment on “Bio-IT World Keynote
  1. web designer says:

    Thanks u r information

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.