A few months back I was in a debate about the value of shared code segments between virtual machines. In my view there is no question that shared code across VMs has some value but code is small compared to data so the impact will be visible but not fundamental. What follows is an inventory of a typical client-side systems.
This experiment was done on an IBM T43 laptop with 1GB of memory running Vista RTM, desktop search, Foldershare (it rocks), and Outlook. Outlook was in use prior to and during the measurement. The system has been running for three days since the last boot. The summary stats are:
|
Classification |
pages |
Meg |
% |
|
|
|
|
|
Kernel: |
65824 |
257.125 |
25% |
|
User: |
195913 |
765.2852 |
75% |
|
Total: |
261737 |
1022.41 |
|
|
|
|
|
|
Kernel Pages |
|
|
|
|
Kernel Image: |
7395 |
28.88672 |
11% |
|
Kernel Pure Data: |
58429 |
228.2383 |
89% |
|
Kernel Total: |
65824 |
257.125 |
|
|
|
|
|
|
User Pages |
|
|
|
|
User Code: |
32348 |
126.3594 |
17% |
|
User Data: |
163565 |
638.9258 |
83% |
|
User Total: |
195913 |
765.2852 |
|
Immediately after boot, 22% of the memory was code which makes sense. As the O/S and apps come up, all constructors and initializers run. After being memory resident for a few days, only those pages currently in use stay loaded and the user code percentage fell to 17%. Ironically, code load time is an issue at start-up time but the actually percentage of code resident in memory over longer runs is fairly small. Vista Superfetch helps with the code load times but, from looking at this data It’s clear that flash memory could make a huge difference to O/S boot and application load times.
The percentage of memory holding code pages is not that high so when going after memory bloat, look first to the data.
--jrh
James Hamilton, Windows Live Platform Services Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh
Yesterday, Intel and Micron announced a generational step forward in NAND Flash Write I/O performance. From the Intel Press release:
The new high speed NAND can reach speeds up to 200 megabytes per second (MB/s) for reading data and 100 MB/s for writing data, achieved by leveraging the new ONFI 2.0 specification and a four-plane architecture with higher clock speeds. In comparison, conventional single level cell NAND is limited to 40 MB/s for reading data and less than 20 MB/s for writing data.
They don’t actually say it’s an SLC device but they compare it to SLC and it has the typical wear characteristics of SLC (100,000 cycles). More data from the Micron web site:
|
Features |
Benefits |
|
Density |
8Gb–16Gb |
Industry-standard densities |
|
Performance |
200 MB/s Sustained READ 100 MB/s Sustained WRITE 1.5ms (TYP) Erase Performance |
Delivers the fastest read and write throughputs ever for a NAND Flash device |
|
Endurance (cycles) |
100,000 |
High-endurance enables applications that require intensive program and erase operation while prolonging memory life |
|
Interface |
Async/Sync ONFI 1.0/2.0 |
Standard interface enables a high degree of interoperability |
|
Temperature Range |
−25˚C to +85˚C |
Wide temperature range is ideal for rugged environments |
|
Configuration |
1.8V, x8 |
Industry-standard configuration enables easy system design |
|
Package |
100-ball BGA |
Industry-standard packaging enables easier density migration |
Expect shipments in the latter half of 2008. We should start seeing interesting applications of this technology in SSDs and other devices this year.
Intel Press Release: http://www.intel.com/pressroom/archive/releases/20080201corp.htm
More data from Micron: http://www.micron.com/products/nand/high_speed/index
--jrh
James Hamilton, Windows Live Platform Services Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh
I saw a video earlier today titled “Great Ideas are a Dime a Dozen” and I just loved it. Unfortunately it’s a Microsoft internal-only video so I can’t post it here but I can point to some related talks and videos. The speaker was Bill Buxton of Microsoft research.
I fell in love with this talk for a variety of reasons: 1) I love and agree with the principle that ideas are cheap but it’s the communicating of the ideas and making them real that is truly hard and where the greatest talent is required. 2) He argues that you need to get a user experienced running quickly and you need to keep it evolving quickly. You need a lightweight experimentation platform to take the user experience from good to great. I’ve long believed that the difference between the iPhone and some other designs is not being satisfied when it’s “done” and, rather than triaging to ship, just keep polishing. Get it running, then get it better. Then throw it out and try again. Change it some more. Get it 100% functionally correct and as good as you can possibly get it. Then keep polishing. Polish and refine further, and 3) he points out that we never have time to properly invest in design at the beginning when the team is small. Yet, we DO have time to be months or even years late partly as a consequence of not doing the design up front. Late projects are when the team is fully staffed and at its biggest and most expensive. Neither he nor I are arguing for waterfall design. What’s Bill is arguing for is human centric design up front. Ray Ozzie calls this experience-first design. Invest in really getting the experience fully understood with super lightweight development methods. If you REALLY understand the user experience and it’s really right, developing the product may be the easiest and perhaps most predictable part of the process. I’ve seen large software teams working on an ill-defined and only barely designed products more than once. As an industry, we need to take some of Bill’s advice.
Bill’s talks and videos are posted at: http://www.billbuxton.com. The closest external example of the video I’m describing above is perhaps: What if Leopold Didn't Have a Piano. Recommended whether you are a designer or a developer.
--jrh
James Hamilton, Windows Live Platform Services Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh
Earlier today Microsoft, held an internal tribute to Jim Gray, celebrating his contributions to the industry and to each of us personally. It’s been just over a year since Jim last sailed out of San Francisco Harbor in Tenacious. He’s been missing since.
Speakers at the tribute to Jim included Rich Rashid, Butler Lampson, Peter Spiro, Tony Hey, David Vaskevitch, Bill Gates, Gordon Bell and myself. Some touched upon Jim’s broad technical contributions across many sub-fields, while others recounted how Jim has influenced or guided them personally through the years. I focused on the latter, and described Jim’s contribution as a mentor, a connector of people, and a bridger of fields (included below).
--jrh
What Would Jim Do?
Jim has many skills, but where he has most influenced and impressed me is as a mentor, a connector of people, and a bridger of fields. He became a mentor to me, whether he knew it or not, more than 13 years ago, before either of us had joined Microsoft. Jim is a phenomenal mentor. He invests deeply in understanding the problem you are trying to solve and always has time for deep discussion and debate. Later, I discovered that Jim was an uncanny connector. He knows everyone, and they all want to show him the details of what they are doing. He sees a vast amount of work and forwards the best broadly. He is a nexus for interesting papers, for valuable results, and for new discoveries across many fields. Over time I learned that one of his unique abilities is a bridger of fields. He can take great work in one field and show how it can be applied in others. He knows that many of the world’s most useful discoveries have been made in the gap between fields, and that some of the most important work has been the application of the technology from one field to the problems of another.
Back in 1994, Pat Selinger decided that I needed to meet Jim Gray, and we went to visit him in San Francisco. Pat and I spent an afternoon chatting with Jim about database performance benchmarks, what we were doing with DB2, compiler design, RISC System 6000 and hardware architecture in general. The discussion was typical for Jim. He’s deeply interested in every technical field from aircraft engine design through heart transplants. His breadth is amazing, and the conversation ranged far and wide. It seemed he just about always knew someone working in any field that came up.
A few months later, Bruce Lindsay and I went to visit Jim while he was teaching at Berkeley. Jim and I didn’t get much of a chance to chat during the course of the day—things were pretty hectic around his office at Berkeley—but he and I drove back into San Francisco together. As we drove into the sunset over the city, approaching the Bay Bridge, Jim talked about his experience at Digital Equipment Corporation. He believed a hardware company could sell software, but would never be able to really make software the complete focus it needed to be. He talked of DEC’s demise and said, “They were bound and determined to fail as a hardware company rather than excel as a software company.”
The sunset, the city, and the Bay Bridge were stretched across the windscreen. It was startlingly beautiful. Instead of making conversation with Jim, I was mostly just listening, reflecting and contemplating. At the time, I was the lead architect on IBM DB2. And yes, I too worked for a hardware company. Everything Jim was relating of his DEC experience sounded eerily familiar to me. It was as though Jim was summarizing my own experiences rather than his. I hadn’t really thought this deeply about it before, but the more I did, the more I knew he was right. This was the beginnings of me thinking that probably I should be working at a software company.
He didn’t say it at the time, and, knowing Jim much better now, I’m not sure he would have even thought it, but the discussion left me thinking that I needed to aim higher. I needed to know more about all aspects of the database world, more about technology in general, and to think more about how it all fit together. Having some time to chat deeply with Jim changed how I looked at my job and where my priorities were. I left the conversation pondering responsibility and the industry, and believing I needed to do more, or at least to broaden the scope of my thinking.
I met Jim again later that year at the High Performance Transaction Systems workshop. During the conference, Jim came over, sat down beside me, and said “How are you doing James Hamilton?” This is signature Jim. I’ll bet nearly everyone he knows has had one of those visits during the course of a conference. He drops by, sits down, matches eyes, and you have 110% of his attention for the next 15 to 20 minutes. Jim’s style is not to correct or redirect. Yet, after each conversation, I’ve typically decided to do something differently. It just somehow becomes clear and obviously the right thing to do by the end of the discussion.
In 2006 I got a note from Jim with the subject “Mentor—I need to say I’m helping someone so…” While it was an honor to officially be Jim’s mentee, I didn’t really expect this to change our relationship much. And, of course, I was wrong. Jim approaches formal mentorship with his typically thoroughness and, in this role, he believes he has signed up to review and assist with absolutely everything you are involved with, even if not work-related. For example, last year I had two articles published in boating magazines and Jim insisted on reviewing them both. His comments included the usual detailed insights we are all used to getting from him, and the articles were much better for it. How does he find the time?
For years, I’ve read every paper Jim sent my way. Jim has become my quality filter in that, as much as I try, I can’t cast my net nearly as wide nor get through close to as much as I should. Like him, I’m interested in just about all aspects of technology but, unlike him, I actually do need to sleep. I can’t possibly keep up. There are hundreds of engineers who receive papers and results from him on a regular basis. Great research is more broadly seen as a result of his culling and forwarding. Many of us read more than we would have otherwise, and are exposed to ideas we wouldn’t normally have seen so early or, in some cases, wouldn’t have seen at all.
Jim’s magic as a mentor, connector and bridger is his scaling. The stories above can be repeated by hundreds of people, each of whom feels as though they had Jim’s complete and undivided attention. To contribute deeply to others at this level is time-consuming, to do it while still getting work done personally is even harder, and to do it for all callers is simply unexplainable. Anyone can talk to Jim, and an astonishing number frequently do. And because his review comments are so good, and he’s so widely respected, a mammoth amount is sent his way. He receives early papers and important new results across a breadth of fields from computer architecture, operating system design, networking, database, transaction processing, astronomy, and particle physics. The most interesting work he comes across is forwarded widely. He ignores company bounds, international bounds, bounds of seniority, and simply routes people and useful data together. Jim effectively is a routing nexus where new ideas and really interesting results are distributed more broadly.
Over the past year I’ve received no papers from Jim. There has been no advice. I’ve not had anything reviewed by him. And, I’ve not been able to talk to him about the projects I’m working on. When I attend conferences, there have been no surprise visits from Jim. Instead, I’ve been operating personally on the basis of “What would Jim do?”
Each of us has opportunities to be mentors, connectors and bridgers. These are our chances to help Jim scale even further. Each of these opportunities is a chance to pass on some of the gift Jim has given us over the years. When you are asked for a review or to help with a project, just answer “GREAT!!!” as he has so many times, and the magic will keep spreading.
This year, when I face tough questions, interesting issues, or great opportunities, I just ask, “What would Jim do?” And then I dive in with gusto.
James Hamilton, 2006-01-16.
James Hamilton, Windows Live Platform Services Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh
Founders at work (http://www.amazon.com/Founders-Work-Stories-Startups-Early/dp/1590597141) is a series of 32 interview with founders of well-known startups. Some have become very successful as independent companies such as Apple where Steve Wozniak was interviewed, Adobe Systems where Charles Geschke was interviewed, and Research in Motion where Mike Lazaridis was interviewed. Others were major successes through acquisition, including Mitch Kapor (Lotus Development), Max Levchin (PayPal), Steve Perlman (WebTv), and Ray Ozzie (Iris Associates & Groove Networks). Some are still startups, and some failed long ago. The book itself is not amazingly well-written, but I found the interviewees captivating and the book was great by that measure.
The book gives a detailed window into how startups are made, how some have succeeded, and how some have failed. In portions of the book small windows into the VC community are opened. The story of how Draper Fisher Jurvetson (DFJ) worked with Sabeer Bhatia (Hotmail) was revealing.
Some common themes emerged for me as I read through the book. One theme was that success often came from great people coming together without much funding but considerable motivation and they just kept trying things and evolving and failing and trying again, and trying some more and then changing again. Often success comes not from a brilliant, well-funded ideas but from intense drive and trying things quickly and failing fast. Often the VC funded idea X and the money was used to develop a completely unrelated idea. Often success was found as the last dollar was spent. I’m quite certain that the ones we didn’t read about were the ones where the last dollar was spent just before success was found. The lesson for us is to spend small when investigating a new idea. Move fast, spend little, keep the team small and keep evolving. Admit when it’s not working and keep trying related ideas. It was an enjoyable read.
--jrh
James Hamilton, Windows Live Platform Services Bldg RedW-C/1279, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh
Exactly one year ago, Jim Gray guided his sailboat Tenacious out of San Francisco’s Gashouse Cove Marina into the Bay. He sailed under the Golden Gate Bridge and continued towards the Faralon Islands, some 27 miles off the California coast line. Until that morning, I chatted with Jim via email or phone several days a week. He has reviewed everything I’ve written of substance for many years. When I consider job changes, I’ll always bounce them first off him first. When I come across something particularly interesting, I’ll always send it Jim’s way. Every month or so, he’ll send me an interesting pre-published paper. If a conference deadline like CIDR or HPTS is approaching, he’ll start pushing me to write something up and keep pushing until it happens. Every four to six months, he’ll decide “I just have to meet” someone with overlapping interests, someone who’s work is particularly interesting, or perhaps they are just super-clear thinkers and worth getting to know.
What’s truly remarkable is that tens and perhaps hundreds of people can say exactly the same thing. He has time for everyone and everyone has similar stories of mentorship, advice, detailed explanation, patience, and insightful reviews. Jim’s magic is that he does this for a huge cross-section of our industry. He knows no bounds and always manages to find the time to help without regard for who’s asking.
Jim is still missing. Over the past year I’ve received no papers from Jim. There’s been no advice. I’ve not had anything reviewed by him. And, I’ve not been able to talk to him about projects I’ve been working on. When I attend a conference, I don’t get the usual surprise visits from Jim. It’s been exactly one year and we know no more today than we did a year ago. Jim remains missing. We all miss him deeply.
--jrh
James Hamilton, Windows Live Platform Services Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh
In Designing and Deploying Internet Scale Services (http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_Lisa.pdf) I’ve argued that all services should expect to be overloaded and all services should expect to have manage mass failures.
Degraded operations mode is a means of dealing with excess load that will happen at some point in the life of your service. Sooner or later, you’ll get an unpredicted number of new customers, more concurrent users, or you’ll have part of the server fleet down and get hit with unexpected load. Sooner or later you’ll have more customer requests than you have resources to satisfy. When this occurs, many services just run slower and slower and eventually start failing in timeouts. Basically every user in the system gets a very bad experience. A more serious example is a login storm. For most services, steady state service is much less resource intensive than user login. So, in the event of global or broad service failure, millions of users will arrive back at once attempting to login. The service will fail again under the load and the cycle repeats. It’s not a good place to be. A more drastic approach to avoid this problem is admission control. Only allow users into the service where you have resources left to be able to serve them. Essentially give a few customers a bad experience by not letting them onto the service in order to avoid giving all customers a bad experience.
There is much that can be done between the first options, service failure under high load, and the other end of the spectrum, admission control. I call this middle ground, degraded operations mode. In the limit all services need to have admission control to avoid complete and repeating service failure under extreme loads but you hope that admission control is never used. Degraded operations mode allows a service to continue to take on new load after it reaches capacity by shedding unnecessary tasks. Most services have batch jobs that run tasks that need to be done but there isn’t actually a customer waiting on them. For example reporting, backup, index creation, system maintenance, copying data to warehouse servers, etc. In most services a substantial amount of this work can be deferred without negatively impacting the service. Clearly these operations need to be run eventually and how long each can be delayed is task and service specific. Temporarily shedding these batch jobs allows more customers to be served. The next level of degraded operations mode is to restrict the quality of service in some way. If some operations are far more expensive, you may only allow users to access a subset of the full service functionality. For example, if you may allow transactions but not reporting if that makes sense for your service. Finding these degraded modes of operation is difficult and very application specific but they are always there and its always worth finding them. There WILL be a time when you have more users than resources.
15 years ago I worked on an Ada language compiler and one of the target hardware platforms for this compiler was a Navy fire control system. This embedded system had a large red switch tagged as “Battle Ready Mode”. This switch would disable all automatic shutdowns and put the server into a mode where it would continue to run when the room was on fire or water is beginning to rise up the base of the computer. In this mode, it runs until it dies. In the services world, this isn’t exactly what we’re after but it’s closely related. We want all system to be able to drop back to a degraded operation mode that will allow them to continue to provide at least a subset of service even when under extreme load or suffering from cascading sub-system failures. We need to design and, most important, we need to test these degraded modes of operation in at least limited production or they won’t work when we really need them. Unfortunately, all services but the very least successful will need these degraded operations modes at least once.
Degraded operation modes are service specific and, for many services, the initial developer gut reaction is that everything is mission critical and there exist no meaningful degraded modes for their specific service. But, they are always there if you take it seriously and look hard. The first level is to stop all batch processing and periodic jobs. That’s an easy one and almost all services have some batch jobs that are not time critical. Run them later. That one is fairly easy but most are hard to come up with. It’s hard to produce a lower quality customer experience that is still useful but I’ve yet to find an example where none were available. As an example, consider Exchange Hosted Services (an email anti-malware and archiving service). In that service, the mail must get delivered. What is the degraded operation mode? They actually can be found there as well. Here’s some examples: turn up the aggressiveness of email edge blocks, defer processing of mail classified as Spam until later, process mail from users of the service ahead of non-known users, prioritize platinum customers ahead of others. There actually are quite a few options. The important point is to think what they are and ensure they are developed and tested prior to the operations team needing them in the middle of the night.
A few months back Skype had a problem recently where the entire service went down or mostly down for more than a day. What they report happened was that Windows Update forced many reboots and it lead to a flood of Skype login requests “that when combined with lack of peer to peer resources had a critical impact” (http://heartbeat.skype.com/2007/08/what_happened_on_august_16.html). There are at least two interesting factors here, one generic to all services and one Skype specific. Generically, it’s very common for login operations to be MUCH more expensive than steady state operation so all services need to engineer for login storms after service interruption. The WinLive Messenger team has given this considerable thought and has considerable experience with this issue. They know there needs to be an easy way to throttle login requests such that you can control the rate with which they are accepted (a fine grained admission control for login). All services need this or something like this but it’s surprising how few have actually implemented this protection and tested it to ensure it works in production. The Skype specific situation is not widely documented put hinted at by the “lack of peer-to-peer” resources note in the above referenced quote. In Skype’s implementation, the lack of an available supernode will cause client to report login failure (http://www1.cs.columbia.edu/~salman/publications/skype1_4.pdf sent to me by Sharma Kunapalli). This means that nodes can’t login unless they can find a supernode. This has a nasty side effect in that the fewer clients that can successfully login, the more likely it is that other clients won’t successfully find a supernode. If they can’t find a supernode, they won’t be able to login either. Basically, the entire network can become unstable due to the dependence on finding a supernode to successfully log a client into the network. For Skype, a great “degraded operation” mode would be to allow login even when a supernode can’t be found. Let the client get on and perhaps establish peer connectivity later.
Why wait for failure and the next post-mortem to design in and production test degraded operations for your services? Make it part of your next release.
--jrh
James Hamilton, Windows Live Platform Services Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh
A couple of weeks back I attended the Berkeley RAD Lab Retreat. At this retreat, the RAD Lab grad students present their projects and, as is typical of Berkeley retreats, the talks we’re quite good. It was held up at Lake Tahoe which was great for the skiers but also made for an interesting drive up there. Chains were required for the drive from Reno to Lake Tahoe and I was in a rental car with less than great summer tires and, of course, no chains.
It snowed hard for much of the retreat. When leaving I took a picture of a pickup truck completely buried in the parking lot:
.jpg)
The talks included: Scalable Consistent Document Store, Prototype of the Instrumentation Backplane, Response time modeling for power-aware resource allocation, Using Machine Learning to Predict Performance of Parallel DB Systems, Diagnosing Performance Problems from Trace data using probabilistic models, Xtrace to find Flaws in UC Berkeley Wireless LAN, Exposing Network Service Failures with Datapath Traces, Owning Your Own Inbox: Attacks on Spam Filters, Declarative Distributed Debugging (D3), Policy Aware Switching Layer, Tracing Hadoop, Machine-Learning-Enabled Router to Deal with Local-Area Congestion, A Declarative API for Secure Network Applications, Deterministic Replay on multi-processor systems, and RubyOnRails.berkeley.edu.
Basically the list of talks presented came pretty close to what I would list as the most interesting challenges in services and service design. Great stuff. In addition to the talks, there are always an interesting group of folks from Industry and this year was no exception. I had a good conversation over dinner with Luiz Barroso (http://research.google.com/pubs/author77.html) and brief chat with Jhttp://www.electric-cloud.com/).
The flight back was more than a bit interesting as well. We left Reno heading towards Seattle in a small prop plane. Thirty minutes into the trip, I was starting to wonder what was wrong in that I could see the aircraft landing gear doors opening and closing repeatedly from my wing side seat. Shortly thereafter the pilot announced that we had a gear problem and we needed to return to Reno. We returned and did a low pass over the Reno airport so that the tower could check the landing gear position via binoculars. Then we circled back and landed with a fire trucking chasing us down the runway. We stayed out on the active taxi ways with the airport closed to incoming or outgoing traffic while crew came out to the aircraft and pinned the gear in the down position before moving the plane to the terminal.
.jpg)
--jrh
James Hamilton, Windows Live Platform Services Bldg RedW-D/2072, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | JamesRH@microsoft.com
H:mvdirona.com | W:research.microsoft.com/~jamesrh
Dave Dewitt and Michael Stonebraker posted an article worth reading yesterday titled: MapReduce: A Major Step Backwards (Thanks to Kevin Merrit and Sriram Krishnan for sending this one my way). Their general argument is that MapReduce isn’t better than current generation RDBMS which is certainly true in many dimensions and it isn’t a new invention which is also true. I’m not in agreement with the conclusion that MapReduce is a major step backwards but I’m fully in agreement with many of the points building towards that conclusion. Let’s look at some of the major points made by the article:
1. MapReduce is a step backwards in database access
In this section, the authors argue that schema is good, separation of schema and application are good, and high level language access is good. On the first two points, I agree schema is good and there is no question that application/schema separation has long ago proven to be a good thing. The thing to keep in mind is that MapReduce is only an execution framework. The data store is GFS or sometimes Bigtable in the case of Google or HDFS or HBase in the case of Hadoop. MapReduce is only the execution framework so it’s not 100% correct to argue that MapReduce doesn’t support schema – that’s a store issue and it is true that most stores that MapReduce is run over don’t implement these features today.
I argue that a separation of execution framework from store and indexing technology is a good thing in th |