Sunday, 26 June 2011

SIGMOD 2011 in Athens

SIGMOD 2011 in Athens: "

Earlier this week, I was in Athens Greece
attending annual conference of the ACM
Machinery Special Interest Group on Management of Data
.
SIGMOD is one of the top two database events held each year attracting academic researchers
and leading practitioners from industry.

I kicked off the conference with the Plenary
keynote
. In this talk
I started with a short retrospection on the industry over the last 20 years. In my
early days as a database developer, things were moving incredibly quickly. Customers
were loving our products, the industry was growing fast and yet the products really
weren’t all that good. You know you are working on important technology when customers
are buying like crazy and the products aren’t anywhere close to where they should
be.

In my first release as lead architect on DB2
20 years ago, we completely rewrote the DB2 database engine process model moving from
a process-per-connected-user model to a single process where each connection only
consumes a single thread supporting many more concurrent connections. It was a fairly
fundamental architectural change completed in a single release. And in that same release,
we improved TPC-A performance
a booming factor of 10 and then did 4x more in the next release. It was a fun time
and things were moving quickly.


From the mid-90s through to around
2005, the database world went through what I refer to as the dark ages. DBMS code
bases had grown to the point where the smallest was more than 4 million lines of code,
the commercial system engineering teams would no longer fit in a single building,
and the number of database companies shrunk throughout the entire period down to only
3 major players. The pace of innovation was glacial and much of the research during
the period was, in the words of Bruce Lindsay, “polishing the round ball”. The problem
was that the products were actually passably good, customers didn’t have a lot of
alternatives, and nothing slows innovation like large teams with huge code bases.

In the last 5 years, the database
world has become exciting again. I’m seeing more opportunity in the database world
now than any other time in the last 20 years. It’s now easy to get venture funding
to do database products and the number of and diversity of viable products is exploding.
My talk focused on what changed, why it happened, and some of the technical backdrop
influencing.


A background thesis of the talk is that cloud
computing solves two of the primary reasons why customers used to be stuck standardizing
on a single database engine even though some of their workloads may have run poorly.
The first is cost. Cloud computing reduces costs dramatically (some of the cloud economics
argument: http://perspectives.mvdirona.com/2009/04/21/McKinseySpeculatesThatCloudComputingMayBeMoreExpensiveThanInternalIT.aspx)
and charges by usage rather than via annual enterprise license. One of the favorite
lock-ins of the enterprise software world is the enterprise license. Once you’ve signed
one, you are completely owned and it’s hard to afford to run another product. My
fundamental rule of enterprise software is that any company that can afford to give
you 50% to 80% reduction from “list price” is pretty clearly not a low margin operator.
That is the way much of the enterprise computing world continues to work: start with
a crazy price, negotiate down to a ½ crazy price, and then feel like a hero while
you contribute to incredibly high profit margins.

Cloud computing charges by the use in small
increments and any of the major database or open source offerings can be used at low
cost. That is certainly a relevant reason but the really significant factor is the
offloading of administrative complexity to the cloud provider. One
of the primary reasons to standardize on a single database is that each is so complex
to administer, that it’s hard to have sufficient skill on staff to manage more than
one. Cloud offerings like AWS
Relational Database Service
transfer
much of the administrative work to the cloud provider making it easy to chose the
database that best fits the application and to have many specialized engines in use
across a given company.

As costs fall, more workloads
become practical and existing workloads get larger. For
example, If analyzing three months of customer usage data has value to the business
and it becomes affordable to analyze two years instead, customers correctly want to
do it. The plunging cost of computing is fueling database size growth at a super-Moore
pace requiring either partitioned (sharded) or parallel DB engines.

Customers now have larger and
more complex data problems, they need the products always online, and they are now
willing to use a wide variety of specialized solutions if needed. Data intensive workloads
are growing quickly and never have there been so many opportunities and so many unsolved
or incompletely solved problems. It’s a great time to be working on database systems.





· Proceedings
extended abstract: http://www.sigmod2011.org/keynote_1.shtml


No comments:

Post a Comment