Showing posts with label database. Show all posts
Showing posts with label database. Show all posts

Thursday, 17 May 2012

Article: If all these new DBMS technologies are so scalable, why are Oracle and DB2 still on top of TPC-C? A roadmap to end their dominance.


http://dbmsmusings.blogspot.com/2012/05/if-all-these-new-dbms-technologies-are.html

(This post is coau­thored by Alexan­der Thom­son and Daniel Abadi)
In the last decade, data­base tech­nol­o­gy has arguably pro­gressed fur­thest along the scal­a­bil­i­ty dimen­sion. There have been hun­dreds of research papers, dozens of open-source projects, and numer­ous star­tups attempt­ing to improve the scal­a­bil­i­ty of data­base tech­nol­o­gy. Many of these new tech­nolo­gies have been extreme­ly influential---some papers have earned thou­sands of cita­tions, and some new sys­tems have been deployed by thou­sands of enter­pris­es.

So let's ask a sim­ple ques­tion: If all these new tech­nolo­gies are so scal­able, why on earth are Ora­cle and DB2 still on top of the TPC-C stand­ings? Go to the TPC-C Web­site with the top 10 results in raw trans­ac­tions per sec­ond. As of today (May 16th, 2012), Ora­cle 11g is used for 3 of the results (includ­ing the top result), 10g is used for 2 of the results, and the rest of the top 10 is filled with var­i­ous ver­sions of DB2. How is tech­nol­o­gy designed decades ago still dom­i­nat­ing TPC-C? What hap­pened to all these new tech­nolo­gies with all these scal­a­bil­i­ty claims?

The sur­pris­ing truth is that these new DBMS tech­nolo­gies are not list­ed in theTPC-C top ten results not because that they do not care enough to enter, but rather because they would not win if they did.

To under­stand why this is the case, one must under­stand that scal­a­bil­i­ty does not come for free. Some­thing must be sac­ri­ficed to achieve high scal­a­bil­i­ty. Today, there are three major cat­e­gories of trade­off that can be exploit­ed to make a sys­tem scale. The new tech­nolo­gies basi­cal­ly fall into two of these cat­e­gories; Ora­cle and DB2 fall into a third. And the later parts of this blog post describes research from our group at Yale that intro­duces a fourth cat­e­go­ry of trade­off that pro­vides a roadmap to end the dom­i­nance of Ora­cle and DB2.

These cat­e­gories are:

(1) Sac­ri­fice ACID for scal­a­bil­i­ty. Our pre­vi­ous post on this topic dis­cussed this in detail. Basi­cal­ly we argue that a major class of new scal­able tech­nolo­gies fall under the cat­e­go­ry of "NoSQL" which achieves scal­a­bil­i­ty by drop­ping ACID guar­an­tees, there­by allow­ing them to eschew two phase lock­ing, two phase com­mit, and other imped­i­ments to con­cur­ren­cy and proces­sor inde­pen­dence that hurt scal­a­bil­i­ty. All of these sys­tems that relax ACID are imme­di­ate­ly inel­i­gi­ble to enter the TPC-C com­pe­ti­tion since ACID guar­an­tees are one of TPC-C's require­ments. That's why you don't see NoSQL data­bas­es in the TPC-C top 10---they are imme­di­ate­ly dis­qual­i­fied.

(2) Reduce trans­ac­tion flex­i­bil­i­ty for scal­a­bil­i­ty. There are many so-called"NewSQL" data­bas­es that claim to be both ACID-compliant and scal­able. And these claims are true---to a degree. How­ev­er, the fine print is that they are only lin­ear­ly scal­able when trans­ac­tions can be com­plete­ly iso­lat­ed to a sin­gle "par­ti­tion" or "shard" of data. While these NewSQL data­bas­es often hide the com­plex­i­ty of shard­ing from the appli­ca­tion devel­op­er, they still rely on the shards to be fair­ly inde­pen­dent. As soon as a trans­ac­tion needs to span mul­ti­ple shards (e.g., update two dif­fer­ent user records on two dif­fer­ent shards in the same atom­ic trans­ac­tion), then these NewSQL sys­tems all run into prob­lems. Some sim­ply reject such trans­ac­tions. Oth­ers allow them, but need to per­form two phase com­mit or other agree­ment pro­to­cols in order to ensure ACID com­pli­ance (since each shard may fail inde­pen­dent­ly). Unfor­tu­nate­ly, agree­ment pro­to­cols such as two phase com­mit come at a great scal­a­bil­i­ty cost (see our 2010 paper that explains why). There­fore, NewSQL data­bas­es only scale well if multi-shard trans­ac­tions (also called "dis­trib­uted trans­ac­tions" or "multi-partition trans­ac­tions") are very rare. Unfor­tu­nate­ly for these data­bas­es, TPC-C mod­els a fair­ly rea­son­able retail appli­ca­tion where cus­tomers buy prod­ucts and the inven­to­ry needs to be updat­ed in the same atom­ic trans­ac­tion. 10% of TPC-C New Order trans­ac­tions involve cus­tomers buy­ing prod­ucts from a "remote" ware­house, which is gen­er­al­ly stored in a sep­a­rate shard. There­fore, even for basic appli­ca­tions like TPC-C, NewSQL data­bas­es lose their scal­a­bil­i­ty advan­tages. That's why the NewSQL data­bas­es do not enter TPC-C results --- even just 10% of multi-shard trans­ac­tions caus­es their per­for­mance to degrade rapid­ly.

(3) Trade cost for scal­a­bil­i­ty. If you use high end hard­ware, it is pos­si­ble to get stun­ning­ly high trans­ac­tion­al through­put using old data­base tech­nolo­gies that don't have shared-nothing hor­i­zon­tal­ly scal­a­bil­i­ty. Ora­cle tops TPC-C with an incred­i­bly high through­put of 500,000 trans­ac­tions per sec­ond. There exists no appli­ca­tion in the mod­ern world that pro­duces more than 500,000 trans­ac­tions per sec­ond (as long as humans are ini­ti­at­ing the transactions---machine-generated trans­ac­tions are a dif­fer­ent story). There­fore, Ora­cle basi­cal­ly has all the scal­a­bil­i­ty that is need­ed for human scale appli­ca­tions. The only down­side is cost---the Ora­cle sys­tem that is able to achieve 500,000 trans­ac­tions per sec­ond costs a pro­hib­i­tive $30,000,000!

Since the first two types of trade­offs are imme­di­ate dis­qual­i­fiers for TPC-C, the only remain­ing thing to give up is cost-for-scale, and that's why the old data­base tech­nolo­gies are still dom­i­nat­ing TPC-C. None of these new tech­nolo­gies can han­dle both ACID and 10% remote trans­ac­tions.

A fourth approach...

TPC-C is a very rea­son­able appli­ca­tion. New tech­nolo­gies should be able to han­dle it. There­fore, at Yale we set out to find a new dimen­sion in this trade­off space that could allow a sys­tem to han­dle TPC-C at scale with­out cost­ing $30,000,000. Indeed, we are pre­sent­ing a paper next week at SIG­MOD (see the full paper) that describes a sys­tem that can achieve 500,000 ACID-compliant TPC-C New Order trans­ac­tions per sec­ond using com­mod­i­ty hard­ware in the cloud. The cost to us to run these exper­i­ments was less than $300 (of course, this is rent­ing hard­ware rather than buy­ing, so it's hard to com­pare prices --- but still --- a fac­tor of 100,000 less than $30,000,000 is quite large).

Calvin, our pro­to­type sys­tem designed and built by a large team of researchers at Yale that include Thad­deus Dia­mond, Shu-Chun Weng, Kun Ren, Philip Shao, Anton Petrov, Michael Giuf­fri­da, and Aaron Segal (in addi­tion to the authors of this blog post), explores a trade­off very dif­fer­ent from the three described above. Calvin requires all trans­ac­tions to be exe­cut­ed fully server-side and sac­ri­fices the free­dom to non-deterministically abort or reorder trans­ac­tions on-the-fly dur­ing exe­cu­tion. In return, Calvin gets scal­a­bil­i­ty, ACID-compliance, and extreme­ly low-overhead multi-shard trans­ac­tions over a shared-nothing archi­tec­ture. In other words, Calvin is designed to han­dle high-volume OLTP through­put on shard­ed data­bas­es on cheap, com­mod­i­ty hard­ware stored local­ly or in the cloud. Calvin sig­nif­i­cant­lyimproves the scal­a­bil­i­ty over our pre­vi­ous approach to achiev­ing deter­min­ism in data­base sys­tems.

Scal­ing ACID

The key to Calvin's strong per­for­mance is that it reor­ga­nizes the trans­ac­tion exe­cu­tion pipeline nor­mal­ly used in DBMSs accord­ing to the prin­ci­ple: do all the "hard" work before acquir­ing locks and begin­ning exe­cu­tion. In par­tic­u­lar, Calvin moves the fol­low­ing stages to the front of the pipeline:

  • Repli­ca­tion. In tra­di­tion­al sys­tems, repli­cas agree on each mod­i­fi­ca­tion to data­base state only after some trans­ac­tion has made the change at some "mas­ter" repli­ca. In Calvin, all repli­cas agree in advance on the sequence of trans­ac­tions that they will (deter­min­is­ti­cal­ly) attempt to exe­cute.
  • Agree­ment between par­tic­i­pants in dis­trib­uted trans­ac­tions. Data­base sys­tems tra­di­tion­al­ly use two-phase com­mit (2PC) to han­dle dis­trib­uted trans­ac­tions. In Calvin, every node sees the same glob­al sequence of trans­ac­tion requests, and is able to use this already-agreed-upon infor­ma­tion in place of a com­mit pro­to­col.
  • Disk access­es. In our VLDB 2010 paper, we observed that deter­min­is­tic sys­tems per­formed ter­ri­bly in disk-based envi­ron­ments due to hold­ing locks for the 10ms+ dura­tion of read­ing the need­ed data from disk, since they can­not reorder con­flict­ing trans­ac­tions on the fly. Calvin gets around this set­back by prefetch­ing into mem­o­ry all records that a trans­ac­tion will need dur­ing the repli­ca­tion phase---before locks are even acquired.

As a result, each trans­ac­tion's user-specified logic can be exe­cut­ed at each shard with an absolute min­i­mum of run­time syn­chro­niza­tion between shards or repli­cas to slow it down, even if the trans­ac­tion's logic requires it to access records at mul­ti­ple shards. By min­i­miz­ing the time that locks are held, con­cur­ren­cy can be great­ly increased, there­by lead­ing to near-linear scal­a­bil­i­ty on a com­mod­i­ty clus­ter of machines. 

Strong­ly con­sis­tent glob­al repli­ca­tion 

Calvin's deter­min­is­tic exe­cu­tion seman­tics pro­vide an addi­tion­al ben­e­fit: repli­cat­ing trans­ac­tion­al input is suf­fi­cient to achieve strong­ly con­sis­tent repli­ca­tion. Since repli­cat­ing batch­es of trans­ac­tion requests is extreme­ly inex­pen­sive and hap­pens before the trans­ac­tions acquire locks and begin exe­cut­ing, Calvin's trans­ac­tion­al through­put capac­i­ty does not depend at all on its repli­ca­tion con­fig­u­ra­tion. 

In other words, not only can Calvin can run 500,000 trans­ac­tions per sec­ond on 100 EC2 instances in Ama­zon's US East (Vir­ginia) data cen­ter, it can main­tain strongly-consistent, up-to-date 100-node repli­cas in Ama­zon's Europe (Ire­land) and US West (Cal­i­for­nia) data centers---at no cost to through­put. 

Calvin accom­plish­es this by hav­ing repli­cas per­form the actu­al pro­cess­ing of trans­ac­tions com­plete­ly inde­pen­dent­ly of one anoth­er, main­tain­ing strong con­sis­ten­cy with­out hav­ing to con­stant­ly syn­chro­nize trans­ac­tion results between repli­cas. (Calvin's end-to-end trans­ac­tion laten­cy does depend on mes­sage delays between repli­cas, of course---there is no get­ting around the speed of light.) 

Flex­i­ble data model 

So where does Calvin fall in the OldSQL/NewSQL/NoSQL tri­choto­my? 

Actu­al­ly, nowhere. Calvin is not a data­base sys­tem itself, but rather a trans­ac­tion sched­ul­ing and repli­ca­tion coor­di­na­tion ser­vice. We designed the sys­tem to inte­grate with any data stor­age layer, rela­tion­al or oth­er­wise. Calvin allows user trans­ac­tion code to access the data layer freely, using any data access lan­guage or inter­face sup­port­ed by the under­ly­ing stor­age engine (so long as Calvin can observe which records user trans­ac­tions access). The exper­i­ments pre­sent­ed in the paper use a cus­tom key-value store. More recent­ly, we've hooked Calvin up to Google's Lev­elDB and added sup­port for SQL-based data access with­in trans­ac­tions, build­ing rela­tion­al tables on top of Lev­elDB's effi­cient sorted-string stor­age. 

From an appli­ca­tion devel­op­er's point of view, Calvin's pri­ma­ry lim­i­ta­tion com­pared to other sys­tems is that trans­ac­tions must be exe­cut­ed entire­ly server-side. Calvin has to know in advance what code will be exe­cut­ed for a given trans­ac­tion. Users may pre-define trans­ac­tions direct­ly in C++, or sub­mit arbi­trary Python code snip­pets on-the-fly to be parsed and exe­cut­ed as trans­ac­tions. 

For some appli­ca­tions, this require­ment of com­plete­ly server-side trans­ac­tions might be a dif­fi­cult lim­i­ta­tion. How­ev­er, many appli­ca­tions pre­fer to exe­cute trans­ac­tion code on the data­base serv­er any­way (in the form of stored pro­ce­dures), in order to avoid mul­ti­ple round trip mes­sages between the data­base serv­er and appli­ca­tion serv­er in the mid­dle of a trans­ac­tion. 

If this lim­i­ta­tion is accept­able, Calvin presents a nice alter­na­tive in the trade­off space to achiev­ing high scal­a­bil­i­ty with­out sac­ri­fic­ing ACID or multi-shard trans­ac­tions. Hence, we believe that ourSIG­MOD paper may present a roadmap for over­com­ing the scal­a­bil­i­ty dom­i­nance of the decades-old data­base solu­tions on tra­di­tion­al OLTP work­loads. We look for­ward to debat­ing the mer­its of this approach in the weeks ahead (and Alex will be pre­sent­ing the paper at SIG­MOD next week).

Wednesday, 11 January 2012

In-Memory Database Systems Questions and Answers

In-Memory Database Systems Questions and Answers:

In-Memory Database Systems - Questions and Answers

In-memory database systems (IMDS) are a growing sub-set of a database management system (DBMS) software. In-memory databases emerged in response to new application goals, system requirements, and operating environments. Below, we answer common IMDS questions.

What is an in-memory database system?

An in-memory database system is a database management system that stores data entirely in main memory. This contrasts to traditional (on-disk) database systems, which are designed for data storage on persistent media. Because working with data in memory is much faster than writing to and reading from a file system, IMDSs can perform applications’ data management functions an order of magnitude faster. Because their design is typically simpler than that of on-disk databases, IMDSs can also impose significantly lower memory and CPU requirements.

If avoiding disk I/O is the goal, why not achieve that through database caching?


Caching is the process whereby on-disk databases keep frequently-accessed records in memory, for faster access. However, caching only speeds up retrieval of information, or “database reads.” Any database write – that is, an update to a record or creation of a new record – must still be written through the cache, to disk. So, the performance benefit only applies to a subset of database tasks. In addition, managing the cache is itself a process that requires substantial memory and CPU resources, so even a “cache hit” underperforms an in-memory database.

If an in-memory database system boosts performance by holding all records in memory, can’t I get the same result by creating a RAM disk and deploying a traditional database there?


As a makeshift solution, placing the entire on-disk database on a RAM disk will speed up both database reads and writes. However, the database is still hard-wired for disk storage, and processes in the database to facilitate disk storage, such as caching and file I/O, will continue to operate, even though they are now redundant.

In addition, data in an on-disk database system must be transferred to numerous locations as it is used. Figure 1 shows the handoffs required for an application to read a piece of data from an on-disk database, modify it and write that record back to the database. These steps require time and CPU cycles, and cannot be avoided in a traditional database, even when it runs on a RAM disk. Still more copies and transfers are required if transaction logging is active.
Figure 1. Data transfer in an on-disk database system
In contrast, an in-memory database system entails a single data transfer. Elimination of multiple data transfers streamlines processing. Removing multiple copies of data reduces memory consumption, and the simplified processing makes for greater reliability and minimizes CPU demands.

Can you quantify the performance difference between the three approaches described above – using on-disk, on-disk deployed on a RAM-disk, and in-memory database systems?


In a published benchmark, McObject compared the same application’s performance using an embedded on-disk database system, using an embedded in-memory database, and using the embedded on-disk database deployed on a RAM-disk. Moving the on-disk database to a RAM drive resulted in read accesses that were almost 4x faster, and database updates that were more than 3x faster.

Moving this same benchmark test to a true in-memory database system, however, provided much more dramatic performance gains: the in-memory database outperformed the RAM-disk database by 4x for database reads and turned in a startling 420x improvement for database writes. Click here to read an article on iApplianceWeb reporting the benchmark test. Click here to download McObject’s benchmark report.

What else distinguishes an in-memory database from a “traditional” (on-disk) database management system (DBMS)?


The optimization objectives of an on-disk database system are diametrically opposed to those of an in-memory database system. With an on-disk database system, the primary burden on performance is file I/O. Therefore an on-disk database system seeks to reduce that I/O, and it will trade off memory consumption and CPU cycles to do so. This includes using extra memory for a cache, and CPU cycles to maintain the cache.

On-disk DBMSs also keep a lot of redundant data around. For example, duplicate data is kept in index structures, to enable the on-disk database system to fetch records from the index, rather than “spending” an I/O navigating from the index to the data itself. Disk space is cheap, so designers of on-disk database systems proceed with the assumption that storage space is virtually limitless.

In stark contrast, an in-memory database system carries no file I/O burden. From the start, its design can be more streamlined, with the optimization goals of reducing memory consumption and CPU cycles. Though memory has declined in price, developers rightly treat it as more precious—and because memory equals storage space for an in-memory database system, IMDSs should be (and McObject’s eXtremeDB in-memory embedded database is) designed to get the most out of memory. An in-memory database is chosen explicitly for its performance advantage, so a secondary design goal is always to eliminate unnecessary CPU cycles.

Isn’t the database just lost if there’s a system crash?


It needn’t be. Most in-memory database systems offer features for adding persistence, or the ability survive disruption of their hardware or software environment.

One important tool is transaction logging, in which periodic snapshots of the in-memory database (called “savepoints”) are written to non-volatile media. If the system fails and must be restarted, the database either “rolls back” to the last completed transaction, or “rolls forward” to complete any transaction that was in progress when the system went down (depending on the particular IMDS’s implementation of transaction logging).

In-memory database systems can also gain durability by maintaining one or more copies of the database. In this solution – called database replication – fail-over procedures allow the system to continue using a standby database. The “master” and replica databases can be maintained by multiple processes or threads within the same hardware instance. They can also reside on two or more boards in a chassis with a high-speed bus for communication, run on separate computers on a LAN, or exist in other configurations.

Non-volatile RAM or NVRAM provides another means of in-memory database persistence. One type of NVRAM, called battery-RAM, is backed up by a battery so that even if a device is turned off or loses its power source, the memory content—including the database—remains. Newer types of NVRAM, including ferroelectric RAM (FeRAM), magnetoresistive RAM (MRAM) and phase change RAM (PRAM) are designed to maintain information when power is turned off, and offer similar persistence options.

Finally, new hybrid database system technology adds the ability to apply disk-based storage selectively, within the broader context of an in-memory database. For example, with McObject’s hybrid eXtremeDB Fusion, a notation in the database design or "schema" causes certain record types to be written to disk, while all others are managed entirely in memory. On-disk functions such as cache management are applied only to those records stored on disk, minimizing these activities’ performance impact and CPU demands.

What kinds of applications typically employ an in-memory database?


In-memory databases are most commonly used in applications that demand very fast data access, storage and manipulation, and in systems that don’t typically have a disk but nevertheless must manage appreciable quantities of data.

An important use for in-memory database systems is in real-time embedded systems. IMDSs running on real-time operating systems (RTOSs) provide the responsiveness needed in applications including IP network routing, telecom switching, and industrial control. IMDSs manage music databases in MP3 players and handle programming data in set-top boxes. In-memory databases’ typically small memory and CPU footprint make them ideal because most embedded systems are highly resource-constrained.

Non-embedded applications requiring exceptional performance are an important growth area for in-memory database systems. For example, algorithmic trading and other applications for financial markets use IMDSs to provide instant manipulation of data, in order to identify and leverage market opportunities. Some multi-user Web applications – such as e-commerce and social networking sites – use in-memory databases to cache portions of their back-end on-disk database systems. These enterprise-scale applications sometimes require very large in-memory data stores, and this need is met by 64-bit IMDS editions.

Is an in-memory database the same as an “embedded database”?


“Embedded database” refers to a database system that is built into the software program by the application developer, is invisible to the application’s end-user and requires little or no ongoing maintenance. Many in-memory databases fit that description, but not all do. In contrast to embedded databases, a “client/server database” refers to a database system that utilizes a separate dedicated software program, called the database server, accessed by client applications via inter-process communication (IPC) or remote procedure call (RPC) interfaces. Some in-memory database systems employ the client/server model.

How scalable is an in-memory database system? My application manages terabytes of data – is it practical to hold this much in an in-memory database?


IMDS technology scales well beyond the terabyte size range. McObject’s benchmark report, In-Memory Database Systems (IMDSs) Beyond the Terabyte Size Boundary, detailed this scalability with a 64-bit in-memory database system deployed on a 160-core SGI Altix 4700 server running SUSE Linux Enterprise Server version 9 from Novell. The database grew to 1.17 terabytes and 15.54 billion rows, with no apparent limits on it scaling further.

Performance remained consistent as the database size grew into the hundreds of gigabytes and exceeded a terabyte, suggesting nearly linear scalability. For a simple SELECT against the fully populated database, the IMDS (McObject’s eXtremeDB-64) processed 87.78 million query transactions per second using its native application programming interface (API) and 28.14 million transactions per second using a SQL ODBC API. To put these results in perspective, consider that the lingua franca for discussing query performance is transactions per minute.

Doesn’t it take a long time to populate an in-memory database?


“A long time” is relative. For example, a 19 megabyte in-memory database loads in under 6.6 seconds (under 4 seconds if reloading from a previously saved database image). The 1.17 terabyte database described earlier loaded in just over 33 hours.

What is true is that populating a very large in-memory database system can be much faster than populating an on-disk DBMS. During such “data ingest,” on-disk database systems use caching to enhance performance. But eventually, memory buffers fill up, and the system writes the data to the file system (logical I/O). Eventually, the file system buffers also fill up, and data must be written to the hard disk (physical I/O). Physical I/O is usually measured in milliseconds, and its performance burden is much greater than that of logical I/O (which is usually measured in microseconds). Physical I/O may be required by an on-disk DBMS for other reasons, for example, to guarantee transactional integrity.

Consider what happens when populating an on-disk database, as the total amount of stored data increases:

First, as the database grows, the tree indexes used to organize data grow deeper, and the average number of steps into the tree, to reach the storage location, expands. Each step imposes a logical disk I/O. Second, assuming that the cache size stays the same, the percent of the database that is cached is smaller. Therefore, it is more likely that any logical disk I/O is the more-burdensome physical I/O.

Third, as the database grows, it consumes more physical space on the disk platter, and the average time to move the head from position to position is greater. When the head travels further, physical I/O takes longer, further degrading performance.

In contrast, in-memory database ingest performance is roughly linear as database size increases.

Isn’t an in-memory database only really usable on a single computer system, whereas an on-disk database can be shared by any number of computers on a network?


An in-memory database system can be either an “embedded database” or a “client/server” database system. Client/server database systems are inherently multi-user, but embedded in-memory databases can also be shared by multiple threads/processes/users. First, the database can be created in shared memory, with the database system providing a mechanism to control concurrent access. Also, embedded databases can (andeXtremeDB does) provide a set of interfaces that allow processes that execute on network nodes remote from the database node, to read from and write to the database. Finally, database replication can be exploited to copy the in-memory database to the node(s) where processes are located, so that those processes can query a local database and eliminate network traffic and latency.

What’s different/better about an in-memory database versus STL or Boost collections, or even just creating my own memory-mapped file(s)?


The question is the same as asking why these alternatives are not viable replacements for Oracle, MS SQL Server, DB2, and other on-disk databases. Any database system goes far beyond giving you a set of interfaces to manage collections, lists, etc. This typically includes support for ACID (atomic, consistent, isolated and durable) transactions, multi-user access, a high level data definition language, one or more programming interfaces (including industry-standard SQL), triggers/event notifications, and more.

Won’t an in-memory database require huge amounts of memory because database systems are large?


Equating “database management system” with “big” is justified, generally speaking. Even some embedded DBMSs are megabytes in code size. This is true largely because traditional on-disk databases – including some that have now been adapted for use in memory, and are pitched as IMDSs—were not written with the goal of minimizing code size (or CPU cycles). As described above, as on-disk database systems, their overriding design goal was amelioration of disk I/O.

In contrast, a database system designed from first principles for in-memory use can be much smaller, requiring less than 100K of memory, compared to many 100s of kilobytes up to many megabytes for other database architectures. This reduction in code size results from:
  • Elimination of on-disk database capabilities that become redundant for in-memory use, such as all processes surrounding caching and file I/O
  • Elimination of many features that are unnecessary in the types of application that use in-memory databases. An IP router does not need separate client and server software modules to manage routing data. And a persistent Web cache doesn’t need user access rights or stored procedures
  • Hundreds of other development decisions that are guided by the design philosophy that memory equals storage space, so efficient use of that memory is paramount

Sunday, 26 June 2011

SIGMOD 2011 in Athens

SIGMOD 2011 in Athens: "

Earlier this week, I was in Athens Greece
attending annual conference of the ACM
Machinery Special Interest Group on Management of Data
.
SIGMOD is one of the top two database events held each year attracting academic researchers
and leading practitioners from industry.

I kicked off the conference with the Plenary
keynote
. In this talk
I started with a short retrospection on the industry over the last 20 years. In my
early days as a database developer, things were moving incredibly quickly. Customers
were loving our products, the industry was growing fast and yet the products really
weren’t all that good. You know you are working on important technology when customers
are buying like crazy and the products aren’t anywhere close to where they should
be.

In my first release as lead architect on DB2
20 years ago, we completely rewrote the DB2 database engine process model moving from
a process-per-connected-user model to a single process where each connection only
consumes a single thread supporting many more concurrent connections. It was a fairly
fundamental architectural change completed in a single release. And in that same release,
we improved TPC-A performance
a booming factor of 10 and then did 4x more in the next release. It was a fun time
and things were moving quickly.


From the mid-90s through to around
2005, the database world went through what I refer to as the dark ages. DBMS code
bases had grown to the point where the smallest was more than 4 million lines of code,
the commercial system engineering teams would no longer fit in a single building,
and the number of database companies shrunk throughout the entire period down to only
3 major players. The pace of innovation was glacial and much of the research during
the period was, in the words of Bruce Lindsay, “polishing the round ball”. The problem
was that the products were actually passably good, customers didn’t have a lot of
alternatives, and nothing slows innovation like large teams with huge code bases.

In the last 5 years, the database
world has become exciting again. I’m seeing more opportunity in the database world
now than any other time in the last 20 years. It’s now easy to get venture funding
to do database products and the number of and diversity of viable products is exploding.
My talk focused on what changed, why it happened, and some of the technical backdrop
influencing.


A background thesis of the talk is that cloud
computing solves two of the primary reasons why customers used to be stuck standardizing
on a single database engine even though some of their workloads may have run poorly.
The first is cost. Cloud computing reduces costs dramatically (some of the cloud economics
argument: http://perspectives.mvdirona.com/2009/04/21/McKinseySpeculatesThatCloudComputingMayBeMoreExpensiveThanInternalIT.aspx)
and charges by usage rather than via annual enterprise license. One of the favorite
lock-ins of the enterprise software world is the enterprise license. Once you’ve signed
one, you are completely owned and it’s hard to afford to run another product. My
fundamental rule of enterprise software is that any company that can afford to give
you 50% to 80% reduction from “list price” is pretty clearly not a low margin operator.
That is the way much of the enterprise computing world continues to work: start with
a crazy price, negotiate down to a ½ crazy price, and then feel like a hero while
you contribute to incredibly high profit margins.

Cloud computing charges by the use in small
increments and any of the major database or open source offerings can be used at low
cost. That is certainly a relevant reason but the really significant factor is the
offloading of administrative complexity to the cloud provider. One
of the primary reasons to standardize on a single database is that each is so complex
to administer, that it’s hard to have sufficient skill on staff to manage more than
one. Cloud offerings like AWS
Relational Database Service
transfer
much of the administrative work to the cloud provider making it easy to chose the
database that best fits the application and to have many specialized engines in use
across a given company.

As costs fall, more workloads
become practical and existing workloads get larger. For
example, If analyzing three months of customer usage data has value to the business
and it becomes affordable to analyze two years instead, customers correctly want to
do it. The plunging cost of computing is fueling database size growth at a super-Moore
pace requiring either partitioned (sharded) or parallel DB engines.

Customers now have larger and
more complex data problems, they need the products always online, and they are now
willing to use a wide variety of specialized solutions if needed. Data intensive workloads
are growing quickly and never have there been so many opportunities and so many unsolved
or incompletely solved problems. It’s a great time to be working on database systems.





· Proceedings
extended abstract: http://www.sigmod2011.org/keynote_1.shtml