Showing posts with label database. Show all posts

Thursday, 17 May 2012

Article: If all these new DBMS technologies are so scalable, why are Oracle and DB2 still on top of TPC-C? A roadmap to end their dominance.

http://dbmsmusings.blogspot.com/2012/05/if-all-these-new-dbms-technologies-are.html

(This post is coauthored by Alexander Thomson and Daniel Abadi)
In the last decade, database technology has arguably progressed furthest along the scalability dimension. There have been hundreds of research papers, dozens of open-source projects, and numerous startups attempting to improve the scalability of database technology. Many of these new technologies have been extremely influential---some papers have earned thousands of citations, and some new systems have been deployed by thousands of enterprises.

So let's ask a simple question: If all these new technologies are so scalable, why on earth are Oracle and DB2 still on top of the TPC-C standings? Go to the TPC-C Website with the top 10 results in raw transactions per second. As of today (May 16th, 2012), Oracle 11g is used for 3 of the results (including the top result), 10g is used for 2 of the results, and the rest of the top 10 is filled with various versions of DB2. How is technology designed decades ago still dominating TPC-C? What happened to all these new technologies with all these scalability claims?

The surprising truth is that these new DBMS technologies are not listed in theTPC-C top ten results not because that they do not care enough to enter, but rather because they would not win if they did.

To understand why this is the case, one must understand that scalability does not come for free. Something must be sacrificed to achieve high scalability. Today, there are three major categories of tradeoff that can be exploited to make a system scale. The new technologies basically fall into two of these categories; Oracle and DB2 fall into a third. And the later parts of this blog post describes research from our group at Yale that introduces a fourth category of tradeoff that provides a roadmap to end the dominance of Oracle and DB2.

These categories are:

(1) Sacrifice ACID for scalability. Our previous post on this topic discussed this in detail. Basically we argue that a major class of new scalable technologies fall under the category of "NoSQL" which achieves scalability by dropping ACID guarantees, thereby allowing them to eschew two phase locking, two phase commit, and other impediments to concurrency and processor independence that hurt scalability. All of these systems that relax ACID are immediately ineligible to enter the TPC-C competition since ACID guarantees are one of TPC-C's requirements. That's why you don't see NoSQL databases in the TPC-C top 10---they are immediately disqualified.

(2) Reduce transaction flexibility for scalability. There are many so-called"NewSQL" databases that claim to be both ACID-compliant and scalable. And these claims are true---to a degree. However, the fine print is that they are only linearly scalable when transactions can be completely isolated to a single "partition" or "shard" of data. While these NewSQL databases often hide the complexity of sharding from the application developer, they still rely on the shards to be fairly independent. As soon as a transaction needs to span multiple shards (e.g., update two different user records on two different shards in the same atomic transaction), then these NewSQL systems all run into problems. Some simply reject such transactions. Others allow them, but need to perform two phase commit or other agreement protocols in order to ensure ACID compliance (since each shard may fail independently). Unfortunately, agreement protocols such as two phase commit come at a great scalability cost (see our 2010 paper that explains why). Therefore, NewSQL databases only scale well if multi-shard transactions (also called "distributed transactions" or "multi-partition transactions") are very rare. Unfortunately for these databases, TPC-C models a fairly reasonable retail application where customers buy products and the inventory needs to be updated in the same atomic transaction. 10% of TPC-C New Order transactions involve customers buying products from a "remote" warehouse, which is generally stored in a separate shard. Therefore, even for basic applications like TPC-C, NewSQL databases lose their scalability advantages. That's why the NewSQL databases do not enter TPC-C results --- even just 10% of multi-shard transactions causes their performance to degrade rapidly.

(3) Trade cost for scalability. If you use high end hardware, it is possible to get stunningly high transactional throughput using old database technologies that don't have shared-nothing horizontally scalability. Oracle tops TPC-C with an incredibly high throughput of 500,000 transactions per second. There exists no application in the modern world that produces more than 500,000 transactions per second (as long as humans are initiating the transactions---machine-generated transactions are a different story). Therefore, Oracle basically has all the scalability that is needed for human scale applications. The only downside is cost---the Oracle system that is able to achieve 500,000 transactions per second costs a prohibitive $30,000,000!

Since the first two types of tradeoffs are immediate disqualifiers for TPC-C, the only remaining thing to give up is cost-for-scale, and that's why the old database technologies are still dominating TPC-C. None of these new technologies can handle both ACID and 10% remote transactions.

A fourth approach...

TPC-C is a very reasonable application. New technologies should be able to handle it. Therefore, at Yale we set out to find a new dimension in this tradeoff space that could allow a system to handle TPC-C at scale without costing $30,000,000. Indeed, we are presenting a paper next week at SIGMOD (see the full paper) that describes a system that can achieve 500,000 ACID-compliant TPC-C New Order transactions per second using commodity hardware in the cloud. The cost to us to run these experiments was less than $300 (of course, this is renting hardware rather than buying, so it's hard to compare prices --- but still --- a factor of 100,000 less than $30,000,000 is quite large).

Calvin, our prototype system designed and built by a large team of researchers at Yale that include Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Anton Petrov, Michael Giuffrida, and Aaron Segal (in addition to the authors of this blog post), explores a tradeoff very different from the three described above. Calvin requires all transactions to be executed fully server-side and sacrifices the freedom to non-deterministically abort or reorder transactions on-the-fly during execution. In return, Calvin gets scalability, ACID-compliance, and extremely low-overhead multi-shard transactions over a shared-nothing architecture. In other words, Calvin is designed to handle high-volume OLTP throughput on sharded databases on cheap, commodity hardware stored locally or in the cloud. Calvin significantlyimproves the scalability over our previous approach to achieving determinism in database systems.

Scaling ACID

The key to Calvin's strong performance is that it reorganizes the transaction execution pipeline normally used in DBMSs according to the principle: do all the "hard" work before acquiring locks and beginning execution. In particular, Calvin moves the following stages to the front of the pipeline:

Replication. In traditional systems, replicas agree on each modification to database state only after some transaction has made the change at some "master" replica. In Calvin, all replicas agree in advance on the sequence of transactions that they will (deterministically) attempt to execute.
Agreement between participants in distributed transactions. Database systems traditionally use two-phase commit (2PC) to handle distributed transactions. In Calvin, every node sees the same global sequence of transaction requests, and is able to use this already-agreed-upon information in place of a commit protocol.
Disk accesses. In our VLDB 2010 paper, we observed that deterministic systems performed terribly in disk-based environments due to holding locks for the 10ms+ duration of reading the needed data from disk, since they cannot reorder conflicting transactions on the fly. Calvin gets around this setback by prefetching into memory all records that a transaction will need during the replication phase---before locks are even acquired.

As a result, each transaction's user-specified logic can be executed at each shard with an absolute minimum of runtime synchronization between shards or replicas to slow it down, even if the transaction's logic requires it to access records at multiple shards. By minimizing the time that locks are held, concurrency can be greatly increased, thereby leading to near-linear scalability on a commodity cluster of machines.

Strongly consistent global replication

Calvin's deterministic execution semantics provide an additional benefit: replicating transactional input is sufficient to achieve strongly consistent replication. Since replicating batches of transaction requests is extremely inexpensive and happens before the transactions acquire locks and begin executing, Calvin's transactional throughput capacity does not depend at all on its replication configuration.

In other words, not only can Calvin can run 500,000 transactions per second on 100 EC2 instances in Amazon's US East (Virginia) data center, it can maintain strongly-consistent, up-to-date 100-node replicas in Amazon's Europe (Ireland) and US West (California) data centers---at no cost to throughput.

Calvin accomplishes this by having replicas perform the actual processing of transactions completely independently of one another, maintaining strong consistency without having to constantly synchronize transaction results between replicas. (Calvin's end-to-end transaction latency does depend on message delays between replicas, of course---there is no getting around the speed of light.)

Flexible data model

So where does Calvin fall in the OldSQL/NewSQL/NoSQL trichotomy?

Actually, nowhere. Calvin is not a database system itself, but rather a transaction scheduling and replication coordination service. We designed the system to integrate with any data storage layer, relational or otherwise. Calvin allows user transaction code to access the data layer freely, using any data access language or interface supported by the underlying storage engine (so long as Calvin can observe which records user transactions access). The experiments presented in the paper use a custom key-value store. More recently, we've hooked Calvin up to Google's LevelDB and added support for SQL-based data access within transactions, building relational tables on top of LevelDB's efficient sorted-string storage.

From an application developer's point of view, Calvin's primary limitation compared to other systems is that transactions must be executed entirely server-side. Calvin has to know in advance what code will be executed for a given transaction. Users may pre-define transactions directly in C++, or submit arbitrary Python code snippets on-the-fly to be parsed and executed as transactions.

For some applications, this requirement of completely server-side transactions might be a difficult limitation. However, many applications prefer to execute transaction code on the database server anyway (in the form of stored procedures), in order to avoid multiple round trip messages between the database server and application server in the middle of a transaction.

If this limitation is acceptable, Calvin presents a nice alternative in the tradeoff space to achieving high scalability without sacrificing ACID or multi-shard transactions. Hence, we believe that ourSIGMOD paper may present a roadmap for overcoming the scalability dominance of the decades-old database solutions on traditional OLTP workloads. We look forward to debating the merits of this approach in the weeks ahead (and Alex will be presenting the paper at SIGMOD next week).

Wednesday, 11 January 2012

In-Memory Database Systems Questions and Answers

In-Memory Database Systems Questions and Answers:

In-Memory Database Systems - Questions and Answers

In-memory database systems (IMDS) are a growing sub-set of a database management system (DBMS) software. In-memory databases emerged in response to new application goals, system requirements, and operating environments. Below, we answer common IMDS questions.

What is an in-memory database system?

An in-memory database system is a database management system that stores data entirely in main memory. This contrasts to traditional (on-disk) database systems, which are designed for data storage on persistent media. Because working with data in memory is much faster than writing to and reading from a file system, IMDSs can perform applications’ data management functions an order of magnitude faster. Because their design is typically simpler than that of on-disk databases, IMDSs can also impose significantly lower memory and CPU requirements.

If avoiding disk I/O is the goal, why not achieve that through database caching?

Caching is the process whereby on-disk databases keep frequently-accessed records in memory, for faster access. However, caching only speeds up retrieval of information, or “database reads.” Any database write – that is, an update to a record or creation of a new record – must still be written through the cache, to disk. So, the performance benefit only applies to a subset of database tasks. In addition, managing the cache is itself a process that requires substantial memory and CPU resources, so even a “cache hit” underperforms an in-memory database.

If an in-memory database system boosts performance by holding all records in memory, can’t I get the same result by creating a RAM disk and deploying a traditional database there?

As a makeshift solution, placing the entire on-disk database on a RAM disk will speed up both database reads and writes. However, the database is still hard-wired for disk storage, and processes in the database to facilitate disk storage, such as caching and file I/O, will continue to operate, even though they are now redundant.

In addition, data in an on-disk database system must be transferred to numerous locations as it is used. Figure 1 shows the handoffs required for an application to read a piece of data from an on-disk database, modify it and write that record back to the database. These steps require time and CPU cycles, and cannot be avoided in a traditional database, even when it runs on a RAM disk. Still more copies and transfers are required if transaction logging is active.

Figure 1. Data transfer in an on-disk database system

In contrast, an in-memory database system entails a single data transfer. Elimination of multiple data transfers streamlines processing. Removing multiple copies of data reduces memory consumption, and the simplified processing makes for greater reliability and minimizes CPU demands.

Can you quantify the performance difference between the three approaches described above – using on-disk, on-disk deployed on a RAM-disk, and in-memory database systems?

In a published benchmark, McObject compared the same application’s performance using an embedded on-disk database system, using an embedded in-memory database, and using the embedded on-disk database deployed on a RAM-disk. Moving the on-disk database to a RAM drive resulted in read accesses that were almost 4x faster, and database updates that were more than 3x faster.

Moving this same benchmark test to a true in-memory database system, however, provided much more dramatic performance gains: the in-memory database outperformed the RAM-disk database by 4x for database reads and turned in a startling 420x improvement for database writes. Click here to read an article on iApplianceWeb reporting the benchmark test. Click here to download McObject’s benchmark report.

What else distinguishes an in-memory database from a “traditional” (on-disk) database management system (DBMS)?

The optimization objectives of an on-disk database system are diametrically opposed to those of an in-memory database system. With an on-disk database system, the primary burden on performance is file I/O. Therefore an on-disk database system seeks to reduce that I/O, and it will trade off memory consumption and CPU cycles to do so. This includes using extra memory for a cache, and CPU cycles to maintain the cache.

On-disk DBMSs also keep a lot of redundant data around. For example, duplicate data is kept in index structures, to enable the on-disk database system to fetch records from the index, rather than “spending” an I/O navigating from the index to the data itself. Disk space is cheap, so designers of on-disk database systems proceed with the assumption that storage space is virtually limitless.

In stark contrast, an in-memory database system carries no file I/O burden. From the start, its design can be more streamlined, with the optimization goals of reducing memory consumption and CPU cycles. Though memory has declined in price, developers rightly treat it as more precious—and because memory equals storage space for an in-memory database system, IMDSs should be (and McObject’s eXtremeDB in-memory embedded database is) designed to get the most out of memory. An in-memory database is chosen explicitly for its performance advantage, so a secondary design goal is always to eliminate unnecessary CPU cycles.

Isn’t the database just lost if there’s a system crash?

It needn’t be. Most in-memory database systems offer features for adding persistence, or the ability survive disruption of their hardware or software environment.

One important tool is transaction logging, in which periodic snapshots of the in-memory database (called “savepoints”) are written to non-volatile media. If the system fails and must be restarted, the database either “rolls back” to the last completed transaction, or “rolls forward” to complete any transaction that was in progress when the system went down (depending on the particular IMDS’s implementation of transaction logging).

In-memory database systems can also gain durability by maintaining one or more copies of the database. In this solution – called database replication – fail-over procedures allow the system to continue using a standby database. The “master” and replica databases can be maintained by multiple processes or threads within the same hardware instance. They can also reside on two or more boards in a chassis with a high-speed bus for communication, run on separate computers on a LAN, or exist in other configurations.

Non-volatile RAM or NVRAM provides another means of in-memory database persistence. One type of NVRAM, called battery-RAM, is backed up by a battery so that even if a device is turned off or loses its power source, the memory content—including the database—remains. Newer types of NVRAM, including ferroelectric RAM (FeRAM), magnetoresistive RAM (MRAM) and phase change RAM (PRAM) are designed to maintain information when power is turned off, and offer similar persistence options.

Finally, new hybrid database system technology adds the ability to apply disk-based storage selectively, within the broader context of an in-memory database. For example, with McObject’s hybrid eXtremeDB Fusion, a notation in the database design or "schema" causes certain record types to be written to disk, while all others are managed entirely in memory. On-disk functions such as cache management are applied only to those records stored on disk, minimizing these activities’ performance impact and CPU demands.

What kinds of applications typically employ an in-memory database?

In-memory databases are most commonly used in applications that demand very fast data access, storage and manipulation, and in systems that don’t typically have a disk but nevertheless must manage appreciable quantities of data.

An important use for in-memory database systems is in real-time embedded systems. IMDSs running on real-time operating systems (RTOSs) provide the responsiveness needed in applications including IP network routing, telecom switching, and industrial control. IMDSs manage music databases in MP3 players and handle programming data in set-top boxes. In-memory databases’ typically small memory and CPU footprint make them ideal because most embedded systems are highly resource-constrained.

Non-embedded applications requiring exceptional performance are an important growth area for in-memory database systems. For example, algorithmic trading and other applications for financial markets use IMDSs to provide instant manipulation of data, in order to identify and leverage market opportunities. Some multi-user Web applications – such as e-commerce and social networking sites – use in-memory databases to cache portions of their back-end on-disk database systems. These enterprise-scale applications sometimes require very large in-memory data stores, and this need is met by 64-bit IMDS editions.

Is an in-memory database the same as an “embedded database”?

“Embedded database” refers to a database system that is built into the software program by the application developer, is invisible to the application’s end-user and requires little or no ongoing maintenance. Many in-memory databases fit that description, but not all do. In contrast to embedded databases, a “client/server database” refers to a database system that utilizes a separate dedicated software program, called the database server, accessed by client applications via inter-process communication (IPC) or remote procedure call (RPC) interfaces. Some in-memory database systems employ the client/server model.

How scalable is an in-memory database system? My application manages terabytes of data – is it practical to hold this much in an in-memory database?

IMDS technology scales well beyond the terabyte size range. McObject’s benchmark report, In-Memory Database Systems (IMDSs) Beyond the Terabyte Size Boundary, detailed this scalability with a 64-bit in-memory database system deployed on a 160-core SGI Altix 4700 server running SUSE Linux Enterprise Server version 9 from Novell. The database grew to 1.17 terabytes and 15.54 billion rows, with no apparent limits on it scaling further.

Performance remained consistent as the database size grew into the hundreds of gigabytes and exceeded a terabyte, suggesting nearly linear scalability. For a simple SELECT against the fully populated database, the IMDS (McObject’s eXtremeDB-64) processed 87.78 million query transactions per second using its native application programming interface (API) and 28.14 million transactions per second using a SQL ODBC API. To put these results in perspective, consider that the lingua franca for discussing query performance is transactions per minute.

Doesn’t it take a long time to populate an in-memory database?

“A long time” is relative. For example, a 19 megabyte in-memory database loads in under 6.6 seconds (under 4 seconds if reloading from a previously saved database image). The 1.17 terabyte database described earlier loaded in just over 33 hours.

What is true is that populating a very large in-memory database system can be much faster than populating an on-disk DBMS. During such “data ingest,” on-disk database systems use caching to enhance performance. But eventually, memory buffers fill up, and the system writes the data to the file system (logical I/O). Eventually, the file system buffers also fill up, and data must be written to the hard disk (physical I/O). Physical I/O is usually measured in milliseconds, and its performance burden is much greater than that of logical I/O (which is usually measured in microseconds). Physical I/O may be required by an on-disk DBMS for other reasons, for example, to guarantee transactional integrity.

Consider what happens when populating an on-disk database, as the total amount of stored data increases:

First, as the database grows, the tree indexes used to organize data grow deeper, and the average number of steps into the tree, to reach the storage location, expands. Each step imposes a logical disk I/O. Second, assuming that the cache size stays the same, the percent of the database that is cached is smaller. Therefore, it is more likely that any logical disk I/O is the more-burdensome physical I/O.

Third, as the database grows, it consumes more physical space on the disk platter, and the average time to move the head from position to position is greater. When the head travels further, physical I/O takes longer, further degrading performance.

In contrast, in-memory database ingest performance is roughly linear as database size increases.

Isn’t an in-memory database only really usable on a single computer system, whereas an on-disk database can be shared by any number of computers on a network?

An in-memory database system can be either an “embedded database” or a “client/server” database system. Client/server database systems are inherently multi-user, but embedded in-memory databases can also be shared by multiple threads/processes/users. First, the database can be created in shared memory, with the database system providing a mechanism to control concurrent access. Also, embedded databases can (andeXtremeDB does) provide a set of interfaces that allow processes that execute on network nodes remote from the database node, to read from and write to the database. Finally, database replication can be exploited to copy the in-memory database to the node(s) where processes are located, so that those processes can query a local database and eliminate network traffic and latency.

What’s different/better about an in-memory database versus STL or Boost collections, or even just creating my own memory-mapped file(s)?

The question is the same as asking why these alternatives are not viable replacements for Oracle, MS SQL Server, DB2, and other on-disk databases. Any database system goes far beyond giving you a set of interfaces to manage collections, lists, etc. This typically includes support for ACID (atomic, consistent, isolated and durable) transactions, multi-user access, a high level data definition language, one or more programming interfaces (including industry-standard SQL), triggers/event notifications, and more.

Won’t an in-memory database require huge amounts of memory because database systems are large?

Equating “database management system” with “big” is justified, generally speaking. Even some embedded DBMSs are megabytes in code size. This is true largely because traditional on-disk databases – including some that have now been adapted for use in memory, and are pitched as IMDSs—were not written with the goal of minimizing code size (or CPU cycles). As described above, as on-disk database systems, their overriding design goal was amelioration of disk I/O.

In contrast, a database system designed from first principles for in-memory use can be much smaller, requiring less than 100K of memory, compared to many 100s of kilobytes up to many megabytes for other database architectures. This reduction in code size results from:

Elimination of on-disk database capabilities that become redundant for in-memory use, such as all processes surrounding caching and file I/O
Elimination of many features that are unnecessary in the types of application that use in-memory databases. An IP router does not need separate client and server software modules to manage routing data. And a persistent Web cache doesn’t need user access rights or stored procedures
Hundreds of other development decisions that are guided by the design philosophy that memory equals storage space, so efficient use of that memory is paramount

Sunday, 26 June 2011

SIGMOD 2011 in Athens

SIGMOD 2011 in Athens: "

Earlier this week, I was in Athens Greece
attending annual conference of the ACM
Machinery Special Interest Group on Management of Data.
SIGMOD is one of the top two database events held each year attracting academic researchers
and leading practitioners from industry.

I kicked off the conference with the Plenary
keynote. In this talk
I started with a short retrospection on the industry over the last 20 years. In my
early days as a database developer, things were moving incredibly quickly. Customers
were loving our products, the industry was growing fast and yet the products really
weren’t all that good. You know you are working on important technology when customers
are buying like crazy and the products aren’t anywhere close to where they should
be.

In my first release as lead architect on DB2
20 years ago, we completely rewrote the DB2 database engine process model moving from
a process-per-connected-user model to a single process where each connection only
consumes a single thread supporting many more concurrent connections. It was a fairly
fundamental architectural change completed in a single release. And in that same release,
we improved TPC-A performance
a booming factor of 10 and then did 4x more in the next release. It was a fun time
and things were moving quickly.

From the mid-90s through to around
2005, the database world went through what I refer to as the dark ages. DBMS code
bases had grown to the point where the smallest was more than 4 million lines of code,
the commercial system engineering teams would no longer fit in a single building,
and the number of database companies shrunk throughout the entire period down to only
3 major players. The pace of innovation was glacial and much of the research during
the period was, in the words of Bruce Lindsay, “polishing the round ball”. The problem
was that the products were actually passably good, customers didn’t have a lot of
alternatives, and nothing slows innovation like large teams with huge code bases.

In the last 5 years, the database
world has become exciting again. I’m seeing more opportunity in the database world
now than any other time in the last 20 years. It’s now easy to get venture funding
to do database products and the number of and diversity of viable products is exploding.
My talk focused on what changed, why it happened, and some of the technical backdrop
influencing.

A background thesis of the talk is that cloud
computing solves two of the primary reasons why customers used to be stuck standardizing
on a single database engine even though some of their workloads may have run poorly.
The first is cost. Cloud computing reduces costs dramatically (some of the cloud economics
argument: http://perspectives.mvdirona.com/2009/04/21/McKinseySpeculatesThatCloudComputingMayBeMoreExpensiveThanInternalIT.aspx)
and charges by usage rather than via annual enterprise license. One of the favorite
lock-ins of the enterprise software world is the enterprise license. Once you’ve signed
one, you are completely owned and it’s hard to afford to run another product. My
fundamental rule of enterprise software is that any company that can afford to give
you 50% to 80% reduction from “list price” is pretty clearly not a low margin operator.
That is the way much of the enterprise computing world continues to work: start with
a crazy price, negotiate down to a ½ crazy price, and then feel like a hero while
you contribute to incredibly high profit margins.

Cloud computing charges by the use in small
increments and any of the major database or open source offerings can be used at low
cost. That is certainly a relevant reason but the really significant factor is the
offloading of administrative complexity to the cloud provider. One
of the primary reasons to standardize on a single database is that each is so complex
to administer, that it’s hard to have sufficient skill on staff to manage more than
one. Cloud offerings like AWS
Relational Database Service transfer
much of the administrative work to the cloud provider making it easy to chose the
database that best fits the application and to have many specialized engines in use
across a given company.

As costs fall, more workloads
become practical and existing workloads get larger. For
example, If analyzing three months of customer usage data has value to the business
and it becomes affordable to analyze two years instead, customers correctly want to
do it. The plunging cost of computing is fueling database size growth at a super-Moore
pace requiring either partitioned (sharded) or parallel DB engines.

Customers now have larger and
more complex data problems, they need the products always online, and they are now
willing to use a wide variety of specialized solutions if needed. Data intensive workloads
are growing quickly and never have there been so many opportunities and so many unsolved
or incompletely solved problems. It’s a great time to be working on database systems.

· The
slides from the talk: http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_Sigmod2011Keynote.pdf

· Proceedings
extended abstract: http://www.sigmod2011.org/keynote_1.shtml

· Video
of talk: https://services.choruscall.eu/links/sigmod1106.html# (select

Unlimited-Data. moved to lab.itbee.vn