Unlimited-Data. moved to lab.itbee.vn

Monday, 29 November 2010

Design — Sheepdog Project

The architecture of Sheepdog is fully symmetric; there is no central node such as a meta-data server. This design enables following features.

Linear scalability in performance and capacity
When more performance or capacity is needed, Sheepdog can be grown linearly by simply adding new machines to the cluster.

No single point of failure

Even if a machine fails, the data is still accessible through other machines.

Easy administration

There is no config file about cluster’s role. When administrators launch Sheepdog programs at the newly added machine, Sheepdog automatically detects the added machine and begins to configure it as a member of the cluster.

Architecture

Sheepdog is a storage system that provides a simple key-value interface to Sheepdog client (qemu block driver). Sheepdog is consists of multiple nodes.

Sheepdog consists of only one server (we call collie) and patched QEMU/KVM.

Virtual Disk Image (VDI)

A Sheepdog client divides a VM image into fixed-size objects (4 MB by default) and store them on the distributed storage system. Each object is identified by globally unique 64 bit id, and replicated to multiple nodes.

Object

Sheepdog objects are grouped into two types.

VDI Object: A VDI object contains metadata for a VM image such as image name, disk size, creation time, etc.

Data Object: A VM images is divided into a data object. Sheepdog client generally access this object.

Sheepdog uses consistent hashing to decide where objects store. Consistent hashing is a scheme that provides hash table functionality, and the addition or removal of nodes does not significantly change the mapping of objects. I/O load is balanced across the nodes by features of hash table. A mechanism of distributing the data not randomly but intelligently is a future work.

Each node is placed on consistent hashing ring based on its own id. To determine where to store the object, Sheepdog client gets the object id, finds the corresponding point on the ring, and walk clockwise to determine the target nodes.

VDI Operation

In most cases, Sheepdog clients can access their images independently because we do not allow for clients to access the same image at the same time. But some VDI operations (e.g. cloning VDI, locking VDI) must be done exclusively because the operations updating global information. To implement this in the highly available system, we use a group communication system (GCS). Group communication systems provide specific guarantees such as total ordering of messages. We use corosync, one of most famous GCS.

Saturday, 27 November 2010

20 Things I Learned About Browsers and the Web

Netflix in the Cloud

View more presentations from Adrian Cockcroft.

Friday, 26 November 2010

Scalability | Harvard Computer Science Lecture

Scalability | Harvard Computer Science Lecture: "

Watch it on Academic Earth

LECTURE DESCRIPTION

Professor David J. Malan discusses scalability as it pertains to building dynamic websites.

COURSE DESCRIPTION

Today's websites are increasingly dynamic. Pages are no longer static HTML files but instead generated by scripts and database calls. User interfaces are more seamless, with technologies like Ajax replacing traditional page reloads. This course teaches students how to build dynamic websites with Ajax and with Linux, Apache, MySQL, and PHP (LAMP), one of today's most popular frameworks. Students learn how to set up domain names with DNS, how to structure pages with XHTML and CSS, how to program in JavaScript and PHP, how to configure Apacheand MySQL, how to design and query databases with SQL, how to use Ajax with both XML andJSON, and how to build mashups. The course explores issues of security, scalability, and cross-browser support and also discusses enterprise-level deployments of websites, including third-party hosting, virtualization, colocation in data centers, firewalling, and load-balancing.

COURSE INDEX

HTTP

JavaScript (continued)

Tuesday, 23 November 2010

ZooKeeper Promoted to Apache Top Level Project

ZooKeeper Promoted to Apache Top Level Project: "
ZooKeeper, the centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services, that started under Hadoop umbrella has been promoted to an Apache Top Level Project, according to the ☞ report sent out by Doug Cutting.

In case you are wondering what it means, simply put it’s a proof of the project maturity and its community to be able to ensure the project future. On the other hand, if Hadoop, HBase, and ZooKeeper communities will not coordinate their efforts, it might mean more work for its users to match and test versions when using it together with Hadoop, HBase.

Original title and link: ZooKeeper Promoted to Apache Top Level Project (NoSQL databases © myNoSQL)

Another NoSQL Comparison: Evaluation Guide

Another NoSQL Comparison: Evaluation Guide: "Another NoSQL Comparison: Evaluation Guide:
The requirements were clear:

Fast data insertion.

Extremely fast random reads on large datasets.

Consistent read/write speed across the whole data set.

Efficient data storage.

Scale well.

Easy to maintain.

Have a network interface.

Stable, of course.

The list of NoSQL databases to be compared: Tokyo Cabinet, BerkleyDB, MemcacheDB, Project Voldemort, Redis, and MongoDB, not so clear.

The methodology to evaluate and the results definitely not clear at all.

NoSQL Comparison Guide / A review of Tokyo Cabinet, Tokyo Tyrant, Berkeley DB, MemcacheDB, Voldemort, Redis, MongoDB

And the conclusion is quite wrong:

Although MongoDB is the solution for most NoSQL use cases, it’s not the only solution for all NoSQL needs.

Original title and link: Another NoSQL Comparison: Evaluation Guide (NoSQL databases © myNoSQL)

Tuesday, 16 November 2010

Videos from Hadoop World

Videos from Hadoop World: "
There was one NoSQL conference that I’ve missed and I was really pissed off: Hadoop World. Even if I’ve followed and curated the Twitter feed, resulting in Hadoop World in tweets, the feeling of not being there made me really sad. But now, thanks to Cloudera I’ll be able to watch most of the presentations. Many of them have already been published and the complete list can be found ☞ here.

Based on the twitter activity on that day, I’ve selected below the ones that seemed to have generated most buzz. The list contains names like Facebook, Twitter, eBay, Yahoo!, StumbleUpon, comScore, Mozilla, AOL. And there are quite a few more …

HBase in production at Facebook

Presented by Jonathan Gray (Facebook)

HBase in Production at Facebook, Jonathan Gray, Facebook

The Hadoop Ecosystem at Twitter

Presented by Kevin Weil (Twitter)

The Hadoop Ecosystem at Twitter, Kevin Weil, Twitter

Twitter - Kevin Weil - Hadoop World 2010

Hadoop at eBay

Presented by Anil Madan (eBay)

Hadoop at eBay, Anil Madan, eBay

View more presentations from Cloudera, Inc..

A Fireside Chat: Using Hadoop to Tackle Big Data at comScore

Presented by Martin Hall (Karmasphere) and Will Duckworth (comScore)

A Fireside Chat: Using Hadoop to Tackle Big Data at comScore, Martin Hall, Karmasphere and Will Duckworth, comScore

comScore - Will Duckworth - Hadoop World 2010

ScaleIn Collecting and Querying Log Data in Near Real-time

Presented by Anurag Phadke (Firefox)

ScaleIn Collecting and Querying Log Data in Near Real-time, Anurag Phadke, Firefox

Mozilla - Anurag Phadke - Hadoop World 2010

AOL’s Data Layer

Presented by Ian Holsman (AOL)

AOL’s Data Layer, Ian Holsman, AOL

AOL - Ian Holsman - Hadoop World 2010

Hadoop Based Intelligent Text Information Processing System

Presented by Vaijanath Rao (AOL) and Rohini Uppuluri (AOL)

Intelligent Text Information Processing System, Vaijanath Rao and Rohini Uppuluri, AOL

AOL - Rao & Uppuluri - Hadoop World 2010

Mixing Real-Time Needs and Batch Processing: How StumbleUpon Built an Advertising Platform using HBase and Hadoop

Presented by Jean-Daniel Cryans (StumbleUpon)

Mixing Real-Time Needs and Batch Processing: How StumbleUpon Built an Advertising Platform using HBase and Hadoop, Jean-Daniel C

StumbleUpon - Jean-Daniel Cryans - Hadoop World 2010

Hadoop at Yahoo! Ready for Business

Presented by Arun C. Murthy (Yahoo!)

Hadoop at Yahoo! Ready for Business, Arun C. Murthy, Yahoo!

Yahoo! - Arun Murthy - Hadoop World 2010

Apache ZooKeeper at Yahoo!

Presented by Mahadev Konar (Yahoo!)

Apache ZooKeeper at Yahoo!, Mahadev Konar, Yahoo

Yahoo! - Mahadev Konar - Hadoop World 2010

And having in mind names like Bank of America, Orbitz, CME, Infochimps, sematext, I bet you can find ☞ many more. So, I guess now we have videos for at least a few days.

Thanks Cloudera!

Original title and link: Videos from Hadoop World (NoSQL databases © myNoSQL)