Monday, 29 November 2010

Design — Sheepdog Project

Design — Sheepdog Project


The architecture of Sheepdog is fully symmetric; there is no central node such as a meta-data server. This design enables following features.
  • Linear scalability in performance and capacity
    When more performance or capacity is needed, Sheepdog can be grown linearly by simply adding new machines to the cluster.
  • No single point of failure
    Even if a machine fails, the data is still accessible through other machines.
  • Easy administration
    There is no config file about cluster’s role. When administrators launch Sheepdog programs at the newly added machine, Sheepdog automatically detects the added machine and begins to configure it as a member of the cluster.

Architecture

Sheepdog is a storage system that provides a simple key-value interface to Sheepdog client (qemu block driver). Sheepdog is consists of multiple nodes.
Compare Sheepdog architecture and a regular cluster file system architecture
Sheepdog consists of only one server (we call collie) and patched QEMU/KVM.
Sheepdog components

Virtual Disk Image (VDI)

A Sheepdog client divides a VM image into fixed-size objects (4 MB by default) and store them on the distributed storage system. Each object is identified by globally unique 64 bit id, and replicated to multiple nodes.
Virtual disk image

Object

Sheepdog objects are grouped into two types.
  • VDI Object: A VDI object contains metadata for a VM image such as image name, disk size, creation time, etc.
  • Data Object: A VM images is divided into a data object. Sheepdog client generally access this object.
Sheepdog uses consistent hashing to decide where objects store. Consistent hashing is a scheme that provides hash table functionality, and the addition or removal of nodes does not significantly change the mapping of objects. I/O load is balanced across the nodes by features of hash table. A mechanism of distributing the data not randomly but intelligently is a future work.
Each node is placed on consistent hashing ring based on its own id. To determine where to store the object, Sheepdog client gets the object id, finds the corresponding point on the ring, and walk clockwise to determine the target nodes.
Consistent hashing

VDI Operation

In most cases, Sheepdog clients can access their images independently because we do not allow for clients to access the same image at the same time. But some VDI operations (e.g. cloning VDI, locking VDI) must be done exclusively because the operations updating global information. To implement this in the highly available system, we use a group communication system (GCS). Group communication systems provide specific guarantees such as total ordering of messages. We use corosync, one of most famous GCS.
Cluster communication

Friday, 26 November 2010

Scalability | Harvard Computer Science Lecture

Scalability | Harvard Computer Science Lecture: "


Watch it on Academic Earth

LECTURE DESCRIPTION

Professor David J. Malan discusses scalability as it pertains to building dynamic websites.

COURSE DESCRIPTION

Today's websites are increasingly dynamic. Pages are no longer static HTML files but instead generated by scripts and database calls. User interfaces are more seamless, with technologies like Ajax replacing traditional page reloads. This course teaches students how to build dynamic websites with Ajax and with Linux, Apache, MySQL, and PHP (LAMP), one of today's most popular frameworks. Students learn how to set up domain names with DNS, how to structure pages with XHTML and CSS, how to program in JavaScript and PHP, how to configure Apacheand MySQL, how to design and query databases with SQL, how to use Ajax with both XML andJSON, and how to build mashups. The course explores issues of security, scalability, and cross-browser support and also discusses enterprise-level deployments of websites, including third-party hosting, virtualization, colocation in data centers, firewalling, and load-balancing.

Tuesday, 23 November 2010

ZooKeeper Promoted to Apache Top Level Project

ZooKeeper Promoted to Apache Top Level Project: "
ZooKeeper, the centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services, that started under Hadoop umbrella has been promoted to an Apache Top Level Project, according to the ☞ report sent out by Doug Cutting.

In case you are wondering what it means, simply put it’s a proof of the project maturity and its community to be able to ensure the project future. On the other hand, if Hadoop, HBase, and ZooKeeper communities will not coordinate their efforts, it might mean more work for its users to match and test versions when using it together with Hadoop, HBase.


Original title and link: ZooKeeper Promoted to Apache Top Level Project (NoSQL databases © myNoSQL)

"

Another NoSQL Comparison: Evaluation Guide

Another NoSQL Comparison: Evaluation Guide: "Another NoSQL Comparison: Evaluation Guide:
The requirements were clear:


  • Fast data insertion.
  • Extremely fast random reads on large datasets.
  • Consistent read/write speed across the whole data set.
  • Efficient data storage.
  • Scale well.
  • Easy to maintain.
  • Have a network interface.
  • Stable, of course.

The list of NoSQL databases to be compared: Tokyo Cabinet, BerkleyDB, MemcacheDB, Project Voldemort, Redis, and MongoDB, not so clear.

The methodology to evaluate and the results definitely not clear at all.


NoSQL Comparison Guide / A review of Tokyo Cabinet, Tokyo Tyrant, Berkeley DB, MemcacheDB, Voldemort, Redis, MongoDB


And the conclusion is quite wrong:


Although MongoDB is the solution for most NoSQL use cases, it’s not the only solution for all NoSQL needs.



Original title and link: Another NoSQL Comparison: Evaluation Guide (NoSQL databases © myNoSQL)

"

Tuesday, 16 November 2010

Videos from Hadoop World

Videos from Hadoop World: "
There was one NoSQL conference that I’ve missed and I was really pissed off: Hadoop World. Even if I’ve followed and curated the Twitter feed, resulting in Hadoop World in tweets, the feeling of not being there made me really sad. But now, thanks to Cloudera I’ll be able to watch most of the presentations. Many of them have already been published and the complete list can be found ☞ here.

Based on the twitter activity on that day, I’ve selected below the ones that seemed to have generated most buzz. The list contains names like Facebook, Twitter, eBay, Yahoo!, StumbleUpon, comScore, Mozilla, AOL. And there are quite a few more …




HBase in production at Facebook


Presented by Jonathan Gray (Facebook)



HBase in Production at Facebook, Jonathan Gray, Facebook


The Hadoop Ecosystem at Twitter


Presented by Kevin Weil (Twitter)



The Hadoop Ecosystem at Twitter, Kevin Weil, Twitter




Hadoop at eBay


Presented by Anil Madan (eBay)



Hadoop at eBay, Anil Madan, eBay





A Fireside Chat: Using Hadoop to Tackle Big Data at comScore


Presented by Martin Hall (Karmasphere) and Will Duckworth (comScore)




A Fireside Chat: Using Hadoop to Tackle Big Data at comScore, Martin Hall, Karmasphere and Will Duckworth, comScore




ScaleIn Collecting and Querying Log Data in Near Real-time


Presented by Anurag Phadke (Firefox)



ScaleIn Collecting and Querying Log Data in Near Real-time, Anurag Phadke, Firefox





AOL’s Data Layer


Presented by Ian Holsman (AOL)



AOL’s Data Layer, Ian Holsman, AOL





Hadoop Based Intelligent Text Information Processing System


Presented by Vaijanath Rao (AOL) and Rohini Uppuluri (AOL)



Intelligent Text Information Processing System, Vaijanath Rao and Rohini Uppuluri, AOL





Mixing Real-Time Needs and Batch Processing: How StumbleUpon Built an Advertising Platform using HBase and Hadoop


Presented by Jean-Daniel Cryans (StumbleUpon)



Mixing Real-Time Needs and Batch Processing: How StumbleUpon Built an Advertising Platform using HBase and Hadoop, Jean-Daniel C





Hadoop at Yahoo! Ready for Business


Presented by Arun C. Murthy (Yahoo!)



Hadoop at Yahoo! Ready for Business, Arun C. Murthy, Yahoo!




Apache ZooKeeper at Yahoo!


Presented by Mahadev Konar (Yahoo!)



Apache ZooKeeper at Yahoo!, Mahadev Konar, Yahoo






And having in mind names like Bank of America, Orbitz, CME, Infochimps, sematext, I bet you can find ☞ many more. So, I guess now we have videos for at least a few days.


Thanks Cloudera!


Original title and link: Videos from Hadoop World (NoSQL databases © myNoSQL)

"