WTF is Elastic Data Grid? (By Example)
    Forrester released their new wave report:  The  Forrester Wave™: Elastic Caching Platforms, Q2 2010 where they  listed GigaSpaces, IBM, Oracle, and Terracotta as leading vendors in the  field. In this post I'd like to take some time to explain what some of  these terms mean, and why they’re important to you. I’ll start with a  definition of Elastic Data Grid (Elastic Caching), how it is different  then other caching and NoSQL alternatives, and more importantly -- I'll  illustrate how it works through some real code examples.
Let’s start…
Now watch what happens to our data-grid instances -- but not only to the data-grid instances but to our containers (processes) as well. Let me explain:
When we un-deploy the data-grid the first thing that happens is that the partitions (the actual data content) get un-deployed freeing the memory within the hosting container. Another thing that happens in the background is that the Elastic Manager notices that we no longer need to run those container processes, and is begins freeing up our cloud environment from those processes automatically… You can watch the ESM logs as this happens.
Let’s start…
Elastic Data Grid (Caching) Platform Defined
Software infrastructure that provides application developers with data caching and code execution services that are distributed across two or more server nodes that 1) consistently perform as volumes grow; 2) can be scaled without downtime; and 3) provide a range of fault-tolerance levels.I personally feel more comfortable with the term Data Grid then term Cache, because Data Grid represents the fact that it's more then a passive data service, but rather one that was designed to execute code. Note that I use the term Elastic Data Grid (Cache) throughout this post instead of just Elastic Cache.
Source: Forrester
Local Cache, Distributed Cache, and Elastic Data Grid (Cache)
Mike Gualtieri   Forrester Senior Analyst provides good coverage on this topic in two  posts. In the first one, titled Fast  Access To Data Is The Primary Purpose Of Caching, he explains the  difference between local cache (a cache that lives within your  application process), distributed cache (remote data that runs on  multiple servers (from 2 up to 1000s)), and elastic cache (a distributed  cache that can scale on demand). 
It is important to note that most Data  Grids support multiple cache topologies that include local caches in  conjunction with distributed caches. The use of a local cache as a front  end to a distributed cache enables you to reduce the number of remote  calls in scenarios where you access the same data frequently. In  GigaSpaces these components are referred to as LocalCache  or LocalView.
Elastic Data Grid and NoSQL
Mike  also provides a good description covering the difference between Elastic  Data Grid (Cache) and NoSQL alternatives in his second post on this  topic, The  NoSQL Movement Is Gaining Momentum, But What The Heck Is It? 
Ultimately the real difference between NoSQL and elastic caching now may be in-memory versus persistent storage on disk.
You can also  find more on this in my recent post on this subject: The  Common Principles Behind the NOSQL Alternatives
The main difference between the two approaches comes down mostly to cost/GB of data and read/write performance.
…
An analysis done recently by Stanford University, called “The Case for RAMCloud” provides an interesting comparison between the disk and memory-based approaches, in terms of cost performance. In general, it shows that cost is also a function of performance. For low performance, the cost of the disk is significantly lower then RAM-based approach, and with higher performance requirements, the RAM becomes x100 to x1000 times cheaper then regular disk.
The Elastic Data Grid as a Component of Elastic Middleware
In my opinion, an Elastic Data Grid is only  one component in a much broader category that applies to all middleware  services, for example: Amazon SimpleDB, SQS, MapReduce (Messaging, Data,  Processing). The main difference between this sort of middleware and  traditional middleware is that the entire deployment and production  complexity is curved out by the service provider. Next-generation  Elastic Middleware should follow the same line of thought. For this  reason, I think that the Forester definition of the Elastic Data Grid  (Caching) category should also include attributes such as On-Demand and  Multi-tenancy as I describe below:
-  On-demand -– With Elastic Data Grid you don’t need to go through a manual process of installing and setting up the services cluster before you use it. Instead, the service is created by an API call. The actual physical resources are allocated based on required SLA (provided by the user) such as Capacity-Range, Scaling threshold, etc. It is the responsibility of the middleware service to manage the provisioning and allocation of those resources.
-  Multi-tenancy -- Elastic Data Grid is built provides built in support multi-tenancy. Different users can have their own data-grid reference while at the same time the data-grid share the resources amongst themselves. Users that want to share the same data-grid instance without sharing the data can also do that through a specialized content based security authorization.
-  Auto-scaling -– Users of the elastic data grid need to be able to grow the capacity of the data grid in an on-demand basis, at the same time the elastic data-grid should automatically scale-down when the resources are not needed anymore.
-  Always-on – In an event of a failure the elastic data-grid should handle the failure implicitly and provision new alternate resources if needed while the application is still running.
Elastic Data Grid by Example (Using GigaSpaces)
To illustrate how these characteristics  works in practice, I'll use the new GigaSpaces Elastic  Middleware introduced in our latest product release.
On-Demand
As with the cloud –- when we want to use a  service like SimpleDB or any other service we don’t set up an entire  cluster ourselves. We expect that the service will be there when we call  it. Let’s see how that works.
Step 1 -– Create your own cloud
This is a one-time setup for all your  users. To create your own cloud with GigaSpaces all you need to do is  make sure that a GigaSpaces agent runs in each machine that belongs to  the cloud. The agent is a lightweight process that enables interaction  with the machine and execute provisioning commands when needed. The  following snippet shows how to start the Elastic Middleware manager and  agent. In our particular case, we would start a single manager for the  entire cluster and an agent per machine. If you happen to use Virtual  Machine you can bundle the agent with each VM in exactly the same way as  you would do with regular machines. The snippet below shows the agent  command would look like:
>gs-agent gsa.global.esm 1 gsa.gsc 0What this command does is start an agent with one Elastic Manager. Note that the use of the word global means that the number of Elastic Manager (ESM) instances refers to the entire cluster and not to a particular machine. The agents will use an internal election mechanism to make sure that in each point in time there would be a single manager instance available. If the manager failed a new one will be launched automatically on one of the other agent machines.
Step 2 -- Start the Elastic Data Grid On-Demand
In this step we write a simple Java client  that calls the elastic middleware and requests a data-grid service with  1G at minimum and up to 2G (I used these numbers just an illustration  that works in a single machine; in a real-life scenario we would expect  10-100G per data-grid).
ProcessingUnit pu = esm.deploy( new ElasticDataGridDeployment("mygrid") .capacity("1G", "2G") .highlyAvailable(false) .publicDeploymentIsolation() .maximumJavaHeapSize("250m") .initialJavaHeapSize("250m"));
Value  Descriptions
-   Capacity -– "Minimum", "Maximum" memory to be provisioned for the entire cluster. In cases where highlyAvailable is set to true, the Capacity value should take into account the double capacity required for the backups. In our case that means that setting 1G capacity with highAvliabilty will translate to 512m for data and 512m for backup. Note also that the capacity number defines the capacity for the entire data-grid cluster and not just a single instance. The difference between the Min-Max will determine the level of elasticity that we expect for our deployment.
-   High Availability -– When set to true, the elastic manager will provision a backup instance per partition. It will ensure that primary and backup instances wouldn’t run on the same machine. (Note that if you're running a single machine you should turn this argument to false, as otherwise the provisioning will wait infinitely until a backup machine will become available).
-   Public Deployment/Isolation –- This is basically one of three isolation/multi-tenancy modes that you can choose for your deployment. The default is dedicatedDeploymentIsolation which means that only one deployment can run on the same machine. In this case we chose public isolation which means that different users share the same machine for deploying multiple data-grid instances.
-   Maximum/Initial Heap Size -– This argument defines the capacity per deployment container. The value will be translated into the –Xms, –Xmx JVM argument for each container process.
Using the data grid
Once we have a reference to the data-grid  processing unit, we can start and use it.
The next snippet shows how we can write  1000 instances of a a simple Data POJO.
The first step is to get the remote  reference of GigaSpace from our deployment.
System.out.println("Waiting for data grid deployment completion .."); Space space = pu.waitForSpace(); space.waitFor(space.getTotalNumberOfInstances()); GigaSpace gigaSpace = space.getGigaSpace();Note for the use of waitFor(..) method. The first call waits until the first instance of the space has been deployed. The second call space.waitFor(space.getTotalNumberOfInstances()) waits until the entire cluster has been provisioned successfully.
//Write some data... System.out.println("Writing 1000 objects"); for (long i=0;i<1000;i++) { gigaSpace.write(new Data(i,"message" + i)); }
And here is the Data object for  our simple test..
@SpaceClass public class Data implements Serializable { private Long id; private String data; public Data(long id, String data) { this.id = id; this.data = data; } @SpaceRouting @SpaceId public Long getId() { return id; } .... setId(), setData(), getData() should appear here.. }
Note for the use of @SpaceRouting  annotation. Space routing is used to set the routing index, or in other  words, it marks the key that determines which target partition this  instance should belong to. @SpaceId  determines the unique identifier for this instance.
Step 3 -- Give It a Go
Now that we have our cloud running (Step 1)  and the client application ready, we're ready to try out our  application. All we need to do is run our client application and verify  that it works. 
If everything works, we should get the  following output:
Deploying Data Grid.. Log file: D:\GigaSpaces\gigaspaces-xap-premium-7.1.1-rc… Waiting for data grid deployment completion .. Writing 1000 objects Done!
Now, if we try to run the same application  again we would see that it runs much faster. The reason is simply  because in the first run we also provisioned our data grid, while in the  second run the elastic middleware skipped the deployment step as it  already found the data-grid services instance running.
Monitoring and Troubleshooting
To understand what happened behind the  scenes we need to familiarize ourselves with the GigaSpaces monitoring  tool.
You can use the gs-ui utility located in  the GigaSpaces bin directory to monitor the deployment at runtime.
You can look at the ESM (Elastic Service  Manager) view to see the provisioning and scaling event. You can also  view the Agent (GSA) view to see the actual Java command that is passed  to the agent to launch new containers. There are plenty of other views  and options that will enable you to view the actual data as well as the  provisioning steps.
 Understanding the Cluster Sizing
In a cloud environment we  normally wouldn't care too much how our service gets provisioned, in  this case we would need to better understand how the provisioning  algorithm works as we are going to be the owner of both the application  and our own little cloud.     
The initial provisioning will fork number of machines/processes based on the following inputs:- Minimum capacity requested by the user- JVM capacity per container- Actual amount of memory available per machine
Number of partitions: The number of partitions is calculated as follows: With High Availability set to false, Max Capacity/Max Heap Size in our case would be 2G/250m = 8 partitions. With High Availability set to true that would be 4 partitions and 4 backup (1 per partition).
Number of JVM Containers: In our example the minimum capacity is set to 1G where the capacity per deployment unit is set to 250m. This means that we need a minimum of 4 JVM containers. If we set High Availability to true we would need the same number of containers, but would be able to host only 512m of data, because the other 512m would serve as backup.
Number of machines: The initial number of machines is a factor of minimum capacity requirements and the actual memory available in each machine. In our example we deliberately chose numbers that fit into a single small machine. If the heap size per JVM container is smaller than the machine available heap size, the Elastic Manager will first try to host more containers to maximize the utilization per machine before it calls for a new machine.
Indeed, if we look at at the deployment  view of the gs-ui, we see that in the given case there are currently 8  partitioned being provisioned.
 If we examine the host view we see that  there are indeed 4 JVM containers being provisioned as expected for the  initial deployment.
Multi-Tenancy/AutoScaling, Always-On
Elastic middleware supports multi-tenancy and auto-scaling, and it will also keep our data-grid service always-on in case of failure. I've already covered in greater detail how multi-tenancy works here, as well as how high availability works, not just in a local environment but across a disaster recovery site as described here. That said, trying out these features in the context of this example shouldn’t be that hard. You are welcome to play with the isolation argument (Private, Public, Shared), deploy multiple data-grids in the same environment and see what happens. You can also try and break the machines and containers and see how the system heals itself while keeping your application running. You can also launch new containers explicitly (right-click the agent in the management console) or just start a new machine and see how the system scales-out on-demand.Scaling Down
One thing that is worth emphasizing is how the system scales-down (not just auto-scale). The simplest way to experience how this feature works is to un-deploy our entire data-grid, which you can do by simply calling the undeploy() method:pu.undeploy();Another option would from the Administration console: Select the deployment tab and un-deploy the data-grid service from the console.
Now watch what happens to our data-grid instances -- but not only to the data-grid instances but to our containers (processes) as well. Let me explain:
When we un-deploy the data-grid the first thing that happens is that the partitions (the actual data content) get un-deployed freeing the memory within the hosting container. Another thing that happens in the background is that the Elastic Manager notices that we no longer need to run those container processes, and is begins freeing up our cloud environment from those processes automatically… You can watch the ESM logs as this happens.
What’s Next?
Mike includes a roadmap for Elastic Data Grid (Caching) as illustrated in the diagram below.Since the report was compiled (the report  was based on cutoff date of last year) we've already developed a fairly  enhanced version of the Elastic Data Grid (Caching) platform. Towards  our 8.0 release we’ll have full support for complete application  deployment in that same model through the support of a processing unit.  In this model you basically deploy an application partition in exactly  the same way that you would have deployed a data-grid instance. This  will enable us to reuse the same multi-tenancy, auto-scaling and  self-healing abstraction that we use for the data-grid for our entire  application -- in the exact same way.
Note that current and earlier version of GigaSpaces XAP already  support full elastic application and data grid deployment through the  SerivceGrid. The Elastic Middleware provides a much higher level of  abstraction that makes elastic deployment much simpler to use and much  closer to the next-generation data center/Cloud vision. 
No comments:
Post a Comment