Sunday, 29 September 2013

Cajo, the easiest way to accomplish distributed computing in Java | Java Code Geeks

From Evernote:

Cajo, the easiest way to accomplish distributed computing in Java | Java Code Geeks

Clipped from: http://www.javacodegeeks.com/2011/01/cajo-easiest-way-to-accomplish.html

Cajo, the easiest way to accomplish distributed computing in Java

by on January 27th, 2011 | Filed in: Enterprise Java Tags: ,
Derived from the introductory section of Jonas Boner's article "Distributed Computing Made Easy" posted on TheServerSide.com on May 1st 2006 :
"Distributed computing is becoming increasingly important in the world of enterprise application development. Today, developers continuously need to address questions like: How do you enhance scalability by scaling the application beyond a single node? How can you guarantee high-availability, eliminate single points of failure, and make sure that you meet your customer SLAs?
For many developers, the most natural way of tackling the problem would be to divide up the architecture into groups of components or services that are distributed among different servers. While this is not surprising, considering the heritage of CORBA, EJB, COM and RMI that most developers carry around, if you decide to go down this path then you are in for a lot of trouble. Most of the time it is not worth the effort and will give you more problems than it solves."
On the other hand, distributed computing and Java go together naturally. As the first language designed from the bottom up with networking in mind, Java makes it very easy for computers to cooperate. Even the simplest applet running in a browser is a distributed application, if you think about it. The client running the browser downloads and executes code that is delivered by some other system. But even this simple applet wouldn't be possible without Java's guarantees of portability and security: the applet can run on any platform, and can't sabotage its host.
The cajo project is a small library, enabling powerful dynamic multi-machine cooperation. It is a surprisingly easy to use yet unmatched in performance. It is a uniquely 'drop-in' distributed computing framework: meaning it imposes no structural requirements on your applications, nor source changes. It allows multiple remote JVMs to work together seamlessly, as one.
The project owner John Catherino claims "King Of the Mountain! ;-)" and challenges everyone who is willing to prove that there exists a distributed computing framework in Java that is equally flexible and as fast as cajo.
To tell you the truth, personally I am convinced by John's saying; and I strongly believe that you will be also if you just let me walk you through this client – server example. You will be amazed of how easy and flexible the cajo framework is :
The Server.java
01import gnu.cajo.Cajo; // The cajo implementation of the Grail
02
03public class Server {
04
05   public static class Test { // remotely callable classes must be public
06      // though not necessarily declared in the same class
07      private final String greeting;
08      // no silly requirement to have no-arg constructors
09      public Test(String greeting) { this.greeting = greeting; }
10      // all public methods, instance or static, will be remotely callable
11      public String foo(Object bar, int count) {
12         System.out.println("foo called w/ " + bar + ' ' + count + " count");
13         return greeting;
14      }
15      public Boolean bar(int count) {
16         System.out.println("bar called w/ " + count + " count");
17         return Boolean.TRUE;
18      }
19      public boolean baz() {
20         System.out.println("baz called");
21         return true;
22      }
23      public String other() { // functionality not needed by the test client
24         return "This is extra stuff";
25      }
26   } // arguments and return objects can be custom or common to server and client
27
28   public static void main(String args[]) throws Exception { // unit test
29      Cajo cajo = new Cajo(0);
30      System.out.println("Server running");
31      cajo.export(new Test("Thanks"));
32   }
33}
Compile via:
1javac -cp cajo.jar;. Server.java
Execute via:
1java -cp cajo.jar;. Server
As you can see with just 2 commands :
1Cajo cajo = new Cajo(0);
2cajo.export(new Test("Thanks"));
we can expose any POJO (Plain Old Java Object) as a distributed service!
And now the Client.java
01import gnu.cajo.Cajo;
02
03import java.rmi.RemoteException; // caused by network related errors
04
05interface SuperSet {  // client method sets need not be public
06   void baz() throws RemoteException;
07} // declaring RemoteException is optional, but a nice reminder
08
09interface ClientSet extends SuperSet {
10   boolean bar(Integer quantum) throws RemoteException;
11   Object foo(String barbaz, int foobar) throws RemoteException;
12} // the order of the client method set does not matter
13
14public class Client {
15   public static void main(String args[]) throws Exception { // unit test
16      Cajo cajo = new Cajo(0);
17      if (args.length > 0) { // either approach must work...
18         int port = args.length > 1 ? Integer.parseInt(args[1]) : 1198;
19         cajo.register(args[0], port);
20         // find server by registry address & port, or...
21      } else Thread.currentThread().sleep(100); // allow some discovery time
22
23      Object refs[] = cajo.lookup(ClientSet.class);
24      if (refs.length > 0) { // compatible server objects found
25         System.out.println("Found " + refs.length);
26         ClientSet cs = (ClientSet)cajo.proxy(refs[0], ClientSet.class);
27         cs.baz();
28         System.out.println(cs.bar(new Integer(77)));
29         System.out.println(cs.foo(null, 99));
30      } else System.out.println("No server objects found");
31      System.exit(0); // nothing else left to do, so we can shut down
32   }
33}
Compile via:
1javac -cp cajo.jar;. Client.java
Execute via:
1java -cp cajo.jar;. Client
The client can find server objects either by providing the server address and port (if available) or by using multicast. To locate the appropriate server object "Dynamic Client Subtyping" is used. For all of you who do not know what "Dynamic Client Subtyping" stands for, John Catherino explains in his relevant blog post :
"Oftentimes service objects implement a large, rich interface. Other times service objects implement several interfaces, grouping their functionality into distinct logical concerns. Quite often, a client needs only to use a small portion of an interface; or perhaps some methods from a few of the logical grouping interfaces, to satisfy its own needs.
The ability of a client to define its own interface, from ones defined by the service object, is known as subtyping in Java. (in contrast to subclassing) However, unlike conventional Java subtyping; Dynamic Client Subtyping means creating an entirely different interface. What makes this subtyping dynamic, is that it works with the original, unmodified service object.
This can be a very potent technique, for client-side complexity management."
Isn't that really cool??? We just have to define the interface our client "needs" to use and locate the appropriate server object that complies with the client specification. The following command derived from our example accomplish just that :
1Object refs[] = cajo.lookup(ClientSet.class);
Last but not least we can create a client side "proxy" of the server object and remotely invoke its methods just like an ordinary local object reference, by issuing the following command :
1ClientSet cs = (ClientSet)cajo.proxy(refs[0], ClientSet.class);
That's it. These allow for complete interoperability between distributed JVMs. It just can't get any easier than this.
As far as performance is concerned, I have conducted some preliminary tests on the provided example and achieved an average score of 12000 TPS on the following system :
Sony Vaio with the following characteristics :
  • System : openSUSE 11.1 (x86_64)
  • Processor (CPU) : Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz
  • Processor Speed : 1,200.00 MHz
  • Total memory (RAM) : 2.8 GB
  • Java : OpenJDK 1.6.0_0 64-Bit
For your convenience I provide the code snippet that I used to perform the stress test :
1int repeats = 1000000;
2long start = System.currentTimeMillis();
3for(int i = 0; i < repeats;i ++)
4  cs.baz();
5System.out.println("TPS : " + repeats/((System.currentTimeMillis() - start)/1000d));
Happy Coding! and Don't forget to share!
Justin
Related Articles :
Related Whitepaper:

Java EE 6 Cookbook for Securing, Tuning, and Extending Enterprise Applications

Java Platform, Enterprise Edition is a widely used platform for enterprise server programming in the Java programming language.
This book covers exciting recipes on securing, tuning and extending enterprise applications using a Java EE 6 implementation.The book starts with the essential changes in Java EE 6. Then they will dive into the implementation of some of the new features of the JPA 2.0 specification, and look at implementing auditing for relational data stores.They will then look into how they can enable security for their software system using Java EE built-in features as well as using the well-known Spring Security framework. They will then look at recipes on testing various Java EE technologies including JPA, EJB, JSF, and Web services.Next they will explore various ways to extend a Java EE environment with the use of additional dynamic languages as well as frameworks.At the end of the book, they will cover managing enterprise application deployment and configuration, and recipes that will help you debug problems and enhance the performance of your applications.
.
.
Share and enjoy!
.

Thursday, 19 September 2013

about | Datamob: Public data put to good use

From Evernote:

about | Datamob: Public data put to good use

Clipped from: http://datamob.org/about

Public data put to good use.

Datamob aims to show, in a very simple way, how public data sources can be used .

We believe good things happen when governments and public institutions make data available in developer-friendly formats. Things that can help save us from bad government and bad decisions.

We're out to find the good things, and get developers excited about the data.

You can help. Contribute high-quality public data sources and apps . Post a link to the data behind an app. Build an app on top of a data source and post a comment about it.

Follow along on Twitter . RSS feeds are available for data , apps , resources , everything . JSON , too. Questions? Get in touch .

Datamob was built with Rails and Heroku in 2008 in coffee shops around New York City by Sean Flannagan and Lauren Sperber .

Content on Datamob is licensed under Creative Commons Attribution-Share Alike 3.0 [14].

Datamob currently lists 227 data sources [15], 165 apps [16] and 66 resources [17], which are categorized by 67 tags. It's sort of like this:

 

All Our N-gram are Belong to You


All Our N-gram are Belong to You 

Posted by Alex Franz and Thorsten Brants, Google Machine Translation Team

Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation , speech recognition,spelling correction , entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google's datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there's no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more - resulting in a training corpus of one trillion words from public Web pages.

We believe that the entire research community can benefit from access to such massive amounts of data. It will advance the state of the art, it will focus research in the promising direction of large-scale, data-driven approaches, and it will allow all research groups, no matter how large or small their computing resources, to play together. That's why we decided to share this enormous dataset with everyone. We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times.

Watch for an announcement at the Linguistics Data Consortium (LDC ), who will be distributing it soon, and then order your set of 6 DVDs. And let us hear from you - we're excited to hear what you will do with the data, and we're always interested in feedback about this dataset, or other potential datasets that might be useful for the research community.

Update (22 Sept. 2006): The LDC now has the data available in their catalog.The counts are as follows:
File sizes: approx. 24 GB compressed (gzip'ed) text files

Number of tokens:    1,024,908,267,229
Number of sentences:    95,119,665,584
Number of unigrams:         13,588,391
Number of bigrams:         314,843,401
Number of trigrams:        977,069,902
Number of fourgrams:     1,313,818,354
Number of fivegrams:     1,176,470,663

The following is an example of the 3-gram data contained this corpus:
ceramics collectables collectibles 55
ceramics collectables fine 130
ceramics collected by 52
ceramics collectible pottery 50
ceramics collectibles cooking 45
ceramics collection , 144
ceramics collection . 247
ceramics collection 
120
ceramics collection and 43
ceramics collection at 52
ceramics collection is 68
ceramics collection of 76
ceramics collection | 59
ceramics collections , 66
ceramics collections . 60
ceramics combined with 46
ceramics come from 69
ceramics comes from 660
ceramics community , 109
ceramics community . 212
ceramics community for 61
ceramics companies . 53
ceramics companies consultants 173
ceramics company ! 4432
ceramics company , 133
ceramics company . 92
ceramics company 41
ceramics company facing 145
ceramics company in 181
ceramics company started 137
ceramics company that 87
ceramics component ( 76
ceramics composed of 85
ceramics composites ferrites 56
ceramics composition as 41
ceramics computer graphics 51
ceramics computer imaging 52
ceramics consist of 92

The following is an example of the 4-gram data in this corpus:
serve as the incoming 92
serve as the incubator 99
serve as the independent 794
serve as the index 223
serve as the indication 72
serve as the indicator 120
serve as the indicators 45
serve as the indispensable 111
serve as the indispensible 40
serve as the individual 234
serve as the industrial 52
serve as the industry 607
serve as the info 42
serve as the informal 102
serve as the information 838
serve as the informational 41
serve as the infrastructure 500
serve as the initial 5331
serve as the initiating 125
serve as the initiation 63
serve as the initiator 81
serve as the injector 56
serve as the inlet 41
serve as the inner 87
serve as the input 1323
serve as the inputs 189
serve as the insertion 49
serve as the insourced 67
serve as the inspection 43
serve as the inspector 66
serve as the inspiration 1390
serve as the installation 136
serve as the institute 187
serve as the institution 279
serve as the institutional 461
serve as the instructional 173
serve as the instructor 286
serve as the instructors 161
serve as the instrument 614
serve as the instruments 193
serve as the insurance 52
serve as the insurer 82
serve as the intake 70
serve as the integral 68