Thursday, January 26, 2012

My Experience of JudCon 2012 :Bangalore

So, finally I had made my mind to attend JudCon 2012 :Bangalore. After going through the list of sessions and speakers, I was very excited to be there. After a delayed flight from New Delhi, I reached at the hotel in Bangalore at 2:00 AM at night.

After a rush in the morning, we reached the venue little late missing the keynote by Bruno.

There were 3 sessions in parallel dedicated to AS7, cloud and JBPM/DROOLS/SOA/others.

First session I attended was : Infinispan :the path ahead by Manik Surtani (lead of JBoss Infinspan project). The session was excellent and Manik explained Infinispan wonderfully. This was a new thing for me and I was excited to know about the new Data grid by Jboss.

Next session I attended was of Greg on JPA and Hibernate. He explained some of the best practices to be used while using an ORM, I really liked his take on "skinny-objects".

The session was so interesting that people were happy to delay the lunch.

After a decent lunch and yummy dessert I decided to attend some sessions on cloud related technology.

At 1 PM the 3rd session of the day was on JBoss Data grid and on implementing LSI. To be honest I did not like the session as it got too boring and finally I lost track and opened by laptop :P

The next session in the category of Openshift/cloud was deploying apps on AS7. I could not understand why they had put this session in this category, perhaps to fill in one empty spot.

Suddenly a huge crowd poured in the assembly room, and it was jam packed.

Personally this session was again not very useful, the speaker just demonstrated the Quick starts available on JBoss AS7 site.

Disappointed I left the session early and headed for a nice cup of coffee.

After two back to back "not much useful" sessions I attended 5th session of the day by Galder, again on Infinispan.

This session was great, It was perfect fit after Manik's overview session in the morning. Infinispan was looking good to try hands on to.

Last session on the day was by the famous (already gathered attention by his Red-Hat :P ) Mark Atwood. The session could last only 30 mins, and was again a letdown.

The best part of the day was the Q&A sessions which were held from 5:30 to 7 PM in the evening. The big guys from Red-hat answered many questions related to Redhat, Jboss, Java, community for more than 90 mins.
This was followed by dinner and drinks hosted by Redhat, but we decided to skip it due to long queues.

..to be cont.. Day 2 experience

Saturday, July 23, 2011

JVM Tuning settings, the ultimate list

Let us begin with the very basic question, why do we need our JVM tuned up? And what exactly is JVM.

JVM, is the instance of Java Runtime Environment, which comes into action whenever you run your application

So, in order to make sure JVM runs fine we require that there is enough space for JVM to run programs. And hence we require timely check on the performance factors and space allotted to the JVM.

The crux of the matter: Heap and Garbage Collection

The Java Virtual Machine heap is the area of memory used by the JVM for dynamic memory allocation.

The heap is divided into generations:

1. Young Generation (Eden)--lived objects that are created and immediately garbage collected.

2. Old Generation(Tenured)- Objects that persist longer

3. Permanent Generation(PermGen)- class definitions and associated metadata

In order to configure the default sizes of heap parameter, we have some parameters which can be set at the time of starting up of application server or in the configuration file of it.

These parameters are :

-Xms : is the initial java heap size

-Xmx : is the maximum java heap size

-Xmn : the size of the heap for the young generation.

How does the heap size, settings and garbage collection affect?

Garbage collection runs primarily as 2 threads-

1. Lightweight- which performs small collections on the young generation of the heap

2. Full GC Thread – traverses the entire heap when there is not enough memory left to allocate space for objects which get promoted from the younger to the older generations

At initialization, a maximum address space is virtually reserved but not allocated to physical memory unless it is needed. The complete address space reserved for object memory can be divided into the young and tenured generations.

Objects are initially allocated in Eden and until they old enough to be tenured, or copied to the tenured generation.

If there is a memory leak or inadequate heap allocated, eventually the older generation will start to run out of room causing the Full GC thread to run more frequently and hence there can be pauses and it can become a performance issue.

So, how do we solve this problem?

The answer would be to customize the generation sizes.

Initial heap size: One major problem in large server application is the slow startup.This arises due to small initial heap size, so we need to give as much size as possible to virtual machine.

Size of generations: The amount allocated for the young generation is the value specified with -Xmn. The amount allocated for the older generation is the value of -Xmx minus the -Xmn. Generally, the young thread should not be too big or it will take too long for the GC to look through it for space that can be reclaimed.

So to decide the ratio of max and min need to check the verbose garbage collector output, and then explore the sensitivity of your individual performance metric to the garbage collector parameters.

So, the heart of a java application is JVM and if JVM and its constituents are fit and fine we can enjoy a healthy application.

Key to Optimization: Algorithm and Analysis

Every problem has solutions but the most important thing is to find the solution that best fits that scenario. So as we take a dig inside the algorithms and techniques we will investigate how we can analyse the complexity and conclude the correct optimized solution.
Let us begin with a simple example and try to find the time and complexity.

Example : We need the sum of n natural numbers starting from ‘a’ and ending at ‘b’.

Solution 1:

public class StupidSum {
public long sum(long a, long b){
long totalSum = 0;
for(long i=a;i<=b;i++)
totalSum += i;
return totalSum;
}
}
As we see the complexity of this solution is O(n) .For small data this algorithm is fine but as the numbers grow very large, it can surely become a bottleneck in any application.

Now let’s look at the second one:
Solution 2:
public class SmartSum {
public long sum(long a, long b){
long fullRange = b*(b+1)/2;
a--; // so result is inclusive of a
long initialRange = a*(a+1)/2;
return (fullRange - initialRange);
}
}

This solution takes into account that the sum of any series of integers between zero and n, inclusive, can be calculated with the formula (n(n+1))/2.
As we see the complexity of this approach is O(1), which is not dependent on the result set at all, however big may be the data it will work in same time.

So , next let’s try to compare the two algorithms by a simple program :

public static void main(String args[]) {
System.out.println("Sum integers between 46 and 891000");
StupidSum s1 = new StupidSum ();
long time1 = System.currentTimeMillis();
long stupidsum = s1.sum(46,891000);
long time2 = System.currentTimeMillis();
System.out.println("StupidSum:"+stupidsum+" Time:"+(time2-time1)+" ms.");
SmartSum s2 = new SmartSum();
time1 = System.currentTimeMillis();
long smartsum = s2.sum(46,891000);
time2 = System.currentTimeMillis();
System.out.println("SmartSum:"+smartsum+" Time:"+(time2-time1)+" ms.");
}

Output :
Sum integers between 46 and 891000
StupidSum:396940944465 Time:5 ms.
SmartSum:396940944465 Time:0 ms.

Conclusion :
The output of the above code clearly stipulates why we must focus on an optimized and smart algorithm.

Sorting and Searching techniques
One of the major killers of any application is when it comes to implement a search or sort functionality. Some of the most common operations are searching, insertion and delete.
Example :
As we see the most common methods of searching could be a linear way of searching by comparing each element with the key, but that would be of complexity O(n), which can be optimised.
So how do we make our search easier, one efficient method is Binary Search, which takes into account that the data is sorted and then uses the divide and conquer algorithm to find the key.
As we see it has Worst case performance O(log n)
Best case performance O(1)
And average case performance O(log n).
So in most cases it is advisable to use binary search where there are arrays and lists as the data structures.

Sorting:
When it comes to sorting there are many techniques used, but few good ones with better performances would be Quick Sort, Merge sort, Heap Sort , of course depending upon the data structure used and the amount of data. All these 3 algorithms have a complexity of n log n as best case.


Hashes:
One of the most important aspects of code optimisation is hashing.
So how do we store the data in hash tables.

Dictionary example:
Suppose we want to store the words and their meanings in a dictionary, and we treat the word as the key to find the meaning.
So one method could be to use an array to store the meaning and its index can be the word.
To insert a Definition into the dictionary, we define
a function hashCode() that maps each word (key) to a unique integer. But the problem arises as English has fewer than one million words, so we would require an array that long.
Clearly we need a better solution.

Suppose n is the number of keys (words) whose definitions we want to store, and
suppose we use a table of N buckets, where N is perhaps a bit larger than n,
but much smaller than the number of possible keys. A hash table maps a huge
set of possible keys into N buckets by applying a compression function to
each hash code.

So, we take the word’s hashcode() and apply a compression function or hash function on it to find the bucket where we would store it.
h(hashCode) = hashCode mod N.

With this compression function, no matter how long and variegated the keys are,
we can map them into a table whose size is not much greater than the actual number of entries we want to store. However, we've created a new problem: several keys are hashed to the same bucket in the table if h(hashCode1) = h(hashCode2). This circumstance is called a collision. To avoid this, we instead of having each bucket in the table reference one entry, we have it reference a linked list of entries, called a chain. If several keys are mapped to the same bucket, their definitions all reside in
that bucket's linked list.

Benefits of Hashing:
Insertion, Deletion, Search becomes easier and now takes less time.
So how do we search a key here: we would apply the decompression function to the hashcode of the key to find the bucket where the value is stored, which has a complexity of O(1), then in the chain we can move linearly to find the desired element. Similarly for insertion and deletion once we find the key it is one step more to perform the insertion and deletion.

This is beneficial as we clearly see while handling huge amount of data we save a substantial amount of time in search, insertion and deletion.

Difference between Factory and Abstract Factory

Abstract factory pattern can use factory method to actulay implement different types of object creation, but it is not necessary.
I would say Abstract Factory Pattern is more into design product creations and how different variants of the product would be created , it is to supply a "kit" to create product and its different variants. It may use prototype/factory method and other builder pattern inside when actualy instantiating the object.
So In Abstract Factory Pattern ,we would provide an interface of product which can be used to create its families without specifying their concrete implementations.

example :

Factory Method :

public Product createProduct(int productType){
if(type=1)
return new type1Product();
else if(type=2)
return new type2Product();
return null;
}


For Abstract Factory Pattern :


//client
abstract class ProductCreator {
public Product CreateProduct(ProductFactory factory);
}

class ProductCreatorImpl {
public Product CreateProduct(ProductFactory factory){
return factory.CreateProduct();
}
}


abstract class ProductFactory{
public Product CreateProduct();

}

class Type1Product extends Product{
....
}

//this class would have to implement CreateProduct(), and this creation here is using Factory Method
class Type1ProductFactory extends ProductFactory{
public Product CreateProduct(){
return new Type1Product();
}
}

This answer I had posted on coderanch as well.