Java does not explicitly specify a memory and remove it in the program code. Some people sets the relevant object to null or use System.gc() method to remove the memory explicitly. the garbage collector finds the unnecessary (garbage) objects and removes them.
These hypotheses are called the weak generational hypothesis. So in order to preserve the strengths of this hypothesis, it is physically divided into two – young generation and old generation – in HotSpot VM.
Young generation: Most of the newly created objects are located here. Since most objects soon become unreachable, many objects are created in the young generation, then disappear. When objects disappear from this area, we say a “minor GC” has occurred.
Old generation: The objects that did not become unreachable and survived from the young generation are copied here. It is generally larger than the young generation. As it is bigger in size, the GC occurs less frequently than in the young generation. When objects disappear from the old generation, we say a “major GC” (or a “full GC“) has occurred.
The permanent generation from the chart above is also called the “method area,” and it stores classes or interned character strings. So, this area is definitely not for objects that survived from the old generation to stay permanently. A GC may occur in this area. The GC that took place here is still counted as a major GC.
Composition of the Young Generation
In order to understand GC, let’s learn about the young generation, where the objects are created for the first time. The young generation is divided into 3 spaces.
- One Eden space
- Two Survivor spaces
There are 3 spaces in total, two of which are Survivor spaces. The order of execution process of each space is as below:
- The majority of newly created objects are located in the Eden space.
- After one GC in the Eden space, the surviving objects are moved to one of the Survivor spaces.
- After a GC in the Eden space, the objects are piled up into the Survivor space, where other surviving objects already exist.
- Once a Survivor space is full, surviving objects are moved to the other Survivor space. Then, the Survivor space that is full will be changed to a state where there is no data at all.
- The objects that survived these steps that have been repeated a number of times are moved to the old generation.
TYPES OF GC
The old generation basically performs a GC when the data is full. The execution procedure varies by the GC type, so it would be easier to understand if you know different types of GC.
According to JDK 7, there are 5 GC types.
- Serial GC
- Parallel GC
- Parallel Old GC (Parallel Compacting GC)
- Concurrent Mark & Sweep GC (or “CMS”)
- Garbage First (G1) GC
Among these, the serial GC must not be used on an operating server. This GC type was created when there was only one CPU core on desktop computers. Using this serial GC will drop the application performance significantly.
Serial GC (-XX:+UseSerialGC)
The GC in the young generation uses the type we explained in the previous paragraph. The GC in the old generation uses an algorithm called “mark-sweep-compact.”
- The first step of this algorithm is to mark the surviving objects in the old generation.
- Then, it checks the heap from the front and leaves only the surviving ones behind (sweep).
- In the last step, it fills up the heap from the front with the objects so that the objects are piled up consecutively, and divides the heap into two parts: one with objects and one without objects (compact).
The serial GC is suitable for a small memory and a small number of CPU cores.
From the picture, you can easily see the difference between the serial GC and parallel GC. While the serial GC uses only one thread to process a GC, the parallel GC uses several threads to process a GC, and therefore, faster. This GC is useful when there is enough memory and a large number of cores. It is also called the “throughput GC.”
Parallel Old GC(-XX:+UseParallelOldGC)
Parallel Old GC was supported since JDK 5 update. Compared to the parallel GC, the only difference is the GC algorithm for the old generation. It goes through three steps: mark – summary – compaction. The summary step identifies the surviving objects separately for the areas that the GC have previously performed, and thus different from the sweep step of the mark-sweep-compact algorithm. It goes through a little more complicated steps.
CMS GC (-XX:+UseConcMarkSweepGC)
As you can see from the picture, the Concurrent Mark-Sweep GC is much more complicated than any other GC types that I have explained so far. The early initial mark step is simple. The surviving objects among the objects the closest to the classloader are searched. So, the pausing time is very short. In the concurrent mark step, the objects referenced by the surviving objects that have just been confirmed are tracked and checked. The difference of this step is that it proceeds while other threads are processed at the same time. In the remarkstep, the objects that were newly added or stopped being referenced in the concurrent mark step are checked. Lastly, in the concurrent sweep step, the garbage collection procedure takes place. The garbage collection is carried out while other threads are still being processed. Since this GC type is performed in this manner, the pausing time for GC is very short. The CMS GC is also called the low latency GC, and is used when the response time from all applications is crucial.
While this GC type has the advantage of short stop-the-world time, it also has the following disadvantages.
- It uses more memory and CPU than other GC types.
- The compaction step is not provided by default.
You need to carefully review before using this type. Also, if the compaction task needs to be carried out because of the many memory fragments, the stop-the-world time can be longer than any other GC types. You need to check how often and how long the compaction task is carried out.
G1 GC
If you want to understand G1 GC, forget everything you know about the young generation and the old generation. As you can see in the picture, one object is allocated to each grid, and then a GC is executed. Then, once one area is full, the objects are allocated to another area, and then a GC is executed. The steps where the data moves from the three spaces of the young generation to the old generation cannot be found in this GC type. This type was created to replace the CMS GC, which has causes a lot of issues and complaints in the long term.
The biggest advantage of the G1 GC is its performance. It is faster than any other GC types that we have discussed so far.