Compression and Decompression of String Data in Java

April 28, 2022

The String is one of the most common primitive datatypes we all use today. The data files are largely comprised of strings of data thus string data compression proves to be highly beneficial. It can significantly reduce the size of files and speed up the file transfer rate saving significant time and resources. As Java is known for its practicality, there are some outstanding features available for string compression in java.

Table of Contents

How string compression and decompression is performed in java?

The algorithm to compress strings in java is more or less similar to the general method used to compress string data. String compression is always a lossless compression method as it is crucial to successfully retrieve all the data at the time of decompressing. It works by scanning the data and identify the repeating patterns in strings. Once all the repeating patterns and their frequency is identified, A small unique code is assigning to each pattern and all the patterns in the data are then replaced with that code. At the time of decompression, the same method is applied in reverse order and all the unique codes are then replaced with their corresponding string retrieving all the data.

string compression in java can be performed using a ZLIB compression library. It offers some distinct features to effectively compress string data in java. Although the compression rate could vary based on the factors such as the amount of compression required, length of data and repetitions in string data.

Following are the two most prominent classes used for used for string compression in java,

1. Deflater Class:

Deflator is one of most used class for string compression in java. It uses the popular ZLIB compression library. It provides the function called deflator() to compress string. The function first takes the input string data, performs the compression and then fills the given buffer with the compressed data. In the end the size of compressed data is returned in the number of bytes.

The function header for the deflator is mentioned below,

public int deflate(byte[] b)

It can also be overloaded by passing more information as parameters.

public int deflate(byte[] b, int offset, int length, int flush)

public int deflate(byte[] b, int offset, int length)

The parameters that accepted by these overloaded functions are described as followed,

byte[] b parameter:

This is the input array that later contains the compressed string in bytes.

The data string to be compressed has to be first converted into an array of bytes. To do that, another function called getBytes() is used.

The code snippet below demonstrates the use of getBytes(),

1.	public static void main(String[] args) {
2.	String str1 = "ABCDEF";
3.	byte[] byteArr = str.getBytes();
4.	// Now print the byte[] elements
5.	System.out.println("String to byte array: " + Arrays.toString(byteArr));
6.	}

OUTPUT:

String to byte array: [ 65, 66, 67, 68, 69, 70]

int offset parameter:

This is the starting offset point in the string data from which the values in the array are to be compressed. In a particular situation, if the complete data is not required to be compressed, only a portion is required that is when this parameter comes in handy.

int length parameter:

After the offset point, this parameter tells the maximum length that is to be compressed from the starting offset. This parameter will only be included if int offset parameter is passed otherwise there is no need for it.

int flush parameter:

It is the flush mode passed as the parameter in the function.

Return Type of Function:

The function returns an integer value which is the size of the compressed data in the number of bytes.

The code below demonstrates the use of deflate function with all the parameters mentioned above,

deflate (byte[] b, int offset, int length, int flush)

1.	import java.util.zip.*; 
2.	import java.io.UnsupportedEncodingException; 
3.	  
4.	class CompressionUsingDeflator { 
5.	    public static void main(String args[]) 
6.	        throws UnsupportedEncodingException 
7.	    { 
8.	        // deflater object is created
9.	        Deflater def = new Deflater(); 
10.	        // get the string to be compressed 
11.	        String str = "ABCDEF", finalStr = ""; 
12.	        // This loop will create a final strig to be compressed by                              repeating the str 3 times generating a repeating pattern
13.	        for (int i = 0; i < 3; i++) 
14.	            finalStr += str; 
15.	        // set the input for deflator by converting it into bytes 
16.	        def.setInput(finalStr.getBytes("UTF-8")); 
17.	        // finish.The finished() function in the Inflater class returns true when it reaches the end of compression data stream.
18.	        def.finish(); 
19.	        // output string data in bytes 
20.	        byte compString[] = new byte[1024]; 
21.	        // compressed string data will be stored in compString, offset is set to 3 and maximum size of compressed string is 13. 
22.	        int compSize = def.deflate(compString, 3, 13, Deflater.FULL_FLUSH); 
23.	        // Final compressed String 
24.	        System.out.println("Compressed String :" + new String(compString) + "n Size :" + compSize); 
25.	  
26.	        // original String is printed for reference 
27.	        System.out.println("Original String :" + finalStr + "n Size :" + finalStr.length()); 
28.	        // object end 
29.	        def.end(); 
30.	    }
31.	 }

Output:

Compressed String :x�strvqusD",                                                               
 Size 13                                                                                      
Original String :ABCDEFABCDEFABCDEF                                                           
 Size 18

This result original data length is 18 which is compressed to data length 13. Although the difference does not seem much but with a large amount of string data the ratio of compression could increases significantly.

Inflator Class:

with a good compression function, a decompression function is a must. As deflator is an excellent class for string compression in java, alongside that, java also provides a class for decompression called “Inflator”. The inflator class offers similar functions as deflator class but they work in the reverse manner.

The inflate() function in the Inflater class performs the decompression of the input data that was previously compressed and then fills the given buffer with the uncompressed data. Exactly like the deflator function, the function returns the size of uncompressed data in the number of bytes.

The function header, return type, overloading method and the parameters are the same with the exception of FLUSH as a parameter.

The main difference is of an exception handling. There has to be a process to identify whether the input data is a compressed string or a random string. The function throws an exception called “DataFormatException” when the passed string data is invalid and is not a compressed string.

The following code demonstrates the use of inflator class and inflator function. This code is an extension of the code mentioned above for deflation class. Same compressed string acquired in the previous example will be the input string and the expected output should be the same string that was compressed before.

1.	import java.util.zip.*; 
2.	import java.io.UnsupportedEncodingException; 
3.	  
4.	class DecompressionUsingInflator { 
5.	    public static void main(String args[]) 
6.	        throws UnsupportedEncodingException, 
7.	               DataFormatException 
8.	    { 
9.	   
10.	        Deflater def = new Deflater(); 
11.	        String str = "ABCDEF", finalStr = ""; 
12.	        for (int i = 0; i < 3; i++)
13.	           finalStr += str;  
14.	        def.setInput(str.getBytes("UTF-8")); 
15.	        def.finish(); 
16.	  
17.	         
18.	        byte compString[] = new byte[1024]; 
19.	  
20.	        
21.	int Compsize = def.deflate(compString); 
22.	        def.end(); 
23.	  
24.	        // This is the end of compression. Now Inflater is used to get back the original string data. 
25.	  
26.	        // Inflater Class object is created
27.	        Inflater inf = new Inflater(); 
28.	  
29.	        // the compString is set as input to be decompressed
30.	        inf.setInput(compString); 
31.	  
32.	        // byte array set for decompressed string
33.	        byte orgString[] = new byte[1024]; 
34.	  
35.	        // decompress the string data 
36.	        int orgSize = inf.inflate(orgString, 0, 18); 
37.	  
38.	        // showing output of inflater and deflater 
39.	        System.out.println("Compressed string data: "
40.	                           + new String(compString)); 
41.	        System.out.println("Decompressed string data: "
42.	                           + new String(orgString, "UTF-8")); 
43.	  
44.	        inf.end(); 
45.	    } 
46.	}

OUTPUT:

Compressed string data:  x�strvqusD",��                                                                     
Decompressed string data:  ABCDEFABCDEFABC

Factors to be considered:

Both of these classes work best for string compression in java but there are certain this to consider. The string data compression is always lossless as any data cannot be removed; it will make the whole data useless that is why these classes solely work on removing the redundancy in data. It removes patterns repetitions so that no data is lost. The problem arises when there are no or very minimum repetitions in string data, in that case, these classes will not be that efficient. In fact, in some cases, the compressed string can surprisingly be longer than the actual data as every unique character would be considered as a pattern and will be assigned a unique code thus increasing the size.

Observe the deflator code example mentioned above in the article, the patterns were created by repeating the string “ABCDEF” three times using FOR loop. If this six-character long string is compressed as it is, the following will be the output,

Compressed String :x�strvqu~                                                                                                          
 Size 13                                                                                                        
Original String :ABCDEF                                                                                                            
 Size 6

Here, the size has increased by more than twice of original string size as there are no repeating patterns at all. To avoid such disaster, only use this compression method when you have a lot of data, as it would significantly increase the probability of patterns.

RLE (Run Length Encoding):

RLE is another Lossless compression algorithm that can be utilized for string compression in java. RLE follows the same concept of identifying the patterns and allocating a code and it is primarily used for images compression by identifying repeating patterns in pixels formation. It can also be used for string compression in java but java does not offer a particular class or function for implementing RLE. It can be easily performed using the StringBuilder class and some variables to keep track of string that has been checked.

For example, see the code mentioned below, it has two separate functions for compression and decompression.

1.	public class RLEInJava {
2.	 
3.	    public String compression(String comStr) {
4.		// To check if the string to be compressed in empty.
5.	        if (str == null || comStr.isEmpty()) return "";
6.	        StringBuilder strBuilder = new StringBuilder();
7.	        char[] chars = comStr.toCharArray();
8.	        char current = chars[0];
9.	        int counter = 1;
10.	 
11.	        for (int i = 1; i < chars.length; i++) {
12.	            if (current == chars[i]){
13.	                counter++;
14.	            } else {
15.	                if (counter > 1) strBuilder.append(counter);
16.	                strBuilder.append(current);
17.	                current = chars[i];
18.	                counter = 1;
19.	            }
20.	        }
21.	        if (counter > 1) strBuilder.append(counter);
22.	        strBuilder.append(current);
23.	        return strBuilder.toString();
24.	    }
25.	 
26.	    public String decompression(String decomStr) {
27.	        if (string == null || decomStr.isEmpty()) return "";
28.	 
29.	        StringBuilder strbuilder = new StringBuilder();
30.	        char[] chars = decomStr.toCharArray();
31.	        boolean preIsDigit = false;
32.	        String digitsString = "";
33.	        for(char current: chars) {
34.	            if (!Character.isDigit(current)) {
35.	                if (preIsDigit){
36.	                    String multipleString = new String(new char[Integer.valueOf(digitsString)]).replace("�",current+"");
37.	                    strBuilder.append(multipleString);
38.	                    preIsDigit = false;
39.	                    digitString = "";
40.	                } else{
41.	                    strBuilder.append(current);
42.	                }
43.	            } else {
44.	                digitsString+=current;
45.	                preIsDigit = true;
46.	            }
47.	        }
48.	        return strBuilder.toString();
49.	    }
50.	}

The same rule applies in RLE regarding the frequency of repetitions. Less repeating patterns will result in inefficient compression. The code above is just one example of the implementation of RLE in Java. A more efficient code can be developed based on the same algorithm for compressing a relatively bigger amount of string data.

Wrapping up

Compression is now common for every data as it offers significant reductions in storage hardware, the time required for data transmission and communication which can result in significant cost cuts and also increases productivity. Using Deflator and Inflator classes, string compression in java can be seamlessly done. The main disadvantage, however, is when data is non repetitive or in less amount, in that particular scenario, developers will have to avoid using deflator class to at least avoid increasing the string data size.

Put your Java skills to the test and find a challenging job opportunity that maximizes your potential. Check out our careers portal for more.