Overview

Suppose you have been using an email account for a few years now and you have a huge number of emails stored. Just in case you need to find an old email and the only information you have is a name that is maybe Dave, David, or Devon. Or was it Damon? you can’t seem to recall but you have no other option but to look for it.

The simplest but extremely time-consuming method would be to open up all hundreds of thousands of emails in a word processor and search for the keywords. It can be way more simplified with pattern matching in Java if you have regular expression support. Regular expressions allow users to search for the patterns with certain commands with keywords like,

shaharyar|sharyar|shahryar

It indicates searching any of these strings separated with a “|”. A more concise form would be,

Sh[^e]

The “S” and the “h” are used to find words that begin with “Sh”, while the cryptic [^ dn] means the “Sh” to be followed by a character other than (^ means not in this context) “e” to eliminate the very common English word “she” in the text.

Pattern matching in Java

Regular expressions also called regexes, provide a concise and accurate specification of patterns to find particular strings from a text. Using regular expressions is also referred to as pattern matching in Java and a regular expression is also referred to as a pattern. Thus, the term pattern matching in Java means matching a regular expression (pattern) against a text using Java.

The Java Pattern class (java.util.regex.Pattern), is the main access point for Java regular expression API. It can be primarily used in two ways,

  1. You can use the matches()method to check if a string from the text file matches a provided regular expression.
  2. You can compile a Pattern instance using compile() which can then be used several times to match a regular expression against multiple text files.

Following are the details for both of these methods, along with some other methods that are also used for pattern matching in Java.

1. Pattern.matches()

The easiest method for pattern matching in Java is to use a static Pattern.matches() method. See this code snippet below that demonstrate a pattern matching example using Pattern.matches(),

import java.util.regex.Pattern;

public class PatternMatchesExample01 {

    public static void main(String[] args) {

        String str    =

            "This string is the text to be searched " +
            "number of occurrences of the pattern.";

        String pattern = ".*the.*";

        boolean match = Pattern.matches(pattern, str);

        System.out.println("match = " + match);
    }
}

 

The code above will search the string referenced by the str variable for an occurrence of the word “the” in the text, allowing none or any word to be present before and after the word as part of the pattern.

The Pattern.matches() method is the best option if you just need to apply pattern matching in Java against a text file, just once, and the default settings of the Pattern class are appropriate. To match the pattern for multiple occurrences, or if you require the non-default settings of Pattern class, you will need to compile a Pattern instance using the Pattern.compile() method.

2. Pattern.compile()

See this code where Pattern.compile() is implemented for pattern matching in Java:

import java.util.regex.Pattern;

public class PatternCompileExample01 {

    public static void main(String[] args) {

        String str    =

                "This string is the text to be searched " +
                "number of occurrences of the pattern.";

        String pattern = ".*the.*";

        Pattern match = Pattern.compile(pattern);
    }
}

Other than for multiple occurrences, you can also use the Pattern.compile() method to compile a Pattern using special flags. Here is a Java Pattern.compile() example using special flags:

Pattern match = pattern.compile(pattern, Pattern.CASE_INSENSITIVE);

The Java Pattern class contains several flags that can be used to make the Pattern matching behave in specific ways. The flag used in the above code snippet makes the pattern matching ignore the case of the text while pattern matching in Java.

3. Pattern.matcher()

After obtaining a Pattern instance, it can be used to obtain a Matcher instance. The Matcher instance is used to find the matches of a pattern in text files. See this example of how you can create a Matcher instance from a Pattern instance:

Matcher matcher = pattern.matcher(str);

The matches() method in the Matchers class checks whether the pattern matches the text or not. Below is a pattern matching example of how to use the Matcher:

import java.util.regex.Pattern;

import java.util.regex.Matcher;

public class PatternMatcherExample01 {

    public static void main(String[] args) {

        String str    =

           "This string is the text to be searched " +
               "number of occurrences of the pattern.";

        String pattern = ".*the*";

        Pattern match = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);

        Matcher matcher = pattern.matcher(str);

        boolean matches = matcher.matches();

        System.out.println("matches = " + matches);
    }
}

The Matcher is a very advanced class. It allows you to access the matched parts of the text in a lot of different ways.

4. Pattern.split()

The split() method is used to split a text into an array of Strings, using the regular expression as a delimiter. Here is a Java Pattern.split() example:

import java.util.regex.Pattern;

public class PatternSplitExample01 {

    public static void main(String[] args) {

        String txt = "A sep piece sep of sep Text sep with sep Many sep Separators";

        String str = "sep";
        Pattern pattern = Pattern.compile(str);

        String[] split = pattern.split(txt);

        System.out.println("array length = " + split.length);  

    }
}

This program splits the text in the txt variable into 7 individual strings. Each of these strings is included in the String array returned by the split() method. The parts of the text that matched as delimiters (“sep” in this case) are not included in the returned string array.

5. Pattern.pattern()

It returns the regular expression that the Pattern instance was compiled from. Here is an example:

import java.util.regex.Pattern;

public class PatternPatternExample01 {

    public static void main(String[] args) {

        String str = "sep";
        Pattern pattern = Pattern.compile(str);

        String pattern2 = pattern.pattern();
    }
}

In this example, the pattern2 variable will contain the value “sep”, which was the value the Pattern instance was compiled from.

Regular Expression Syntax

After understanding the working of these methods, you need to know the syntax of Java regular expressions to efficiently apply pattern matching in Java. While building patterns, you can use any combination of ordinary text and the metacharacters, or special characters mentioned below. For instance, “a+” means any number of occurrences of the letter ’a’, starting from one and goes up to even millions whereas the pattern \d+ is used for any frequency of numeric digits whereas \d{2,3} means only two or three-digit number.

Check out this extensive list of the most common Regular expression metacharacter and what they match,

Subexpression

Matches

General expressions
\^ Start of a string
$ End of a string
\b Word boundary
\B Not a word boundary
\A Beginning of entire string
\z End of the complete string
\Z End of entire string except for allowable final line terminator
. Anyone character, except the line terminator
[\^…] Anyone character, not from those listed
Alternation and Grouping expressions
(…) Grouping (used to capture groups)
| Alternation (OR)
(?:_re_ ) Non-capturing parenthesis
\G End of the previous match
n Back-reference to capture group number n
+ Quantifier for 1 or more repetitions
? Quantifier for 0 or 1 repetitions (means present exactly once, or not at all)
Reluctant (non-greedy) quantifiers
m,n }? Reluctant quantifier for “from m to n repetitions”
m ,}? Reluctant quantifier for “m or more repetitions”
{,n }? Reluctant quantifier for 0 up to n repetitions
\*? Reluctant quantifier: 0 or more
+? Reluctant quantifier: 1 or more
?? Reluctant quantifier: 0 or 1 times
Possessive (very greedy) quantifiers
m,n }+ Possessive quantifier for “from m to n repetitions”
m ,}+ Possessive quantifier for “m or more repetitions”
{,n }+ Possessive quantifier for 0 up to n repetitions
\*+ Possessive quantifier: 0 or more
++ Possessive quantifier: 1 or more
?+ Possessive quantifier: 0 or 1 times
Escapes and shorthands
\ Escape (quote) character: turns most metacharacters off; turns subsequent alphabetic into metacharacters
\Q Escape (quote) all characters up to \E
\E Ends quoting begun with \Q
\t Tab character
\r Return (carriage return) character
\n Newline character
\f Form feed
\w Character in a word
\W A non-word character
\d Numeric digit
\D A non-digit character
\s Whitespace
\S A non-whitespace character

Conclusion

Pattern matching is no doubt a rich feature in Java. As users now keep storing more and more data, pattern matching in Java offers some exceptional applications for finding data from a text. It not only saves time but also makes things easier for Java developers for applications like setting strong password constraints and validating the entered data by the user.

See Also: Java Feature Spotlight: Sealed Classes

Author

Shaharyar Lalani is a developer with a strong interest in business analysis, project management, and UX design. He writes and teaches extensively on themes current in the world of web and app development, especially in Java technology.

Write A Comment