Overview

Email validation has become an integral part of almost every application that requires its users to register. It’s a great source for authenticating a user. Email addresses are required for registration. Java offers a variety of methods to validate the entered email address. In this article, we will be discussing some methods for validating an email address in Java using regular expressions (Regex).

Java email address validation using regex

Email addresses are divided into three sections: the local part (user name), an @ symbol, and lastly, a domain like ‘gmail.com.’ As an email address is a string, it can be validated through string manipulation techniques, but it will take a lot of effort and time, as it typically requires to count and check all the character types and lengths. Java email validation can be much easier, as you can validate using regular expressions in Java. Being a Java developer, you should be familiar with regular expressions. It is a sequence of characters used to match patterns.

 

Following are the different regular expression methods that can be utilized for email validation in Java.

Simple regular expression validation

The most basic regular expression to validate an email address in Java is:

^(.+)@(\S+) $

It only checks the basic format by validating the presence of the ‘@’ symbol in the email address. If present, then the validation result returns true otherwise, it returns false. The limitation here is that this regular expression does not validate the local part of the email address’s domain. For example, a dummy email address like this, myUserName@myDomain.com, will easily pass the validation.

 

Following is a simple helper method to match the regex pattern for this simple regular expression:

1.  public static boolean patternMatch(String emailId, String regexPattern) {
2.  return Pattern.compile(regexPattern)
3.  .matcher(emailId)
4.  .matches();
5.  }

 

Below mentioned is the code demonstration to validate the email address using this regular expression:

1.  @Test
2.  public void simpleRegexTesting() {
3.  emailID = "abc123@xyz.com";
4.  regexPattern = "^(.+)@(\\S+)$";
5.  assertTrue(EmailValidation.patternMatches(emailID, regexPattern));
6.  }

Strict regular expression validation

The following regular expression is used for a rigid validation than the previous one. It also validates the local part as well as the domain part of the email address,

^(?=.{1,64}@)[A-Za-z0-9_-]+(\\.[A-Za-z0-9_-]+)*@[^-][A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$

 

It includes the following restrictions that are imposed on the  local part and domain part of email addresses by using this regex:

  • Dots “.” are allowed in the local part but not at the beginning or end.
  • Consecutive dots are also not allowed, neither in the local part nor in the domain part.
  • There is also a limit of 64 characters in the local part.
  • Hyphens “-” and dots “.” are also not allowed at the beginning or the end of the domain part.

 

Following is the code demonstration to test out this regular expression:

1.  @Test
2.  public void strictRegexTesting() {
3.  emailID = "abc123@xyz.com";
4.  regexPattern = "^(?=.{1,64}@)[A-Za-z0-9_-]+(\\.[A-Za-z0-9_-]+)*@" 
5.  + "[^-][A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$";
6.  assertTrue(EmailValidation.patternMatches(emailID, regexPattern));
7.  }

Regular expression for validation of unicode characters

The regex in the previous section would work just fine for email addresses written in English, but it will not work with non-Latin email addresses or an email address in any other language. The regular expression mentioned below is then used to validate a Unicode character as well as English characters:

^(?=.{1,64}@)[\\p{L}0-9_-]+(\\.[\\p{L}0-9_-]+)*@[^-][\\p{L}0-9-]+(\\.[\\p{L}0-9-]+)*(\\.[\\p{L}]{2,})$

This regex is used to validate the Unicode or Non-Latin email addresses to support all the languages used globally. It is similar to the previous regex except that the “A-Za-Z” part is changed with “\\p{L}”. This is done to enable the support for Unicode characters.

 

See this test code below:

1.  @Test
2.  public void unicodeRegexTesting() {
3.  emailID = "用户名@领域.电脑";
4.  regexPattern = "^(?=.{1,64}@)[\\p{L}0-9_-]+(\\.[\\p{L}0-9_-]+)*@" 
5.  + "[^-][\\p{L}0-9-]+(\\.[\\p{L}0-9-]+)*(\\.[\\p{L}]{2,})$";
6.  assertTrue(EmailValidation.patternMatches(emailID, regexPattern));
7.  }

Regular expression by RFC 5322 for email validation

You can also use a regular expression provided by the RFC standards instead of writing a custom regex to validate email addresses in Java.

 

The RFC 5322 provides the following regular expression for email validation:

^[a-zA-Z0-9_!#$%&’*+/=?`{|}~^.-]+@[a-zA-Z0-9.-]+$

It is a very simple regex that allows all the characters in the email. Still, it does not allow the pipe character (|) and the single quote (‘) as these characters present a potential threat for a SQL injection attack when the email address and other details are passed from the client site to the server.

 

You can validate an email with this regex like this:

1.  @Test
2.  public void testingRFC5322Regex() {
3.  emailID = "abc123@xyz.com";
4.  regexPattern = "^[a-zA-Z0-9_!#$%&'*+/=?`{|}~^.-]+@[a-zA-Z0-9.-]+$";
5.  assertTrue(EmailValidation.patternMatches(emailID, regexPattern));
6.  }

Regular expression for characters in the top-level domain

The previous expressions can easily verify the local and domain parts of email addresses but we can also write a regex that checks the top-level domain of the email. Following regular expression validates the top-level domain-part of the email address:

^[\\w!#$%&’*+/=?`{|}~^-]+(?:\\.[\\w!#$%&’*+/=?`{|}~^-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,6}$

This regex validates that the email address must consist of only one dot, and there must be a minimum of two and a maximum of six characters present in the top-level domain.

 

This a code demonstration to verify the email address by using this regex:

1.  @Test
2.  public void topLevelDomainTest() {
3.  emailID = "abc123@xyz.com";
4.  regexPattern = "^[\\w!#$%&'*+/=?`{|}~^-]+(?:\\.[\\w!#$%&'*+/=?`{|}~^-]+)*" + "@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,6}$";
5.  assertTrue(EmailValidation.patternMatches(emailID, regexPattern));
6.  }

Regular expression for restricting consecutive, leading and trailing dots in an email address

This regex will restrict the usage of dots in the email addresses.

^[a-zA-Z0-9_!#$%&’*+/=?`{|}~^-]+(?:\\.[a-zA-Z0-9_!#$%&’*+/=?`{|}~^-]+)*@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*$

As an email address can contain more than one dot but not in consecutive order neither in the local or the domain parts, this regex is specifically used to restrict consecutively, leading, and trailing dots.

 

Take a look at this code:

1.  @Test
2.  public void dotsRestriction() {
3.  emailID = "abc123@xyz.com";
4.  regexPattern = "^[a-zA-Z0-9_!#$%&'*+/=?`{|}~^-]+(?:\\.[a-zA-Z0-9_!#$%&'*+/=?`{|}~^-]+)*@" 
5.  + "[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*$";
6.  assertTrue(EmailValidation.patternMatches(emailID, regexPattern));
7.  }

OWASP validation regular expression

This particular regular expression is provided by the OWASP validation regex repository for validating email in Java:

^[a-zA-Z0-9_+&*-] + (?:\\.[a-zA-Z0-9_+&*-] + )*@(?:[a-zA-Z0-9-]+\\.) + [a-zA-Z]{2, 7}

This regex supports the most validations in the standard structure of an email address. It is a great option if you do not wish to write a regex by yourself or have any special validation requirements.

 

See the code demonstration below for OWASP regex:

1.  @Test
2.  public void owaspValidationTesting() {
3.  emailID = "abc123@xyz.com";
4.  regexPattern = "^[a-zA-Z0-9_+&*-]+(?:\\.[a-zA-Z0-9_+&*-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,7}$";
5.  assertTrue(EmailValidation.patternMatches(emailID, regexPattern));
6.  }

Gmail special case for emails

This is the one special case that only applies to the Gmail domain. It is permission for Gmail email addresses to use the “character + character” format in the local part of the email address. For the Gmail domain, the two email addresses, abc+123@gmail.com and abc123@gmail.com, are considered the same.

To cater to that, a slightly different regex is implemented that will pass the email validation in Java including this special case:

^(?=.{1,64}@)[A-Za-z0-9_-+]+(\\.[A-Za-z0-9_-+]+)*@[^-][A-Za-z0-9-+]+(\\.[A-Za-z0-9-+]+)*(\\.[A-Za-z]{2,})$

 

Here is the code:

1.  @Test
2.  public void gmailSpecialCaseTesting() {
3.  emailID = "abc+123@gmail.com";
4.  regexPattern = "^(?=.{1,64}@)[A-Za-z0-9\\+_-]+(\\.[A-Za-z0-9\\+_-]+)*@" 
5.  + "[^-][A-Za-z0-9\\+-]+(\\.[A-Za-z0-9\\+-]+)*(\\.[A-Za-z]{2,})$";
6.  assertTrue(EmailValidation.patternMatches(emailID, regexPattern));
7.  }

Apache commons validator for email validation in Java

The Apache Commons Validator is a complete validation package that consists of all the standard validation rules. You can apply all these email validations in Java just by importing this package into your code.

This Validator package contains a combination of custom code and various regular expressions to validate an email. Using the RFC 822 standards supports checking the special characters and supports the Unicode characters, as we discussed before. You can make use of the Email Validator class to validate the email addresses.

 

Following part of code shows how you can add the commons-validator dependency in our project:

<dependency>
    <groupId>commons-validator</groupId>
    <artifactId>commons-validator</artifactId>
    <version>${validator.version}</version>
</dependency>

 

Now, you can validate email addresses using this code below:

1.  @Test
2.  public void EmailValidatortesting() {
3.  emailID = "abc123@xyz.com";
4.  assertTrue(EmailValidator.getInstance()
5.  .isValid(emailID));
6.  }

Which regex should you be using?

In this article, we have looked into a variety of methods for Java email validation using regex. Based on how complex you want your validation to be, you can choose the regex you should go with.

See Also: 10 Best Continuous Integration Tools

For instance, if you wish to check the presence of an @ symbol in an email address. You can go with the first validation. However, for a more complex validation like for applications that require a genuine email ID, you must opt for a stricter regex like the one based on the RFC5322 standard.

Author

Full Stack Java Developer | Writer | Recruiter, bridging the gap between exceptional talent and opportunities, for some of the biggest Fortune 500 companies.

Write A Comment