The Java Regex API contains the following elements.
Pattern
.Pattern
.RuntimeException
.Pattern
.public class RegexDemo {
public static void main(String[] args) {
String text = "The Lord of the Rings is an epic[1] high-fantasy novel " +
"written by English author and scholar J. R. R. Tolkien. " +
"The story began as a sequel to Tolkien's 1937 fantasy " +
"novel The Hobbit, but eventually developed into a much larger work. " +
"Written in stages between 1937 and 1949, The Lord of the Rings " +
"is one of the best-selling novels ever written, with over 150 " +
"million copies sold.[2]";
// Regex that matches 4-digit numbers
String regex = "\\d{4}";
// Creates pattern by compiling regex string
Pattern pattern = Pattern.compile(regex);
// Creates a matcher for from the pattern for the text string
Matcher matcher = pattern.matcher(text);
// Checks if the text contains an occurrence of the pattern
System.out.println("Found year? " + matcher.find());
}
}
Found year? true
public class Styles {
public static void main(String[] args) {
//1st way
Pattern pattern = Pattern.compile("s.*");
Matcher matcher = pattern.matcher("as");
boolean hasMatched1 = matcher.matches();
//2nd way
boolean hasMatched2 = Pattern.compile("as")
.matcher("as")
.matches();
//3rd way
boolean hasMatched3 = Pattern.matches("a.*", "as");
System.out.println(hasMatched1 + " " + hasMatched2 + " " + hasMatched3);
}
}
false true true
Matcher
The Matcher
class provides three types of methods.
Matcher
: Study methodsreview the input string and return a boolean
indicating whether the pattern was found.
On success, more information can be obtained via the start
, end
, and group
methods.
public boolean matches()
attempts to match the entire region against the pattern.
public boolean lookingAt()
attempts to match the input sequence against the pattern, starting at the beginning of the region.
public boolean find()
attempts to find the next subsequence of the input sequence that matches the pattern.
public boolean find(int start)
resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.
matches()
Checks if the entire string matches against the pattern. Its behavior is similar to String.matches()
, but when it succeeds, we can obtain more information on the match.
public class Matches {
public static void main(String[] args) {
String[] fellowship = {"Frodo", "Gandalf", "Sam", "Aragorn",
"Legolas", "Gimli", "Pippin", "Merry", "Boromir"};
// Name starts with uppercase G
String regex = "G\\w*";
Pattern pattern = Pattern.compile(regex);
for (String member : fellowship) {
Matcher matcher = pattern.matcher(member);
boolean matched = matcher.matches();
System.out.printf("Does '%s' match the regex? %B%n", member, matched);
}
System.out.println();
for (String member : fellowship) {
boolean matched = member.matches(regex);
System.out.printf("Does '%s' match the regex? %B%n", member, matched);
}
}
}
Does 'Frodo' match the regex? FALSE
Does 'Gandalf' match the regex? TRUE
Does 'Sam' match the regex? FALSE
Does 'Aragorn' match the regex? FALSE
Does 'Legolas' match the regex? FALSE
Does 'Gimli' match the regex? TRUE
Does 'Pippin' match the regex? FALSE
Does 'Merry' match the regex? FALSE
Does 'Boromir' match the regex? FALSE
Does 'Frodo' match the regex? FALSE
Does 'Gandalf' match the regex? TRUE
Does 'Sam' match the regex? FALSE
Does 'Aragorn' match the regex? FALSE
Does 'Legolas' match the regex? FALSE
Does 'Gimli' match the regex? TRUE
Does 'Pippin' match the regex? FALSE
Does 'Merry' match the regex? FALSE
Does 'Boromir' match the regex? FALSE
Write a program that checks if the elements of an array are international phone numbers.
Assume the following rules.
Here is a valid example: +39 0474013600
Here is an invalid example: +39 0474 013600 (Space characters are not allowed within the national number group.)
lookingAt()
Checks if the string starts with a subsequence that matches against the pattern.
public class LookingAt {
public static void main(String[] args) {
String text = "+39 0471011000 (unibz)";
// Checks if the text starts with an international phone number:
String regex = "\\+(\\d{1,3})\\s(\\d{6,14})";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
System.out.printf("The text \"%s\" starts with a phone number?%n%B", text, matcher.lookingAt());
}
}
The text "+39 0471011000 (unibz)" starts with a phone number?
TRUE
find()
Checks if the string contains a subsequence that matches against the pattern.
public class Find {
public static void main(String[] args) {
String text = "Free University of Bozen-Bolzano\n" +
"Universitätsplatz 1 - piazza Università, 1\n" +
"Italy - 39100, Bozen-Bolzano\n" +
"Tel +39 0471011000";
String regex = "\\+\\d{1,3}\\s\\d{6,14}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
// Checks if the text contains an international phone number:
String found = matcher.find() ? "Yes" : "No";
System.out.println(text);
System.out.println("\nDoes the text above contain a phone number? " + found);
}
}
Free University of Bozen-Bolzano
Universitätsplatz 1 - piazza Università, 1
Italy - 39100, Bozen-Bolzano
Tel +39 0471011000
Does the text above contain a phone number? Yes
We can iteratively call find()
to retrieve all matches on a input string.
After a successful find()
,
we can invoke the group()
on the matcher to retrieve the matched string segment.
public class FindAll {
public static void main(String[] args) {
String text = "The Lord of the Rings is an epic[1] high-fantasy novel " +
"written by English author and scholar J. R. R. Tolkien. " +
"The story began as a sequel to Tolkien's 1937 fantasy " +
"novel The Hobbit, but eventually developed into a much larger work. " +
"Written in stages between 1937 and 1949, The Lord of the Rings " +
"is one of the best-selling novels ever written, with over 150 " +
"million copies sold.[2]";
// Regex that matches 4-digit numbers
String regex = "\\d{4}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
// While a new occurrence is found
while (matcher.find()) {
// Retrieve and print the found substring
System.out.println(matcher.group());
}
}
}
1937
1937
1949
find()
vs matches()
vs lookingAt()
Differences:
matches()
gets the match to the complete string.lookingAt()
gets the matching substring at the beginning of the string.find()
gets all the matching substrings.find()
is often called multiple times to catch all string segments that match the provided regex.
An alternative way to retrieve all matches on a input string is to call results()
,
which returns a Stream
of matches.
public class MatchStream {
public static void main(String[] args) {
String text = "The Lord of the Rings is an epic[1] high-fantasy novel " +
"written by English author and scholar J. R. R. Tolkien. " +
"The story began as a sequel to Tolkien's 1937 fantasy " +
"novel The Hobbit, but eventually developed into a much larger work. " +
"Written in stages between 1937 and 1949, The Lord of the Rings " +
"is one of the best-selling novels ever written, with over 150 " +
"million copies sold.[2]";
String regex = "\\b(\\w{5})\\b";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
matcher.results()
.forEach(match -> System.out.println(match.group()));
}
}
Rings
novel
story
began
novel
Rings
Matcher
: Index methodsreturn indexes that show precisely where the last match was found in the string.
public int start()
returns the start index of the previous match.
public int start(int group)
returns the start index of the subsequence captured by the given group during the previous match operation.
public int end()
returns the offset after the last character matched.
public int end(int group)
returns the offset after the last character of the subsequence captured by the given group during the previous match operation.
start()
and end()
Finding the indexes of every match of the word "Rings" in the text:
public class IndexMethods {
public static void main(String[] args) {
String text = "The Lord of the Rings is an epic[1] high-fantasy novel " +
"written by English author and scholar J. R. R. Tolkien. " +
"The story began as a sequel to Tolkien's 1937 fantasy " +
"novel The Hobbit, but eventually developed into a much larger work. " +
"Written in stages between 1937 and 1949, The Lord of the Rings " +
"is one of the best-selling novels ever written, with over 150 " +
"million copies sold.[2]";
String regex = "\\bRings\\b";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
int count = 0;
while (matcher.find()) {
count++;
int startIndex = matcher.start();
int endIndex = matcher.end();
System.out.printf("%d: start=%d, end=%d%n", count, startIndex, endIndex);
}
}
}
1: start=16, end=21
2: start=290, end=295
Finding the indexes of specific matched groups:
public class IndexMethodsGroups {
public static void main(String[] args) {
String text = "The Lord of the Rings is an epic[1] high-fantasy novel " +
"written by English author and scholar J. R. R. Tolkien. " +
"The story began as a sequel to Tolkien's 1937 fantasy " +
"novel The Hobbit, but eventually developed into a much larger work. " +
"Written in stages between 1937 and 1949, The Lord of the Rings " +
"is one of the best-selling novels ever written, with over 150 " +
"million copies sold.[2]";
String regex = "(Lord) (of) (the) (Rings)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.printf("Group 0: string=\"%s\", start=%d, end=%d",
matcher.group(0), matcher.start(0), matcher.end(0));
System.out.printf("%nGroup 1: string=\"%s\", start=%d, end=%d",
matcher.group(1), matcher.start(1), matcher.end(1));
System.out.printf("%nGroup 2: string=\"%s\", start=%d, end=%d",
matcher.group(2), matcher.start(2), matcher.end(2));
System.out.printf("%nGroup 3: string=\"%s\", start=%d, end=%d",
matcher.group(3), matcher.start(3), matcher.end(3));
System.out.printf("%nGroup 4: string=\"%s\", start=%d, end=%d%n%n",
matcher.group(4), matcher.start(4), matcher.end(4));
}
}
}
Group 0: string="Lord of the Rings", start=4, end=21
Group 1: string="Lord", start=4, end=8
Group 2: string="of", start=9, end=11
Group 3: string="the", start=12, end=15
Group 4: string="Rings", start=16, end=21
Consider the following text.
String text = "Never gonna give you up\n" +
"Never gonna let you down\n" +
"Never gonna run around and desert you\n" +
"Never gonna make you cry\n" +
"Never gonna say goodbye\n" +
"Never gonna tell a lie and hurt you";
Print out the start and end indexes of any sequence of three words that start with "Never".
Matcher
: Replacement methodsare methods for replacing text in an input string.
public String replaceFirst(String replacement)
replaces the first subsequence of the input sequence that matches the pattern with the given replacement string.
public String replaceAll(String replacement)
replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
public Matcher appendReplacement(StringBuffer sb, String replacement)
implements a non-terminal append-and-replace step.
public StringBuffer appendTail(StringBuffer sb)
implements a terminal append-and-replace step.
replaceFirst()
and replaceAll()
public class ReplaceFirstAll {
public static void main(String[] args) {
String text = "\"Someone else always has to carry on the story.\"\n" +
"― J.R.R. Tolkien, The Lord of the Rings";
System.out.println(text+"\n");
// Matches the string "J.R.R. Tolkien"
Pattern pattern = Pattern.compile("J\\.R\\.R\\. Tolkien");
Matcher matcher = pattern.matcher(text);
// Returns a copy of the string replacing the first occurrence of the pattern
String modifiedText = matcher.replaceFirst("J.K. Rowling");
System.out.println(modifiedText+"\n");
// Matches the letter "e"
pattern = Pattern.compile("e");
matcher = pattern.matcher(modifiedText);
// Returns a copy of the string replacing all occurrences of the pattern
modifiedText = matcher.replaceAll("x");
System.out.println(modifiedText+"\n");
}
}
"Someone else always has to carry on the story."
― J.R.R. Tolkien, The Lord of the Rings
"Someone else always has to carry on the story."
― J.K. Rowling, The Lord of the Rings
"Somxonx xlsx always has to carry on thx story."
― J.K. Rowling, Thx Lord of thx Rings
replaceFirst()
and replaceAll()
with lambdas expressionspublic class ReplaceFirstAllLambdas {
public static void main(String[] args) {
String text = "\"Someone else always has to carry on the story.\"\n" +
"― J.R.R. Tolkien, The Lord of the Rings";
System.out.println(text + "\n");
// Matches the string "J.R.R. Tolkien"
Pattern pattern = Pattern.compile("J\\.R\\.R\\. Tolkien");
Matcher matcher = pattern.matcher(text);
// Returns a copy of the string replacing the first occurrence of the pattern
String modifiedText = matcher.replaceFirst(match -> match.group().toUpperCase() + " and J.K. Rowling");
System.out.println(modifiedText + "\n");
// Matches the letter "e"
pattern = Pattern.compile("e");
matcher = pattern.matcher(text);
// Returns a copy of the string replacing all occurrences of the pattern
modifiedText = matcher.replaceAll(match -> match.group() + match.group());
System.out.println(modifiedText + "\n");
}
}
"Someone else always has to carry on the story."
― J.R.R. Tolkien, The Lord of the Rings
"Someone else always has to carry on the story."
― J.R.R. TOLKIEN and J.K. Rowling, The Lord of the Rings
"Someeonee eelsee always has to carry on thee story."
― J.R.R. Tolkieen, Thee Lord of thee Rings
appendReplacement()
and appendTail()
public class ReplaceAppendMethods {
public static void main(String[] args) {
String text = "\"Someone else always has to carry on the story.\"\n" +
"― J.R.R. Tolkien, The Lord of the Rings";
System.out.println(text + "\n");
Pattern pattern = Pattern.compile("a");
Matcher matcher = pattern.matcher(text);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
matcher = matcher.appendReplacement(buffer, "ä");
System.out.printf("start=%d\tgroup=%s\t\t%s%n", matcher.start(), matcher.group(), buffer);
}
matcher.appendTail(buffer);
System.out.println("\n" + buffer);
}
}
"Someone else always has to carry on the story."
― J.R.R. Tolkien, The Lord of the Rings
start=14 group=a "Someone else ä
start=17 group=a "Someone else älwä
start=22 group=a "Someone else älwäys hä
start=29 group=a "Someone else älwäys häs to cä
"Someone else älwäys häs to cärry on the story."
― J.R.R. Tolkien, The Lord of the Rings
PatternSyntaxException
A PatternSyntaxException
is an unchecked exception that indicates a syntax error in a regular expression pattern.
The PatternSyntaxException
class provides the following methods to help you determine what went wrong.
String getDescription()
retrieves the description of the error.int getIndex()
retrieves the error index.String getPattern()
retrieves the erroneous regular expression pattern.String getMessage()
returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error index within the pattern.public class ExceptionDemo {
public static void main(String[] args) {
String text = "The Lord of the Rings is an epic[1] high-fantasy novel " +
"written by English author and scholar J. R. R. Tolkien.";
try {
Pattern pattern = Pattern.compile("[][aeiou]");
Matcher matcher = pattern.matcher(text);
while (matcher.find())
System.out.println(matcher.group());
} catch (PatternSyntaxException patternException) {
System.out.println(patternException.getPattern() + " is invalid!");
patternException.printStackTrace();
}
}
}
[][aeiou] is invalid!
java.util.regex.PatternSyntaxException: Unclosed character class near index 8
[][aeiou]
^
at java.base/java.util.regex.Pattern.error(Pattern.java:2027)
at java.base/java.util.regex.Pattern.clazz(Pattern.java:2696)
at java.base/java.util.regex.Pattern.sequence(Pattern.java:2138)
at java.base/java.util.regex.Pattern.expr(Pattern.java:2068)
at java.base/java.util.regex.Pattern.compile(Pattern.java:1782)
at java.base/java.util.regex.Pattern.<init>(Pattern.java:1429)
at java.base/java.util.regex.Pattern.compile(Pattern.java:1069)
at main.java.regex_api.ExceptionDemo.main(ExceptionDemo.java:14)
Write a method that...
"word"
but not "words"
or "aword"
."a dog"
should become "A dog"
."My name is John Doe. I'm 33 years old."
should return [M,J,D]
"My name is John"
should return "My name is John"
"My my name name is John John"
should return "My name is John"
172.16.254.1
, 127.0.0.1
"<h1>Hello!</h1>"
should become "Hello!"
Part of the material has been taken from the following sources. The usage of the referenced copyrighted work is in line with fair use since it is for nonprofit educational purposes.