Get updates

Scott Introduction to Regular Expressions in JavaScript

If you’re a programmer who has ever had to develop an application dealing with text input (which is most of them), from files, databases, or user input, you might be interested to know that regular expressions can make your job a lot easier.

Unlike most dry regular expressions tutorials out there (Mike’s smart post aside), I intend to provide more than just the “what”; I’ll walk you through the “how” and “why” too. After all, why would you care to learn regular expressions if you don’t get to find out about some of the cool and powerful things you can do with them?

As a heads-up, the examples I’ll be using will include general regex (regular expressions, see also regexp) material worked into some JavaScript code as that’s one of my specialties. Most of the information will be common to regular expressions regardless of language. Regular expression support is built into many other modern languages, including fantastic support in Perl (that is where I picked it up in the first place) and Ruby, among others.

Regular Expressions as Pattern Matching

At the basic level, a regex describes a pattern you want to match against a string. The pattern is compared to the string, and returns a result indicating whether the pattern found a match in the string (most languages also return additional information, such as the specific match or position of the match within the string).

var text = 'JavaScript';
var pattern = /Java/;
alert(text.match(pattern)); //-> Java

This shows a simple regex that is used to determine simply whether the string ‘JavaScript’ contains the pattern /Java/. Note that the slashes on the pattern are not matched, they simply denote the beginning and end of the pattern.

That’s not bad, but kind of boring. Matching alone can be good for validating input and quickly comparing strings, but things get more interesting as soon as we start matching and replacing.

var text = 'Ex-Nine Inch Nails drummer Chris Vrenna has been making music under the moniker Tweaker';
var pattern = /Nine Inch Nails/;
var replacement = 'NIN';
alert(text.replace(pattern, replacement)); //-> Ex-NIN drummer Chris Vrenna has been making music under the moniker Tweaker

So far so good. However pattern matching can be made easier by adding an option (i) to make the pattern case-insensitive. Just place the option character after the ending slash of the regex.

var text = 'Saturday Oct. 6th';
var pattern = /saturday/i;
var replacement = 'Satyr\'s Day';
alert(text.replace(pattern, replacement)); //-> Satyr's Day Oct. 6th

Notice the lower-case s in the pattern matched the text string’s capital S. This is completely separate from the capitalization of the replacement string. Note also the escaped apostrophe in the replacement string, with backslash as the common escape character to force the following character to be read as a normal character, without any special meaning (common to most programming languages, and something we’ll return to again later with respect to regular expressions).

Instead of a single match and replace, we can extend it to “replace all” by adding a global option (g) to the pattern. This can be used alone or in conjunction with other options.

var text = 'One day everyone wakes up with eyes wide open and mind reeling';
var pattern = /one/gi;
var replacement = '1';
alert(text.replace(pattern, replacement)); //-> 1 day every1 wakes up with eyes wide open and mind reeling

This kind of literal match and replace is essentially the same as the search and replace feature in most text editors. Useful, but there’s a lot more we can do. Regular expressions of this kind work fine to match literal text strings, but once we start adding special characters we can gain access to a lot more power.

A number of special characters take on different meanings in regular expressions. I won’t cover them all immediately in this post, but those that I do will be introduced one at a time to slowly reveal their power at a pace that is (hopefully) easy to follow.

Position Metacharacters

There are special parts of a string that are not really “characters” but can still be matched with special regex metacharacters. The start of the string is one example. It is matched with ^ and is probably best to imagine as an invisible special character present at the beginning of every string, before the first character of the string.

var text = '';
var pattern = /^/;
var replacement = 'http://';
alert(text.replace(pattern, replacement)); //->

This is often combined with additional characters in the pattern to match text chunks that occur only at the beginning of the string.

var text = '';
var pattern = /^http:\/\//;
var replacement = '';
alert(text.replace(pattern, replacement)); //->

Note that here the pattern includes the special character of the forward slash (/), but since that is a special character (as the one which terminates the pattern) it is escaped by a backslash to be represented as itself, without special meaning, within the pattern.

The other common positional metacharacter is $ used to match the end of a string. Like ^ it pretends there is a matching special character, hidden invisibly at the end of the string. This is not the same as a newline or carriage return/line feed, though they’re easy to confuse.

var text = '';
var pattern = /^/;
var replacement = 'http://';
text = text.replace(pattern, replacement);
pattern = /$/;
replacement = '/';
alert(text.replace(pattern, replacement)); //->

Note here the results of the first replacement were assigned to text, the regex replace() function in JavaScript does not modify in place.

Continue to part 2: Alternation and Grouping.

Leave a Reply