Regular Expressions

Regular expressions are a set of codes that are used to match patterns of letters in many programming languages. The DreamBank search engine lets you use the complete set of regular expressions (with a few exceptions) in your searches, so that you can refine your queries to find exactly what you want.

This document includes:

An Introduction to Regular Expressions

This tutorial shows you exactly how you can use a few of the regular expression codes to perfect a search. Let's say we want to find all the dream reports that include a mention of a cat. It should be simple enough:

      Query:  cat

Unfortunately, this will also pull up dream reports that contain the words "category," "scathing," "locate," etc. So, we refine our query by looking for a word boundary on either side of the word. A word boundary represents the spot where a letter or number meets a space, apostrophe, a period, or anything else that isn't a letter or number; it's represented in the DreamBank search engine (but not in the rest of the world!) by a ^ symbol (the standard \b also works). Our new attempt is as follows:

      Query:  ^cat^

Now we have a new problem: we've eliminated "category" and "locate," but we've also excluded the word "cats," which we certainly want to find, if we're looking for cat dreams. We could remove the word boundary from the end of "cat," but then we'd still find "category"! The solution is to include an optional "s" at the end of the word. To do this, we use a question mark (?), which means "find zero or one of the thing that comes before this question mark." So the optional "s" is represented by s?, and the new query says, "find a word boundary, followed by 'cat', followed by either an 's' or no 's', followed by another word boundary."

      Query:  ^cats?^

So far, so good, but now it occurs to us that a cat might be referred to as a feline, or a kitten, or a kitty (the plural of which would be kitties). Fortunately, we can account for these quite easily: we'll simply put all of the possibilities in a single set of parentheses, separated by pipes. (The pipe character is the vertical line that is most likely found on the same key as the backslash on your keyboard.) A list separated by pipes says, "find any one of these things; i.e., this OR that OR the other thing."

      Query:  ^(cat|kitten|kitty|kittie|feline)s?^

This query will work just fine, and it will cover just about every occurrence of a cat in the dream reports that we search. But we can shorten it up a little by adding another set of parentheses inside the existing ones. Note that "kitten," "kitty," and "kittie" all start with "kitt". Therefore, all three of those can be replaced with this single item: kitt(en|y|ie).

      Query:  ^(cat|kitt(en|y|ie)|feline)s?^

So, our final query says, "Find a word boundary, followed by either 'cat', 'kitten/kitty/kittie', or 'feline', possibly followed by an 's', followed by another word boundary."

Can you think of how to find all the dog dreams? It would be pretty similar:

      Query:  ^(dog(|gy|gie)|pupp(y|ie)|canine)s?^

This is essentially the same as the cat query, but with one new trick: dog(|gy|gie) says, "find 'dog', followed by either nothing or 'gy' or 'gie'," meaning it'll find "dog," "doggy," or "doggie". We included "nothing" in the list by typing a pipe immediately after the parenthesis that follows dog.

Of course, none of these is perfect: our feline query wouldn't find "tabby" or "pussycat," and our canine query wouldn't find "hound" or "beagle." But it's a very good start, and much better than simply typing cat or dog.

Some Examples of Regular Expression Searches

See the reference table at the bottom of this document for details about the codes used here.

what you're looking for search query notes
"Cathy" or "Kathy" [CK]athy A series of characters in square brackets will match any one of the characters.
"color" or "colour" colou?r u? will match either zero or one u's; in other words, either "u" or nothing.
Numbers between 1960 and 1969 ^196\d^ \d matches any digit (0 through 9). The word boundaries (^) keep the pattern from matching "19672," "91963," etc.
Numbers between 1940 and 1980 ^19([4567]\d|80)^ A series of patterns inside parentheses and separated by pipes will match any one of the patterns. You need to include "80" as a two-character pattern at the end, because 19[45678]\d would also match 1981 through 1989.
All forms of the verb "to make" (make, makes, made, making) ^ma(ke|kes|king|de)^ Another set of patterns in parentheses, separated by pipes. The word boundary at the beginning and end keep this pattern from matching "remake" or "maker."
All forms of the verb "to watch" (watch, watches, watched, watching) ^watch(|es|ed|ing)^ The first pattern inside the parentheses is empty (the first pipe comes immediately after the opening parenthesis). This is necessary to match the word "watch" with no suffix.
"running water," "rushing water", "flowing water", etc. \w\w+ing_water The pattern \w\w+ says "match a letter followed by one or more letters," i.e., a string of at least two letters, since any verb ending in "ing" will have at least two letters before "ing". (However, this pattern would also match "spring water.")
"I was driving," "I'm going," "I am running," etc. I('m|_am|_was)_\w\w+ing Note that the two of the three possibilities inside the parentheses -- "I am" & "I was" -- include a space, but to match "I'm," you need to allow for the possibility of no space after the "I".
A question mark ("?") \? If you want to do a search for anything that has special meaning in the world of regular expressions, you must precede it with a backslash. This includes question marks, periods, asterisks, plus signs, square brackets, parentheses, etc.
"A+" A\+
"Dr. Smith" Dr\._Smith

Regular Expression Reference Table

The DreamBank search engine lets you use the standard set of UNIX regular expressions, with TWO exceptions: the caret character (^) matches a word boundary, and underscore (_) matches a space. (In standard regular expressions, ^ matches the beginning of a line, and _ simply matches an underscore.) Here are some examples of how to use other regular expression codes:

regex codemeaningExamples
regular expressionmatches:does not match:
\b or ^ word boundary (but ^ only works in ^air^ air, air-ball, fresh air airplane, chair, fairy
\brain rain, raining, rainbow brain, strained
_ space (but only in wheel_chair wheel chair wheelchair
mom_and_dad my mom and dad my mom and my dad
. any character e.f elf, efface, serf, d4e5f6, male/female, blue fish refer, clef
\w any letter or number e\wf elf, efface, serf, d4e5f6 male/female, blue fish
[a-z] any letter e[a-z]f elf, efface, serf male/female, blue fish, d4e5f6
[0-9] or \d any number \d[0-9]a 12a, 57a, 1997a, x00a 3a, 57b, 1997, Route66
[a-e] a, b, c, d, or e p[a-e]t pattern, pct, trumpets pit, pot, apt
[6-9] 6, 7, 8, or 9 19[6-9]7 1977, 1997, 523198700b 1927, 19-7, 0987, 19/77
[mrz] m, r, or z [mrz]ap map, strap, zapped nap, ape, flap
[d-fr-t2wz] d, e, f, r, s, t, 2, w, or z [d-fr-t2wz]an Jordan, mean, fan, sand, tangy, 2and2, swan, Tarzan can, ant, than
[^e] not e p[^e]st post, pasta, harpist, camp stove pest, lipstick
[^p-t] not p, q, r, s, or t [^p-t]ee bee, feet, moray eel, queen peer, freedom, steel, eek
[^eou] not e, o, or u th[^eou]n ethanol, nothing, twelfth night then, python, ethnic
[^\w] not any letter or number d[^\w]o and/or, red oak drop, dog, d4o15
| OR (often used with parentheses) cat|dog cat, scathing, doggerel kat, dawg, puppy
tak(e|en|ing) take, taken, taking, mistake tak, took
? ZERO OR ONE of the preceding item sn?ap snapped, sapling sneap, snnap, slap
cats? cat, cats, catch, catsup cash, cast
watch(es)?\b watch, watches, wristwatch watched, watching
+ ONE OR MORE of the preceding item po+r sport, poor, poooooooor pray, pour, explore
(la_)+ la princesse, Tra la la la, Fa la la la la late, Ella's
* ZERO OR MORE of the preceding item se*t fast, upset, seethe sent, basket
rain_*storm rainstorm, rain storm, rain   storm rainy storm, rain in a storm
{1,2} AT LEAST ONE (but NO MORE THAN TWO) of the preceding item am{1,2}e hammer, amend aesthetic, ammme