www.digitalmars.com

D Programming Language 2.0

Last update Thu May 12 16:55:16 2011

std.regex

Regular expressions are a powerful method of string pattern matching. The regular expression language used in this library is the same as that commonly used, however, some of the very advanced forms may behave slightly differently. The standard observed is the ECMA standard for regular expressions.

std.regex is designed to work only with valid UTF strings as input - UTF8 (char), UTF16 (wchar), or UTF32 (dchar). To validate untrusted input, use std.utf.validate().

In the following guide, pattern[] refers to a regular expression. The attributes[] refers to a string controlling the interpretation of the regular expression. It consists of a sequence of one or more of the following characters:

Attribute Characters
Attribute Action
g global; repeat over the whole input string
i case insensitive
m treat as multiple lines separated by newlines

The format[] string has the formatting characters:

Formatting Characters
Format Replaced With
$$ $
$& The matched substring.
$` The portion of string that precedes the matched substring.
$' The portion of string that follows the matched substring.
$n The nth capture, where n is a single digit 1-9 and n is not followed by a decimal digit.
$nn The nnth capture, where nn is a two-digit decimal number 01-99. If nnth capture is undefined or more than the number of parenthesized subexpressions, use the empty string instead.

Any other $ are left as is.

References:
Wikipedia

License:
Boost License 1.0.

Authors:
Walter Bright, Andrei Alexandrescu

Source:
std/regex.d

string email;
Regular expression to extract an email address.

References:
How to Find or Validate an Email Address; RFC 2822 Internet Message Format

string url;
Regular expression to extract a url

struct Regex(E) if (is(E == Unqual!(E)));
Regex!(Unqual!(typeof(String.init[0]))) regex(String)(String pattern, string flags = null);
A Regex stores a regular expression engine. A Regex object is constructed from a string and compiled into an internal format for performance.

The type parameter E specifies the character type recognized by the regular expression. Currently char, wchar, and dchar are supported. The encoding of the regex string and of the recognized strings must be the same.

This object will be mostly used via a call to the regex function, which automatically deduces the character type.

Example:
Declare two variables and assign to them a Regex object. The first matches UTF-8 strings, the second matches UTF-32 strings and also has the global option set.

auto r = regex("pattern");
auto s = regex(r"p[1-5]\s*"w, "g");

template __ctor(String)
Construct a Regex object. Compile pattern with attributes into an internal form for fast execution.

Parameters:
pattern regular expression
attributes The attributes (g, i, and m accepted)

Throws:
Exception if there are any compilation errors.

const uint captures();
Returns the number of parenthesized captures

struct RegexMatch(Range = string);
RegexMatch is the type returned by a call to match. It stores the matching state and can be inspected and iterated.

Regex engine;
Get or set the engine of the match.

const bool empty();
void popFront();
RegexMatch!(Range) front();
typeof(this) save();
Range primitives that allow incremental matching against a string.

Example:
import std.stdio;
import std.regex;

void main()
{
    foreach(m; match("abcabcabab", regex("ab")))
    {
        writefln("%s[%s]%s", m.pre, m.hit, m.post);
    }
}
// Prints:
// [ab]cabcabab
// abc[ab]cabab
// abcabc[ab]ab
// abcabcab[ab]

Captures captures();
Retrieve the captured parenthesized matches, in the form of a random-access range. The first element in the range is always the full match.

Example:
foreach (m; match("abracadabra", "(.)a(.)"))
{
    foreach (c; m.captures)
        write(c, ';');
    writeln();
}
// writes:
// rac;r;c;
// dab;d;b;

Range pre();
Returns the slice of the input that precedes the matched substring.

Range hit();
The matched portion of the input.

Range post();
Returns the slice of the input that follows the matched substring.

string toString();
Returns hit (converted to string if necessary).

bool chr(ref size_t si, E c);
Returns whether string s matches this.

RegexMatch!(Range) match(Range, Engine)(Range r, Engine engine);
Matches a string against a regular expression. This is the main entry to the module's functionality. A call to match(input, regex) returns a RegexMatch object that can be used for direct inspection or for iterating over all matches (if the regular expression was built with the "g" option).

Range replace(Range, Engine, String)(Range input, Engine regex, String format);
Search string for matches with regular expression pattern with attributes. Replace the first match with string generated from format. If the regular expression has the "g" (global) attribute, continue and replace all matches.

Parameters:
input Range to search.
regex Regular expression pattern.
format Replacement string format.

Returns:
The resulting string.

Example:
s = "ark rapacity";
assert(replace(s, regex("r"), "c") == "ack rapacity");
assert(replace(s, regex("r", "g"), "c") == "ack capacity");
The replacement format can reference the matches using the $&, $$, $', $`, .. 9 notation:

assert(replace("noon", regex("^n"), "[$&]") == "[n]oon");

Range replace(alias fun, Range, Regex)(Range s, Regex rx);
Search string for matches with regular expression pattern with attributes. Pass each match to function fun. Replace each match with the return value from dg.

Parameters:
s String to search.
pattern Regular expression pattern.
dg Delegate

Returns:
the resulting string.

Example:
Capitalize the letters 'a' and 'r':
string baz(RegexMatch!(string) m)
{
    return std.string.toupper(m.hit);
}
auto s = replace!(baz)("Strap a rocket engine on a chicken.",
        regex("[ar]", "g"));
assert(s == "StRAp A Rocket engine on A chicken.");

struct Splitter(Range);
Splitter!(Range) splitter(Range, Regex)(Range r, Regex pat);
Range that splits another range using a regular expression as a separator.

Example:
auto s1 = ", abc, de,  fg, hi, ";
assert(equal(splitter(s1, regex(", *")),
    ["", "abc", "de", "fg", "hi", ""][]));