std.regex
Regular expressions are a
powerful method of string pattern matching. The regular expression
language used in this library is the same as that commonly used,
however, some of the very advanced forms may behave slightly
differently. The standard observed is the
ECMA
standard for regular expressions.
std.regex is designed to work only with valid UTF strings as
input - UTF8 (
char), UTF16 (
wchar), or UTF32 (
dchar).
To validate untrusted input, use
std.utf.validate().
In the following guide,
pattern[] refers to a
regular expression. The
attributes[] refers to a string controlling the interpretation of the
regular expression. It consists of a sequence of one or more of the
following characters:
Attribute CharactersAttribute | Action |
g | global; repeat over the whole input string |
i | case insensitive |
m | treat as multiple lines separated by newlines |
The
format[] string has the formatting characters:
Formatting CharactersFormat | Replaced With |
$$ | $ |
$& | The matched substring. |
$` | The portion of string that precedes the matched
substring. |
$' | The portion of string that follows the matched
substring. |
$n | The nth capture, where n is a single digit 1-9 and n is not followed by a decimal
digit. |
$nn | The nnth
capture, where nn is a two-digit decimal number 01-99. If nnth capture is undefined or more than the number of parenthesized
subexpressions, use the empty string instead. |
Any other $ are left as is.
References:
Wikipedia
License:Boost License 1.0.
Authors:Walter Bright,
Andrei Alexandrescu
Source:
std/regex.d
- Regular expression to extract an email address.
References:
How to Find or Validate an
Email Address; RFC
2822 Internet Message Format
- Regular expression to extract a url
struct
Regex(E) if (is(E == Unqual!(E)));
Regex!(Unqual!(ElementEncodingType!(String)))
regex(String)(String
pattern, string
flags = null);
- A Regex stores a regular expression engine. A Regex object
is constructed from a string and compiled into an internal format for
performance.
The type parameter E specifies the character type recognized by
the regular expression. Currently char, wchar, and dchar are supported. The encoding of the regex string and of the
recognized strings must be the same.
This object will be mostly used via a call to the regex function,
which automatically deduces the character type.
Example:
Declare two variables and assign to them a Regex
object. The first matches UTF-8 strings, the second matches UTF-32
strings and also has the global option set.
auto r = regex("pattern");
auto s = regex(r"p[1-5]\s*"w, "g");
this(String)(String pattern, string attributes = null);
- Construct a Regex object. Compile pattern with attributes
into an internal form for fast execution.
Parameters:
pattern |
regular expression |
attributes |
The attributes (g, i, and m accepted) |
Throws:
Exception if there are any compilation errors.
- Returns the number of parenthesized captures
struct
RegexMatch(Range = string);
- RegexMatch is the type returned by a call to match. It
stores the matching state and can be inspected and iterated.
- Get or set the engine of the match.
const bool
empty();
void
popFront();
RegexMatch!(Range)
front();
typeof(this)
save();
- Range primitives that allow incremental matching against a string.
Example:
import std.stdio;
import std.regex;
void main()
{
foreach(m; match("abcabcabab", regex("ab")))
{
writefln("%s[%s]%s", m.pre, m.hit, m.post);
}
}
- Retrieve the captured parenthesized matches, in the form of a
random-access range. The first element in the range is always the full
match.
Example:
foreach (m; match("abracadabra", "(.)a(.)"))
{
foreach (c; m.captures)
write(c, ';');
writeln();
}
- Returns the slice of the input that precedes the matched substring.
- The matched portion of the input.
- Returns the slice of the input that follows the matched substring.
- Returns hit (converted to string if necessary).
bool
chr(ref size_t
si, E
c);
- Returns whether string s matches this.
RegexMatch!(Range)
match(Range, Engine)(Range
r, Engine
engine);
- Matches a string against a regular expression. This is the main entry
to the module's functionality. A call to match(input, regex)
returns a RegexMatch object that can be used for direct
inspection or for iterating over all matches (if the regular
expression was built with the "g" option).
Range
replace(Range, Engine, String)(Range
input, Engine
regex, String
format);
- Search string for matches with regular expression pattern with
attributes. Replace the first match with string generated from format. If the regular expression has the "g" (global)
attribute, continue and replace all matches.
Parameters:
input |
Range to search. |
regex |
Regular expression pattern. |
format |
Replacement string format. |
Returns:
The resulting string.
Example:
s = "ark rapacity";
assert(replace(s, regex("r"), "c") == "ack rapacity");
assert(replace(s, regex("r", "g"), "c") == "ack capacity");
The replacement format can reference the matches using the $&, $$,
$', $`, .. 9 notation:
assert(replace("noon", regex("^n"), "[$&]") == "[n]oon");
Range
replace(alias fun, Range, Regex)(Range
s, Regex
rx);
- Search string for matches with regular expression pattern with
attributes. Pass each match to function fun. Replace each match
with the return value from dg.
Parameters:
s |
String to search. |
pattern |
Regular expression pattern. |
dg |
Delegate |
Returns:
the resulting string.
Example:
Capitalize the letters 'a' and 'r':
string baz(RegexMatch!(string) m)
{
return std.string.toUpper(m.hit);
}
auto s = replace!(baz)("Strap a rocket engine on a chicken.",
regex("[ar]", "g"));
assert(s == "StRAp A Rocket engine on A chicken.");
struct
Splitter(Range);
Splitter!(Range)
splitter(Range, Regex)(Range
r, Regex
pat);
- Range that splits another range using a regular expression as a
separator.
Example:
auto s1 = ", abc, de, fg, hi, ";
assert(equal(splitter(s1, regex(", *")),
["", "abc", "de", "fg", "hi", ""][]));