Introduction
Regular expressions are a powerful tool to search and manipulate text. Unfortunately,
the power comes at a price: the syntax is rather daunting to learn and master; even
an experienced developer may have problems in getting it right from the start.
This guide aims to be an introduction to Regular Expression design using C#. You
will learn some basic skills in composing a regular expression or regex, as it
is also called. In addition, we will show what basic C# classes that could be used for this task.
In the first section we study matching and in the next we look at string replacement.
We jump directly to the matching section and introduce any theory as we go along.
Matching a Simple Pattern
Our first regular expression example is very simple and is introduced to show the
different classes that participate in a pattern match. As pattern we use:
Controls and as source text: Custom Controls and User
Drawn Controls. To implement a pattern match in C#, remember to include
the System.Text.RegularExpressions namespace in your own code. In the following
text, we mention Matches, Groups and Captures. The concept can be a bit confusing
at first, especially when reading the .NET documentation for the APIs. However,
the logic behind this structure will be revealed in the next few code examples
and hopefully you will find it straightforward.
MatchCollection objects
using System.Text.RegularExpressions;
string source = "Custom Controls and User Drawn Controls";
string pattern = "Controls";
MatchCollection mc = Regex.Matches(source, pattern);
foreach (Match m in mc) {
Console.WriteLine("Match: [{0}]", m.ToString());
showGroupCollection(m.Groups);
}
The Matches method in the code above is using the static variant of the Regex class.
Matches is also available as an instance method.
The result from a Matches call is the MatchCollection object. This object contains
all the matches that were found in the source string. In order to view the contents
of the MatchCollection object, iterate in e.g. a foreach loop to get every single
Match object. Note that every Match object contains a string that matches the provided
pattern. Every Match object has a GroupCollection object, which allows us to iterate
through all the captured groups.
We could have put all the code needed to display the matches and the sub strings
in one single block. However, it is easier to read and understand the code if it
is separated into smaller units. We will continue this approach throughout the Guide.
GroupCollection objects
public void showGroupCollection(GroupCollection aGC) {
foreach (Group g in aGC) {
Console.WriteLine(" Group: [{0}]", g.ToString());
showCaptureCollection(g.Captures;);
}
}
The input argument is a GroupCollection. The first object in that array has the
same value as the one in the Match object. Any following array objects in the GroupCollection
corresponds to the last captured string for its own Capture(s), which you will see
in the next example. Please note that in this simple example there is only one captured
string for each match, so obviously the loop expression is not necessary. However,
the code was designed this way to make it generic.
CaptureCollection objects
public void showCaptureCollection(CaptureCollection aCC) {
foreach (Capture c in aCC) {
string s = c.ToString();
string pos = c.Index.ToString();
Console.WriteLine(" Capture: [{0}] at pos: {1}", s, pos);
}
}
The input argument is a CaptureCollection. This collection may contain more than
one object, depending on the complexity of the pattern. In our simple example, there
is no real Capture provided, but (most likely) for the robustness of the .NET API, the
Capture object still contains the match that was found. A useful feature in the Capture
class is the possibility to get the position of the captured string.
Results
Match: [Controls]
Group: [Controls]
Capture: [Controls] at pos: 7
Match: [Controls]
Group: [Controls]
Capture: [Controls] at pos: 31
The results show that we have two matches, with the positions 7 and 31, as expected.
As this was a simple example, the results does not give a good picture of the Regex design.
In fact, using ordinary string handling APIs would probably be
easier and more efficient in this simple case.
|