sommergyll.software(c#);
"Guide to designing a basic regular expressions in C#"
 

Introduction

Regular expressions are a powerful tool to search and manipulate text. Unfortunately, the power comes at a price: the syntax is rather daunting to learn and master; even an experienced developer may have problems in getting it right from the start.

This guide aims to be an introduction to Regular Expression design using C#. You will learn some basic skills in composing a regular expression or regex, as it is also called. In addition, we will show what basic C# classes that could be used for this task.

In the first section we study matching and in the next we look at string replacement. We jump directly to the matching section and introduce any theory as we go along.

Matching a Simple Pattern

Our first regular expression example is very simple and is introduced to show the different classes that participate in a pattern match. As pattern we use: Controls and as source text: Custom Controls and User Drawn Controls. To implement a pattern match in C#, remember to include the System.Text.RegularExpressions namespace in your own code. In the following text, we mention Matches, Groups and Captures. The concept can be a bit confusing at first, especially when reading the .NET documentation for the APIs. However, the logic behind this structure will be revealed in the next few code examples and hopefully you will find it straightforward.

MatchCollection objects

using System.Text.RegularExpressions;
string source = "Custom Controls and User Drawn Controls";
string pattern = "Controls";

MatchCollection mc = Regex.Matches(source, pattern);
foreach (Match m in mc) {
    Console.WriteLine("Match: [{0}]", m.ToString());
    showGroupCollection(m.Groups); 
}  

The Matches method in the code above is using the static variant of the Regex class. Matches is also available as an instance method.

 

The result from a Matches call is the MatchCollection object. This object contains all the matches that were found in the source string. In order to view the contents of the MatchCollection object, iterate in e.g. a foreach loop to get every single Match object. Note that every Match object contains a string that matches the provided pattern. Every Match object has a GroupCollection object, which allows us to iterate through all the captured groups.

We could have put all the code needed to display the matches and the sub strings in one single block. However, it is easier to read and understand the code if it is separated into smaller units. We will continue this approach throughout the Guide.

GroupCollection objects

public void showGroupCollection(GroupCollection aGC) {   
    foreach (Group g in aGC) {
        Console.WriteLine(" Group: [{0}]", g.ToString());        
        showCaptureCollection(g.Captures;);
    }
}  

The input argument is a GroupCollection. The first object in that array has the same value as the one in the Match object. Any following array objects in the GroupCollection corresponds to the last captured string for its own Capture(s), which you will see in the next example. Please note that in this simple example there is only one captured string for each match, so obviously the loop expression is not necessary. However, the code was designed this way to make it generic.

CaptureCollection objects

public void showCaptureCollection(CaptureCollection aCC) {    
    foreach (Capture c in aCC) {
        string s = c.ToString();
        string pos = c.Index.ToString();
        Console.WriteLine("  Capture: [{0}] at pos: {1}", s, pos);
    }
} 

The input argument is a CaptureCollection. This collection may contain more than one object, depending on the complexity of the pattern. In our simple example, there is no real Capture provided, but (most likely) for the robustness of the .NET API, the Capture object still contains the match that was found. A useful feature in the Capture class is the possibility to get the position of the captured string.

Results

Match: [Controls]
 Group: [Controls]
  Capture: [Controls] at pos: 7

Match: [Controls]
 Group: [Controls]
  Capture: [Controls] at pos: 31    

The results show that we have two matches, with the positions 7 and 31, as expected. As this was a simple example, the results does not give a good picture of the Regex design. In fact, using ordinary string handling APIs would probably be easier and more efficient in this simple case.

 
 

Regex Complex | Regex Replace | ComboBox control | Front Page

Disclaimer

© Copyright 2003-2010 Sommergyll Software. All Rights Reserved.

Basic Guide on Regular Expressions