Matching Digits in .NET Regex
If you've done any work with regular expressions in .NET, you've probably come across the predefined shorthand character classes:
\w
matches any word character\s
matches any white-space character\d
matches any decimal digit
The \w
character class matches characters that are considered letters, digits, or certain punctuation marks. Similarly, \s
matches any character considered white space, such as various spaces, tabs, or newlines. But what characters exactly does the \d
shorthand match? What does "any decimal digit" mean?
The Meaning of \d
#
Depending on your cultural background, you might assume that \d
matches any of the ten digits denoted by the ASCII characters 48 through 57 (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9). Therefore, \d
would just be a shorter way of writing [0123456789]
or [0-9]
. Is that the case? Generally, no.
The \d
character class is only equivalent to [0-9]
if the RegexOptions.ECMAScript
flag is set, which enables ECMAScript-compliant behavior for the given regular expression (see MSDN documentation). Otherwise, \d
matches many more characters than just the ASCII digits. After all, there are many more digit characters used in various cultures, and those should be recognized as digits as well!
Here are some matches (still not all!) that you might not have expected:
߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩
For a full list of characters matched by \d
, check out this Gist.
To verify that the pattern \d
matches the above characters, paste them into a regex tool like Regex Lab .NET. You'll see that all 310 characters are being matched, although the editor doesn't display them correctly:
Problems with \d
#
As you can imagine, the problem with \d
is that it's commonly used in various places where the above digits probably aren't expected. For example, route definitions in ASP.NET applications accept a regular expression to constrain the set of valid values for any route parameter:
routes.MapHttpRoute("ProductDetails", "products/{id}",
new { controller = "Products" }, new { id = @"\d+" });
The intended purpose was to restrict the id
parameter to (positive) integer values. However, "products/൫൬൭"
is probably not a valid URL for the given endpoint because ൫൬൭
is very likely not an id the database (or another service) will understand.
Conclusion #
The next time you're about to use the \d
shorthand character class within a regular expression pattern, think about whether all of the above characters are valid input values. If not, use [0-9]
instead.
For more details on the various RegexOptions
flags and their meaning, check out my blog post about practical use cases of RegexOptions
. If you're interested in the topic, I also recommend you read the Character Classes in Regular Expressions article in the Microsoft Docs for more information on all shorthand character classes provided by the .NET regex engine.