This issue has nothing to do with datatypes, whether for input parameters or return values, as the code provided, while sparse on detail, does show enough to see that:
- there is no input parameter being used (the string is hard-coded).
- the error is being thrown by
System.Text.RegularExpressions.Regex, so has nothing to do with T-SQL or return values / types.
Also, while the error message does mention "Quantifier {x,y}", and there is indeed a {1,10} quantifier being used in the Regular Expression, it is a false correlation (albeit a rather understandable one) that the error message is referring to that specific quantifier. If you shorten the Regular Expression down to just "شماره", you will get the same error, except it will report the Regular Expression as being just "?????". Hence, "Quantifier {x,y}" actually refers to the first "?" in the expression shown in the error message (you will get the same error even if the Regular Expression is nothing more than "ش"). I figure that "Quantifier {x,y}" is the generalized way of looking at the ?, +, and * quantifiers as they can also be expressed as {0,1}, {1,}, and {0,}, respectively (or at least they should be).
This issue has nothing to do with SQL Server, or even Regular Expressions. This is an encoding issue, and RegEx is reporting the problem because it is being given ????? instead of شماره.
<TL;DR> Check your source code file's encoding. You might need to go to "Save As...", click on the down-arrow to the right of the word "Save" on the "Save" button, select "Save with Encoding...", and then select "Unicode (UTF-8 with signature) - Codepage 65001".
There is a problem with the project configuration and/or the compiler. I placed the following string in both a Console Application and a Database Project:
"-😈-ŏ-א---\U0001F608-\u014F-\u05D0-"
(The second half of that test string, after the ---, is merely the escape sequences for the same three characters as appear in the first half, and in the same order.)
I compiled both and inspected the compiled output (meaning: it hasn't been deployed to SQL Server yet). That string appears in the EXE file (Console App) as:
2D003DD808DE2D004F012D00D0052D002D002D003DD808DE2D004F012D00D0052D00
which is the UTF-16 LE encoding for: -😈-ŏ-א---😈-ŏ-א-
Yet, it appears in the DLL file (SQLCLR Assembly) as:
2D003F003F002D003F002D003F002D002D002D003DD808DE2D004F012D00D0052D00
which is the UTF-16 LE encoding for: -??-?-?---😈-ŏ-א-
I even changed the output type of the Console App project to be "Class Library" and the string still got embedded correctly in that DLL file. So, for some reason the literal characters are being turned into literal question marks when compiled into a SQLCLR Assembly. I haven't yet figured out what is causing this as a quick look at the config settings and command-line flags for csc.exe seems to show them being effectively the same.
In either case, it should be clear that specifying the Arabic characters via escape sequences, while cumbersome, will at least work, hence providing a (hopefully short-term) work-around so that you can move forward on this. I will continue looking to see what could be causing this difference in behavior.
UPDATE
In order to determine if the string was being converted to an 8-bit encoding or something else, I added two characters to the test string (one in both Windows-1252 and ISO-8859-1, and one only in Windows-1252):
§ = 0xA7 in CP-1252, 0xA7 in ISO-8859-1, and 0x00A7 in UTF-16
œ = 0x9C in CP-1252, not in ISO-8859-1, and 0x0153 in UTF-16
The new test string is:
"-😈-ŏ-א-§-œ---\U0001F608-\u014F-\u05D0-\x00A7-\x0153-"
That string appears in the EXE file (Console App) as:
2D003DD808DE2D004F012D00D0052D00A7002D0053012D002D002D003DD808DE2D004F012D00D0052D00A7002D0053012D00
which is the UTF-16 LE encoding for: -😈-ŏ-א-§-œ---😈-ŏ-א-§-œ-
Yet, it appears in the DLL file (SQLCLR Assembly) as:
2D003F003F002D003F002D003F002D00A7002D0053012D002D002D003DD808DE2D004F012D00D0052D00A7002D0053012D00
which is the UTF-16 LE encoding for: -??-?-?-§-œ---😈-ŏ-א-§-œ-
So, because both § and œ came through correctly in the SQLCLR Assembly, it is clearly not ISO-8859-1. And, it is either Code Page Windows-1252 or some other that supports both of those characters (CP-1252 being the most likely given that my system is using it).
Still investigating the root cause...
UPDATE 2
Ok, I feel kinda silly. Sometimes it helps to close a file (or the entire solution sometimes) and reopen it. Doing so I noticed that my test string now appeared as:
"-??-?-?-?-?---\U0001F608-\u014F-\u05D0-\x00A7-\x0153-"
Funny, I don't remember pasting that in ;-). So, I checked the file encoding that Visual Studio was saving it as and sure enough it was "Western European (Windows) - Codepage 1252". And just to be extra special certain, I checked the file for the Console App and it was correctly set to "Unicode (UTF-8 with signature) - Codepage 65001". D'oh! Changing the file encoding under "Save As..." to "Unicode (UTF-8 with signature) - Codepage 65001", I then replaced both the test string and the O.P.'s Regular Expression. Both came through perfectly, no errors or question marks.
nvarchar, or the return type isn't; at a guess. No SQL or code to debug here, so impossible to suggest more.{1,10}would precede\din this case.