0

I have a C# regular expression to match author names in a text document that is written as:

"author":"AUTHOR'S NAME"

The regex is as follows:

new Regex("\"author\":\"[A-Za-z0-9]*\\s?[A-Za-z0-9]*")

This returns "author":"AUTHOR'S NAME. However, I don't want the quotation marks or the word Author before. I just want the name.

Could anyone help me get the expected value please?

3
  • 1
    Can you post the author content as it appears within the text document? Commented May 20, 2015 at 8:50
  • Unrelated, but your expression matches only a subset of names. It doesn't allow for any special characters, such as the ' in O'Connor and it only allows for exactly one space, no hyphens, no foreign characters etc. I don't know your usecase, but if you know the author name is going to be enclosed within the quotes, you would get away with just accepting anything that isn't a double quotes: \"author\":\"([^\"]+)\". Commented May 20, 2015 at 8:58
  • Thanks David. Good point I hadn't considered. Gibbs, the author's name(s) will appear after the text I've managed to find in quotation marks, as shown above. Commented May 20, 2015 at 9:41

2 Answers 2

3

Use regex groups to get a part of the string. ( ) acts as a capture group and can be accessed by the .Groups field.

.Groups[0] matches the whole string

.Groups[1] matches the first group (and so on)

string pattern = "\"author\":\"([A-Za-z0-9]*\\s?[A-Za-z0-9]*)\"";
var match = Regex.Match("\"author\":\"Name123\"", pattern);
string authorName = match.Groups[1];
Sign up to request clarification or add additional context in comments.

Comments

0

You can also use look-around approach to only get a match value:

var txt = "\"author\":\"AUTHOR'S NAME\"";
var rgx = new Regex(@"(?<=""author"":"")[^""]+(?="")");
var result = rgx.Match(txt).Value;

My regex yields 555,020 iterations per second speed with this input string, which should suffice.

result will be AUTHOR'S NAME.

(?<="author":") checks if we have "author":" before the match, [^"]+ looks safe since you only want to match alphanumerics and space between the quotes, and (?=") is checking the trailing quote.

3 Comments

Does it work for you or do you need more assistance?
Not at my computer right now, will let you know when I am.
Sorry to bother, did you have time to check my approach? BTW, if there can be spaces around :, we can enhance the look-behind as @"(?<=""author""\s*:\s*"")[^""]+(?="")".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.