4

I am trying to use RegEx.Replace to convert a string into Pascal case. RegEx is not necessary, but I thought that maybe it'll be easier. Here are some example test cases I'm trying to convert:

simple simon says       => SimpleSimonSays
SIMPLE SIMON SaYs       => SimpleSimonSays
simple_simon_says       => SimpleSimonSays
simple    simon    says => SimpleSimonSays
simpleSimonSays         => SimpleSimonSays
simple___simon___  says => SimpleSimonSays

The method I currently have doesn't use RegEx and it works correctly on 4 of the 5 examples above:

internal static string GetPascalCaseName(string name)
{
    string s = System.Globalization.CultureInfo.CurrentCulture.
               TextInfo.ToTitleCase(name.ToLower()).Replace(" ", "").Replace("_", "");

    return s;
}

The one example that fails is simpleSimonSays. It currently returns Simplesimonsays instead of SimpleSimonSays. How can I make this work on all 4 scenarios?

EDIT

So basically, words are distinguished if there are spaces seperating them, or underscores, or whenever an upper-case character is reached. Also, multiple spaces and/or multiple underscores should be treated as one. Basically spaces and underscores should just be ignored and used as a signal that the next letter should be a capital letter. Like this:

simple_____simon___   says => SimpleSimonSays
11
  • 3
    How will you determine where 'simple' and 'simon', as well as 'simon' and 'says' starts and finishes? I think that is the real issue, how do you determine where one word ends and the other begins for casing if it is a single string with no determining start/finish between each word? Commented Nov 12, 2018 at 19:00
  • As far as the single words like simpleSimonSays, there is no boundary to extract a case separation. So, unless you're using natural language processing, regex isn't going to ever do that. Commented Nov 12, 2018 at 19:15
  • Otherwise, \b([^\W_]+)(?:[ _]*([^\W_]+))*\b and use Capture Collections within a delegate callback. Commented Nov 12, 2018 at 19:20
  • Think of the single word scenario like this, instead of using simpleSimonSays use pkrltUdrXywaT Commented Nov 12, 2018 at 19:23
  • @Ingenioushax - I Updated my question. Words are distinguished if there are spaces separating them, or underscores, or whenever an upper-case character is reached (assume it's the start of a new word) Commented Nov 12, 2018 at 21:33

3 Answers 3

1

I have a trick for solving your problem. Using regex, split the word and introduce a space within word for words where there is no space or underscore, that are camel case (like this simpleSimonSays). Modify your method to this,

internal static string GetPascalCaseName(string name)
{
    if (!name.Contains(" ")) {
        name = Regex.Replace(name, "(?<=[a-z])(?=[A-Z])", " ");
    }
    string s = System.Globalization.CultureInfo.CurrentCulture.
               TextInfo.ToTitleCase(name.ToLower()).Replace(" ", "").Replace("_", "");

    return s;
}

This new line in your method,

name = Regex.Replace(name, "(?<=[a-z])(?=[A-Z])", " ");

splits the camel case word by introducing a space between them, making them like others where you had no difficultly.

For this input,

simpleSimonSays

It outputs this,

SimpleSimonSays

And for rest of the input, it works anyway. This strategy will work even for words where you have partially camel case and partially space or underscore too.

Sign up to request clarification or add additional context in comments.

4 Comments

How should pkrltUdrXywaT be capitalized ?
This almost works. It does not work for my second example though. SIMPLE SIMON SaYs becomes SimpleSimonSaYs instead of SimpleSimonSays
@sln: I don't see any trouble there. pkrltUdrXywaT would simply become PkrltUdrXywaT where only the first letter changes and becomes capital letter. Did you expect something else?
@Icemanind: Sorry I missed to see your one case that wasn't working. I've taken care for your 'SIMPLE SIMON SaYs' input and now it would give you your desired result 'SimpleSimonSays'. Please check my updated code. Also, if you have any other case, then let me know. I will further tweak the code and make it work.
0

Here is solution without Regex. The last one cannot be done.

            string[] input = {
                "simple simon says",
                "SIMPLE SIMON SaYs",
                "simple_simon_says",
                "simple    simon    says",
                "simpleSimonSays"
                             };

            var temp = input.Select(x => x.Split(new char[] {' ', '_'}, StringSplitOptions.RemoveEmptyEntries).Select(y => y.Select((z,i) => (i == 0) ? z.ToString().ToUpper() : z.ToString().ToLower()))).ToArray();
            string[] output = temp.Select(x => string.Join("", x.Select(y => string.Join("",y)))).ToArray();

Comments

0

If can be version like "abc simpleSimonSays" then it's impossible. Or need to add more rules. Or things like deep learning :)
EDIT:
possible code (but without "abc simpleSimonSays"):

var s = "simple__simon_says __ Hi _ _,,, __coolWa";

var s1 = Regex.Replace(s, "[ _,]+", " ");
var s2 = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(s1);
var s3 = s2.Replace(" ","");

// s1 = "simple simon says Hi coolWa"
// s2 = "Simple Simon Says Hi Coolwa"
// s3 = "SimpleSimonSaysHiCoolwa"

2 Comments

Words are distinguished if there are spaces separating them, or underscores, or whenever an upper-case character is reached (assume it's the start of a new word). So in your example, it should be "AbcSimpleSimonSays".
I mean difference a)SIMPLE SIMON SaYs => SimpleSimonSays b)abc simpleSimonSays => AbcSimpleSimonSays program can't know why a)"Y" should become "y" but b)"S" should left big. Human understands, but for program - just big letter after small

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.