C# - Efficient search and replace char array in string

Question

I have e.g.

string str ='Àpple';
string strNew="";
char[] A = {'À','Á','Â','Ä'};
char[] a = {'à','á','â','ä'};

I want to look through the str and see if found replace with Ascii code 'A' . So the result should be:

strNew = 'Apple';

Here is my code:

for (int i = 0; i < str.Length; i++)
{ 
    if(str[i].CompareTo(A))
       strNew += 'A'
    else if(str[i].CompareTo(a)) 
       strNew +='a'
    else
       strNew += str[i];
}

But the compare function doesn't work, so what other function I can use?

It looks like you are trying to strip diacritics. Check out this answer for info on how to do it efficiently and reliably for all UNICODE characters, not only As. — Sergey Kalinichenko
– Sergey Kalinichenko, Commented Jun 19, 2012 at 18:01
possible duplicate of How do I remove diacritics (accents) from a string in .NET? — arcain
– arcain, Commented Jun 19, 2012 at 18:09

Jon Skeet · Accepted Answer · 2012-06-19 18:02:33Z

5

It sounds like you could just use:

if (A.Contains(str[i]))

but there are certainly more efficient ways of doing this. In particular, avoid string concatenation in a loop.

My guess is that there are Unicode normalization approaches which don't require you to hard-code all this data, too. I'm sure I remember one somewhere, around encoding fallbacks, but I can't put my finger on it... EDIT: I suspect it's around String.Normalize - worth a look, at least.

At the very least, this would be more efficient:

char[] mutated = new char[str.Length];
for (int i = 0; i < str.Length; i++)
{
    // You could use a local variable to avoid calling the indexer three
    // times if you really want...
    mutated[i] = A.Contains(str[i]) ? 'A'
               : a.Contains(str[i]) ? 'a'
               : str[i];
}
string strNew = new string(mutated);

edited Jun 19, 2012 at 18:02

answered Jun 19, 2012 at 17:57

Jon Skeet

1.5m893 gold badges9.3k silver badges9.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sergey Kalinichenko Over a year ago

It's a little more work than String.Normalize - you need to remove non-spacing characters after normalization. Here is a link to an answer on removing diacritics.

Benk Over a year ago

thx Jon, can you plz tell me why is it bad to do string concatenation in a loop?

Jon Skeet Over a year ago

@dasblinkenlight: Yes, it's not just a single call - but that's the crux of it. There are simpler alternatives to explicitly calling GetUnicodeCategory on each character yourself, e.g. using an ASCII encodinging with a replacement fallback of "".

Samy Arous · Accepted Answer · 2012-06-19 17:57:56Z

2

This should work:

for (int i = 0; i < str.Length; i++)
{ 
    if(A.Contains(str[i]))
        strNew += 'A'
    else if(a.Contains(str[i])) 
          strNew +='a'
    else
        strNew += str[i];
}

answered Jun 19, 2012 at 17:57

Samy Arous

6,81215 silver badges20 bronze badges

Comments

Marcel N. · Accepted Answer · 2012-06-19 18:24:57Z

0

Try with a regex (first replace with "A" and then with "a":

string result = Regex.Replace("Àpple", "([ÀÁÂÄ])", "A", RegexOptions.None);

And then you can do the same for "a".

edited Jun 19, 2012 at 18:24

answered Jun 19, 2012 at 18:06

Marcel N.

14k5 gold badges50 silver badges74 bronze badges

Collectives™ on Stack Overflow

C# - Efficient search and replace char array in string

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related