5

I have e.g.

string str ='Àpple';
string strNew="";
char[] A = {'À','Á','Â','Ä'};
char[] a = {'à','á','â','ä'};

I want to look through the str and see if found replace with Ascii code 'A' . So the result should be:

strNew = 'Apple';

Here is my code:

for (int i = 0; i < str.Length; i++)
{ 
    if(str[i].CompareTo(A))
       strNew += 'A'
    else if(str[i].CompareTo(a)) 
       strNew +='a'
    else
       strNew += str[i];
}

But the compare function doesn't work, so what other function I can use?

3
  • Look-up table and StringBuilder. Less code and faster. Commented Jun 19, 2012 at 17:59
  • It looks like you are trying to strip diacritics. Check out this answer for info on how to do it efficiently and reliably for all UNICODE characters, not only As. Commented Jun 19, 2012 at 18:01
  • possible duplicate of How do I remove diacritics (accents) from a string in .NET? Commented Jun 19, 2012 at 18:09

3 Answers 3

5

It sounds like you could just use:

if (A.Contains(str[i]))

but there are certainly more efficient ways of doing this. In particular, avoid string concatenation in a loop.

My guess is that there are Unicode normalization approaches which don't require you to hard-code all this data, too. I'm sure I remember one somewhere, around encoding fallbacks, but I can't put my finger on it... EDIT: I suspect it's around String.Normalize - worth a look, at least.

At the very least, this would be more efficient:

char[] mutated = new char[str.Length];
for (int i = 0; i < str.Length; i++)
{
    // You could use a local variable to avoid calling the indexer three
    // times if you really want...
    mutated[i] = A.Contains(str[i]) ? 'A'
               : a.Contains(str[i]) ? 'a'
               : str[i];
}
string strNew = new string(mutated);
Sign up to request clarification or add additional context in comments.

3 Comments

It's a little more work than String.Normalize - you need to remove non-spacing characters after normalization. Here is a link to an answer on removing diacritics.
thx Jon, can you plz tell me why is it bad to do string concatenation in a loop?
@dasblinkenlight: Yes, it's not just a single call - but that's the crux of it. There are simpler alternatives to explicitly calling GetUnicodeCategory on each character yourself, e.g. using an ASCII encodinging with a replacement fallback of "".
2

This should work:

for (int i = 0; i < str.Length; i++)
{ 
    if(A.Contains(str[i]))
        strNew += 'A'
    else if(a.Contains(str[i])) 
          strNew +='a'
    else
        strNew += str[i];
}

Comments

0

Try with a regex (first replace with "A" and then with "a":

string result = Regex.Replace("Àpple", "([ÀÁÂÄ])", "A", RegexOptions.None);

And then you can do the same for "a".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.