Multiple String.Replace without interference?

Question

What is a prudent approach to performing multiple String.Replace without replacing text that has already been replaced. For example, say I have this string:

str = "Stacks be [img]http://example.com/overflowing.png[/img] :/";

A Regex I wrote will match the [img]url[/img], and let me replace it with the proper HTML <img> formatting.

str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> :/";

Afterwards I perform String.Replace to replace emoticon codes (:/, :(,:P, etc) with <img> tags. However, there's unintended results:

Intended Result

str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> " + 
    "<img src=\"emote-sigh.png\"/>";

Actual (and obvious) Result

str = "Stacks be <img src=\"http<img src=\"emote-sigh.png"/> " + 
    "/example.com/overflowing.png\"/>" + 
    "<img src=\"emote-sigh.png\"/>";

Unfortunently, with the number of replacements I plan to make, it seems impracticle to try to do it all in a single Regex expression (though I'd imagine that would be the most performant solution). What is a (slower but) more maintainable way to do this?

It feels to me like moving to a system where you actually parse the different parts of the input text would be more useful. Then you'd know what's an image URL, and what's text. You can then perform replacements on just the right bits... — Jon Skeet
– Jon Skeet, Commented Oct 8, 2013 at 6:03
@JonSkeet Strongly agree, doing it in this fashion should also produce the most maintainable result, all you'll have to do is specify the priority of the tokens to match and what to do with them when they're found as opposed to worrying about interactions between replacements — mlorbetske
– mlorbetske, Commented Oct 8, 2013 at 6:40

Amadan · Accepted Answer · 2013-10-08 06:00:35Z

Unfortunently, with the number of replacements I plan to make, it seems impracticle to try to do it all in a single Regex expression (though I'd imagine that would be the most performant solution). What is a (slower but) more maintainable way to do this?

Might seem so, but isn't. Take a look at this article.

tl;dr: Replace accepts a delegate as its second argument. So match on a pattern that is a disjunction of all the different things you want to simultaneously replace, and in the delegate use a Dictionary or a switch or a similar strategy to select the correct replacement for the current element.

The strategy in the article depends on keys being static strings; if there are regexp operators in keys, the concept fails. There is a better way, by wrapping the keys in capture parentheses, you can just test for the presence of the appropriate capture group to see which brace matched.

npinti · Accepted Answer · 2013-10-08 06:00:58Z

3

The most obvious approach would be to use a regular expression to replace whatever text you need. So in short, you could use a regex like so: :/[^/] to match :/ but not ://.

You could also use groups to know which pattern you have matched thus allowing you to know what to put.

answered Oct 8, 2013 at 6:00

npinti

52.2k5 gold badges74 silver badges98 bronze badges

Comments

mlorbetske · Accepted Answer · 2013-10-08 06:26:37Z

Another alternative is to use a sort of a modified Lexer to isolate each of the discrete regions in your text where a certain replacement is warranted and marking that block so that replacements aren't run in it again

Here's an example of how you'd do that:

First, we'll create a class that indicates whether a particular string is used or not

public class UsageIndicator
{
    public string Value { get; private set; }

    public bool IsUsed { get; private set; }

    public UsageIndicator(string value, bool isUsed)
    {
        Value = value;
        IsUsed = isUsed;
    }

    public override string ToString()
    {
        return Value;
    }
}

Then we'll define a class that represents both how to locate a "token" in your text and what to do when it's been found

public class TokenOperation
{
    public Regex Pattern { get; private set; }

    public Func<string, string> Mutator { get; private set; }

    public TokenOperation(string pattern, Func<string, string> mutator)
    {
        Pattern = new Regex(pattern);
        Mutator = mutator;
    }

    private List<UsageIndicator> ExtractRegions(string source, int index, int length, out int matchedIndex)
    {
        var result = new List<UsageIndicator>();
        var head = source.Substring(0, index);
        matchedIndex = 0;

        if (head.Length > 0)
        {
            result.Add(new UsageIndicator(head, false));
            matchedIndex = 1;
        }

        var body = source.Substring(index, length);
        body = Mutator(body);
        result.Add(new UsageIndicator(body, true));

        var tail = source.Substring(index + length);

        if (tail.Length > 0)
        {
            result.Add(new UsageIndicator(tail, false));
        }

        return result;
    }

    public void Match(List<UsageIndicator> source)
    {
        for (var i = 0; i < source.Count; ++i)
        {
            if (source[i].IsUsed)
            {
                continue;
            }

            var value = source[i];
            var match = Pattern.Match(value.Value);

            if (match.Success)
            {
                int modifyIBy;
                source.RemoveAt(i);
                var regions = ExtractRegions(value.Value, match.Index, match.Length, out modifyIBy);

                for (var j = 0; j < regions.Count; ++j)
                {
                    source.Insert(i + j, regions[j]);
                }

                i += modifyIBy;
            }
        }
    }
}

After taking care of those things, putting something together to do the replacement is pretty simple

public class Rewriter
{
    private readonly List<TokenOperation> _definitions = new List<TokenOperation>();

    public void AddPattern(string pattern, Func<string, string> mutator)
    {
        _definitions.Add(new TokenOperation(pattern, mutator));
    }

    public void AddLiteral(string pattern, string replacement)
    {
        AddPattern(Regex.Escape(pattern), x => replacement);
    }

    public string Rewrite(string value)
    {
        var workingValue = new List<UsageIndicator> { new UsageIndicator(value, false) };

        foreach (var definition in _definitions)
        {
            definition.Match(workingValue);
        }

        return string.Join("", workingValue);
    }
}

In the demo code (below), keep in mind that the order in which pattern or literal expressions are added is important. The things that are added first get tokenized first, so, to prevent the :// in the url from getting picked off as an emoticon plus a slash, we process the image block first, as it'll contain the url between the tags and be marked as used before the emoticon rule can try to get it.

class Program
{
    static void Main(string[] args)
    {
        var rewriter = new Rewriter();
        rewriter.AddPattern(@"\[img\].*?\[/img\]", x => x.Replace("[img]", "<img src=\"").Replace("[/img]", "\"/>"));
        rewriter.AddLiteral(":/", "<img src=\"emote-sigh.png\"/>");
        rewriter.AddLiteral(":(", "<img src=\"emote-frown.png\"/>");
        rewriter.AddLiteral(":P", "<img src=\"emote-tongue.png\"/>");

        const string str = "Stacks be [img]http://example.com/overflowing.png[/img] :/";
        Console.WriteLine(rewriter.Rewrite(str));
    }
}

The sample prints:

Stacks be <img src="http://example.com/overflowing.png"/> <img src="emote-sigh.png"/>

Alan Moore · Accepted Answer · 2013-10-08 06:51:09Z

1

If you do not want to use any complex Regex than you can e.g. split the text into any kind of container.

You should split based on tokens found in the text: in your case a token is a text between [img] [/img] (including those [img] tags), that is [img]http://example.com/overflowing.png[/img].

Then you can apply [img] replace method on these tokens and emoticons replace method on the rest of elements in the aforementioned container. Then you just output a string containing all the container elements.

Below you fill find example contents of such container after the split procedure:

 1. "Stacks be " 
 2. "[img]http://example.com/overflowing.png[/img]" 
 3. " :/"

To elements 1 & 3 you apply emoticon replace and in case of token element number 2 you apply [img] replace.

edited Oct 8, 2013 at 6:51

Alan Moore

75.6k13 gold badges109 silver badges161 bronze badges

answered Oct 8, 2013 at 6:03

Tobiasz

1,0791 gold badge12 silver badges29 bronze badges

Comments

Nitu Bansal · Accepted Answer · 2013-10-08 06:00:16Z

0

you can replace like below

string.replace( string.replace("[img]","<img src=\""),"[/img]","\"/>")

it should work.

answered Oct 8, 2013 at 6:00

Nitu Bansal

3,8563 gold badges20 silver badges24 bronze badges

Comments

iTURTEV · Accepted Answer · 2013-10-08 06:16:50Z

Here is a code snippet from my old project:

private string Emoticonize(string originalStr)
{
    StringBuilder RegExString = new StringBuilder(@"(?<=^|\s)(?:");
    foreach (KeyValuePair<string, string> e in Emoticons)
    {
        RegExString.Append(Regex.Escape(e.Key) + "|");
    }
    RegExString.Replace("|", ")", RegExString.Length - 1, 1);
    RegExString.Append(@"(?=$|\s)");
    MatchCollection EmoticonsMatches = Regex.Matches(originalStr, RegExString.ToString());

    RegExString.Clear();
    RegExString.Append(originalStr);
    for (int i = EmoticonsMatches.Count - 1; i >= 0; i--)
    {
        RegExString.Replace(EmoticonsMatches[i].Value, Emoticons[EmoticonsMatches[i].Value], EmoticonsMatches[i].Index, EmoticonsMatches[i].Length);
    }

    return RegExString.ToString();
}

Emoticons is a Dictionary where I have stored emoticon codes as a key and the corresponding images for a value.

Abdul Saleem · Accepted Answer · 2013-10-08 06:22:44Z

0

        string[] emots = { ":/", ":(", ":)" };
        string[] emotFiles = { "emote-sigh", "emot-sad.png", "emot-happy.png" };

        string replaceEmots(string val)
        {
            string res = val;
            for (int i = 0; i < emots.Length; i++)
                res = res.Replace(emots[i], "<img src=\"" + emotFiles[i] + ".png\"/>");
            return res;
        }

        void button1_click()
        {
            string str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> :/";
            str = replaceEmots(str);
        }

answered Oct 8, 2013 at 6:22

Abdul Saleem

10.7k6 gold badges47 silver badges47 bronze badges

Comments

Tapan kumar · Accepted Answer · 2013-10-08 06:40:13Z

0

Here is the code which did the replace in my case. And the output is exactly what you want.

    str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> :/";


        // check if the htmltemplate hold any template then set it or else hide the div data.
        if (!String.IsNullOrEmpty(str))
        {
            divStaticAsset.InnerHtml = str.Replace("[img]", "<img src=\'").
                                                    Replace("[/img]", "\'/>") + "<img src=\'emote-sigh.png'/>";

        }

answered Oct 8, 2013 at 6:40

Tapan kumar

7,0191 gold badge28 silver badges26 bronze badges

2 Comments

mlorbetske Over a year ago

This will output Stacks be <img src="http://example.com/overflowing.png"/> :/<img src="emote-sigh.png"/>, noting that :/ is present before the second <img> tag

Tapan kumar Over a year ago

Then just add :/ inside this Replace("[/img]", "\'/>")

Collectives™ on Stack Overflow

Multiple String.Replace without interference?

8 Answers 8

Comments

Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Comments

Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest