0

What is a prudent approach to performing multiple String.Replace without replacing text that has already been replaced. For example, say I have this string:

str = "Stacks be [img]http://example.com/overflowing.png[/img] :/";

A Regex I wrote will match the [img]url[/img], and let me replace it with the proper HTML <img> formatting.

str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> :/";

Afterwards I perform String.Replace to replace emoticon codes (:/, :(,:P, etc) with <img> tags. However, there's unintended results:

Intended Result

str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> " + 
    "<img src=\"emote-sigh.png\"/>";

Actual (and obvious) Result

str = "Stacks be <img src=\"http<img src=\"emote-sigh.png"/> " + 
    "/example.com/overflowing.png\"/>" + 
    "<img src=\"emote-sigh.png\"/>";

Unfortunently, with the number of replacements I plan to make, it seems impracticle to try to do it all in a single Regex expression (though I'd imagine that would be the most performant solution). What is a (slower but) more maintainable way to do this?

3
  • please show the replace code that you are using Commented Oct 8, 2013 at 5:59
  • 1
    It feels to me like moving to a system where you actually parse the different parts of the input text would be more useful. Then you'd know what's an image URL, and what's text. You can then perform replacements on just the right bits... Commented Oct 8, 2013 at 6:03
  • @JonSkeet Strongly agree, doing it in this fashion should also produce the most maintainable result, all you'll have to do is specify the priority of the tokens to match and what to do with them when they're found as opposed to worrying about interactions between replacements Commented Oct 8, 2013 at 6:40

8 Answers 8

3

Unfortunently, with the number of replacements I plan to make, it seems impracticle to try to do it all in a single Regex expression (though I'd imagine that would be the most performant solution). What is a (slower but) more maintainable way to do this?

Might seem so, but isn't. Take a look at this article.

tl;dr: Replace accepts a delegate as its second argument. So match on a pattern that is a disjunction of all the different things you want to simultaneously replace, and in the delegate use a Dictionary or a switch or a similar strategy to select the correct replacement for the current element.

The strategy in the article depends on keys being static strings; if there are regexp operators in keys, the concept fails. There is a better way, by wrapping the keys in capture parentheses, you can just test for the presence of the appropriate capture group to see which brace matched.

Sign up to request clarification or add additional context in comments.

Comments

3

The most obvious approach would be to use a regular expression to replace whatever text you need. So in short, you could use a regex like so: :/[^/] to match :/ but not ://.

You could also use groups to know which pattern you have matched thus allowing you to know what to put.

Comments

2

Another alternative is to use a sort of a modified Lexer to isolate each of the discrete regions in your text where a certain replacement is warranted and marking that block so that replacements aren't run in it again

Here's an example of how you'd do that:

First, we'll create a class that indicates whether a particular string is used or not

public class UsageIndicator
{
    public string Value { get; private set; }

    public bool IsUsed { get; private set; }

    public UsageIndicator(string value, bool isUsed)
    {
        Value = value;
        IsUsed = isUsed;
    }

    public override string ToString()
    {
        return Value;
    }
}

Then we'll define a class that represents both how to locate a "token" in your text and what to do when it's been found

public class TokenOperation
{
    public Regex Pattern { get; private set; }

    public Func<string, string> Mutator { get; private set; }

    public TokenOperation(string pattern, Func<string, string> mutator)
    {
        Pattern = new Regex(pattern);
        Mutator = mutator;
    }

    private List<UsageIndicator> ExtractRegions(string source, int index, int length, out int matchedIndex)
    {
        var result = new List<UsageIndicator>();
        var head = source.Substring(0, index);
        matchedIndex = 0;

        if (head.Length > 0)
        {
            result.Add(new UsageIndicator(head, false));
            matchedIndex = 1;
        }

        var body = source.Substring(index, length);
        body = Mutator(body);
        result.Add(new UsageIndicator(body, true));

        var tail = source.Substring(index + length);

        if (tail.Length > 0)
        {
            result.Add(new UsageIndicator(tail, false));
        }

        return result;
    }

    public void Match(List<UsageIndicator> source)
    {
        for (var i = 0; i < source.Count; ++i)
        {
            if (source[i].IsUsed)
            {
                continue;
            }

            var value = source[i];
            var match = Pattern.Match(value.Value);

            if (match.Success)
            {
                int modifyIBy;
                source.RemoveAt(i);
                var regions = ExtractRegions(value.Value, match.Index, match.Length, out modifyIBy);

                for (var j = 0; j < regions.Count; ++j)
                {
                    source.Insert(i + j, regions[j]);
                }

                i += modifyIBy;
            }
        }
    }
}

After taking care of those things, putting something together to do the replacement is pretty simple

public class Rewriter
{
    private readonly List<TokenOperation> _definitions = new List<TokenOperation>();

    public void AddPattern(string pattern, Func<string, string> mutator)
    {
        _definitions.Add(new TokenOperation(pattern, mutator));
    }

    public void AddLiteral(string pattern, string replacement)
    {
        AddPattern(Regex.Escape(pattern), x => replacement);
    }

    public string Rewrite(string value)
    {
        var workingValue = new List<UsageIndicator> { new UsageIndicator(value, false) };

        foreach (var definition in _definitions)
        {
            definition.Match(workingValue);
        }

        return string.Join("", workingValue);
    }
}

In the demo code (below), keep in mind that the order in which pattern or literal expressions are added is important. The things that are added first get tokenized first, so, to prevent the :// in the url from getting picked off as an emoticon plus a slash, we process the image block first, as it'll contain the url between the tags and be marked as used before the emoticon rule can try to get it.

class Program
{
    static void Main(string[] args)
    {
        var rewriter = new Rewriter();
        rewriter.AddPattern(@"\[img\].*?\[/img\]", x => x.Replace("[img]", "<img src=\"").Replace("[/img]", "\"/>"));
        rewriter.AddLiteral(":/", "<img src=\"emote-sigh.png\"/>");
        rewriter.AddLiteral(":(", "<img src=\"emote-frown.png\"/>");
        rewriter.AddLiteral(":P", "<img src=\"emote-tongue.png\"/>");

        const string str = "Stacks be [img]http://example.com/overflowing.png[/img] :/";
        Console.WriteLine(rewriter.Rewrite(str));
    }
}

The sample prints:

Stacks be <img src="http://example.com/overflowing.png"/> <img src="emote-sigh.png"/>

Comments

1

If you do not want to use any complex Regex than you can e.g. split the text into any kind of container.

You should split based on tokens found in the text: in your case a token is a text between [img] [/img] (including those [img] tags), that is [img]http://example.com/overflowing.png[/img].

Then you can apply [img] replace method on these tokens and emoticons replace method on the rest of elements in the aforementioned container. Then you just output a string containing all the container elements.

Below you fill find example contents of such container after the split procedure:

 1. "Stacks be " 
 2. "[img]http://example.com/overflowing.png[/img]" 
 3. " :/" 

To elements 1 & 3 you apply emoticon replace and in case of token element number 2 you apply [img] replace.

Comments

0

you can replace like below

string.replace( string.replace("[img]","<img src=\""),"[/img]","\"/>")

it should work.

Comments

0

Here is a code snippet from my old project:

private string Emoticonize(string originalStr)
{
    StringBuilder RegExString = new StringBuilder(@"(?<=^|\s)(?:");
    foreach (KeyValuePair<string, string> e in Emoticons)
    {
        RegExString.Append(Regex.Escape(e.Key) + "|");
    }
    RegExString.Replace("|", ")", RegExString.Length - 1, 1);
    RegExString.Append(@"(?=$|\s)");
    MatchCollection EmoticonsMatches = Regex.Matches(originalStr, RegExString.ToString());

    RegExString.Clear();
    RegExString.Append(originalStr);
    for (int i = EmoticonsMatches.Count - 1; i >= 0; i--)
    {
        RegExString.Replace(EmoticonsMatches[i].Value, Emoticons[EmoticonsMatches[i].Value], EmoticonsMatches[i].Index, EmoticonsMatches[i].Length);
    }

    return RegExString.ToString();
}

Emoticons is a Dictionary where I have stored emoticon codes as a key and the corresponding images for a value.

Comments

0
        string[] emots = { ":/", ":(", ":)" };
        string[] emotFiles = { "emote-sigh", "emot-sad.png", "emot-happy.png" };

        string replaceEmots(string val)
        {
            string res = val;
            for (int i = 0; i < emots.Length; i++)
                res = res.Replace(emots[i], "<img src=\"" + emotFiles[i] + ".png\"/>");
            return res;
        }

        void button1_click()
        {
            string str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> :/";
            str = replaceEmots(str);
        }

Comments

0

Here is the code which did the replace in my case. And the output is exactly what you want.

    str = "Stacks be <img src=\"http://example.com/overflowing.png\"/> :/";


        // check if the htmltemplate hold any template then set it or else hide the div data.
        if (!String.IsNullOrEmpty(str))
        {
            divStaticAsset.InnerHtml = str.Replace("[img]", "<img src=\'").
                                                    Replace("[/img]", "\'/>") + "<img src=\'emote-sigh.png'/>";

        }

2 Comments

This will output Stacks be <img src="http://example.com/overflowing.png"/> :/<img src="emote-sigh.png"/>, noting that :/ is present before the second <img> tag
Then just add :/ inside this Replace("[/img]", "\'/>")

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.