0

I'm looking for a way to replace what i'd call garbage text on a doc-xml file to replace with values

I have this program, that can grab a doc-xml to print out contracts, where the user only need to feed the program with a doc-xml file format where there will be some parameters that my program will replace with values

lets say I have this chunk of a contract format

The Contract {@ContractNumber} specified to the contractor {@ContractorName}....

My program looks for this parameters {@ContractNumber} and {@ContractorName} to replace with the Contract values, and i'm only asking the user to have it in a XML-DOC format, but sometimes the file it written like this

<w:p w:rsidR="0094616E" w:rsidRDefault="00AC620A"><w:pPr><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/><w:color w:val="000000"/><w:sz w:val="24"/><w:szCs w:val="24"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/><w:color w:val="000000"/><w:sz w:val="24"/><w:szCs w:val="24"/></w:rPr><w:t>{@</w:t></w:r><w:proofErr w:type="spellStart"/><w:r><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/><w:color w:val="000000"/><w:sz w:val="24"/><w:szCs w:val="24"/></w:rPr><w:t>ContractorNumber</w:t></w:r>

and sometime it will do what i'm really hoping for

<w:p w:rsidR="0094616E" w:rsidRDefault="0094616E"><w:pPr><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/><w:color w:val="000000"/><w:sz w:val="24"/><w:szCs w:val="24"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/><w:color w:val="000000"/><w:sz w:val="24"/><w:szCs w:val="24"/></w:rPr><w:t>{@Value1}</w:t></w:r></w:p>

SO, what i'm looking for it a RegEx replace statement where i can get rid of all the garbage that can be found between open chars of my params ({@) and the closure of it (}) so it can find the entire word i'm looking to be replaced with the value assigned to it

Edit 1:

For a simpler understanding of my question, what i'm looking for it a ReGex that will Find everything that is between a {@ and a subsequent } and when it finds <> delete them with everything within them so i have in the end {@Param} insted of {@ <garbage/> Param <garbage/> } or {@Param <garbage/> } or {@Pa <garbage/> am}

Edit 2:

So far, the most helpfull regex has been this one

{.*?@.*?}

Giving me a result like this

{</w:t></w:r><w:r><w:t>@Contrato</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>Obrigado</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>Adquisicion</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>Import</w:t></w:r><w:r><w:t>e</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>Acreditado</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>ImporteLetras</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>O</w:t></w:r><w:r><w:t>ficio</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>FechaOficio</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>Gracia</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>M</w:t></w:r><w:r><w:t>ensualidad-Gracia</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>ImporteMensualidad</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>I</w:t></w:r><w:r><w:t>mporteMensualidadLetra</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>D</w:t></w:r><w:r><w:t>ireccionAcreditada</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>EdoC</w:t></w:r><w:r><w:t>ivilAcreditado</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>CiudadOri</w:t></w:r><w:r><w:t>genAcredi</w:t></w:r><w:r><w:t>t</w:t></w:r><w:r><w:t>a</w:t></w:r><w:r><w:t>do</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>IFE</w:t></w:r><w:r><w:t>Acreditado</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>Sexo</w:t></w:r><w:r><w:t>Acreditado}
{@</w:t></w:r><w:r><w:t>EdoCivilAval</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>CiudadOrigenAval</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>IFEAval</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>S</w:t></w:r><w:r><w:t>e</w:t></w:r><w:r><w:t>xoAval</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>NumeroAmortizacion</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>DireccionAval</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>ProgramaCredito</w:t></w:r><w:r><w:t>}
{@</w:t></w:r><w:r><w:t>Por</w:t></w:r><w:r><w:t>cComisionAper</w:t></w:r><w:r><w:t>tura</w:t></w:r><w:r><w:t>}

now, i need is for the Regex to get rid of all those and that are between the characters, cant seem to find a way to delete those :S

3
  • Are you trying to make the regex to do parser's job? Commented Mar 21, 2012 at 17:38
  • since it's in XML, why not use an XML parser? Commented Mar 21, 2012 at 17:41
  • RomanR. actually, yes, thats pretty much it jb. Already looking around that option, but since i'm using prettyh much pure strings, i think getting rid of the garbage with regex is much faster than xml parsing Commented Mar 21, 2012 at 18:29

4 Answers 4

1

The first XML codeblock you provided does not contain a } character, so it already breaks your prerequisites. However, if you really want to go through with said solution, follow Jetti's advice; that is, generate a list of matches and perform a replace on each. I would have used the Regex expression

@"@{.*?}" 

or

@"@{.*?ContractName.*?}"/@"@{.*?ContractorNumber.*?}"

but how you want to match it is really up to you and what you require.

Edit 1:

After reviewing your most recent edit and getting a better understanding of what you're looking for, I devised a slightly ugly but functional solution. Anyone with privileges are free to clean it up but I don't have time right now:

string yourstring = "{@</w:t></w:r><w:r><w:t>Obrigado</w:t></w:r><w:r><w:t>}{@......}...";
Regex reg1 = new Regex(@"{.*?@.*?}");
Regex reg2 = new Regex(@"<.*?>");

MatchCollection matches = reg1.Matches(yourstring);
List<string> names = new List<string>();
foreach (Match match in matches)
{
    // yeah.. this could be cleaned up. 
    names.Add((string)reg2.Replace(match.ToString(), ""));
}
for (int i = 0; i < names.Count; i++)
{
    yourstring = yourstring.Replace(matches[i].ToString(), names[i]);
}

I tried doing all of this in one foreach loop but match is readonly, and I can't think of a reasonable way to bypass that right now aside from a second run through. I've heard of recursive Regex methods, but I do not know much about them.

Sign up to request clarification or add additional context in comments.

1 Comment

Updated; let me know if this works for you. It isn't the prettiest code, but it should get the job done.
0

Two ways to do it. If the string to replace will be the same each time, you could just do

input.Replace("{@ContractNumber}","Actual Number");

If they can call it whatever they want, then you could do:

Regex reg = new Regex(@"{@[\w|\d]+}");
string input = "test {@name} this out";
MatchCollection matches = reg.Matches(input);
foreach (Match m in matches)
{
    // Look up the value or whatever based on m.Value
    Console.WriteLine(m.Value);
}

2 Comments

You don't need | inside of [], in fact that just adds | as a valid character. Also this doesn't handle XML in between. However whether that is possible in Regex is debatable.
@Guvante The way I understood the original question is that there is no xml and is just a string to designate a placeholder.
0
Regex.Replace(sourceString, @"{@ContractName}", myContractName);
Regex.Replace(sourceString, @"{@ContractNumber}", myContractNumber);

make sure to include the using System.Text.RegularExpressions; at the top of your code.

2 Comments

The problem with this is that it would match {@ContractName} as well as {@ContractNumber} and replace it with the same value
@CJLopez Well, what output did you get from it?
0

You can't just "get rid of the garbage" and still have valid XML.

Here are some of the problems with this solution:

  • Do you want to match <w a="{@"> as part of string?
  • What do you do when </w> is in between but not <w>?
  • What do you do when <w> is in between but not </w>?

It sounds like you are either going to have to clean up your input somehow, or do it the hard way using an XML parsing library and some state.

3 Comments

actually, what i'm looking for it a ReGex that will Find everything that is between a {@ and a subsequent } and when it finds <> delete them with everything within them so i have in the end {@Param} insted of {@ <garbage/> Param <garbage/> } or {@Param <garbage/> } or {@Pa <garbage/> am}
@CJLopez: Then you won't have valid XML, you can delete whole nodes and be okay, but randomly deleting a </garbage> would break it. In fact in your first example <w:t> is before {@, but </w:t> is after, if you deleted </w:t> you would no longer have valid XML (unless you just got lucky and also deleted enough matching <w:t> later in the section)
Luckily for me, it wont kill it :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.