0

I need to create a Javascript object representation of a string, that includes style information. The style identifiers are unimportant but for the sake of this question lets use the identifiers that stackoverflow uses:

    *text* = italic
    **text** = bold
    ***text*** = bold italic

The data representation I would like to create is an array of objects, in order as they appear in the string, with each object being as follows:

{
  stringpart : (string),
  style : (normal | bold | italic | bold italic)
}

Therefore given the following string:

This is some example text, with some **bold** and *italic* ***styles***.

Should be converted into the following array of objects:

[
    {
      stringpart : "This is some example text, with some ",
      style : "normal"
    },
    {
      stringpart : "bold",
      style : "bold"
    },
    {
      stringpart : " and ",
      style : "regular"
    },
    {
      stringpart : "italic",
      style : "italic"
    },
    {
      stringpart : " ",
      style : "normal"
    },
    {
      stringpart : "styles",
      style : "bold italic"
    },
    {
      stringpart : ".",
      style : "normal"
    }
]

So far I have began looking at html parsers and come across the following code:

var
    content = 'This is some <b>really important <i>text</i></b> with <i>some <b>very very <br>very important</b> things</i> in it.',
    tagPattern = /<\/?(i|b)\b[^>]*>/ig,
    stack = [],
    tags = [],
    offset = 0,
    match,
    tag;

while (match = tagPattern.exec(content)) {
    if (match[0].substr(1, 1) !== '/') {
        stack.push(match.index - offset);
    } else {
        tags.push({
            tag: match[1],
            from: stack.splice(-1, 1)[0],
            to: match.index - offset
        });
    }
    offset += match[0].length;
}
content = content.replace(tagPattern, '');
// now use tags array and perform needed actions.

// see stuff
console.log(tags);
console.log(content);
//example of correct result
console.log(content.substring(tags[3].from, tags[3].to)); 

While the regex in this code could be adapted to detect the style identifiers mentioned above, it would not output the data in the required format since it simply returns from/to indexes.

How could I efficiently convert a string, using the above identifiers into the required array/object representation?

2 Answers 2

1

I think this will get you pretty far

var str = "This is some example text, with some **bold** and *italic* ***styles***."
str.match(/(\*{1,3})[^*]+(\1)/g);

Regular expression visualization

Output

[ '**bold**',
  '*italic*',
  '***styles***' ]

The handy thing about using the \1 backreference is that you will be able to match * pairs. That is, a single * will look for the next single *, whereas a double ** will look for the next double, etc.


I wasn't going to do this, but meh, I was kind of bored

var getStyleTokens = function(str) {

  var parts = [];

  var addNode = function(text, style) {
    return parts.push(
      {stringpart: text, style: style}
    );
  };

  var styles = {
    "*":   "italic",
    "**":  "bold",
    "***": "bold italic"
  };

  var re = /(\*{1,3})([^*]+)(?:\1)/g,
      caret = 0,
      match;

  while ((match = re.exec(str)) !== null) {
    console.log(match);
    addNode(str.substr(caret, match.index), "normal")
    addNode(match[2], styles[match[1]]);
    caret = match.index + match[0].length;
  };

  addNode(str.substr(caret), "normal");

  return parts;
};

var str = "This is some example text, with some **bold** and *italic* ***styles***."

getStyleTokens(str);

Output

[ { stringpart: 'This is some example text, with some ',
    style: 'normal' },
  { stringpart: 'bold', style: 'bold' },
  { stringpart: ' and ', style: 'normal' },
  { stringpart: 'italic', style: 'bold' },
  { stringpart: ' ', style: 'normal' },
  { stringpart: 'styles',
    style: 'bold italic' },
  { stringpart: '.', style: 'normal' } ]

Note!

Since your tags are not likely to be all *, it would probably be better to write a list of possible tags in the first capture group. But, that means the rest of the RegExp changes, too.

/(\*|\*\*|\*\*\*)(?:.(?!\1))+.(\1)/

Regular expression visualization

This means you could write something like

/(BOLD|ITALIC|BOTH)(?:.(?!\1))+.(\1)/

Regular expression visualization

Which would work on a string like this

This is some example text, with some BOLDboldBOLD and ITALICitalicITALIC BOTHstylesBOTH.

In summary: modify the above expression to use whichever tags you like; as long as you use a symmetrical closing tag, the styles will be parsed just fine.

Sign up to request clarification or add additional context in comments.

4 Comments

This is definitely useful with regards to the regex, thank you.
Hmmm this is actually very useful, I may be able to get where I need from this. I'll leave the question open a while longer in case someone can provide a complete solution. If not I will award you the answer. Thanks again :)
Fantastic! A seriously in depth and concise answer. Many thanks! :-)
@user2616246, I made some improvements to the post above. Specifically the getStyleTokens code example.
0

Isn't it JSON you are talking about? There are numbers of JSON parsing libraries available. Check them or post your requirement clearly. By clearly I mean the language/platform you want to get it done, & for what purpose(just to get an idea).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.