Converting string (with style identifiers) into a javascript object representation

Question

I need to create a Javascript object representation of a string, that includes style information. The style identifiers are unimportant but for the sake of this question lets use the identifiers that stackoverflow uses:

    *text* = italic
    **text** = bold
    ***text*** = bold italic

The data representation I would like to create is an array of objects, in order as they appear in the string, with each object being as follows:

{
  stringpart : (string),
  style : (normal | bold | italic | bold italic)
}

Therefore given the following string:

This is some example text, with some **bold** and *italic* ***styles***.

Should be converted into the following array of objects:

[
    {
      stringpart : "This is some example text, with some ",
      style : "normal"
    },
    {
      stringpart : "bold",
      style : "bold"
    },
    {
      stringpart : " and ",
      style : "regular"
    },
    {
      stringpart : "italic",
      style : "italic"
    },
    {
      stringpart : " ",
      style : "normal"
    },
    {
      stringpart : "styles",
      style : "bold italic"
    },
    {
      stringpart : ".",
      style : "normal"
    }
]

So far I have began looking at html parsers and come across the following code:

var
    content = 'This is some <b>really important <i>text</i></b> with <i>some <b>very very <br>very important</b> things</i> in it.',
    tagPattern = /<\/?(i|b)\b[^>]*>/ig,
    stack = [],
    tags = [],
    offset = 0,
    match,
    tag;

while (match = tagPattern.exec(content)) {
    if (match[0].substr(1, 1) !== '/') {
        stack.push(match.index - offset);
    } else {
        tags.push({
            tag: match[1],
            from: stack.splice(-1, 1)[0],
            to: match.index - offset
        });
    }
    offset += match[0].length;
}
content = content.replace(tagPattern, '');
// now use tags array and perform needed actions.

// see stuff
console.log(tags);
console.log(content);
//example of correct result
console.log(content.substring(tags[3].from, tags[3].to));

While the regex in this code could be adapted to detect the style identifiers mentioned above, it would not output the data in the required format since it simply returns from/to indexes.

How could I efficiently convert a string, using the above identifiers into the required array/object representation?

Community · Accepted Answer · 2017-02-08 14:43:38Z

1

I think this will get you pretty far

var str = "This is some example text, with some **bold** and *italic* ***styles***."
str.match(/(\*{1,3})[^*]+(\1)/g);

Regular expression visualization

Output

[ '**bold**',
  '*italic*',
  '***styles***' ]

The handy thing about using the \1 backreference is that you will be able to match * pairs. That is, a single * will look for the next single *, whereas a double ** will look for the next double, etc.

I wasn't going to do this, but meh, I was kind of bored

var getStyleTokens = function(str) {

  var parts = [];

  var addNode = function(text, style) {
    return parts.push(
      {stringpart: text, style: style}
    );
  };

  var styles = {
    "*":   "italic",
    "**":  "bold",
    "***": "bold italic"
  };

  var re = /(\*{1,3})([^*]+)(?:\1)/g,
      caret = 0,
      match;

  while ((match = re.exec(str)) !== null) {
    console.log(match);
    addNode(str.substr(caret, match.index), "normal")
    addNode(match[2], styles[match[1]]);
    caret = match.index + match[0].length;
  };

  addNode(str.substr(caret), "normal");

  return parts;
};

var str = "This is some example text, with some **bold** and *italic* ***styles***."

getStyleTokens(str);

Output

[ { stringpart: 'This is some example text, with some ',
    style: 'normal' },
  { stringpart: 'bold', style: 'bold' },
  { stringpart: ' and ', style: 'normal' },
  { stringpart: 'italic', style: 'bold' },
  { stringpart: ' ', style: 'normal' },
  { stringpart: 'styles',
    style: 'bold italic' },
  { stringpart: '.', style: 'normal' } ]

Note!

Since your tags are not likely to be all *, it would probably be better to write a list of possible tags in the first capture group. But, that means the rest of the RegExp changes, too.

/(\*|\*\*|\*\*\*)(?:.(?!\1))+.(\1)/

Regular expression visualization

This means you could write something like

/(BOLD|ITALIC|BOTH)(?:.(?!\1))+.(\1)/

Regular expression visualization

Which would work on a string like this

This is some example text, with some BOLDboldBOLD and ITALICitalicITALIC BOTHstylesBOTH.

In summary: modify the above expression to use whichever tags you like; as long as you use a symmetrical closing tag, the styles will be parsed just fine.

edited Feb 8, 2017 at 14:43

CommunityBot

11 silver badge

answered Aug 19, 2013 at 7:26

Mulan

136k35 gold badges240 silver badges276 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Gordo Over a year ago

This is definitely useful with regards to the regex, thank you.

Gordo Over a year ago

Hmmm this is actually very useful, I may be able to get where I need from this. I'll leave the question open a while longer in case someone can provide a complete solution. If not I will award you the answer. Thanks again :)

Gordo Over a year ago

Fantastic! A seriously in depth and concise answer. Many thanks! :-)

Mulan Over a year ago

@user2616246, I made some improvements to the post above. Specifically the getStyleTokens code example.

Prince Agrawal · Accepted Answer · 2013-08-19 07:22:51Z

0

Isn't it JSON you are talking about? There are numbers of JSON parsing libraries available. Check them or post your requirement clearly. By clearly I mean the language/platform you want to get it done, & for what purpose(just to get an idea).

answered Aug 19, 2013 at 7:22

Prince Agrawal

3,6073 gold badges29 silver badges41 bronze badges

Collectives™ on Stack Overflow

Converting string (with style identifiers) into a javascript object representation

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related