6

I want to get the array of arguments so I can use it with optparse-js library so If I have something like

-f foo -b -a -z baz bar

I want array like this

["-f", "foo", "-b", "-a", "-z", "baz", "bar"]

it should work with strings that have escape quote inside and long GNU options. So far I have regex that match the string

/("(?:\\"|[^"])*"|'(?:\\'|[^'])*')/g

it match strings like "das" or "asd\"asd" or 'asd' or 'sad\'asd'

Can I use regex for this or do I need a parser (like using PEG) it would be nice if it match regex to so I can do

-p "hello b\"ar baz" -f /^ [^ ]+ $/

UPDATE: with help from @Damask I've created this regex:

/('(\\'|[^'])*'|"(\\"|[^"])*"|\/(\\\/|[^\/])*\/|(\\ |[^ ])+|[\w-]+)/g

it work for strings like this:

echo -p "hello b\"ar baz" -f /^ [^ ]+ $/

it return

['echo', '-p', '"hello b\"ar baz"', '-f', '/^ [^ ]+ $/']

but if fail on strings like this:

echo "©\\\\" abc "baz"

it match command and two arguments instead of 3 arguments demo

if argument don't have spaces like "foo"baz it should be one item in array, quotes need to be included but I will remove not escaped ones from string (like in bash when you execute echo "foo"bar echo will get one foobar argument).

4
  • To get from the first string to the array mentioned, you can use split(" ") but I assume you need to elaborate on the first 2 sentences ( ̄(エ) ̄) Commented Dec 10, 2012 at 7:22
  • @mplungjan I need solution that will work with something like -p "hello b\"ar baz" -f /^ [^ ]+ $/ Commented Dec 10, 2012 at 7:28
  • So I suggest you swap your examples and show how the array would look with a real example Commented Dec 10, 2012 at 7:30
  • is the input is "-f foo -b -a -z baz bar " is string or not Commented May 3, 2017 at 16:54

7 Answers 7

6
+100

Some comments:

  • The raw regex for quotes is this
    "[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'
    Example: http://regex101.com/r/uxqApc/2

  • This part (?= :? | $ ) will always resolve to true, and is useless

  • This part /(\\/|[^/])+/[gimy]* if this is a regex (or any delimited item)
    you have to blindly handle escape anything. Like this /[^/\\]*(?:\\[\S\s][^/\\]*)*/[gimy]*.
    Otherwise it would match /..\\// which is not correct.

  • This expression (?: \\ \s | \S )+ is first in the alternation sequence, i.e. before this one [\w-]+. Since not whitespace \S is a superset of [\w-], it means this [\w-]+ never, ever get's reached.

Making the corrections and putting it all back together gets this regex:
/("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|\/[^\/\\]*(?:\\[\S\s][^\/\\]*)*\/[gimy]*(?=\s|$)|(?:\\\s|\S)+)/

Demo's:

JavaScript - http://regex101.com/r/cuJuQ8/1
PCRE - http://regex101.com/r/cuJuQ8/2

Formatted

 (                             # (1 start)
      "
      [^"\\]* 
      (?: \\ [\S\s] [^"\\]* )*
      "
   |  
      ' 
      [^'\\]* 
      (?: \\ [\S\s] [^'\\]* )*
      '
   |  
      / 
      [^/\\]* 
      (?: \\ [\S\s] [^/\\]* )*
      /
      [gimy]* 
      (?= \s | $ )
   |  
      (?: \\ \s | \S )+
 )                             # (1 end)


If also, you need to parse this like the space (outside of quotes or regex) is a delimiter as well, this would be it:

/((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|\/[^\/\\]*(?:\\[\S\s][^\/\\]*)*\/[gimy]*(?=\s|$)|(?:\\\s|\S))+)(?=\s|$)/

Demo's:

JavaScript - http://regex101.com/r/cuJuQ8/3
PCRE - https://regex101.com/r/cuJuQ8/4

Formatted

 (                             # (1 start)
      (?:
           "
           [^"\\]* 
           (?: \\ [\S\s] [^"\\]* )*
           "
        |  
           ' 
           [^'\\]* 
           (?: \\ [\S\s] [^'\\]* )*
           '
        |  
           / 
           [^/\\]* 
           (?: \\ [\S\s] [^/\\]* )*
           /
           [gimy]* 
           (?= \s | $ )
        |  
           (?: \\ \s | \S )
      )+
 )                             # (1 end)
 (?= \s | $ )
Sign up to request clarification or add additional context in comments.

Comments

5

I really love regex but sometimes a combination of simple regex and simple function does the same job but is a lot easier to debug and maintain, especially when developers not familiar with (complex) regex join the project.

So here is another approach, see explanation below.

It's tested using this rather complicated sample with arguments containing many spaces or escaped double quotes as required :

echo "©\\\\" abc "baz" "foo bar dummy" -d "marty \\\"mc fly" -f "avb eer\"" -p 2 "asd\"asd" -a 3

Code Snippet

function commandArgs2Array(text) {
  const re = /^"[^"]*"$/; // Check if argument is surrounded with double-quotes
  const re2 = /^([^"]|[^"].*?[^"])$/; // Check if argument is NOT surrounded with double-quotes

  let arr = [];
  let argPart = null;

  text && text.split(" ").forEach(function(arg) {
    if ((re.test(arg) || re2.test(arg)) && !argPart) {
      arr.push(arg);
    } else {
      argPart = argPart ? argPart + " " + arg : arg;
      // If part is complete (ends with a double quote), we can add it to the array
      if (/"$/.test(argPart)) {
        arr.push(argPart);
        argPart = null;
      }
    }
  });

  return arr;
}

let result = commandArgs2Array('echo "©\\\\" abc "baz" "foo bar  dummy" -d "marty \\\"mc fly" -f "avb eer\"" -p 2 "asd\"asd" -a 3');
console.log(result);

Explanation

First, arguments are splitted using space char.

For each argument, we check if it's a complete or an incomplete argument

A complete argument is an argument which is either

  • surrounded with double-quotes
  • NOT surrounded with double-quotes at all

Every other case represents an incomplete argument. It's​ either

  • The start of an incomplete argument (starts with a double-quote)
  • A space
  • A part of an incomplete argument which can contain escaped double-quotes
  • The end of an incomplete argument (ends with a double-quote)

That's all folks !

2 Comments

Error: { "message": "Syntax error", "filename": "stacksnippets.net/js", "lineno": 20, "colno": 40 }
This is because you're using a browser which does not support ES2015. I've edited answer to fix it. Thanx for pointing it.
2

why don't you simply use split function?

var arr = myString.split(/\s+/);

you better pass a regexp as argument to avoid bugs in cases when separator is \t or there are multiple spaces etc.

EDIT:

if your arguments have spaces and are in quote marks, I think you can't find a single regexp. Think you should find arguments with spaces at first (/"(.*?)"/ in group 1 you'll get argument), add them to array, then remove them from string and only after that use split method like described above.

3 Comments

I suggested that too. But that seems to be too simple to put in an answer
I can't use just that because I can have strings or regexes that have spaces as argument to an options
IIRC, there is a solution to tokenize string that is inside quote ", but it requires feature not available in JS Regex.
0

Try this:

var a = '-f foo "ds  df s\\" da" -b -a -z baz bar';
a.match(/([\w-]+|"(\\"|[^"])*")/g)

returns [ "-f", "foo", ""ds df s\" da"", "-b", "-a", "-z", "baz", "bar"]

1 Comment

with your help I created better regex /('(\\'|[^'])*'|"(\\"|[^"])*"|\/(\\\/|[^\/])*\/|(\\ |[^ ])+|[\w-]+)/g that match regex, single quote, and text with escape space.
0

This will work:

var input = '-p "hello b\"ar baz" -f /^ [^ ]+ $/ -c -d -e'
var arr = input.split(' -');
var out = [];
for(var i = 0; i < arr.length; i++){
    if(~arr[i].indexOf(' ')){
        out = out.concat([arr[i].substring(0, arr[i].indexOf(' ')), arr[i].substring(arr[i].indexOf(' ')+1)])
    }else{
        out = out.concat('-'+arr[i]);
    }
}

Output:

["-p", ""hello b"ar baz"", "f", "/^ [^ ]+ $/", "-c", "-d", "-e"]

I know it's not a fancy 1-line regex, but it works like expected.

1 Comment

it don't work with escaped quotes var input = 'echo "asd\\"asd" asd' in your case escape \" should be \\" to have slash in input and not quote.
0
 var string = "-f foo -b -a -z baz bar";
        string = string.split(" ");
    var stringArray = new Array();
    for(var i =0; i < string.length; i++){
        stringArray.push(string[i]);
    }
    console.log(stringArray);

output will be console like this

Array [ "-f", "foo", "-b", "-a", "-z", "baz", "bar" ]

1 Comment

This will not work for 'echo "foo bar" baz` and you don't need to iterate over array, string will already be an array, so stringArray and string is the same.
-1

Ok, even that I created a Bounty for this question I found the answer with help from Regex match even number of letters

and my regex look like this:

/('((?:[^\\]*(?:\\\\)*\\')+|[^']*)*'|"(?:(?:[^\\]*(?:\\\\)*\\")+|[^"]*)*"|(?:\/(\\\/|[^\/])+\/[gimy]*)(?=:? |$)|(\\\s|\S)+|[\w-]+)/

with demo

EDIT: @sin suggesion make better regex:

/("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\/(\\\/|[^\/])+\/[gimy]*)(?=:? |$)|(\\\s|\S)+|[\w-]+)/

10 Comments

This one is wrong too, it fails because you didn't take account that characters other than quotes or slashes may also be escaped, example: "ab\cd". You don't also need to know if the number of backslashes is odd or even. To finish, using [^\\]* (at the beginning) allows to go out of the quoted part and to reach an eventual other quoted part: 'abc' -p 'def\'ghi'. A simple way to match quoted parts with escaped quotes is: '[^'\\]*(?:\\.[^'\\]*)*' (replace the dot with [\s\S] or [^] if you also want to match escaped newlines)
Also, I don't understand why you added (?=:? |$) after the pattern part?
The raw regex for quotes is this "[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*' (regex101.com/r/uxqApc/2) what the heck are you trying to do ??
@sin Oh, thanks, that's much better string matching regex I added it to my command line split regex, that also need to match normal words, numbers and regexes.
I'm not sure what the other parts of your regex is supposed to do but this part (?= :? | $ ) will always resolve to true, and is useless.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.