0

I'm trying to write a simple parser in scala but when I add a repeated token Scala seems to get stuck in an infinite loop.

I have 2 parse methods below. One uses rep(). The non repetitive version works as expected (not what I want though) using the rep() version results in an infinite loop.

EDIT: This was a learning example where I tired to enforce the '=' was surrounded by whitespace.

If it is helpful this is my actual test file:

a = 1
b = 2
c = 1 2 3

I was able to parse: (with the parse1 method) K = V

but then ran into this problem when tried to expand the exercise out to: K = V1 V2 V3

import scala.util.parsing.combinator._
import scala.io.Source.fromFile

class MyParser extends RegexParsers {
  override def skipWhitespace(): Boolean = { false }

  def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
  def eq: Parser[String]   = """\s+=\s+""".r ^^ { _.toString.trim }
  def string: Parser[String] = """[^ \t\n]*""".r ^^ { _.toString.trim }
  def value: Parser[List[String]] = rep(string)

  def foo(key: String, value: String): Boolean = {
    println(key + " = " + value)
    true
  }

  def parse1: Parser[Boolean] = key ~ eq ~ string ^^ { case k ~ eq ~ string => foo(k, string) }
  def parse2: Parser[Boolean] = key ~ eq ~ value ^^ { case k ~ eq ~ value => foo(k, value.toString) }

  def parseLine(line: String): Boolean = {
      parse(parse2, line) match {
      case Success(matched, _) => true
      case Failure(msg, _) => false
      case Error(msg, _) => false
    }
  }
}

object TestParser {
  def usage() = {
    System.out.println("<file>")
  }

  def main(args: Array[String]) : Unit = {
    if (args.length != 1) {
      usage()
    } else {
      val mp = new MyParser()

      fromFile(args(0)).getLines().foreach { mp.parseLine }
      println("done")
    }
  }
}

2 Answers 2

1

Next time, please provide some concrete examples, it's not obvious what your input is supposed to look like.

Meanwhile, you can try this, maybe you find it helpful:

import scala.util.parsing.combinator._
import scala.io.Source.fromFile

class MyParser extends JavaTokenParsers {
  // override def skipWhitespace(): Boolean = { false }

  def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
  def eq: Parser[String]   = "="
  def string: Parser[String] = """[^ \t\n]+""".r
  def value: Parser[List[String]] = rep(string)

  def foo(key: String, value: String): Boolean = {
    println(key + " = " + value)
    true
  }

  def parse1: Parser[Boolean] = key ~ eq ~ string ^^ { case k ~ eq ~ string => foo(k, string) }
  def parse2: Parser[Boolean] = key ~ eq ~ value ^^ { case k ~ eq ~ value => foo(k, value.toString) }

  def parseLine(line: String): Boolean = {
      parseAll(parse2, line) match {
      case Success(matched, _) => true
      case Failure(msg, _) => false
      case Error(msg, _) => false
    }
  }
}

val mp = new MyParser()
for (line <- List("hey = hou", "hello = world ppl", "foo = bar baz blup")) {
  println(mp.parseLine(line))
}

Explanation:

JavaTokenParsers and RegexParsers treat white space differently. The JavaTokenParsers handles the white space for you, it's not specific for Java, it works for most non-esoteric languages. As long as you are not trying to parse Whitespace, JavaTokenParsers is a good starting point.

Your string definition included a *, which caused the infinite recursion. Your eq definition included something that messed with the empty space handling (don't do this unless it's really necessary). Furthermore, if you want to parse the whole line, you must call parseAll, otherwise it parses only the beginning of the string in non-greedy manner.

Final remark: for parsing key-value pairs line by line, some String.split and String.trim would be completely sufficient. Scala Parser Combinators are a little overkill for that.

PS: Hmm... Did you want to allow =-signs in your key-names? Then my version would not work here, because it does not enforce an empty space after the key-name.

Sign up to request clarification or add additional context in comments.

3 Comments

ugh, I should not post questions late. That was a terrible question.... I will update, but yes the goal was to write a parser (for learning) that enforced K = V with mandatory whitespace. It seemed like a simple starter parser.
I am unsure what to do here. It appears the question has been answered but it hasn't. This is largely my fault for asking the question very poorly. Is it possible to toss Andrey Tyukin some karma or something for giving a great attempt but this still signal that I am stuck here?
Hey again. Sorry, I did not realize that you cared about the whitespace. I thought it was all about the *-vs-+ bug in your string-Parser. I added another version that explicitly takes care of white space.
1

This is not a duplicate, it's a different version with RegexParsers that takes care of whitespace explicitly

If you for some reason really care about the white space, then you could stick to the RegexParsers, and do the following (notice the skipWhitespace = false, explicit parser for whitespace ws, the two ws with squiglies around the equality sign, and the repsep with explicitly specified ws):

import scala.util.parsing.combinator._
import scala.io.Source.fromFile

class MyParser extends RegexParsers {
  override def skipWhitespace(): Boolean = false

  def ws: Parser[String] = "[ \t]+".r
  def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
  def eq: Parser[String]   = ws ~> """=""" <~ ws
  def string: Parser[String] = """[^ \t\n]+""".r
  def value: Parser[List[String]] = repsep(string, ws)

  def foo(key: String, value: String): Boolean = {
    print(key + " = " + value)
    true
  }

  def parse1: Parser[Boolean] = (key ~ eq ~ string) ^^ { case k ~ e ~ v => foo(k, v) }
  def parse2: Parser[Boolean] = (key ~ eq ~ value) ^^ { case k ~ e ~ v => foo(k, v.toString) }

  def parseLine(line: String): Boolean = {
      parseAll(parse2, line) match {
      case Success(matched, _) => true
      case Failure(msg, _) => false
      case Error(msg, _) => false
    }
  }
}

val mp = new MyParser()
for (line <- List("hey = hou", "hello = world ppl", "foo = bar baz blup", "foo= bar baz", "foo =bar baz")) {
  println(" (Matches: " + mp.parseLine(line) + ")")
}

Now the parser rejects the lines where there is no whitespace around the equal sign:

hey = List(hou) (Matches: true)
hello = List(world, ppl) (Matches: true)
foo = List(bar, baz, blup) (Matches: true)
(Matches: false)
(Matches: false)

The bug with * instead of + in string has been removed, just like in the previous version.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.