4

I've been staring myself blind on this problem I have and I guess this will probably be a real stupid question. But I have to swallow my pride.

I have this combinator parser that doesn't backtrack like I thought it would. I've been reducing it down to a small example without entirely removing context. Feels like "foobar"-examples are just harder to read. Here I go:

@RunWith(classOf[JUnitRunner])
class ParserBacktrackTest extends RegexParsers with Spec with ShouldMatchers {
  override def skipWhitespace = false

  lazy val optSpace = opt(whiteSpace)
  lazy val number = """\d+([\.]\d+)?""".r
  lazy val numWithOptSpace = number <~ optSpace

  private def litre = numWithOptSpace <~ ("litre" | "l")
  def volume = litre ^^ { case _ => "volume" }

  private def namedPieces = numWithOptSpace <~ ("pcs") ^^ { case _ => "explPcs" }
  private def implicitPieces = number ^^ { case _ => "implPcs" }
  protected def unitAmount = namedPieces | implicitPieces

  def nameOfIngredient = ".*".r

  def amount = volume | unitAmount
//  def amount = unitAmount
  protected def ingredient = (amount <~ whiteSpace) ~ nameOfIngredient

  describe("IngredientParser") {
    it("should parse volume") {
      shouldParse("1 litre lime")
    }
    it("should parse explicit pieces") {
      shouldParse("1 pcs lime")
    }
    it("should parse implicit pieces") {
      shouldParse("1 lime")
    }
  }

  def shouldParse(row: String) = {
    val result = parseAll(ingredient, row)
    result match {
      case Success(value, _) => println(value)
      case x => println(x)
    }
    result.successful should be(true)
  }
}

So what happens is that the third test fails:

(volume~lime)
(explPcs~lime)
[1.4] failure: string matching regex `\s+' expected but `i' found

1 lime
   ^

So it seems the litre-parser consumed the l and then it failed when it couldn't find any space. But I would have thought that it would backtrack then and try the next production rule. Obviously the implicitPieces parser parses this line because if I remove the preceding volume parser (remove the comment), it succeeds

(implPcs~litre lime)
(explPcs~lime)
(implPcs~lime)

Why isn't amount backtracking? What am I misunderstanding?

2 Answers 2

4

I just want to post a minimal example illustrating my misunderstanding. I thought this would succeed:

  def foo = "foo" | "fo"
  def obar = "obar"

  def foobar = foo ~ obar

  describe("foobar-parser") {
    it("should parse it") {
      shouldParseWith(foobar, "foobar")
    }
  }

but backtracking over | doesn't work that way. The disjunctive parser will consume "foo" and never give it back.

It has to be normalized so the disjunction is moved to the top level:

def workingFooBar = ("foo" ~ obar) | ("fo" ~ obar)
Sign up to request clarification or add additional context in comments.

1 Comment

Is there any parser library in scala where the transformation is done automatically?
2

It doesn't backtrack because for 1 lime

  • ingredient starts out with amount
  • amount starts with volume
  • volume starts with litre, and
  • litre successfully consumes 1 l of 1 lime

So litre, volume and amount were all successful! This is why then the whole thing continues with the second part of ingredient, namely whiteSpace.

HTH!

7 Comments

I just don't get that? I mean it has to be able to backtrack when it fails. A non-backtracking CFG parser wouldn't be worth much. I mean this works: {def foo = "foo"; def bar = "bar"; def baz = "baz"; def foobarbaz = foo ~ bar ~ baz; def foobarfoo = foo ~ bar ~ foo; def p = foobarbaz | foobarfoo; describe("foobar-parser") { it("should parse it") { shouldParseWith(p, "foobarfoo") } }}
@hedefalk In this example foobarbaz fails, so | tries the alternative. In your question, volume succeeds, so | doesn't try the alternative.
Yes, that's what I was trying to show. @hedefalk: It may be that it's just a matter of priority so that you would only need to rework a minor part of your grammar. As Daniel said, in a | b the right-hand side will only be evaluated once the left-hand side fails which is not the case in your OP. See if you can rearrange one or more rules to get the priorities that you want to have.
I realize I didn't understand how these parsers work at all. Changing: def amount = (volume | unitAmount) <~ whiteSpace to def amount = (volume <~ whiteSpace) | (unitAmount <~ whiteSpace) makes the backtracking work. So it only backtracks over a disjunction (|) but sequencing with ~ is gready?
But what are then ~! for? I still don't get it. I'm probably real slow today :´(
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.