6

I'm trying to find the begin of a named capturing groups in a string to create a simple parser (see related question). To do this the extract function remembers the last for characters in the last4 variable. If the last 4 characters are equal to "(?P<" it is the beginning of a capturing group:

package main

import "fmt"

const sample string = `/(?P<country>m((a|b).+)(x|y)n)/(?P<city>.+)`

func main() {
    extract(sample)
}

func extract(regex string) {
    last4 := new([4]int32)
    for _, c := range regex {
        last4[0], last4[1], last4[2], last4[3] = last4[1], last4[2], last4[3], c
        last4String := fmt.Sprintf("%c%c%c%c\n", last4[0], last4[1], last4[2], last4[3])
        if last4String == "(?P<" {
            fmt.Print("start of capturing group")
        }
    }
}

http://play.golang.org/p/pqA-wCuvux

But this code prints nothing! last4String == "(?P<" is never true, although this substrin appears in the output if I print last4String inside the loop. How to compare strings in Go then?

And is there a more elegant way to convert an int32 array to a string than fmt.Sprintf("%c%c%c%c\n", last4[0], last4[1], last4[2], last4[3])?

Anything else that could be better? My code looks somewhat inelegant to me.

1 Answer 1

3

If it's not for self-education or similar, you probably want to use the existing RE parser in the standard library and then "walk" the AST to do whatever required.

func Parse(s string, flags Flags) (*Regexp, error)

Parse parses a regular expression string s, controlled by the specified Flags, and returns a regular expression parse tree. The syntax is described in the top-level comment for package regexp.

There's even a helper for your task.

EDIT1: Your code repaired:

package main

import "fmt"

const sample string = `/(?P<country>m((a|b).+)(x|y)n)/(?P<city>.+)`

func main() {
        extract(sample)
}

func extract(regex string) {
        var last4 [4]int32
        for _, c := range regex {
                last4[0], last4[1], last4[2], last4[3] = last4[1], last4[2], last4[3], c
                last4String := fmt.Sprintf("%c%c%c%c", last4[0], last4[1], last4[2], last4[3])
                if last4String == "(?P<" {
                    fmt.Println("start of capturing group")
                }
        }
}

(Also here)

EDIT2: Your code rewritten:

package main

import (
        "fmt"
        "strings"
)

const sample string = `/(?P<country>m((a|b).+)(x|y)n)/(?P<city>.+)`

func main() {
        extract(sample)
}

func extract(regex string) {
        start := 0
        for {
                i := strings.Index(regex[start:], "(?P<")
                if i < 0 {
                        break
                }

                fmt.Printf("start of capturing group @ %d\n", start+i)
                start += i + 1
        }
}

(Also here)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.