13

Let's say for example that I have one string, like this:

<h1>Hello World!</h1>

What Go code would be able to extract Hello World! from that string? I'm still relatively new to Go. Any help is greatly appreciated!

4
  • 1
    Are you looking to parse a specific pattern or format? For example, is the text always surrounded by <h1> tags, general HTML, something else entirely? There is not enough information to answer the question so I am downvoting. Commented Nov 13, 2014 at 19:49
  • 1
    It's just matching strings. If I hit one matching string and then another, then give me the stuff in the middle. Commented Nov 13, 2014 at 19:52
  • For manipulating HTML, look at GoQuery or golang.org/x/net/html (formerly go.net/html). Commented Nov 14, 2014 at 20:30
  • Good answer to this question is stackoverflow.com/a/62555190/3415984 Commented Jun 24, 2020 at 17:43

9 Answers 9

20

If the string looks like whatever;START;extract;END;whatever you can use this which will get the string in between:

// GetStringInBetween Returns empty string if no start string found
func GetStringInBetween(str string, start string, end string) (result string) {
    s := strings.Index(str, start)
    if s == -1 {
        return
    }
    s += len(start)
    e := strings.Index(str[s:], end)
    if e == -1 {
        return
    }
    e += s + e - 1
    return str[s:e]
}

What happens here is it will find first index of START, adds length of START string and returns all that exists from there until first index of END.

Sign up to request clarification or add additional context in comments.

5 Comments

This is the best answer but it will panic if the END is not found or if the END is also found before START, see this play link: play.golang.org/p/C2sZRYC15XN. That play link also includes the revision to fix this problem. I submitted this revision to SO which is under peer review.
@schollz is correct and provides a more correct answer. Copying and pasting this answer is dangerous as it will panic. However, thank you Jan for the original work.
improved answer is there : stackoverflow.com/a/62555190/3415984
@schollz comment is good return str[s:s+e] is require
''' Panic: runtime error: slice bounds out of range [:27] with length 21 goroutine 1 [running]: main.GetStringInBetween({0x49690b, 0x15}, {0x494327?, 0x4}, {0x4943f3, 0x5}) /tmp/sandbox2821937282/prog.go:20 +0xf0 main.main() /tmp/sandbox2821937282/prog.go:24 +0x38 ''' go.dev/play/p/ju_9M0ZZzqZ
15

There are lots of ways to split strings in all programming languages.

Since I don't know what you are especially asking for I provide a sample way to get the output you want from your sample.

package main

import "strings"
import "fmt"

func main() {
    initial := "<h1>Hello World!</h1>"

    out := strings.TrimLeft(strings.TrimRight(initial,"</h1>"),"<h1>")
    fmt.Println(out)
}

In the above code you trim <h1> from the left of the string and </h1> from the right.

As I said there are hundreds of ways to split specific strings and this is only a sample to get you started.

Hope it helps, Good luck with Golang :)

DB

2 Comments

this is wrong as trim argument is a list of characters not a string. if initial := "<h1>hhhhhello</h1>" then the result would be ello play.golang.org/p/HkopYJEDg9F
Ignore this answer. It works for @T145s specific case but not generally. The answer below works perfectly.
7

I improved the Jan Kardaš`s answer. now you can find string with more than 1 character at the start and end.

func GetStringInBetweenTwoString(str string, startS string, endS string) (result string,found bool) {
    s := strings.Index(str, startS)
    if s == -1 {
        return result,false
    }
    newS := str[s+len(startS):]
    e := strings.Index(newS, endS)
    if e == -1 {
        return result,false
    }
    result = newS[:e]
    return result,true
}

Comments

6

Here is my answer using regex. Not sure why no one suggested this safest approach

package main

import (
    "fmt"
        "regexp"
)

func main() {
    content := "<h1>Hello World!</h1>"
    re := regexp.MustCompile(`<h1>(.*)</h1>`)
    match := re.FindStringSubmatch(content)
    if len(match) > 1 {
        fmt.Println("match found -", match[1])
    } else {
        fmt.Println("match not found")
    }
    
}

Playground - https://play.golang.org/p/Yc61x1cbZOJ

Comments

1

Read up on the strings package. Have a look into the SplitAfter function which can do something like this:

var sample = "[this][is my][string]"
t := strings.SplitAfter(sample, "[")

That should produce a slice something like: "[", "this][", "is my][", "string]". Using further functions for Trimming you should get your solution. Best of luck.

Comments

1

In the strings pkg you can use the Replacer to great affect.

r := strings.NewReplacer("<h1>", "", "</h1>", "")
fmt.Println(r.Replace("<h1>Hello World!</h1>"))

Go play!

2 Comments

How does this answer the OP's question about finding the string between the tags? It only shows how to remove the tags.
My answer does exactly what the OP asked for "What Go code would be able to extract Hello World! from that string?"
1
func findInString(str, start, end string) ([]byte, error) {
    var match []byte
    index := strings.Index(str, start)

    if index == -1 {
        return match, errors.New("Not found")
    }

    index += len(start)

    for {
        char := str[index]

        if strings.HasPrefix(str[index:index+len(match)], end) {
            break
        }

        match = append(match, char)
        index++
    }

    return match, nil
}

Comments

1

How about:

func SplitBetween(str, bef, aft string) string {
    sa := strings.SplitN(str, bef, 2)
    if len(sa) == 1 {
        return ""
    }
    sa = strings.SplitN(sa[1], aft, 2)
    if len(sa) == 1 {
        return ""
    }
    return sa[0]
}

Returns empty string if split is not found.

Comments

-1
func Split(str, before, after string) string {
    a := strings.SplitAfterN(str, before, 2)
    b := strings.SplitAfterN(a[len(a)-1], after, 2)
    if 1 == len(b) {
        return b[0]
    }
    return b[0][0:len(b[0])-len(after)]
}

the first call of SplitAfterN will split the original string into array of 2 parts divided by the first found after string, or it will produce array containing 1 part equal to the original string.

second call of SplitAfterN uses a[len(a)-1] as input, as it is "the last item of array a". so either string after after or the original string str. the input will be split into array of 2 parts divided by the first found before string, or it will produce array containing 1 part equal to the input.

if after was not found than we can simply return b[0] as it is equal to a[len(a)-1]

if after is found, it will be included at the end of b[0] string, therefore you have to trim it via b[0][0:len(b[0])-len(after)]

all strings are case sensitive

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.