Golang Regexp.FindSubmatch

最后修改于 2025 年 4 月 20 日

本教程将介绍如何在 Go 中使用 Regexp.FindSubmatch 方法。我们将通过实际示例介绍子匹配项的提取。

一个正则表达式是一个定义搜索模式的字符序列。它用于在字符串中进行模式匹配。

Regexp.FindSubmatch 方法返回一个字符串切片，其中包含最左边匹配项及其子匹配项的文本。子匹配项是带括号的子表达式的匹配项。

基本 FindSubmatch 示例

FindSubmatch 最简单的用法是提取日期字符串的各个部分。在这里，我们获取年、月和日组件。

basic_submatch.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    date := "2025-04-20"

    matches := re.FindSubmatch([]byte(date))
    if matches != nil {
        fmt.Println("Full match:", string(matches[0]))
        fmt.Println("Year:", string(matches[1]))
        fmt.Println("Month:", string(matches[2]))
        fmt.Println("Day:", string(matches[3]))
    }
}

我们用三个捕获组编译一个模式。FindSubmatch 返回一个切片，其中索引 0 是完整匹配项，后续索引是子匹配项。

提取姓名组件

此示例演示了如何从格式化的字符串中提取名字和姓氏。该模式捕获两个单词组。

name_components.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)\s+(\w+)`)
    name := "John Doe"

    matches := re.FindSubmatch([]byte(name))
    if len(matches) >= 3 {
        fmt.Println("First name:", string(matches[1]))
        fmt.Println("Last name:", string(matches[2]))
    }
}

模式 (\w+)\s+(\w+) 匹配由空格分隔的两个单词组。我们在索引 1 和 2 处访问子匹配项。

解析 URL 组件

FindSubmatch 可以从 URL 中提取协议、域和路径。此示例展示了一个简单的 URL 解析器。

url_parser.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(https?)://([^/]+)(/.*)?`)
    url := "https://example.com/path/to/resource"

    matches := re.FindSubmatch([]byte(url))
    if matches != nil {
        fmt.Println("Protocol:", string(matches[1]))
        fmt.Println("Domain:", string(matches[2]))
        fmt.Println("Path:", string(matches[3]))
    }
}

该模式捕获协议、域和可选路径。第三个组是可选的，因此它可能为空。

提取多个电子邮件地址

此示例在字符串中查找所有电子邮件地址并提取其组件。我们使用带有 FindAllSubmatch 的循环。

multiple_emails.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,})`)
    text := "Contact us at info@example.com or support@company.co.uk"

    allMatches := re.FindAllSubmatch([]byte(text), -1)
    for _, match := range allMatches {
        fmt.Println("\nFull email:", string(match[0]))
        fmt.Println("Username:", string(match[1]))
        fmt.Println("Domain:", string(match[2]))
        fmt.Println("TLD:", string(match[3]))
    }
}

该模式捕获用户名、域和 TLD 组件。FindAllSubmatch 返回输入字符串中的所有匹配项。

处理可选子匹配项

此示例展示了如何处理带有可选组件的模式。我们解析带有可选错误代码的日志条目。

optional_submatches.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\[(.*?)\]\s(.*?)(?:\s\((\d+)\))?$`)
    logs := []string{
        "[ERROR] Connection failed",
        "[WARNING] Low memory (1024)",
    }

    for _, log := range logs {
        matches := re.FindSubmatch([]byte(log))
        if matches != nil {
            fmt.Println("\nLevel:", string(matches[1]))
            fmt.Println("Message:", string(matches[2]))
            if len(matches[3]) > 0 {
                fmt.Println("Code:", string(matches[3]))
            }
        }
    }
}

该模式使用 (?:...)? 使错误代码成为可选的。在使用 matches[3] 之前，我们检查其长度。

提取嵌套子匹配项

这个高级示例演示了嵌套捕获组。我们解析带有键值对的配置文件行。

nested_submatches.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+)=((?:"([^"]*)"|([^,]*))(?:,|$))`)
    config := `name="John Doe",age=30,city=New York`

    allMatches := re.FindAllSubmatch([]byte(config), -1)
    for _, match := range allMatches {
        fmt.Println("\nKey:", string(match[1]))
        if len(match[3]) > 0 {
            fmt.Println("Value (quoted):", string(match[3]))
        } else {
            fmt.Println("Value:", string(match[4]))
        }
    }
}

该模式同时处理带引号和不带引号的值。嵌套组允许我们区分不同的值格式。

性能注意事项

处理大型文本时，请考虑使用 FindSubmatchIndex 以获得更好的性能。此示例比较了这两种方法。

performance.go

package main

import (
    "fmt"
    "regexp"
    "time"
)

func main() {
    re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    text := []byte("Date: 2025-04-20, Time: 12:30")

    start := time.Now()
    matches := re.FindSubmatch(text)
    fmt.Println("FindSubmatch:", string(matches[1]), time.Since(start))

    start = time.Now()
    indices := re.FindSubmatchIndex(text)
    year := text[indices[2]:indices[3]]
    fmt.Println("FindSubmatchIndex:", string(year), time.Since(start))
}

FindSubmatchIndex 返回字节索引而不是子字符串。这可以避免分配，并且在进行大规模处理时更快。

来源

Go regexp 包文档

本教程通过实际的字符串子匹配项提取示例，介绍了 Go 中的 Regexp.FindSubmatch 方法。

作者

我的名字是 Jan Bodnar，我是一名热情的程序员，拥有丰富的编程经验。自 2007 年以来，我一直在撰写编程文章。迄今为止，我已撰写了 1400 多篇文章和 8 本电子书。我在编程教学方面拥有十多年的经验。

列出所有 Go 教程。