Golang Regexp.FindStringSubmatch

最后修改于 2025 年 4 月 20 日

本教程将讲解如何在 Go 中使用 `Regexp.FindStringSubmatch` 方法。我们将通过实际示例介绍子匹配项的提取。

一个正则表达式是一个定义搜索模式的字符序列。它用于在字符串中进行模式匹配。

Regexp.FindStringSubmatch 方法返回一个字符串切片，其中包含最左侧匹配项及其子匹配项的文本。子匹配项是正则表达式中带括号的子表达式的匹配项。

基本的 FindStringSubmatch 示例

FindStringSubmatch 最简单的用法是提取日期字符串中的部分。在这里，我们将日期分解为年、月和日组件。

basic_submatch.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    date := "2025-04-20"

    matches := re.FindStringSubmatch(date)
    if matches != nil {
        fmt.Println("Full match:", matches[0])
        fmt.Println("Year:", matches[1])
        fmt.Println("Month:", matches[2])
        fmt.Println("Day:", matches[3])
    }
}

该模式使用括号创建捕获组。索引 0 包含整个匹配项，而后续索引包含子匹配项。

提取 URL 组件

本示例演示了如何从 URL 中提取协议、域名和路径。它展示了如何处理更复杂的字符串解析。

url_components.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`^(https?)://([^/]+)(/.*)?$`)
    url := "https://example.com/path/to/resource"

    matches := re.FindStringSubmatch(url)
    if matches != nil {
        fmt.Println("Protocol:", matches[1])
        fmt.Println("Domain:", matches[2])
        fmt.Println("Path:", matches[3])
    }
}

该正则表达式将 URL 分解为三个部分。路径是可选的，其捕获组后面的问号表示了这一点。

命名捕获组

Go 不原生支持命名捕获组，但我们可以使用带有常量索引的 map 来模拟它们。这可以提高代码的可读性。

named_groups.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    const (
        fullMatch = iota
        year
        month
        day
    )

    re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    date := "2025-04-20"

    matches := re.FindStringSubmatch(date)
    if matches != nil {
        fmt.Println("Full match:", matches[fullMatch])
        fmt.Println("Year:", matches[year])
        fmt.Println("Month:", matches[month])
        fmt.Println("Day:", matches[day])
    }
}

使用常量作为索引使代码更易于维护。模式保持不变，但访问更清晰。

电子邮件地址解析

本示例从电子邮件地址中提取用户名和域。它展示了如何处理具有子匹配项的更复杂模式。

email_parsing.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`^([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})$`)
    email := "user.name@example.com"

    matches := re.FindStringSubmatch(email)
    if matches != nil {
        fmt.Println("Full email:", matches[0])
        fmt.Println("Username:", matches[1])
        fmt.Println("Domain:", matches[2])
    } else {
        fmt.Println("Invalid email format")
    }
}

该模式在提取电子邮件组件的同时验证其格式。请注意，完整的电子邮件验证需要更复杂的模式。

文本中的多个匹配项

要查找文本中的所有匹配项，我们使用 `FindAllStringSubmatch`。本示例从字符串中提取所有电话号码。

multiple_matches.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\d{3})-(\d{3})-(\d{4})`)
    text := "Call 555-123-4567 or 888-234-5678 for assistance"

    allMatches := re.FindAllStringSubmatch(text, -1)
    for _, matches := range allMatches {
        fmt.Println("Full number:", matches[0])
        fmt.Println("Area code:", matches[1])
        fmt.Println("Exchange:", matches[2])
        fmt.Println("Line number:", matches[3])
        fmt.Println("---")
    }
}

`FindAllStringSubmatch` 的第二个参数限制了匹配项的数量。使用 -1 可以查找字符串中的所有匹配项。

可选的子匹配项

本示例展示了如何处理模式中的可选组件。我们从字符串中提取必需和可选部分。

optional_submatches.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`^(Mr|Ms|Mrs)\.?\s+(\w+)(?:\s+(\w+))?$`)
    names := []string{
        "Mr. John Doe",
        "Ms Jane Smith",
        "Mrs. Johnson",
    }

    for _, name := range names {
        matches := re.FindStringSubmatch(name)
        if matches != nil {
            fmt.Println("Title:", matches[1])
            fmt.Println("First name:", matches[2])
            if matches[3] != "" {
                fmt.Println("Last name:", matches[3])
            } else {
                fmt.Println("Last name: (none)")
            }
            fmt.Println("---")
        }
    }
}

非捕获组 `(?:...)` 使姓氏成为可选。我们在处理结果时检查空子匹配项。

复杂的日志解析

这个高级示例解析了包含多个组件的日志条目。它展示了如何处理复杂的实际数据提取。

log_parsing.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(\w+)\] (\w+): (.+)$`)
    logEntry := "2025-04-20 14:30:45 [ERROR] main: Failed to connect to database"

    matches := re.FindStringSubmatch(logEntry)
    if matches != nil {
        fmt.Println("Timestamp:", matches[1])
        fmt.Println("Log level:", matches[2])
        fmt.Println("Component:", matches[3])
        fmt.Println("Message:", matches[4])
    }
}

该模式将标准日志条目分解为其组件。每个部分都单独捕获在子匹配项中，以便于访问。

来源

Go regexp 包文档

本教程通过实际的字符串解析和数据提取示例，介绍了 Go 中的 `Regexp.FindStringSubmatch` 方法。

作者

我叫 Jan Bodnar，是一名热情的程序员，拥有丰富的编程经验。我从 2007 年开始撰写编程文章。迄今为止，我已撰写了 1400 多篇文章和 8 本电子书。我在编程教学方面拥有十多年的经验。

列出所有 Go 教程。