Golang Regexp.FindAllStringSubmatch

最后修改于 2025 年 4 月 20 日

本教程将介绍如何在 Go 中使用 Regexp.FindAllStringSubmatch 方法。我们将涵盖子匹配项提取，并提供实际示例。

一个正则表达式是一个定义搜索模式的字符序列。它用于在字符串中进行模式匹配。

Regexp.FindAllStringSubmatch 方法会在字符串中返回正则表达式的所有匹配项，包括子匹配项。每个匹配项都是一个字符串切片。

基本 FindAllStringSubmatch 示例

最简单的用法是从字符串中提取所有匹配项和子匹配项。在这里，我们在文本中查找日期。

basic_submatch.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := "Dates: 2025-04-20, 2025-05-15, 2025-06-10"
    re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)

    matches := re.FindAllStringSubmatch(text, -1)
    for _, match := range matches {
        fmt.Printf("Full: %s, Year: %s, Month: %s, Day: %s\n",
            match[0], match[1], match[2], match[3])
    }
}

该方法返回一个切片数组。每个内部切片包含完整的匹配项以及子匹配项。索引 0 始终是完整的匹配项。

提取键值对

此示例演示了如何使用命名捕获组从字符串中提取键值对。

key_value_pairs.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := "name=John age=30 city=New York"
    re := regexp.MustCompile(`(\w+)=(\w+)`)

    matches := re.FindAllStringSubmatch(text, -1)
    for _, match := range matches {
        fmt.Printf("Key: %s, Value: %s\n", match[1], match[2])
    }
}

该模式捕获等号之前和之后的单词字符。每次匹配都包含完整的键值对以及分隔的组件。

查找 HTML 标签和属性

在这里，我们从字符串中提取 HTML 标签及其属性。这展示了更复杂的模式匹配。

html_tags.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    html := `<a href="https://example.com" title="Example">Link</a>`
    re := regexp.MustCompile(`<(\w+)([^>]*)>`)

    matches := re.FindAllStringSubmatch(html, -1)
    for _, match := range matches {
        fmt.Println("Tag:", match[1])
        fmt.Println("Attributes:", match[2])
    }
}

该模式捕获标签名称和所有属性。请注意，使用正则表达式解析 HTML 对于复杂文档存在局限性。

提取多个电子邮件地址

此示例展示了如何从文本中提取多个电子邮件地址及其组件（使用子匹配项）。

emails_extraction.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := `Contact us at info@example.com or support@company.co.uk`
    re := regexp.MustCompile(`([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,})`)

    matches := re.FindAllStringSubmatch(text, -1)
    for _, match := range matches {
        fmt.Println("Full email:", match[0])
        fmt.Println("Username:", match[1])
        fmt.Println("Domain:", match[2])
        fmt.Println("TLD:", match[3])
        fmt.Println()
    }
}

每个电子邮件匹配项被分解为用户名、域名和顶级域组件。该模式匹配常见的电子邮件格式。

解析日志条目

日志文件通常包含可以用正则表达式提取的结构化数据。在这里，我们解析 Apache 日志条目。

log_parsing.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    logEntry := `127.0.0.1 - frank [10/Oct/2025:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326`
    re := regexp.MustCompile(`^(\S+) (\S+) (\S+) \[([^\]]+)\] "(\S+) (\S+) (\S+)" (\d+) (\d+)$`)

    matches := re.FindAllStringSubmatch(logEntry, -1)
    for _, match := range matches {
        fmt.Println("IP:", match[1])
        fmt.Println("User:", match[3])
        fmt.Println("Date:", match[4])
        fmt.Println("Method:", match[5])
        fmt.Println("Path:", match[6])
        fmt.Println("Status:", match[8])
        fmt.Println("Size:", match[9])
    }
}

该模式捕获标准 Apache 日志条目的所有组件。每个组件都可以作为单独的子匹配项获取。

提取电话号码

此示例演示了如何从文本中提取各种格式的电话号码及其组件。

phone_numbers.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := `Call 555-1234 or (555) 987-6543 or 555.456.7890`
    re := regexp.MustCompile(`\(?(\d{3})\)?[-. ]?(\d{3})[-. ]?(\d{4})`)

    matches := re.FindAllStringSubmatch(text, -1)
    for _, match := range matches {
        fmt.Println("Full number:", match[0])
        fmt.Println("Area code:", match[1])
        fmt.Println("Exchange:", match[2])
        fmt.Println("Line number:", match[3])
        fmt.Println()
    }
}

该模式处理几种常见的电话号码格式。子匹配项提取区号、交换号码和线路号码组件。

限制匹配数量

FindAllStringSubmatch 的第二个参数控制返回多少个匹配项。此示例显示了如何限制匹配项。

limit_matches.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := "apple banana apple cherry apple date"
    re := regexp.MustCompile(`apple`)

    // Get all matches
    allMatches := re.FindAllStringSubmatch(text, -1)
    fmt.Println("All matches:", len(allMatches))

    // Get first 2 matches
    limitedMatches := re.FindAllStringSubmatch(text, 2)
    fmt.Println("Limited matches:", len(limitedMatches))
}

负值返回所有匹配项。正数将结果限制为该数量的匹配项。这可以提高处理大型输入的性能。

来源

Go regexp 包文档

本教程通过实际示例演示了如何在 Go 中使用 Regexp.FindAllStringSubmatch 方法从字符串中提取子匹配项。

作者

我叫 Jan Bodnar，是一位充满热情的程序员，拥有丰富的编程经验。我从 2007 年开始撰写编程文章。迄今为止，我已撰写了 1400 多篇文章和 8 本电子书。我在教学编程方面有十多年的经验。

列出所有 Go 教程。