Golang Regexp.FindAll

最后修改于 2025 年 4 月 20 日

本教程将介绍如何在 Go 中使用 `Regexp.FindAll` 方法。我们将涵盖使用正则表达式查找所有匹配项，并提供示例。

一个正则表达式是一个定义搜索模式的字符序列。它用于在字符串中进行模式匹配。

Regexp.FindAll 方法返回输入字符串或字节切片中模式的所有连续匹配项。它有助于一次性提取多个匹配项。

基本的 FindAllString 示例

`FindAllString` 的最基本用法是查找模式的所有匹配项。在这里，我们在文本中查找单词的所有出现。

basic_findall.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := "cat dog cat bird cat fish"
    re := regexp.MustCompile(`cat`)
    
    matches := re.FindAllString(text, -1)
    fmt.Println(matches) // [cat cat cat]
    fmt.Println("Number of matches:", len(matches))
}

我们编译模式“cat”，并在文本中查找所有出现。-1 表示查找所有匹配项。该方法返回一个包含所有匹配字符串的切片。

查找所有电子邮件地址

此示例演示了使用 `FindAllString` 在文本中查找所有电子邮件地址。

find_emails.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := `Contact us at info@example.com or support@domain.com 
             for assistance. Sales can be reached at sales@company.net.`

    pattern := `[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`
    re := regexp.MustCompile(pattern)
    
    emails := re.FindAllString(text, -1)
    for _, email := range emails {
        fmt.Println(email)
    }
}

该模式匹配标准的电子邮件格式。`FindAllString` 扫描整个文本并返回找到的所有电子邮件地址。

限制匹配数量

`FindAll` 的第二个参数控制返回多少匹配项。在这里，我们将结果限制为前两个匹配项。

limit_matches.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := "apple banana apple cherry apple date"
    re := regexp.MustCompile(`apple`)
    
    // Find first 2 matches
    matches := re.FindAllString(text, 2)
    fmt.Println(matches) // [apple apple]
}

将限制设置为 2 只会返回前两个匹配项。当您只需要从大量文本中匹配项的样本时，这很有用。

查找带子匹配项的 FindAll

`FindAllStringSubmatch` 返回所有匹配项，包括子匹配项。在这里，我们提取带有其组件的日期。

findall_submatches.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := "Dates: 2025-04-20, 2026-05-21, and 2027-06-22"
    re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    
    matches := re.FindAllStringSubmatch(text, -1)
    for _, match := range matches {
        fmt.Printf("Full: %s, Year: %s, Month: %s, Day: %s\n",
            match[0], match[1], match[2], match[3])
    }
}

每个匹配项都是一个切片，其中索引 0 是完整匹配项，后续索引是捕获组。这会从文本中提取结构化数据。

查找带字节切片的 FindAll

`FindAll` 使用字节切片进行原始数据处理。这在处理二进制数据或文件时很有用。

findall_bytes.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    data := []byte("key1=value1,key2=value2,key3=value3")
    re := regexp.MustCompile(`(\w+)=(\w+)`)
    
    matches := re.FindAllSubmatch(data, -1)
    for _, match := range matches {
        fmt.Printf("Key: %s, Value: %s\n", match[1], match[2])
    }
}

字节切片版本与字符串操作类似，但直接使用 `[]byte`。这避免了二进制数据的字符串转换。

查找所有单词边界

此示例显示了如何使用单词边界在文本中查找所有单词。

find_words.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := "The quick brown fox jumps over the lazy dog"
    re := regexp.MustCompile(`\b\w+\b`)
    
    words := re.FindAllString(text, -1)
    for i, word := range words {
        fmt.Printf("Word %d: %s\n", i+1, word)
    }
}

模式 `\b\w+\b` 匹配单词边界。`FindAllString` 返回文本中的所有单词。这是一种简单的标记化方法。

查找所有 HTML 标签

此高级示例查找文档中的所有 HTML 标签。请注意，正则表达式可能不是解析完整 HTML 的最佳工具。

find_html_tags.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    html := `<html><head><title>Page</title></head>
             <body><p>Content</p></body></html>`
    
    re := regexp.MustCompile(`<[^>]+>`)
    tags := re.FindAllString(html, -1)
    
    for _, tag := range tags {
        fmt.Println(tag)
    }
}

该模式匹配尖括号之间的任何内容。虽然这适用于简单情况，但对于复杂文档，请考虑使用正确的 HTML 解析器。

来源

Go regexp 包文档

本教程通过查找文本中多个模式匹配项的实际示例，涵盖了 Go 中的 `Regexp.FindAll` 方法。

作者

我的名字是 Jan Bodnar，我是一名充满热情的程序员，拥有丰富的编程经验。我自 2007 年以来一直在撰写编程文章。至今，我已撰写了 1,400 多篇文章和 8 本电子书。我在教学编程方面拥有十多年的经验。

列出所有 Go 教程。