Golang Regexp.FindAllIndex

最后修改于 2025 年 4 月 20 日

本教程解释了如何在 Go 中使用 Regexp.FindAllIndex 方法。我们将通过实际示例介绍其功能。

一个正则表达式是一个定义搜索模式的字符序列。它用于在字符串中进行模式匹配。

Regexp.FindAllIndex 方法返回输入字节切片中模式的所有连续匹配项的切片。每个匹配项由一个两元素的整数切片表示。

基本 FindAllIndex 示例

FindAllIndex 最简单的用法是查找单词的所有出现。这里我们定位文本中所有的“go”实例。

basic_findall.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := []byte("go is good, go is great, go is awesome")
    re := regexp.MustCompile(`go`)

    matches := re.FindAllIndex(text, -1)
    for _, match := range matches {
        fmt.Printf("Found 'go' at %d-%d\n", match[0], match[1])
    }
}

该方法返回 [开始, 结束] 索引对的切片。每对索引表示输入文本中的一个匹配位置。

查找多个模式

FindAllIndex 可以查找多个不同的模式。此示例在文本中定位“cat”和“dog”。

multiple_patterns.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := []byte("cat dog bird cat dog")
    re := regexp.MustCompile(`cat|dog`)

    matches := re.FindAllIndex(text, -1)
    for i, match := range matches {
        word := string(text[match[0]:match[1]])
        fmt.Printf("Match %d: %s at %d-%d\n", i+1, word, match[0], match[1])
    }
}

交替运算符 | 匹配任一模式。我们使用返回的索引提取匹配的单词。

限制匹配数量

第二个参数控制返回多少个匹配项。这里我们限制为前两个匹配项。

limit_matches.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := []byte("one two three four five six")
    re := regexp.MustCompile(`\w+`)

    matches := re.FindAllIndex(text, 2)
    for _, match := range matches {
        word := string(text[match[0]:match[1]])
        fmt.Println(word)
    }
}

将 n 设置为 2 只返回前两个匹配项。使用 -1 查找所有匹配项。

查找重叠匹配项

默认情况下，匹配项不重叠。此示例演示了如何使用前瞻查找重叠匹配项。

overlapping.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := []byte("ababab")
    re := regexp.MustCompile(`(?=(aba))`)

    matches := re.FindAllIndex(text, -1)
    for _, match := range matches {
        // Note: match[0] == match[1] for zero-width matches
        fmt.Printf("Found at %d-%d: %s\n", 
            match[0], match[0]+3, text[match[0]:match[0]+3])
    }
}

前瞻断言 (?=...) 允许查找重叠模式。每个匹配项具有相等的开始和结束索引。

查找所有电子邮件索引

这个实际示例查找文本中所有的电子邮件地址及其位置。

email_indices.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := []byte(`Contact us at info@example.com or support@company.com`)
    re := regexp.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`)

    matches := re.FindAllIndex(text, -1)
    for _, match := range matches {
        email := string(text[match[0]:match[1]])
        fmt.Printf("Email %s at %d-%d\n", email, match[0], match[1])
    }
}

该模式匹配标准的电子邮件格式。我们提取了电子邮件本身及其在文本中的确切位置。

处理空匹配

空匹配项需要特殊处理。此示例演示了它们的行为。

empty_matches.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := []byte("a,,b,c,,")
    re := regexp.MustCompile(`,`)

    matches := re.FindAllIndex(text, -1)
    fmt.Println("All comma positions:")
    for _, match := range matches {
        fmt.Printf("%d-%d\n", match[0], match[1])
    }

    // Handling empty fields between commas
    fields := re.Split(string(text), -1)
    fmt.Println("\nFields:")
    for i, field := range fields {
        fmt.Printf("%d: %q\n", i, field)
    }
}

空匹配项显示为零长度范围。它们对于分割字符串同时保留空字段很有用。

性能注意事项

对于大型文本，请考虑使用 FindAllIndex 配合字节切片而不是字符串以获得更好的性能。

performance.go

package main

import (
    "fmt"
    "regexp"
    "time"
)

func main() {
    // Generate large text
    var text []byte
    for i := 0; i < 10000; i++ {
        text = append(text, "abc123 "...)
    }

    re := regexp.MustCompile(`\d+`)

    start := time.Now()
    matches := re.FindAllIndex(text, -1)
    elapsed := time.Since(start)

    fmt.Printf("Found %d matches in %s\n", len(matches), elapsed)
    fmt.Printf("First match at %d-%d\n", matches[0][0], matches[0][1])
}

使用字节切片可避免字符串转换。这可以显著提高大型输入的性能。

来源

Go regexp 包文档

本教程通过模式匹配和索引检索的实际示例，介绍了 Go 中的 Regexp.FindAllIndex 方法。

作者

我叫 Jan Bodnar，我是一名热情的程序员，拥有丰富的编程经验。自 2007 年以来，我一直在撰写编程文章。迄今为止，我已撰写了 1400 多篇文章和 8 本电子书。我在编程教学方面拥有十多年的经验。

列出所有 Go 教程。