Golang Regexp.FindIndex

最后修改于 2025 年 4 月 20 日

本教程解释了如何在 Go 中使用 Regexp.FindIndex 方法。我们将介绍基本用法，并提供查找匹配位置的实用示例。

一个正则表达式是一个定义搜索模式的字符序列。它用于在字符串中进行模式匹配。

Regexp.FindIndex 方法在字节切片中定位最左边的匹配项。它返回一个定义匹配位置的整数双元素切片。

基本的 FindIndex 示例

FindIndex 最简单的用法是查找字符串中的第一个匹配项。在这里，我们定位一个简单单词的位置。

basic_findindex.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`hello`)
    text := []byte("say hello to my little friend")
    
    index := re.FindIndex(text)
    fmt.Println(index) // [4 9]
}

输出显示了字节切片中 "hello" 的开始和结束位置。位置是从零开始的，结束索引是排他的。

查找多个匹配项

要查找字符串中的所有匹配项，我们使用 FindAllIndex。此示例说明了如何定位文本中的所有数字。

find_all_index.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    text := []byte("2025 is coming after 2024")
    
    indexes := re.FindAllIndex(text, -1)
    for _, idx := range indexes {
        fmt.Printf("Found at %v: %s\n", idx, text[idx[0]:idx[1]])
    }
}

该模式匹配一个或多个数字。FindAllIndex 返回所有匹配项及其位置。-1 表示查找所有匹配项。

不区分大小写的匹配

我们可以通过使用 regexp.Compile 进行编译来查找不区分大小写的匹配项。此示例演示了不区分大小写的单词匹配。

case_insensitive.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(?i)hello`)
    text := []byte("Hello world, HELLO universe")
    
    indexes := re.FindAllIndex(text, -1)
    for _, idx := range indexes {
        fmt.Printf("Found at %v: %s\n", idx, text[idx[0]:idx[1]])
    }
}

(?i) 标志使匹配不区分大小写。 "Hello" 和 "HELLO" 都会在其各自的位置被找到。

查找子匹配索引

FindSubmatchIndex 同时定位整个匹配项和捕获组。在这里，我们提取日期组件的位置。

submatch_index.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    text := []byte("Date: 2025-04-20")
    
    indexes := re.FindSubmatchIndex(text)
    if indexes != nil {
        fmt.Println("Full match:", indexes[0:2])
        fmt.Println("Year:", indexes[2:4])
        fmt.Println("Month:", indexes[4:6])
        fmt.Println("Day:", indexes[6:8])
    }
}

输出显示了整个日期和每个组件的位置。偶数索引是匹配项的开始，奇数索引是匹配项的结束。

在大型文本中查找索引

对于大型文本，我们可以使用带 io.Reader 的 FindReaderIndex。这避免了将整个文本加载到内存中。

reader_index.go

package main

import (
    "bytes"
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`important`)
    largeText := bytes.NewReader([]byte("This is an important notice about something important"))
    
    idx := re.FindReaderIndex(largeText)
    fmt.Println("First match:", idx)
}

该方法与 FindIndex 类似，但从流中读取。它对于处理大文件或网络流非常高效。

处理无匹配项

当找不到匹配项时，FindIndex 返回 nil。此示例演示了如何正确处理无匹配项的情况。

no_match.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`missing`)
    text := []byte("this text doesn't contain the pattern")
    
    index := re.FindIndex(text)
    if index == nil {
        fmt.Println("Pattern not found")
    } else {
        fmt.Println("Found at:", index)
    }
}

在使用结果之前，请务必检查是否为 nil。这可以防止在访问切片索引时发生潜在的恐慌。

性能注意事项

对于重复搜索，一次编译正则表达式并重复使用它。此示例比较了单次使用与重复使用正则表达式的性能。

performance.go

package main

import (
    "fmt"
    "regexp"
    "time"
)

func main() {
    text := []byte("sample text with pattern to find")
    
    start := time.Now()
    for i := 0; i < 1000; i++ {
        re := regexp.MustCompile(`pattern`)
        re.FindIndex(text)
    }
    fmt.Println("Recompile each time:", time.Since(start))
    
    start = time.Now()
    re := regexp.MustCompile(`pattern`)
    for i := 0; i < 1000; i++ {
        re.FindIndex(text)
    }
    fmt.Println("Reuse compiled regex:", time.Since(start))
}

基准测试显示，重复使用编译后的正则表达式对象可以带来显著的性能提升。尽可能一次编译模式。

来源

Go regexp 包文档

本教程通过在文本中查找匹配位置的实用示例，涵盖了 Go 中的 Regexp.FindIndex 方法。

作者

我的名字是 Jan Bodnar，我是一名充满激情的程序员，拥有丰富的编程经验。我从 2007 年开始撰写编程文章。迄今为止，我已撰写了 1400 多篇文章和 8 本电子书。我在编程教学方面拥有超过十年的经验。

列出所有 Go 教程。