Golang Regexp.FindSubmatchIndex

最后修改于 2025 年 4 月 20 日

本教程解释了如何在 Go 中使用 Regexp.FindSubmatchIndex 方法。我们将介绍子匹配索引并提供实际示例。

一个正则表达式是一个定义搜索模式的字符序列。它用于在字符串中进行模式匹配。

Regexp.FindSubmatchIndex 方法返回一个切片，其中包含标识最左侧匹配项和子匹配项的索引对。它对于提取匹配的部分及其位置非常有用。

基本的 FindSubmatchIndex 示例

FindSubmatchIndex 最简单的用法是查找匹配项及其位置。这里我们找到一个简单的单词及其位置。

basic_indices.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`hello`)
    str := "hello world hello again"
    
    indices := re.FindSubmatchIndex([]byte(str))
    if indices != nil {
        fmt.Println("Full match:", str[indices[0]:indices[1]])
        fmt.Printf("Positions: %d to %d\n", indices[0], indices[1])
    }
}

该方法返回一个切片，其中 indices[0] 是匹配项的开始，indices[1] 是结束。我们可以使用这些来提取匹配的子字符串。

使用索引提取日期组件

此示例展示了如何获取匹配的日期组件及其在输入字符串中的位置。

date_indices.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    str := "Event date: 2025-04-20, deadline: 2025-05-15"
    
    indices := re.FindSubmatchIndex([]byte(str))
    if indices != nil {
        fmt.Println("Full match:", str[indices[0]:indices[1]])
        fmt.Println("Year:", str[indices[2]:indices[3]])
        fmt.Println("Month:", str[indices[4]:indices[5]])
        fmt.Println("Day:", str[indices[6]:indices[7]])
    }
}

每个捕获组在结果切片中都有两个索引。该模式有三个组，因此我们得到七个索引对（包括整个匹配）。

查找所有匹配项及其索引

要查找所有匹配项及其位置，我们使用 FindAllSubmatchIndex。这会显示字符串中的所有日期及其位置。

all_indices.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    str := "Dates: 2025-04-20, 2025-05-15, 2025-06-30"
    
    allIndices := re.FindAllSubmatchIndex([]byte(str), -1)
    for _, indices := range allIndices {
        fmt.Printf("Found date %s at position %d\n",
            str[indices[0]:indices[1]], indices[0])
    }
}

该方法返回一个索引切片。每个内部切片包含一个匹配项的位置。第二个参数限制匹配的数量。

带索引的命名捕获组

命名捕获组使代码更具可读性。此示例演示了如何处理命名组及其索引。

named_groups.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(?P\d{4})-(?P\d{2})-(?P\d{2})`)
    str := "Today is 2025-04-20"
    
    indices := re.FindSubmatchIndex([]byte(str))
    if indices != nil {
        for i, name := range re.SubexpNames() {
            if i != 0 && name != "" {
                start := indices[2*i]
                end := indices[2*i+1]
                fmt.Printf("%s: %s\n", name, str[start:end])
            }
        }
    }
}

命名组通过 SubexpNames 访问。索引遵循与编号组相同的模式，但具有有意义的名称。

带位置跟踪的电子邮件验证

此示例验证电子邮件地址，同时跟踪它们在输入文本中的位置。

email_indices.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})`)
    str := "Contact us at info@example.com or support@company.org"
    
    allIndices := re.FindAllSubmatchIndex([]byte(str), -1)
    for _, indices := range allIndices {
        fmt.Printf("Found email %s at position %d\n",
            str[indices[0]:indices[1]], indices[0])
        fmt.Printf("Username: %s\n", str[indices[2]:indices[3]])
        fmt.Printf("Domain: %s\n", str[indices[4]:indices[5]])
    }
}

该模式捕获整个电子邮件及其组件。索引有助于在原始字符串中定位每个部分。

带有多个组的复杂模式

对于具有许多组的复杂模式，FindSubmatchIndex 有助于提取特定部分，同时了解其确切位置。

complex_pattern.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`(\w+):(\d+):(\d+\.\d+)`)
    str := "item1:42:3.14,item2:99:2.72,item3:7:1.61"
    
    allIndices := re.FindAllSubmatchIndex([]byte(str), -1)
    for i, indices := range allIndices {
        fmt.Printf("Match %d:\n", i+1)
        fmt.Printf("  Name: %s\n", str[indices[2]:indices[3]])
        fmt.Printf("  ID: %s\n", str[indices[4]:indices[5]])
        fmt.Printf("  Value: %s\n", str[indices[6]:indices[7]])
    }
}

该模式匹配带有 ID 的名称-值对。索引允许我们精确提取每个组件，即使在复杂的字符串中。

带位置的 HTML 标签提取

此示例提取 HTML 标签及其属性，同时跟踪它们在文档中的位置。

html_tags.go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`<([a-zA-Z]+)(\s+[^>]*)?>`)
    html := `Title
`
    
    allIndices := re.FindAllSubmatchIndex([]byte(html), -1)
    for _, indices := range allIndices {
        tag := html[indices[2]:indices[3]]
        fullTag := html[indices[0]:indices[1]]
        fmt.Printf("Found %s tag at position %d\n", tag, indices[0])
        fmt.Printf("  Full tag: %s\n", fullTag)
    }
}

该模式匹配 HTML 标签并捕获标签名。索引有助于在 HTML 字符串中定位每个标签以供进一步处理。

来源

Go regexp 包文档

本教程通过实际的模式匹配和位置跟踪示例，介绍了 Go 中的 Regexp.FindSubmatchIndex 方法。

作者

我叫 Jan Bodnar，是一名充满热情的程序员，拥有丰富的编程经验。我自 2007 年以来一直在撰写编程文章。迄今为止，我已撰写了 1,400 多篇文章和 8 本电子书。我在教授编程方面拥有十多年的经验。

列出所有 Go 教程。