Rust 中的正则表达式

最后修改于 2025 年 2 月 19 日

在本文中，我们将展示如何在 Rust 中使用正则表达式。

正则表达式

在 Rust 中，正则表达式是用于匹配字符串中字符组合的模式。它们常用于字符串搜索、替换和验证。Rust 提供了 `regex` crate，这是一个用于处理正则表达式的库，它既快速又可靠。

下表显示了一些正则表达式字符串。

Regex	含义
`.`	匹配任何单个字符。
`?`	匹配前一个元素一次或不匹配。
`+`	匹配前一个元素一次或多次。
`*`	匹配前一个元素零次或多次。
`^`	匹配字符串内的起始位置。
`$`	匹配字符串内的结束位置。
`\|`	交替运算符。
`[abc]`	匹配 a、b 或 c。
`[a-c]`	范围；匹配 a、b 或 c。
`[^abc]`	否定；匹配除 a、b 或 c 之外的所有字符。
`\s`	匹配空白字符。
`\w`	匹配单词字符；等同于 `[a-zA-Z_0-9]`

is_match 函数

is_match 函数指示正则表达式是否在输入字符串中找到匹配项。

main.rs

use regex::Regex;

fn main() {
    let words = vec!["Seven", "even", "Maven", "Amen", "eleven"];
    let rx = Regex::new(r".even").unwrap();

    for word in &words {
        if rx.is_match(word) {
            println!("{} does match", word);
        } else {
            println!("{} does not match", word);
        }
    }
}

在示例中，我们在一个向量中有五个单词。我们检查哪些单词与 .even 正则表达式匹配。

let rx = Regex::new(r".even").unwrap();

我们定义了正则表达式。点字符代表任何单个字符。其余的是普通字母。

for word in &words {
    if rx.is_match(word) {
        println!("{} does match", word);
    } else {
        println!("{} does not match", word);
    }
}

我们遍历单词列表。如果单词与正则表达式匹配，is_match 方法将返回 true。

$ cargo run -q
Seven does match
even does not match
Maven does not match
Amen does not match
eleven does match

查找单词的出现

该示例查找并打印提供的文本中 "fox" 或 "foxes" 的所有出现，以及它们的位置。

main.rs

use regex::Regex;

fn main() {
    let content = "Foxes are omnivorous mammals belonging to several genera
of the family Canidae. Foxes have a flattened skull, upright triangular ears,
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every
continent except Antarctica. By far the most common and widespread species of
fox is the red fox.";

    // Adding (?i) to the regex pattern to ignore case
    let rx = Regex::new(r"(?i)fox(es)?").unwrap();

    for mat in rx.find_iter(content) {
        println!("{} at index {}", mat.as_str(), mat.start());
    }
}

搜索是区分大小写的。

let rx = Regex::new(r"(?i)fox(es)?").unwrap();

此行创建一个新的正则表达式，该正则表达式匹配 "fox" 或 "foxes"，忽略大小写。

for mat in rx.find_iter(content) {
    println!("{} at index {}", mat.as_str(), mat.start());
}

find_iter 返回一个迭代器，遍历内容中正则表达式的所有匹配项。mat.as_str 返回找到的模式，mat.start 返回其起始索引。

$ cargo run -q
Foxes at index 0
Foxes at index 80
Foxes at index 194
fox at index 292
fox at index 307

计数匹配项

下一个示例使用 count 计算给定模式的所有出现次数。

main.rs

use regex::Regex;

fn main() {
    let content = "Foxes are omnivorous mammals belonging to several genera
of the family Canidae. Foxes have a flattened skull, upright triangular ears,
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every
continent except Antarctica. By far the most common and widespread species of
fox is the red fox.";

    let pattern = r"(?i)fox(es)?";

    let rx = Regex::new(pattern).unwrap();
    let n = rx.find_iter(content).count();

    println!("There are {} matches", n);
}

我们找出文本中有多少次 fox(es)。搜索是区分大小写的。

Regex 锚点

锚点匹配给定文本中字符的位置。在下一个示例中，我们查看一个字符串是否位于句子的开头。

main.rs

use regex::Regex;

fn main() {
    let sentences = vec![
        "I am looking for Jane.",
        "Jane was walking along the river.",
        "Kate and Jane are close friends."
    ];

    let rx = Regex::new(r"^Jane").unwrap();

    for sentence in &sentences {
        if rx.is_match(sentence) {
            println!("{} does match", sentence);
        } else {
            println!("{} does not match", sentence);
        }
    }
}

我们有三个句子。搜索模式是 ^Jane。该模式检查 "Jane" 字符串是否位于文本的开头。Jane\.$ 将查找句子末尾的 "Jane"。

Regex 交替

交替运算符 | 允许创建具有多个选择的正则表达式。

main

use regex::Regex;

fn main() {
    let users = vec![
        "Jane", "Thomas", "Robert", "Lucy", "Beky", "John", "Peter", "Andy",
    ];

    for user in &users {
        if rx.is_match(user) {
            println!("{} does match", user);
        } else {
            println!("{} does not match", user);
        }
    }
}

列表中有九个名字。

let rx = Regex::new(r"Jane|Beky|Robert").unwrap();

这个正则表达式查找 "Jane"、"Beky" 或 "Robert" 字符串。

捕获组

圆括号用于创建捕获组。这使我们能够将量词应用于整个组，或将交替限制为正则表达式的一部分。

main

use regex::Regex;

fn main() {
    let sites = vec!["webcode.me", "zetcode.com", "freebsd.org", "netbsd.org"];

    let rx = Regex::new(r"(\w+)\.(\w+)").unwrap();

    for site in &sites {
        if let Some(caps) = rx.captures(site) {
            println!("{}", &caps[0]); // Whole match
            println!("{}", &caps[1]); // First group
            println!("{}", &caps[2]); // Second group
        }
        println!("*****************");
    }
}

在示例中，我们使用组将域名分成两部分。

let rx = Regex::new(r"(\w+)\.(\w+)").unwrap();

我们用括号定义了两个组。

if let Some(caps) = rx.captures(site)

该条件检查站点是否匹配正则表达式模式并捕获组。

println!("{}", &caps[0]); // Whole match
println!("{}", &caps[1]); // First group
println!("{}", &caps[2]); // Second group

我们打印匹配的组。

$ cargo run -q
webcode.me
webcode
me
*****************
zetcode.com
zetcode
com
*****************
freebsd.org
freebsd
org
*****************
netbsd.org
netbsd
org
*****************

Regex 替换

replace 方法用于替换文本。

main.rs

use regex::Regex;

fn main() {
    let text = "My name is John Doe.";
    let re = Regex::new(r"John").unwrap();

    let new_text = re.replace(text, "Jane");

    println!("{}", new_text);
}

$ cargo run -q
My name is Jane Doe.

Regex 分割

split 方法用于分割文本。

main.rs

use regex::Regex;

fn main() {
    let text = "My name is John Doe.";
    let re = Regex::new(r"\s").unwrap();

    for part in re.split(text) {
        println!("{}", part);
    }
}

$ cargo run -q
My
name
is
John
Doe.

来源

Crate regex 文档

在本文中，我们学习了如何在 Rust 中使用正则表达式。

作者

我叫 Jan Bodnar，是一名热情的程序员，拥有丰富的编程经验。我从 2007 年开始撰写编程文章。迄今为止，我已撰写了 1,400 多篇文章和 8 本电子书。我在编程教学方面拥有十多年的经验。

列出所有 Rust 教程。