Python re.search 函数

最后修改于 2025 年 4 月 20 日

re.search 简介

re.search 函数会扫描整个字符串，查找正则表达式模式匹配的第一个位置。与 re.match 不同，它搜索的是整个字符串。

如果找到匹配项，此函数将返回一个匹配对象；如果不存在匹配项，则返回 None。它是 Python 正则表达式模块中最常用的函数之一。

搜索在字符串中找到的第一个匹配项处停止。对于所有匹配项，请改用 re.findall 或 re.finditer。

基本语法

re.search 的语法很简单

re.search(pattern, string, flags=0)

pattern 是要匹配的正则表达式。将搜索 string 以查找匹配项。可选标志会修改匹配行为。

基本模式搜索

让我们从一个在文本中搜索模式的简单示例开始。

basic_search.py

#!/usr/bin/python

import re

text = "The quick brown fox jumps over the lazy dog"
match = re.search(r'fox', text)

if match:
    print(f"Found '{match.group()}' at position {match.start()}")
else:
    print("Pattern not found")

此示例演示了 re.search 的基本用法。我们在给定的文本中查找文字字符串“fox”。

match = re.search(r'fox', text)

原始字符串 (r'') 可防止 Python 解释反斜杠。该函数会扫描文本中第一次出现的“fox”。

if match:
    print(f"Found '{match.group()}' at position {match.start()}")

如果找到匹配项，我们将打印匹配的文本及其起始位置。 group 方法返回匹配的子字符串。

将标志与 re.search 结合使用

标志会修改模式匹配的行为。这是一个不区分大小写的搜索。

flags_example.py

#!/usr/bin/python

import re

text = "Python is FUN!"
match = re.search(r'fun', text, re.IGNORECASE)

if match:
    print(f"Found '{match.group()}' ignoring case")
else:
    print("Pattern not found")

re.IGNORECASE 标志使搜索不区分大小写。这样可以将“FUN”与模式“fun”进行匹配。

搜索数字

正则表达式可以匹配复杂的模式，例如数字序列。

digits_example.py

#!/usr/bin/python

import re

text = "Order 12345 was placed on 2023-05-20"
match = re.search(r'\d+', text)

if match:
    print(f"Found number: {match.group()}")
else:
    print("No numbers found")

模式 \d+ 匹配一个或多个数字。它查找字符串中第一个数字序列（在本例中为 12345）。

提取电子邮件地址

我们可以使用 re.search 来查找结构化数据，例如电子邮件。

email_example.py

#!/usr/bin/python

import re

text = "Contact us at support@example.com or sales@example.org"
match = re.search(r'[\w\.-]+@[\w\.-]+', text)

if match:
    print(f"Found email: {match.group()}")
else:
    print("No email found")

此模式匹配常见的电子邮件格式。字符类 [\w\.-] 匹配电子邮件中常见的单词字符、点和连字符。

在搜索中使用组

括号创建可以单独提取的捕获组。

groups_example.py

#!/usr/bin/python

import re

text = "Date: 2023-12-25"
match = re.search(r'Date: (\d{4})-(\d{2})-(\d{2})', text)

if match:
    print(f"Year: {match.group(1)}")
    print(f"Month: {match.group(2)}")
    print(f"Day: {match.group(3)}")
else:
    print("Date pattern not found")

这会将日期组成部分提取到单独的组中。组 0 是完全匹配项，而组 1-3 包含捕获的日期部分。

使用单词边界进行搜索

单词边界 (\b) 确保我们只匹配整个单词。

word_boundaries.py

#!/usr/bin/python

import re

text = "The cat in the hat"
match = re.search(r'\bcat\b', text)

if match:
    print("Found exact word 'cat'")
else:
    print("Word 'cat' not found")

\b 定位符确保我们将“cat”匹配为整个单词，而不是作为其他单词（如“category”或“concatenate”）的一部分。

在多行文本中搜索

re.MULTILINE 标志更改了 ^ 和 $ 在模式中的工作方式。

multiline_example.py

#!/usr/bin/python

import re

text = """First line
Second line
Third line"""

match = re.search(r'^Second.*$', text, re.MULTILINE)

if match:
    print(f"Found line: {match.group()}")
else:
    print("Line not found")

使用 re.MULTILINE 时，^ 匹配每一行的开头，而不仅仅是整个字符串的开头。 $ 也是如此。

最佳实践

使用 re.search 时，请遵循以下最佳实践

使用原始字符串 (r'') 作为模式以避免转义问题
如果经常重用模式，请使用 re.compile 预编译模式
在使用匹配对象之前，请检查该对象是否不为 None
尽可能使用适当的标志来简化模式
如果需要所有匹配项，请考虑使用 re.findall

性能注意事项

对于单个搜索，re.search 足够高效。对于使用相同模式的重复搜索，请考虑首先编译该模式。

具有过度回溯的复杂模式可能会很慢。使用真实的输入数据测试性能，尤其是在处理复杂正则表达式时。

来源

Python re.search() 文档

本教程介绍了 Python 的 re.search 函数的本质方面。掌握模式搜索是有效文本处理的基础。

作者

我叫 Jan Bodnar，是一名充满激情的程序员，拥有丰富的编程经验。自 2007 年以来，我一直在撰写编程文章。迄今为止，我已经撰写了 1,400 多篇文章和 8 本电子书。我拥有超过十年的编程教学经验。

列出所有 Python 教程。