Python re.purge() 函数

最后修改于 2025 年 4 月 20 日

re.purge 简介

re.purge 函数是 Python re 模块的一部分。它清除模块用于存储已编译模式的正则表达式缓存。

Python 缓存最近使用的正则表达式以提高性能。re.purge 函数允许手动清除此缓存。

此函数在内存敏感型应用程序中或在使用不应长期缓存的许多唯一模式时特别有用。

基本语法

re.purge 的语法很简单，因为它不接受任何参数

re.purge()

调用此函数会清除已编译的正则表达式模式的内部缓存。后续的正则表达式操作如果再次使用，将需要重新编译模式。

基本缓存清除示例

这是一个清除正则表达式缓存的简单演示。

basic_purge.py

#!/usr/bin/python

import re

# Compile some patterns to fill the cache
re.compile(r'pattern1')
re.compile(r'pattern2')

# Clear the cache
re.purge()

# These will need to be compiled fresh
re.compile(r'pattern1')
re.compile(r'pattern2')

此示例显示了 re.purge 的基本用法，用于清除编译之间的缓存。

re.compile(r'pattern1')

第一次编译将模式添加到 Python 的内部正则表达式缓存中。

re.purge()

此调用清除所有缓存的已编译模式，立即释放内存。

测量缓存影响

让我们测量一下缓存如何影响编译性能。

cache_impact.py

#!/usr/bin/python

import re
import time

def time_compilation(pattern):
    start = time.perf_counter()
    re.compile(pattern)
    return time.perf_counter() - start

# First compilation (uncached)
t1 = time_compilation(r'\d+')

# Second compilation (cached)
t2 = time_compilation(r'\d+')

# After purge
re.purge()
t3 = time_compilation(r'\d+')

print(f"First: {t1:.6f}s, Second: {t2:.6f}s, After purge: {t3:.6f}s")

这演示了缓存和未缓存编译之间的性能差异。

使用 Purge 进行内存管理

以下是 re.purge 如何帮助管理内存使用情况。

memory_management.py

#!/usr/bin/python

import re
import sys

def show_cache_size():
    return sys.getsizeof(re._cache)

# Fill cache with many patterns
for i in range(1000):
    re.compile(f'pattern_{i}')

print(f"Cache size before purge: {show_cache_size()} bytes")

# Clear the cache
re.purge()

print(f"Cache size after purge: {show_cache_size()} bytes")

此示例显示了 re.purge 如何通过清除正则表达式缓存来减少内存使用。

具有相同模式的缓存行为

缓存有效地存储相同的模式。

identical_patterns.py

#!/usr/bin/python

import re

# These compile to the same pattern object
pattern1 = re.compile(r'\d+')
pattern2 = re.compile(r'\d+')

print(f"Same object: {pattern1 is pattern2}")

# After purge, new objects are created
re.purge()
pattern3 = re.compile(r'\d+')
pattern4 = re.compile(r'\d+')

print(f"After purge: {pattern3 is pattern4}")

这演示了相同的模式如何共享缓存对象，直到清除缓存。

缓存限制和 Purge 时机

Python 的正则表达式缓存有一个我们可以观察到的默认限制。

cache_limits.py

#!/usr/bin/python

import re
import sys

# Check default cache size
print(f"Default cache limit: {re._MAXCACHE}")

# Fill cache beyond limit
for i in range(re._MAXCACHE + 10):
    re.compile(f'pattern_{i}')

# Verify cache size
print(f"Actual cache size: {len(re._cache)}")

# Purge resets everything
re.purge()
print(f"After purge: {len(re._cache)}")

这显示了缓存如何自动管理其大小，以及 re.purge 如何提供手动控制。

测试依赖缓存的代码

re.purge 对于测试依赖缓存的行为很有用。

testing.py

#!/usr/bin/python

import re

def test_pattern_compilation():
    # Ensure fresh start
    re.purge()
    
    # Test first compilation
    start = time.perf_counter()
    re.compile(r'complex_pattern')
    first_time = time.perf_counter() - start
    
    # Test cached compilation
    start = time.perf_counter()
    re.compile(r'complex_pattern')
    cached_time = time.perf_counter() - start
    
    assert cached_time < first_time, "Cache not working"
    print("Test passed")

test_pattern_compilation()

这演示了如何使用 re.purge 来确保一致的测试条件。

最佳实践

使用 re.purge 时，请考虑以下最佳实践

谨慎使用 - 缓存的存在是为了提高性能
如果使用许多唯一的模式，请在内存密集型操作之前调用
在测试中使用以确保行为一致
与自定义缓存结合使用以用于专用应用程序
在优化之前监控实际内存影响

性能注意事项

应谨慎使用re.purge，因为清除缓存会强制重新编译后续模式。

性能影响取决于您的模式复杂性和使用频率。在优化缓存行为之前进行测量。

来源

Python re.purge() 文档

本教程介绍了 Python 的 re.purge 函数及其在管理正则表达式缓存中的作用。明智地使用它进行内存管理。

作者

我叫 Jan Bodnar，是一位充满热情的程序员，拥有丰富的编程经验。自 2007 年以来，我一直在撰写编程文章。到目前为止，我已经撰写了 1,400 多篇文章和 8 本电子书。我拥有超过十年的编程教学经验。

列出所有 Python 教程。