ZetCode

Python 惰性求值

最后修改于 2025 年 2 月 24 日

惰性求值是一种编程技术,表达式的求值会被延迟到实际需要其值时才进行。 这可以显著提高性能,尤其是在处理大型数据集或计算密集型操作时。 在本教程中,我们将使用生成器探索 Python 中的惰性求值,并通过性能分析将其与非惰性方法进行比较。

生成斐波那契数列

此示例演示了生成斐波那契数列的惰性方法和非惰性方法之间的区别。

fibonacci.py
import time
import itertools
from memory_profiler import memory_usage

# Non-lazy approach
def fibonacci_non_lazy(n):
    result = []
    a, b = 0, 1
    for _ in range(n):
        result.append(a)
        a, b = b, a + b
    return result

# Lazy approach
def fibonacci_lazy(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

# Profiling
def profile_non_lazy():
    start_time = time.time()
    result = fibonacci_non_lazy(100_000)
    
    # Print the first 20 elements
    for e in result[:20]:
        print(e, end=' ')
    print()
    
    duration = time.time() - start_time
    return duration

def profile_lazy():
    start_time = time.time()
    slice = itertools.islice(fibonacci_lazy(100_000), 20)
    
    # Print the first 20 elements
    for e in slice:
        print(e, end=' ')
    print()
    
    duration = time.time() - start_time
    return duration

def profile_non_lazy_memory():
    result = fibonacci_non_lazy(100_000)
    
    # Monitor memory usage in the loop
    for e in memory_usage((print, [result[:100]])):
        pass

def profile_lazy_memory():
    slice = itertools.islice(fibonacci_lazy(100_000), 100)
    
    # Monitor memory usage in the loop
    for e in memory_usage((print, [list(slice)])):
        pass

if __name__ == "__main__":
    # Profile non-lazy and lazy approaches with print
    non_lazy_me = memory_usage((profile_non_lazy_memory, ))
    print('-------------------------------------')
    lazy_mem = memory_usage((profile_lazy_memory, ))

    # Profile without print statements
    non_lazy_delta = profile_non_lazy()
    lazy_delta = profile_lazy()

    print(f"Non-lazy approach: {non_lazy_me[0]} MiB used in {non_lazy_delta:.2f} seconds")
    print('-------------------------------------')
    print(f"Lazy approach: {lazy_mem[0]} MiB used in {lazy_delta:.2f} seconds")

在此示例中,非惰性方法生成整个斐波那契数列并将其存储在列表中,而惰性方法使用生成器来动态生成值。 对于大型序列,惰性方法更节省内存并且更快。

注意:请小心处理大型斐波那契数列;这可能会压垮您的操作系统。

range 函数

内置的 range 函数是惰性求值的。

range_fun.py
import time
from memory_profiler import memory_usage

# Non-lazy custom range function
def custom_non_lazy_range(start, end):
    result = []
    current = start
    while current < end:
        result.append(current)
        current += 1
    return result

# Profiling functions
def profile_builtin_range():
    start_time = time.time()
    result = range(1_500_000)
    
    # Print the first 3000 elements
    for e in result[:3000]:
        print(e, end=' ')
    print()
    
    duration = time.time() - start_time
    return duration

def profile_custom_non_lazy_range():
    start_time = time.time()
    result = custom_non_lazy_range(0, 1_500_000)
    
    # Print the first 3000 elements
    for e in result[:3000]:
        print(e, end=' ')
    print()
    
    duration = time.time() - start_time
    return duration

if __name__ == "__main__":
    # Profile built-in range and custom non-lazy range
    builtin_range_memory = memory_usage((profile_builtin_range, ))
    print('-------------------------------------')
    custom_non_lazy_range_memory = memory_usage((profile_custom_non_lazy_range, ))

    # Print memory usage and durations
    builtin_range_duration = profile_builtin_range()
    custom_non_lazy_range_duration = profile_custom_non_lazy_range()

    print(f"Built-in range: {builtin_range_memory[0]} MiB used in {builtin_range_duration:.2f} seconds")
    print('-------------------------------------')
    print(f"Custom non-lazy range: {custom_non_lazy_range_memory[0]} MiB used in {custom_non_lazy_range_duration:.2f} seconds")

在该示例中,我们将内置函数与自定义函数(非惰性)进行比较。我们以惰性和非惰性的方式创建包含 150 万个值的序列。然后,我们提取前 3000 个。最后,我们比较两种方法的时间和内存使用情况。

读取大型文件

此示例比较了读取大型文件的惰性方法和非惰性方法。

read_file.py
import time

# Non-lazy approach
def read_file_non_lazy(filename):
    with open(filename, 'r') as file:
        return file.readlines()

# Lazy approach
def read_file_lazy(filename):
    with open(filename, 'r') as file:
        for line in file:
            yield line

# Profiling
start_time = time.time()
read_file_non_lazy('large_file.txt')
print(f"Non-lazy approach: {time.time() - start_time} seconds")

start_time = time.time()
list(read_file_lazy('large_file.txt'))
print(f"Lazy approach: {time.time() - start_time} seconds")

非惰性方法将整个文件读取到内存中,这对于大型文件来说效率低下。 惰性方法逐行读取文件,从而减少内存使用并提高性能。

过滤数据

此示例演示了惰性求值用于过滤数据。

filter_data.py
import time
import itertools

# Non-lazy approach
def filter_non_lazy(data):
    return [x for x in data if x % 2 == 0]

# Lazy approach
def filter_lazy(data):
    for x in data:
        if x % 2 == 0:
            yield x


# Profiling
data = range(10_000_000)

start_time = time.time()
res = filter_non_lazy(data)

for e in res[:10]:
    print(e)

print(f"Non-lazy approach: {time.time() - start_time} seconds")

start_time = time.time()
res = filter_lazy(data)
for e in itertools.islice(res, 10):
    print(e)

print(f"Lazy approach: {time.time() - start_time} seconds")

非惰性方法一次性过滤整个数据集,而惰性方法动态地过滤元素。 对于大型数据集,惰性方法更节省内存并且更快。

无限序列

此示例演示了惰性求值用于生成无限序列。

infinite_sequence.py
import time

# Non-lazy approach (not feasible for infinite sequences)
# Lazy approach
def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1

# Profiling
start_time = time.time()
sequence = infinite_sequence()
for _ in range(1000000):
    next(sequence)
print(f"Lazy approach: {time.time() - start_time} seconds")

惰性方法允许我们生成无限序列,而不会消耗无限内存。 这对于非惰性方法是不可行的。

链式迭代器

此示例演示了惰性求值用于链式迭代器。

chain_iterators.py
import time
from itertools import chain

# Non-lazy approach
def chain_non_lazy(iter1, iter2):
    return list(iter1) + list(iter2)

# Lazy approach
def chain_lazy(iter1, iter2):
    return chain(iter1, iter2)

# Profiling
iter1 = range(1000000)
iter2 = range(1000000)

start_time = time.time()
chain_non_lazy(iter1, iter2)
print(f"Non-lazy approach: {time.time() - start_time} seconds")

start_time = time.time()
list(chain_lazy(iter1, iter2))
print(f"Lazy approach: {time.time() - start_time} seconds")

非惰性方法将两个迭代器组合成一个列表,而惰性方法将它们链接在一起,而无需创建中间列表。 惰性方法更节省内存。

处理大型数据集

此示例演示了惰性求值用于处理大型数据集。

process_data.py
import time

# Non-lazy approach
def process_non_lazy(data):
    return [x * 2 for x in data]

# Lazy approach
def process_lazy(data):
    for x in data:
        yield x * 2

# Profiling
data = range(1000000)

start_time = time.time()
process_non_lazy(data)
print(f"Non-lazy approach: {time.time() - start_time} seconds")

start_time = time.time()
list(process_lazy(data))
print(f"Lazy approach: {time.time() - start_time} seconds")

非惰性方法一次性处理整个数据集,而惰性方法动态地处理元素。 对于大型数据集,惰性方法更节省内存并且更快。

来源

Python itertools 文档

在本文中,我们探讨了 Python 中的惰性求值,并通过实际示例和性能分析比较证明了其有效性。

作者

我叫 Jan Bodnar,是一位充满热情的程序员,拥有丰富的编程经验。 自 2007 年以来,我一直在撰写编程文章。 迄今为止,我撰写了超过 1,400 篇文章和 8 本电子书。 我拥有超过十年的编程教学经验。

列出所有 Python 教程