Python os.scandir 函数

上次修改时间：2025 年 4 月 11 日

本综合指南探讨了 Python 的 os.scandir 函数，它提供了高效的目录扫描。我们将介绍基本用法、文件属性、性能优势和实际示例。

基本定义

os.scandir 函数扫描目录，生成 DirEntry 对象。它比 os.listdir 更高效，因为它提供文件属性而无需额外的系统调用。

主要特性：返回 DirEntry 对象的迭代器，提供文件类型和属性信息，并且在许多用例中比 os.listdir 更快。

基本目录列表

os.scandir 最简单的用法是列出目录中的所有条目。此示例展示了如何遍历目录内容。

basic_listing.py

import os

# List all entries in current directory
with os.scandir('.') as entries:
    for entry in entries:
        print(entry.name)

# Alternative without context manager (not recommended)
entries = os.scandir('.')
for entry in entries:
    print(entry.name)
entries.close()

此示例展示了扫描目录的两种方法。第一种使用上下文管理器进行自动资源清理。第二种需要手动关闭。

上下文管理器方法是首选，因为它确保即使在迭代期间发生异常，也能正确清理资源。

过滤文件和目录

DirEntry 对象提供检查条目类型的方法。此示例分别过滤文件和目录。

filter_entries.py

import os

# Scan directory and separate files from directories
with os.scandir('.') as entries:
    files = []
    dirs = []
    
    for entry in entries:
        if entry.is_file():
            files.append(entry.name)
        elif entry.is_dir():
            dirs.append(entry.name)

print("Files:", files)
print("Directories:", dirs)

此代码扫描当前目录，并将条目分类为文件和目录。 is_file() 和 is_dir() 方法有效地检查类型。

这些方法使用来自初始扫描的缓存信息，避免了使用 os.path.isdir() 所需的额外系统调用。

获取文件信息

DirEntry 对象提供对文件元数据的访问。此示例展示了如何获取文件的大小和修改时间。

file_info.py

import os
import datetime

# Get file information for all entries
with os.scandir('.') as entries:
    for entry in entries:
        if entry.is_file():
            stat = entry.stat()
            mtime = datetime.datetime.fromtimestamp(stat.st_mtime)
            print(f"{entry.name:20} {stat.st_size:8} bytes  {mtime}")

此代码显示文件名、大小和修改时间。 stat() 方法返回一个 stat_result 对象，其中包含文件属性。

stat() 方法执行系统调用，但仅在需要时才执行，这使得 os.scandir 仍然比单独的 os.stat 调用更有效。

递归目录遍历

将 os.scandir 与递归结合使用可以进行完整的目录树遍历。此示例实现了一个简单的递归文件搜索。

recursive_scan.py

import os

def scan_directory(path, indent=0):
    with os.scandir(path) as entries:
        for entry in entries:
            print(" " * indent + entry.name)
            if entry.is_dir():
                full_path = os.path.join(path, entry.name)
                scan_directory(full_path, indent + 4)

# Start recursive scan from current directory
scan_directory('.')

此函数递归地扫描目录，打印类似树状的结构。每个嵌套级别都缩进以获得更好的可视化效果。

对于大型目录树，请考虑使用 os.walk() 代替，它可以更稳健地处理一些边缘情况。

查找特定文件类型

os.scandir 可以有效地按扩展名或其他属性过滤文件。此示例查找目录中的所有 Python 文件。

find_py_files.py

import os

# Find all Python files in directory
python_files = []
with os.scandir('.') as entries:
    for entry in entries:
        if entry.is_file() and entry.name.endswith('.py'):
            python_files.append(entry.name)

print("Python files found:", python_files)

# Alternative with list comprehension
with os.scandir('.') as entries:
    py_files = [e.name for e in entries if e.is_file() and e.name.endswith('.py')]
print("Python files (listcomp):", py_files)

第一种方法使用传统的循环来收集 Python 文件。第二种方法显示了更简洁的列表推导式版本。

两种方法都有效地检查文件类型和扩展名，而无需不必要的系统调用。

与 os.listdir 比较

此示例演示了在检查文件类型时，os.scandir 和 os.listdir 之间的性能差异。

performance_comparison.py

import os
import time

def using_scandir():
    start = time.time()
    with os.scandir('.') as entries:
        files = [e.name for e in entries if e.is_file()]
    return time.time() - start

def using_listdir():
    start = time.time()
    files = [f for f in os.listdir('.') if os.path.isfile(f)]
    return time.time() - start

# Time both approaches
scandir_time = using_scandir()
listdir_time = using_listdir()

print(f"os.scandir: {scandir_time:.6f} seconds")
print(f"os.listdir: {listdir_time:.6f} seconds")
print(f"scandir is {listdir_time/scandir_time:.1f}x faster")

此代码测量使用这两种方法列出文件所花费的时间。 os.scandir 通常更快，因为它避免了单独的 stat 调用进行类型检查。

对于较大的目录或检查多个文件属性时，性能差异变得更加明显。

处理符号链接

os.scandir 提供检测和处理符号链接的方法。此示例展示了如何识别它们。

symlink_handling.py

import os

# Create a test symlink if it doesn't exist
if not os.path.exists('test_link'):
    os.symlink(__file__, 'test_link')

# Scan directory and identify symlinks
with os.scandir('.') as entries:
    for entry in entries:
        if entry.is_symlink():
            target = os.readlink(entry.path)
            print(f"Symlink: {entry.name} -> {target}")
        elif entry.is_file():
            print(f"File: {entry.name}")
        elif entry.is_dir():
            print(f"Directory: {entry.name}")

此代码首先创建一个符号链接进行测试，然后扫描目录以识别不同的条目类型。 is_symlink() 方法检测链接。

请注意，默认情况下 is_file() 和 is_dir() 遵循符号链接。将 follow_symlinks=False 与 stat() 一起使用以获取有关链接本身的信息。

安全注意事项

资源处理： 完成后始终关闭 scandir 迭代器
权限错误： 处理受保护目录的 OSError
符号链接： 注意潜在的符号链接攻击
区分大小写： 行为因文件系统而异
并发修改： 目录可能在扫描期间发生更改

最佳实践

使用上下文管理器： 首选 'with' 语句进行资源清理
缓存属性： 存储所需的属性以避免重复调用
处理异常： 捕获权限问题的 OSError
考虑 os.walk： 用于具有更多特性的递归扫描
记录假设： 记录任何预期的目录结构

资料来源

作者

我叫 Jan Bodnar，是一位充满热情的程序员，拥有丰富的编程经验。自 2007 年以来，我一直在撰写编程文章。迄今为止，我已撰写了 1,400 多篇文章和 8 本电子书。我拥有超过十年的编程教学经验。

列出所有 Python 教程。