Python os.read 函数

上次修改时间：2025 年 4 月 11 日

这份全面的指南探讨了 Python 的 os.read 函数，该函数使用文件描述符执行底层文件读取。我们将涵盖文件描述符、缓冲区大小、错误处理和实际示例。

基本定义

os.read 函数从文件描述符读取最多 n 个字节。它是一种直接与文件描述符交互的底层 I/O 操作。

主要参数：fd（文件描述符），n（要读取的最大字节数）。以字节对象形式返回读取的字节。对于无效描述符，可能会引发 OSError。

使用 os.read 读取文件

这个基本示例演示了如何打开一个文件并使用 os.read 读取其内容。我们首先使用 os.open 获取文件描述符。

basic_read.py

import os

# Open file and get file descriptor
fd = os.open("example.txt", os.O_RDONLY)

# Read up to 1024 bytes
data = os.read(fd, 1024)
print(f"Read {len(data)} bytes:")
print(data.decode('utf-8'))

# Close file descriptor
os.close(fd)

此示例以只读模式打开文件，读取最多 1024 字节，然后关闭描述符。数据以字节形式返回，需要解码。

请注意，即使有更多可用字节，os.read 也可能返回少于请求的字节数。这是底层 I/O 操作的正常行为。

分块读取文件

对于大型文件，分块读取效率更高。此示例展示了如何一块一块地读取文件，直到到达 EOF。

chunked_read.py

import os

CHUNK_SIZE = 4096  # 4KB chunks
fd = os.open("large_file.bin", os.O_RDONLY)

total_bytes = 0
while True:
    chunk = os.read(fd, CHUNK_SIZE)
    if not chunk:  # EOF reached
        break
    total_bytes += len(chunk)
    # Process chunk here
    print(f"Read chunk of {len(chunk)} bytes")

os.close(fd)
print(f"Total bytes read: {total_bytes}")

循环持续读取，直到 os.read 返回一个空字节对象，表示 EOF。每个块在读取时都会被处理。

这种方法节省内存，因为它不会一次加载整个文件。块大小可以根据性能需求进行调整。

从标准输入读取

os.read 可以从标准输入（文件描述符 0）读取。此示例演示了直接从 stdin 读取用户输入。

stdin_read.py

import os
import sys

print("Type something and press Enter (Ctrl+D to end):")

# Read from stdin (fd 0)
try:
    while True:
        data = os.read(0, 1024)  # STDIN_FILENO is 0
        if not data:
            break
        print(f"You entered: {data.decode('utf-8').strip()}")
except KeyboardInterrupt:
    print("\nInterrupted by user")

这将直接从标准输入读取，直到 EOF (Ctrl+D) 或中断。数据从字节解码为字符串以供显示。

请注意，这比 input() 更底层，需要手动处理编码和换行符。它对于二进制 stdin 读取很有用。

非阻塞文件读取

使用非阻塞文件描述符，os.read 可用于异步 I/O。此示例显示了非阻塞读取行为。

nonblocking_read.py

import os
import errno

# Open file in non-blocking mode
fd = os.open("fifo_pipe", os.O_RDONLY | os.O_NONBLOCK)

try:
    while True:
        try:
            data = os.read(fd, 1024)
            if data:
                print(f"Data received: {data.decode()}")
            else:
                print("No data available")
                break
        except BlockingIOError:
            print("No data ready - would block")
            # In real code, you might wait here
            break
finally:
    os.close(fd)

当没有可用数据时，非阻塞模式会引发 BlockingIOError 而不是等待。这对于事件循环和异步 I/O 非常有用。

该示例假定存在一个命名管道 (fifo)。实际代码需要适当的错误处理，并且可能需要事件循环。

读取二进制数据

os.read 非常适合读取二进制文件，因为它返回原始字节。此示例读取一个二进制文件并处理其内容。

binary_read.py

import os

def read_binary_file(filename):
    fd = os.open(filename, os.O_RDONLY)
    try:
        # Read first 4 bytes (could be a magic number)
        header = os.read(fd, 4)
        print(f"File header: {header.hex()}")

        # Read rest of file
        remaining = b''
        while True:
            chunk = os.read(fd, 4096)
            if not chunk:
                break
            remaining += chunk

        return header + remaining
    finally:
        os.close(fd)

data = read_binary_file("image.png")
print(f"Total bytes read: {len(data)}")
print(f"First 16 bytes: {data[:16].hex()}")

这将读取一个二进制文件（PNG 图像），首先提取标头，然后提取其余内容。十六进制表示清楚地显示了二进制数据。

二进制模式完全保留文件中出现的所有字节，这与可能执行换行符转换的文本模式不同。

使用 os.read 进行错误处理

使用底层 I/O 时，正确的错误处理至关重要。此示例演示了 os.read 的全面错误处理。

error_handling.py

import os
import errno

def safe_read(fd, n):
    try:
        data = os.read(fd, n)
        if not data:
            print("Reached end of file")
            return None
        return data
    except OSError as e:
        if e.errno == errno.EBADF:
            print("Error: Bad file descriptor")
        elif e.errno == errno.EINTR:
            print("Error: Interrupted system call")
        elif e.errno == errno.EIO:
            print("Error: I/O error occurred")
        else:
            print(f"Unexpected error: {e}")
        return None

# Example usage
try:
    fd = os.open("example.txt", os.O_RDONLY)
    data = safe_read(fd, 1024)
    if data:
        print(data.decode('utf-8'))
finally:
    if 'fd' in locals():
        os.close(fd)

safe_read 函数处理 os.read 可能遇到的各种错误情况。每种错误类型都得到特定的处理。

finally 块确保即使发生错误也关闭文件描述符。这可以防止错误场景中的资源泄漏。

性能注意事项

缓冲区大小：较大的块减少了系统调用，但增加了内存
直接 I/O：O_DIRECT 标志绕过内核缓存以用于特殊情况
内存映射：对于大型文件，mmap 可能会更有效
系统调用：每个 os.read 都是具有开销的系统调用
Python 开销：高级文件对象添加缓冲

最佳实践

始终关闭描述符：使用 try/finally 或上下文管理器
处理部分读取：检查返回值长度
使用适当的大小：在调用和内存之间取得平衡
考虑替代方案：对于文本，内置的 open() 更简单
注意 EINTR：系统调用可能会被中断

资料来源

作者

我的名字是 Jan Bodnar，我是一位充满热情的程序员，拥有丰富的编程经验。我从 2007 年开始撰写编程文章。到目前为止，我已经撰写了超过 1,400 篇文章和 8 本电子书。我拥有超过十年的编程教学经验。

列出所有 Python 教程。