Python os.pread 函数

上次修改时间：2025 年 4 月 11 日

本综合指南探讨了 Python 的 os.pread 函数，它从文件描述符的特定偏移量处读取数据。我们将涵盖文件描述符、偏移量处理以及实际的底层文件 I/O 示例。

基本定义

os.pread 函数从指定偏移量的文件描述符读取数据，而不更改文件位置。它是线程安全的，并且对于随机访问文件操作很有用。

关键参数：fd（文件描述符），n（要读取的字节数），offset（从中读取的位置）。将读取的字节作为字节对象返回。

基本文件读取

此示例演示了 os.pread 的最简单用法，即从文件中的特定偏移量读取数据。我们首先打开一个文件以获取其描述符。

basic_read.py

import os

# Create a test file
with open("data.txt", "w") as f:
    f.write("Hello World! This is a test file.")

# Open file and get descriptor
fd = os.open("data.txt", os.O_RDONLY)

# Read 5 bytes from offset 6
data = os.pread(fd, 5, 6)
print(f"Read data: {data.decode()}")  # Output: World

os.close(fd)

该代码创建一个测试文件，打开它以获取文件描述符，然后使用 os.pread 从偏移量 6 读取“World”。文件位置不受此操作的影响。

请注意，我们使用 os.open 进行底层文件描述符访问，并且必须使用 os.close 手动关闭它。

以块读取大型文件

os.pread 非常适合以特定偏移量按块读取大型文件。此示例演示了如何以固定大小的块处理文件。

chunked_read.py

import os

# Create a large test file (1MB)
with open("large.bin", "wb") as f:
    f.write(os.urandom(1024 * 1024))

fd = os.open("large.bin", os.O_RDONLY)
chunk_size = 4096  # 4KB chunks
offset = 0

while True:
    data = os.pread(fd, chunk_size, offset)
    if not data:
        break  # End of file
    
    print(f"Read {len(data)} bytes from offset {offset}")
    # Process chunk here
    offset += len(data)

os.close(fd)

这将创建一个 1MB 的随机二进制文件，然后使用 os.pread 以 4KB 的块读取它。每次读取后，偏移量都会手动前进。

该循环持续到 os.pread 返回一个空的字节对象，表示文件结束。每个读取操作都是独立的且线程安全的。

线程安全的并行读取

os.pread 是线程安全的，因为它不会修改文件位置。此示例演示了从多个线程并行读取。

threaded_read.py

import os
import threading

def read_chunk(fd, offset, size):
    data = os.pread(fd, size, offset)
    print(f"Thread {threading.get_ident()}: Read {len(data)} bytes from {offset}")

# Create test file
with open("parallel.txt", "w") as f:
    f.write("ABCDEFGHIJKLMNOPQRSTUVWXYZ")

fd = os.open("parallel.txt", os.O_RDONLY)

# Create threads to read different sections
threads = []
for i in range(0, 26, 5):
    t = threading.Thread(target=read_chunk, args=(fd, i, 5))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

os.close(fd)

这将创建多个线程，这些线程同时从文件中读取不同的 5 字节块。每个线程都指定自己的偏移量，而不会干扰其他线程。

线程安全性来自 os.pread 不使用或修改传统读取操作将使用的共享文件位置。

从特定文件位置读取

此示例演示了如何使用 os.pread 从结构化二进制文件中读取计算出的位置，类似于数据库记录访问。

structured_read.py

import os
import struct

# Create structured binary file
records = [
    struct.pack("i10s", 1, b"Alice"),
    struct.pack("i10s", 2, b"Bob"),
    struct.pack("i10s", 3, b"Charlie")
]
with open("records.bin", "wb") as f:
    f.write(b"".join(records))

fd = os.open("records.bin", os.O_RDONLY)
record_size = struct.calcsize("i10s")

# Read second record
data = os.pread(fd, record_size, record_size)
id, name = struct.unpack("i10s", data)
print(f"Record 2: ID={id}, Name={name.decode().strip()}")

os.close(fd)

我们创建一个具有固定大小记录的二进制文件，然后使用 os.pread 通过计算其偏移量来直接访问第二个记录。

记录大小使用 struct.calcsize 计算，从而可以在文件中精确定位，而无需维护文件位置指针。

处理部分读取

os.pread 可能会返回少于请求的字节数。此示例演示了如何正确处理部分读取和文件结束条件。

partial_read.py

import os

# Create small test file
with open("small.txt", "w") as f:
    f.write("Short")

fd = os.open("small.txt", os.O_RDONLY)

# Attempt to read more bytes than available
data = os.pread(fd, 100, 0)
print(f"Read {len(data)} bytes: {data.decode()}")  # Output: 5 bytes

# Read beyond EOF
data = os.pread(fd, 10, 10)
print(f"Read {len(data)} bytes from offset 10")  # Output: 0 bytes

os.close(fd)

第一次读取尝试获取 100 个字节，但仅接收 5 个字节（文件大小）。第二次读取从 EOF 之外开始，并返回一个空的字节对象。

应用程序必须始终检查返回的数据长度，而不是假设读取了请求的字节数。

与常规文件读取比较

此示例将 os.pread 与传统文件读取方法进行对比，显示 os.pread 如何不影响文件位置。

compare_read.py

import os

# Create test file
with open("compare.txt", "w") as f:
    f.write("ABCDEFGHIJKLMNOPQRSTUVWXYZ")

fd = os.open("compare.txt", os.O_RDONLY)

# Traditional read (affects position)
os.lseek(fd, 10, os.SEEK_SET)
data1 = os.read(fd, 5)
print(f"Traditional read: {data1.decode()}")  # KLMNO

# pread doesn't affect position
data2 = os.pread(fd, 5, 15)
print(f"pread result: {data2.decode()}")  # PQRST
print(f"Current position: {os.lseek(fd, 0, os.SEEK_CUR)}")  # Still 15

os.close(fd)

在寻址到位置 10 之后，传统的读取会将位置推进到 15。在偏移量 15 处进行的后续 os.pread 不会影响该位置。

这证明了 os.pread 的关键优势：在特定偏移量处读取数据，而不会干扰当前文件位置。

错误处理

此示例显示了 os.pread 的正确错误处理，包括无效的文件描述符、错误的偏移量和中断的系统调用。

error_handling.py

import os
import errno

try:
    # Attempt read from invalid descriptor
    data = os.pread(9999, 10, 0)
except OSError as e:
    print(f"Error reading: {e.errno} ({errno.errorcode[e.errno]})")

# Valid file but bad offset
try:
    fd = os.open("example.txt", os.O_RDONLY | os.O_CREAT, 0o644)
    data = os.pread(fd, 10, -5)  # Invalid offset
except OSError as e:
    print(f"Invalid offset error: {e}")
finally:
    if 'fd' in locals():
        os.close(fd)

第一次尝试因 EBADF（错误的文件描述符）而失败。第二次尝试因 EINVAL（由于负偏移量）而失败。始终妥善处理此类错误。

请注意，使用 errno.errorcode 将数字错误代码转换为其符号名称，以便更好地进行错误报告。

性能注意事项

线程安全： pread 是原子的，不影响文件位置
内核调用： 每个 pread 都是一个单独的系统调用
缓冲区大小： 通常，较大的读取效率更高
位置跟踪： 无需管理文件位置
缓存： 操作系统可能会缓存频繁访问的区域

最佳实践

检查返回大小： 始终验证读取的字节数
处理错误： 捕获 OSError 以获得健壮的代码
使用适当的大小： 在许多小读取和少量大读取之间取得平衡
关闭描述符： 始终关闭文件描述符
考虑替代方案： 对于简单情况，常规文件对象可能就足够了

资料来源

作者

我叫 Jan Bodnar，是一位充满激情的程序员，拥有丰富的编程经验。自 2007 年以来，我一直撰写编程文章。迄今为止，我已经撰写了 1,400 多篇文章和 8 本电子书。我拥有超过十年的编程教学经验。

列出所有 Python 教程。