Python sqlite3.Connection.text_factory 属性

上次修改时间：2025 年 4 月 15 日

本篇综合指南探讨了 Python 的 sqlite3.Connection.text_factory 属性，它控制着 SQLite TEXT 值如何转换为 Python 对象。我们将涵盖基本用法、自定义和实际示例。

基本定义

sqlite3.Connection 对象的 text_factory 属性决定了数据库中的 TEXT 值如何转换为 Python 对象。默认情况下，它返回 Unicode 字符串 (str)。

主要特征：它是一个可调用对象，接受 bytes 对象，可以在运行时更改，并影响所有后续查询。它对于处理 SQLite 数据库中的文本编码至关重要。

默认文本工厂行为

此示例演示了默认行为，其中 TEXT 值作为 Python Unicode 字符串返回。

default_text_factory.py

import sqlite3

with sqlite3.connect(':memory:') as conn:
    conn.execute("CREATE TABLE test (id INTEGER, text_data TEXT)")
    conn.execute("INSERT INTO test VALUES (1, 'Hello World')")
    
    # Default text_factory returns str (Unicode)
    cursor = conn.cursor()
    cursor.execute("SELECT text_data FROM test WHERE id = 1")
    result = cursor.fetchone()[0]
    print(type(result), result)  # <class 'str'> Hello World

默认的 text_factory 将 SQLite TEXT 值转换为 Python str 对象。这是最常见且推荐用于大多数应用程序的设置。

该示例使用内存数据库为了简单起见，演示了文本数据的自动 Unicode 转换。

返回字节而不是 Unicode

设置 text_factory = bytes 使 TEXT 值返回为原始 bytes 对象，而不是 Unicode 字符串。

bytes_text_factory.py

import sqlite3

with sqlite3.connect(':memory:') as conn:
    conn.text_factory = bytes
    conn.execute("CREATE TABLE test (id INTEGER, text_data TEXT)")
    conn.execute("INSERT INTO test VALUES (1, 'Hello World')")
    
    cursor = conn.cursor()
    cursor.execute("SELECT text_data FROM test WHERE id = 1")
    result = cursor.fetchone()[0]
    print(type(result), result)  # <class 'bytes'> b'Hello World'

此示例演示了如何从 TEXT 列中获取原始字节。当您需要 SQLite 中存储的确切二进制表示时，bytes 工厂非常有用。

请注意，bytes 对象包含字符串的 UTF-8 编码版本，这是 SQLite 用于 TEXT 的内部存储格式。

自定义文本工厂函数

您可以定义一个自定义函数，以在 TEXT 值成为 Python 对象之前以特定方式对其进行处理。

custom_text_factory.py

import sqlite3

def upper_case_factory(b):
    return b.decode('utf-8').upper()

with sqlite3.connect(':memory:') as conn:
    conn.text_factory = upper_case_factory
    conn.execute("CREATE TABLE test (id INTEGER, text_data TEXT)")
    conn.execute("INSERT INTO test VALUES (1, 'Hello World')")
    
    cursor = conn.cursor()
    cursor.execute("SELECT text_data FROM test WHERE id = 1")
    result = cursor.fetchone()[0]
    print(result)  # HELLO WORLD

此自定义工厂将所有 TEXT 值转换为大写。该函数接收来自 SQLite 的原始字节，并返回处理后的 Python 对象。

自定义工厂对于数据转换非常强大，但应谨慎使用，因为它们会影响所有 TEXT 列检索。

处理不同的编码

text_factory 可用于处理存储在 SQLite 数据库中的非 UTF-8 文本编码。

encoding_text_factory.py

import sqlite3

def latin1_decoder(b):
    return b.decode('latin-1')

with sqlite3.connect(':memory:') as conn:
    conn.text_factory = latin1_decoder
    # Store Latin-1 encoded text directly as bytes
    conn.execute("CREATE TABLE test (id INTEGER, text_data TEXT)")
    conn.execute("INSERT INTO test VALUES (1, ?)", 
                ('Héllö Wørld'.encode('latin-1'),))
    
    cursor = conn.cursor()
    cursor.execute("SELECT text_data FROM test WHERE id = 1")
    result = cursor.fetchone()[0]
    print(result)  # Héllö Wørld

此示例演示了如何处理 Latin-1 编码的文本。自定义工厂使用正确的编码来解码字节。

当使用旧数据库或特定编码要求时，自定义文本工厂提供必要的灵活性。

将 Lambda 用作文本工厂

对于简单的转换，lambda 函数可以作为简洁的 text_factory。

lambda_text_factory.py

import sqlite3

with sqlite3.connect(':memory:') as conn:
    conn.text_factory = lambda b: b.decode('utf-8').strip().title()
    conn.execute("CREATE TABLE test (id INTEGER, text_data TEXT)")
    conn.execute("INSERT INTO test VALUES (1, '  hello world  ')")
    
    cursor = conn.cursor()
    cursor.execute("SELECT text_data FROM test WHERE id = 1")
    result = cursor.fetchone()[0]
    print(result)  # Hello World

此 lambda 函数修剪空格并将所有 TEXT 值转换为首字母大写。 Lambda 对于简单的一行转换非常方便。

对于更复杂的处理，通常使用命名函数（如前面的示例）更易于维护。

禁用文本转换

设置 text_factory = None 会禁用所有转换，返回与 SQLite 中存储的完全一样的原始字节。

none_text_factory.py

import sqlite3

with sqlite3.connect(':memory:') as conn:
    conn.text_factory = None
    conn.execute("CREATE TABLE test (id INTEGER, text_data TEXT)")
    conn.execute("INSERT INTO test VALUES (1, 'Hello World')")
    
    cursor = conn.cursor()
    cursor.execute("SELECT text_data FROM test WHERE id = 1")
    result = cursor.fetchone()[0]
    print(type(result), result)  # <class 'bytes'> b'Hello World'

使用 text_factory = None，TEXT 值将作为原始字节返回，而无需进行任何解码尝试。这类似于使用 bytes，但没有 str 会执行的自动 UTF-8 解码。

当您需要手动处理编码或者使用存储在 TEXT 列中的二进制数据时，此方法非常有用。

最佳实践

使用默认 str 进行 Unicode： 最适合大多数应用程序
小心处理编码： 明确文本编码
考虑性能： 自定义工厂会增加开销
记录自定义工厂： 使其他开发人员清楚地了解行为
彻底测试： 尤其是在使用非 UTF-8 数据时

资料来源

作者

我叫 Jan Bodnar，是一位充满热情的程序员，拥有丰富的编程经验。自 2007 年以来，我一直在撰写编程文章。迄今为止，我已经撰写了超过 1,400 篇文章和 8 本电子书。我拥有超过十年的编程教学经验。

列出所有 Python 教程。