Python sqlite3.Connection.create_collation 方法

上次修改时间：2025 年 4 月 15 日

本综合指南探讨了 Python 的 create_collation 方法，该方法允许为 SQLite 数据库定义自定义排序规则序列。

基本定义

排序规则是一组用于比较数据库操作中的文本字符串的规则。 SQLite 使用排序规则进行排序和比较，例如 ORDER BY、GROUP BY 等。

create_collation 方法将 Python 函数注册为自定义排序规则序列。此函数必须接受两个字符串并返回 -1、0 或 1。

基本排序规则示例

此示例演示如何创建简单的不区分大小写的排序规则序列。

basic_collation.py

import sqlite3

def case_insensitive_collation(a, b):
    a = a.lower()
    b = b.lower()
    if a < b:
        return -1
    elif a > b:
        return 1
    else:
        return 0

with sqlite3.connect(':memory:') as conn:
    conn.create_collation('NOCASE', case_insensitive_collation)
    cursor = conn.cursor()
    
    cursor.execute('CREATE TABLE words (word TEXT)')
    cursor.executemany('INSERT INTO words VALUES (?)', 
                      [('Apple',), ('banana',), ('cherry',)])
    
    cursor.execute('SELECT word FROM words ORDER BY word COLLATE NOCASE')
    print([row[0] for row in cursor.fetchall()])

该示例创建一个名为“NOCASE”的不区分大小写的排序规则。 Python 函数在比较之前将字符串转换为小写。

使用 COLLATE NOCASE 排序时，“Apple”、“banana”、“cherry”按不区分大小写的方式排序。输出将是 ['Apple', 'banana', 'cherry']。

反向排序规则

此示例演示如何创建按相反顺序对字符串进行排序的排序规则。

reverse_collation.py

import sqlite3

def reverse_collation(a, b):
    if a < b:
        return 1
    elif a > b:
        return -1
    else:
        return 0

with sqlite3.connect(':memory:') as conn:
    conn.create_collation('REVERSE', reverse_collation)
    cursor = conn.cursor()
    
    cursor.execute('CREATE TABLE items (name TEXT)')
    cursor.executemany('INSERT INTO items VALUES (?)', 
                      [('A',), ('B',), ('C',), ('D',)])
    
    cursor.execute('SELECT name FROM items ORDER BY name COLLATE REVERSE')
    print([row[0] for row in cursor.fetchall()])

reverse_collation 函数只是反转了正常的比较逻辑。通常会较早排序的字符串现在排序较晚，反之亦然。

输出将是 ['D', 'C', 'B', 'A']，展示了反向排序顺序。

数字排序规则

此示例演示如何创建对存储为文本的数字进行排序的排序规则。

numeric_collation.py

import sqlite3

def numeric_collation(a, b):
    try:
        a_num = float(a)
        b_num = float(b)
        if a_num < b_num:
            return -1
        elif a_num > b_num:
            return 1
        else:
            return 0
    except ValueError:
        # Fall back to regular string comparison if not numbers
        if a < b:
            return -1
        elif a > b:
            return 1
        else:
            return 0

with sqlite3.connect(':memory:') as conn:
    conn.create_collation('NUMERIC', numeric_collation)
    cursor = conn.cursor()
    
    cursor.execute('CREATE TABLE numbers (value TEXT)')
    cursor.executemany('INSERT INTO numbers VALUES (?)', 
                      [('10',), ('2',), ('1',), ('20',)])
    
    cursor.execute('SELECT value FROM numbers ORDER BY value COLLATE NUMERIC')
    print([row[0] for row in cursor.fetchall()])

numeric_collation 函数在比较之前将字符串转换为数字。这确保了“10”在数字上位于“2”之后，而不是按字母顺序排列。

输出将是 ['1', '2', '10', '20']，显示了正确的数字顺序。

区域设置感知的排序规则

此示例演示了尊重特定于区域设置的排序规则的排序规则。

locale_collation.py

import sqlite3
import locale

def locale_collation(a, b):
    return locale.strcoll(a, b)

# Set the locale to the user's default
locale.setlocale(locale.LC_ALL, '')

with sqlite3.connect(':memory:') as conn:
    conn.create_collation('LOCALE', locale_collation)
    cursor = conn.cursor()
    
    cursor.execute('CREATE TABLE words (word TEXT)')
    words = [('été',), ('eté',), ('étage',), ('étalage',)]
    cursor.executemany('INSERT INTO words VALUES (?)', words)
    
    cursor.execute('SELECT word FROM words ORDER BY word COLLATE LOCALE')
    print([row[0] for row in cursor.fetchall()])

locale_collation 函数使用 Python 的 locale 模块来执行区域设置感知的字符串比较。这对于许多语言中正确排序带重音符号的字符非常重要。

输出将根据系统区域设置而异，但会显示法语单词带重音符号的正确的语言特定排序。

自然排序规则

此示例实现了自然排序，其中字符串中的数字按数值而非按字典顺序进行比较。

natural_sort.py

import sqlite3
import re

def natural_sort_key(s):
    return [int(text) if text.isdigit() else text.lower()
            for text in re.split('([0-9]+)', s)]

def natural_collation(a, b):
    a_key = natural_sort_key(a)
    b_key = natural_sort_key(b)
    if a_key < b_key:
        return -1
    elif a_key > b_key:
        return 1
    else:
        return 0

with sqlite3.connect(':memory:') as conn:
    conn.create_collation('NATURAL', natural_collation)
    cursor = conn.cursor()
    
    cursor.execute('CREATE TABLE files (name TEXT)')
    files = [('file1.txt',), ('file10.txt',), ('file2.txt',), ('file20.txt',)]
    cursor.executemany('INSERT INTO files VALUES (?)', files)
    
    cursor.execute('SELECT name FROM files ORDER BY name COLLATE NATURAL')
    print([row[0] for row in cursor.fetchall()])

natural_collation 函数将字符串拆分为文本和数字部分，并将数字转换为整数以进行正确的数值比较。

输出将是 ['file1.txt', 'file2.txt', 'file10.txt', 'file20.txt']，显示了正确的自然排序顺序。

不区分变音符号的排序规则

此示例创建一个在比较时忽略变音符号的排序规则。

diacritic_insensitive.py

import sqlite3
import unicodedata

def remove_diacritics(s):
    return ''.join(c for c in unicodedata.normalize('NFD', s)
                  if not unicodedata.combining(c))

def diacritic_insensitive_collation(a, b):
    a_simple = remove_diacritics(a)
    b_simple = remove_diacritics(b)
    if a_simple < b_simple:
        return -1
    elif a_simple > b_simple:
        return 1
    else:
        return 0

with sqlite3.connect(':memory:') as conn:
    conn.create_collation('DIACRITIC_INSENSITIVE', diacritic_insensitive_collation)
    cursor = conn.cursor()
    
    cursor.execute('CREATE TABLE words (word TEXT)')
    words = [('café',), ('cafe',), ('résumé',), ('resume',)]
    cursor.executemany('INSERT INTO words VALUES (?)', words)
    
    cursor.execute('SELECT word FROM words ORDER BY word COLLATE DIACRITIC_INSENSITIVE')
    print([row[0] for row in cursor.fetchall()])

排序规则在使用 Unicode 规范化进行比较之前删除变音符号。这使得“café”和“cafe”比较时相等。

输出将根据其基本字符将带和不带变音符号的单词组合在一起。

自定义加权排序规则

此示例演示了将自定义权重应用于某些字符的排序规则。

weighted_collation.py

import sqlite3

def weighted_collation(a, b):
    # Custom weights for certain characters
    weights = {'@': 0, '#': 1, '$': 2}
    
    def get_weight(c):
        return weights.get(c, ord(c))
    
    for a_char, b_char in zip(a, b):
        a_weight = get_weight(a_char)
        b_weight = get_weight(b_char)
        if a_weight < b_weight:
            return -1
        elif a_weight > b_weight:
            return 1
    
    # If all compared characters were equal, compare lengths
    if len(a) < len(b):
        return -1
    elif len(a) > len(b):
        return 1
    else:
        return 0

with sqlite3.connect(':memory:') as conn:
    conn.create_collation('WEIGHTED', weighted_collation)
    cursor = conn.cursor()
    
    cursor.execute('CREATE TABLE symbols (value TEXT)')
    symbols = [('apple',), ('@pple',), ('#pple',), ('$pple',)]
    cursor.executemany('INSERT INTO symbols VALUES (?)', symbols)
    
    cursor.execute('SELECT value FROM symbols ORDER BY value COLLATE WEIGHTED')
    print([row[0] for row in cursor.fetchall()])

weighted_collation 函数将自定义排序权重应用于特定符号（@、#、$），同时保持其他字符的正常排序。

输出将是 ['@pple', '#pple', '$pple', 'apple']，显示了自定义符号排序，然后是常规的字母排序。

最佳实践

保持排序规则函数简单：复杂的逻辑会降低查询速度
处理边缘情况：考虑 None 值和不同的类型
使用 Unicode 规范化：为了保持文本比较的一致性
彻底测试：验证各种输入的行为
记录自定义排序规则：解释其目的和行为

资料来源

作者

我的名字是 Jan Bodnar，我是一位充满激情的程序员，拥有丰富的编程经验。自 2007 年以来，我一直在撰写编程文章。迄今为止，我已经撰写了超过 1,400 篇文章和 8 本电子书。我拥有超过十年的编程教学经验。

列出所有 Python 教程。