Python 编码/解码

最后修改于 2024 年 1 月 29 日

在本文中，我们将展示如何在 Python 中编码和解码数据。

str.encode(encoding='utf-8', errors='strict')

str.encode 函数将字符串值编码为 bytes 类型。编码默认为 'utf-8'。

bytes.decode(encoding='utf-8', errors='strict')

bytes.decode 函数将 bytes 类型解码为字符串类型。

bytes 类型是不可变的字节序列。该序列由 0 到 255 范围内的整数组成。此数据类型用于存储数据和数据传输。

当我们打开网络套接字、处理串行 I/O 或打开二进制文件时，我们会使用 bytes 类型。

Python 有多种标准编码，包括 utf_8、utf_16、ascii、latin-1、iso8859_2 或 cp1252。一个编码可能具有多个别名；例如，utf_8 具有 utf8 和 utf-8 别名。

Python 编码示例

在第一个例子中，我们编码一个包含表情符号的消息。

main.py

#!/usr/bin/python

text = "one 🐘 and three 🐋"
print(text)
print(len(text))

e = text.encode('utf8')
print(e)
print(len(e))

e = text.encode('utf16')
print(e)
print(len(e))

该程序定义了一个消息，并使用 utf8 和 utf16 编码将其编码为 bytes 类型。

text = "one 🐘 and three 🐋"

我们定义了一个包含两个表情符号的 Unicode 字符串。

print(text)
print(len(text))

我们打印文本和字符数。

e = text.encode('utf8')
print(e)
print(len(e))

我们使用 utf8 编码将字符串编码为 bytes 类型，并打印字节。我们计算此编码类型中的字节数。

e = text.encode('utf16')
print(e)
print(len(e))

我们对 utf16 编码执行相同的操作。

$ ./main.py 
one 🐘 and three 🐋
17
b'one \xf0\x9f\x90\x98 and three \xf0\x9f\x90\x8b'
23
b'\xff\xfeo\x00n\x00e\x00 \x00=\xd8\x18\xdc ... \x00=\xd8\x0b\xdc'
40

Python 解码示例

在以下示例中，我们以二进制模式读取文件。稍后我们将数据解码为 utf8 字符串。

data.txt

one 🐘 and three 🐋

我们有这个 data.txt 文件。

main.py

#!/usr/bin/python

fname = 'data.txt'

with open(fname, mode='rb') as f:
    contents = f.read()

    print(type(contents))
    print(contents)
    print(contents.decode('utf8'))

我们以 rb 模式打开文件并读取其内容。

contents = f.read()

由于它是一个小文件，我们使用 read 将整个文件读入一个变量。

print(type(contents))

我们打印数据类型。

print(contents)
print(contents.decode('utf8'))

我们打印内容，然后将解码后的内容打印到终端。

$ ./main.py 
<class 'bytes'>
b'one \xf0\x9f\x90\x98 and three \xf0\x9f\x90\x8b'
one 🐘 and three 🐋

Python 传输字节

网络上的数据以 bytes 类型传输。

main.py

#!/usr/bin/python

import requests

url = 'http://webcode.me/small.txt'

resp = requests.get(url)
print(resp.content)
print(resp.content.decode('utf8'))
print(resp.text)

我们生成一个 GET 请求，以获取一个小型文本资源。

url = 'http://webcode.me/small.txt'

我们定义 URL。

resp = requests.get(url)

我们向给定的 URL 生成一个 GET 请求。

print(req.content)

打印请求内容，我们得到一个 bytes 字符串。

print(resp.content.decode('utf8'))

我们使用 decode 将 bytes 字符串转换为 Unicode 字符串。

print(resp.text)

requests 库还包含 text 成员函数，该函数执行解码操作。

$ ./main.py 
b'small text page\n'
small text page

small text page

来源

Python Unicode HOWTO - 文档

在本文中，我们使用了 Python 中的 encode 和 decode 函数。

作者

我叫 Jan Bodnar，是一位充满热情的程序员，拥有丰富的编程经验。自 2007 年以来，我一直在撰写编程文章。迄今为止，我撰写了超过 1,400 篇文章和 8 本电子书。我拥有超过十年的编程教学经验。

列出所有 Python 教程。