Python 并发 HTTP 请求

最后修改于 2024 年 1 月 29 日

在本文中，我们将展示如何在 Python 中生成并发 HTTP 请求。

超文本传输协议 (HTTP) 是分布式、协作式、超媒体信息系统的应用程序协议。在 HTTP 协议中，客户端和服务器通过交换消息进行通信。客户端（通常是 Web 浏览器）发送的消息称为请求，服务器作为响应发送的消息称为响应。

注意：“并发”一词意味着我们在特定的时间范围内处理多个任务，例如在 5 分钟内。这并不一定意味着我们在同一时刻执行任务。

并发请求

请求可以按顺序处理，也可以并发处理。顺序请求是逐个处理的。如果我们处理许多请求，这可能会效率低下。在并发请求中，程序不会等待一个请求完成来处理另一个请求；它们是并发处理的。

生成并发 HTTP 请求有两种基本方法：通过多线程或通过异步编程。在多线程方法中，每个请求都由一个特定的线程处理。在异步编程中，（通常）只有一个线程和一个事件循环，它会定期检查任务的完成情况。

在 Python 中，我们可以使用 ThreadPoolExecutor 来生成并发请求。要进行异步编程，我们可以使用 asyncio 模块。使用该模块，我们还需要使用支持异步编程的模块，例如 aiohttp 或 httpx。

Python 同步 HTTP 请求

在第一个示例中，我们同步创建了多个 HTTP 请求。为了测量经过的时间，我们使用了 perf_counter 函数。

mul_sync.py

#!/usr/bin/python

import requests as req
import time

urls = ['http://webcode.me', 'https://httpbin.org/get',
    'https://google.com', 'https://stackoverflow.com',
    'https://github.com', 'https://clojure.org',
    'https://fsharp.org']

tm1 = time.perf_counter()

for url in urls:

    resp = req.get(url)
    print(resp.status_code)


tm2 = time.perf_counter()
print(f'Total time elapsed: {tm2-tm1:0.2f} seconds')

在示例中，我们向七个网站生成 HTTP 请求并检索它们的响应代码。我们使用了 requests 库。请求是同步执行的，一个接一个。

$ ./mul_sync.py
200
200
200
200
200
200
200
Total time elapsed: 2.96 seconds

Python 异步 HTTP 请求

以下示例生成异步 HTTP 请求。

mul_async.py

#!/usr/bin/python

import httpx
import asyncio
import time

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ['http://webcode.me', 'https://httpbin.org/get',
    'https://google.com', 'https://stackoverflow.com',
    'https://github.com', 'https://clojure.org',
    'https://fsharp.org']

async def launch():

    resps = await asyncio.gather(*map(get_async, urls))
    data = [resp.status_code for resp in resps]

    for status_code in data:
        print(status_code)

tm1 = time.perf_counter()

asyncio.run(launch())

tm2 = time.perf_counter()
print(f'Total time elapsed: {tm2-tm1:0.2f} seconds')

该示例使用 httpx 模块创建异步客户端，并使用 asyncio 模块创建事件循环和调度异步任务。

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

在 Python 异步编程中，我们使用协程。协程是用 async 关键字装饰的。await 关键字用于等待协程，并在函数完成后获取其结果。

resps = await asyncio.gather(*map(get_async, urls))

多个协程与 asyncio.gather 函数并发处理。

$ ./mul_async.py
200
200
301
200
200
200
200
Total time elapsed: 0.93 seconds

Python 多线程 HTTP 请求

在下一个示例中，我们使用 ThreadPoolExecutor 生成并发 HTTP 请求。ThreadPoolExecutor 使用线程池来并发执行调用。

threaded.py

#!/usr/bin/python

import requests
import concurrent.futures
import time

def get_status(url):

    resp = requests.get(url=url)
    return resp.status_code

urls = ['http://webcode.me', 'https://httpbin.org/get',
    'https://google.com', 'https://stackoverflow.com',
    'https://github.com', 'https://clojure.org',
    'https://fsharp.org']

tm1 = time.perf_counter()

with concurrent.futures.ThreadPoolExecutor() as executor:

    futures = []

    for url in urls:
        futures.append(executor.submit(get_status, url=url))

    for future in concurrent.futures.as_completed(futures):
        print(future.result())

tm2 = time.perf_counter()
print(f'Total time elapsed: {tm2-tm1:0.2f} seconds')

ThreadPoolExecutor 位于 concurrent.futures 模块中。

with concurrent.futures.ThreadPoolExecutor() as executor:

创建了一个 ThreadPoolExecutor。

futures.append(executor.submit(get_status, url=url))

submit 方法调度函数并返回一个表示函数执行的 Future 对象。

for future in concurrent.futures.as_completed(futures):
    print(future.result())

as_completed 方法返回一个 Future 迭代器。它在 Future 完成时将它们yield出来。

$ ./threaded.py 
200
200
200
200
200
200
200
Total time elapsed: 0.86 seconds

来源

Python 并发 - 文档

在本文中，我们展示了如何在 Python 中生成并发 HTTP 请求。

作者

我叫 Jan Bodnar，是一名热情的程序员，拥有丰富的编程经验。自 2007 年以来，我一直在撰写编程文章。迄今为止，我已撰写了 1400 多篇文章和 8 本电子书。我在教学编程方面拥有超过十年的经验。

列出所有 Python 教程。