异步 HTTP 请求

最后修改于 2023 年 1 月 10 日

异步 HTTP 请求教程演示了如何在 Go、C#、F#、Groovy、Python、Perl、Java、JavaScript 和 PHP 中创建异步 HTTP 请求。

异步请求不会阻塞客户端，并允许我们更有效地生成 HTTP 请求。

我们不是一个接一个地生成请求，在当前请求完成后再执行下一个请求，而是快速执行所有请求，然后等待它们全部完成。

Go 异步请求

Go 使用 goroutine 来发起异步请求。Goroutine 是由 Go 运行时管理的一种轻量级线程。

main.go

package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
    "regexp"
    "sync"
)

func main() {

    urls := []string{
        "http://webcode.mse",
        "https://example.com",
        "http://httpbin.org",
        "https://perl.net.cn",
        "https://php.ac.cn",
        "https://pythonlang.cn",
        "https://vscode.js.cn",
        "https://clojure.org",
    }

    var wg sync.WaitGroup

    for _, u := range urls {

        wg.Add(1)
        go func(url string) {

            defer wg.Done()

            content := doReq(url)
            title := getTitle(content)
            fmt.Println(title)
        }(u)
    }

    wg.Wait()
}

func doReq(url string) (content string) {

    resp, err := http.Get(url)

    if err != nil {

        log.Println(err)
        return
    }

    defer resp.Body.Close()

    body, err := ioutil.ReadAll(resp.Body)

    if err != nil {

        log.Println(err)
        return
    }

    return string(body)
}

func getTitle(content string) (title string) {

    re := regexp.MustCompile("<title>(.*)</title>")

    parts := re.FindStringSubmatch(content)

    if len(parts) > 0 {
        return parts[1]
    } else {
        return "no title"
    }
}

我们发起多个异步 HTTP 请求。我们获取每个网页的 title 标签的内容。

var wg sync.WaitGroup

WaitGroups 用于管理 goroutine。它等待一组 goroutine 完成。

go func(url string) {

  defer wg.Done()

  content := doReq(url)
  title := getTitle(content)
  fmt.Println(title)
}(u)

使用 go 关键字创建 goroutine。

$ go run async_req.go 
The Perl Programming Language - www.perl.org
Welcome to Python.org
Visual Studio Code - Code Editing. Redefined
PHP: Hypertext Preprocessor
Example Domain
httpbin.org
Clojure
My html page

C# 异步请求

在 C# 中，我们使用 HttpClient 来生成异步请求。

Program.cs

using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Threading.Tasks;
using System.Text.RegularExpressions;

var urls = new string[] { "http://webcode.me", "http://example.com",
    "http://httpbin.org", "https://ifconfig.me", "http://termbin.com",
    "https://github.com"
};

var rx = new Regex(@"<title>\s*(.+?)\s*</title>",
  RegexOptions.Compiled);

using var client = new HttpClient();

var tasks = new List<Task<string>>();

foreach (var url in urls)
{
    tasks.Add(client.GetStringAsync(url));
}

Task.WaitAll(tasks.ToArray());

var data = new List<string>();

foreach (var task in tasks)
{
    data.Add(await task);
}

foreach (var content in data)
{
    var matches = rx.Matches(content);

    foreach (var match in matches)
    {
        Console.WriteLine(match);
    }
}

我们异步下载给定的网页并打印它们的 HTML 标题标签。

tasks.Add(client.GetStringAsync(url));

GetStringAsync 向指定 URL 发送 GET 请求，并在异步操作中将响应正文作为字符串返回。它返回一个新的任务；在 C# 中，任务代表一个异步操作。

Task.WaitAll(tasks.ToArray());

Task.WaitAll 等待所有提供的任务完成执行。

data.Add(await task);

await 关键字解开结果值。

$ dotnet run
<title>My html page</title>
<title>Example Domain</title>
<title>httpbin.org</title>
<title>termbin.com - terminal pastebin</title>
<title>GitHub: Where the world builds software · GitHub</title>

F# 异步请求

以下示例使用 HttpClient 和任务表达式异步获取网站标题。

async_req.fsx

open System.Net.Http
open System.Text.RegularExpressions
open System.Threading.Tasks

let fetchTitleAsync (url: string) =

    task {

        use client = new HttpClient()
        let! html = client.GetStringAsync(url)
        let pattern = "<title>\s*(.+?)\s*</title>"

        let m = Regex.Match(html, pattern)
        return m.Value
    }

let sites =
    [| "http://webcode.me"
       "http://example.com"
       "https://bing.com"
       "http://httpbin.org"
       "https://ifconfig.me"
       "http://termbin.com"
       "https://github.com" |]

let titles =
    sites
    |> Array.map fetchTitleAsync
    |> Task.WhenAll
    |> Async.AwaitTask
    |> Async.RunSynchronously

titles
|> Array.iter (fun title -> printfn $"%s{title}")

该示例异步检索给定 URL 的标题。

另一种解决方案是使用 WebRequest 来生成请求。其 GetResponseStream 在异步操作中返回对请求的响应。

async_req2.fsx

open System.Net
open System
open System.Text.RegularExpressions

let fetchTitleAsync url =

    async {
        let req = WebRequest.Create(Uri(url))
        use! resp = req.AsyncGetResponse()
        use stream = resp.GetResponseStream()

        use reader = new IO.StreamReader(stream)
        let html = reader.ReadToEnd()

        let pattern = "<title>\s*(.+?)\s*</title>"

        let m = Regex.Match(html, pattern)
        return m.Value
    }

let sites =
    [ "http://webcode.me"
      "http://example.com"
      "https://bing.com"
      "http://httpbin.org"
      "https://ifconfig.me"
      "http://termbin.com"
      "https://github.com" ]

let titles = sites
            |> List.map fetchTitleAsync
            |> Async.Parallel
            |> Async.RunSynchronously

titles |> Array.iter (fun title -> printfn $"%s{title}")

该示例异步检索给定 URL 的标题。

Groovy 异步请求

在 Groovy 中，我们使用 ExecutorService 和 HttpClient。

mul_async_req.gvy

import java.util.concurrent.Executors
import java.util.concurrent.TimeUnit

import java.net.http.HttpClient
import java.net.http.HttpRequest
import java.net.http.HttpResponse

int nThreads = 30

def executor = Executors.newFixedThreadPool(nThreads)

def urls = [
    "https://crunchify.com",
    "https://yahoo.com",
    "https://www.ebay.com",
    "https://google.com",
    "https://www.example.co",
    "https://paypal.com",
    "http://bing.com/",
    "https://techcrunch.com/",
    "http://mashable.com/",
    "https://pro.crunchify.com/",
    "https://wordpress.com/",
    "https://wordpress.ac.cn/",
    "https://example.com/",
    "https://sjsu.edu/",
    "https://ask.crunchify.com/",
    "https://test.com.au/",
    "https://www.wikipedia.org/",
    "https://en.wikipedia.org"
]

for (String url in urls ) {

    executor.execute(() -> {

        worker(url)

        // try {
        //     worker(url)
        // } catch (Exception e) {
        //     e.printStackTrace()
        // }
    })
}

executor.shutdown()

executor.awaitTermination(30, TimeUnit.SECONDS)
println("finished")

def worker(url) {

    def client = HttpClient.newHttpClient()
    def request = HttpRequest.newBuilder()
        .uri(URI.create(url))
        .build()

    HttpResponse<Void> res = client.send(request,
            HttpResponse.BodyHandlers.discarding())

    println "${url}: ${res.statusCode()}"
}

该示例向 URL 发起多个异步请求并打印它们的响应状态码。

$ groovy mul_async_req.gvy
http://mashable.com/: 301
http://bing.com/: 301
https://paypal.com: 302
https://en.wikipedia.org: 301
https://paypal.com: 302
https://en.wikipedia.org: 301
https://en.wikipedia.org: 301
https://google.com: 301
https://example.com/: 200
https://example.com/: 200
https://yahoo.com: 301
https://test.com.au/: 301
https://wordpress.com/: 200
https://techcrunch.com/: 200
https://www.ebay.com: 200
https://ask.crunchify.com/: 200
https://pro.crunchify.com/: 200
https://sjsu.edu/: 200
finished

Python 异步请求

在 Python 中，我们使用 httpx 和 asyncio 模块。

async_req.py

#!/usr/bin/python

import httpx
import asyncio

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ['http://webcode.me', 'https://httpbin.org/get',
    'https://google.com', 'https://stackoverflow.com',
    'https://github.com']

async def launch():
    resps = await asyncio.gather(*map(get_async, urls))
    data = [resp.status_code for resp in resps]

    for status_code in data:
        print(status_code)

asyncio.run(launch())

该示例在 Python 中发起异步请求。它打印所有提供 URL 的响应状态码。

./async_req.py
200
200
200
200
200

Perl 异步请求

在 Perl 中，我们使用 LWP 模块生成请求，并使用 Parallel::ForkManager 模块使它们异步。

$ cpanm Parallel::ForkManager LWP

我们使用 cpanm 安装模块。

urls.txt

http://webcode.me
https://example.com
http://httpbin.org
https://google.com
https://perl.net.cn
https://fsharp.org
https://clojure.org
https://rust-lang.net.cn
https://golang.ac.cn
https://pythonlang.cn
https://vscode.js.cn
https://ifconfig.me
http://termbin.com
https://github.com
https://stackoverflow.com
https://php.ac.cn/

urls.txt 包含一个网站列表。

async_req.pl

#!/usr/bin/perl

use warnings;
use 5.30.0;
use Path::Tiny;
use LWP::UserAgent;
use Parallel::ForkManager;

my @urls = split "\n", path('urls.txt')->slurp_utf8;

my $pm = Parallel::ForkManager->new(4);
my $ua = LWP::UserAgent->new;
$ua->agent('Perl script');

say "downloading ", scalar @urls, " files";

my $dir = 'files/';
mkdir $dir if not -d $dir;

foreach my $link (@urls) {

    my $name = $1 if $link =~ m%https?://(.+)\.\w+%;
    my $file_name = "$dir/$name" . '.txt';

    $pm->start and next;

    my $resp = $ua->get($link);

    if ($resp->is_success) {

        path($file_name)->spew_utf8($resp->decoded_content);

    } else { warn $resp->status_line }

    $pm->finish;
}

$pm->wait_all_children;

该示例读取 urls.txt 文件并获取链接。它向给定 URL 发起异步请求。网页内容被写入文件。

$ ./async_req.pl
downloading 15 files
$ ls -1 files/
clojure.txt
code.visualstudio.txt
example.txt
fsharp.txt
github.txt
golang.txt
google.txt
httpbin.txt
ifconfig.txt
stackoverflow.txt
termbin.txt
webcode.txt
www.perl.txt
www.python.txt
www.rust-lang.txt

JS 异步请求

对于 JavaScript，我们选择了 axios 模块。

$ npm i axios

我们安装 axios 模块。

async_req.js

const axios = require('axios');

async function makeRequests(urls) {

    const fetchUrl = (url) => axios.get(url);
    const promises = urls.map(fetchUrl);

    let responses = await Promise.all(promises);

    responses.forEach(resp => {
        let msg = `${resp.config.url} -> ${resp.headers.server}: ${resp.status}`;
        console.log(msg);
    });
}

let urls = [
    'http://webcode.me',
    'https://example.com',
    'http://httpbin.org',
    'https://clojure.org',
    'https://fsharp.org',
    'https://symfony.com.cn',
    'https://perl.net.cn',
    'https://php.ac.cn',
    'https://pythonlang.cn',
    'https://vscode.js.cn',
    'https://github.com'
];

makeRequests(urls);

该示例向给定的 URL 列表发起异步请求。它打印网站的 URL、服务器名称和响应状态码。

const fetchUrl = (url) => axios.get(url);

axios.get 发起异步请求并返回一个 Promise。

let responses = await Promise.all(promises);

我们使用 Promise.All 收集所有 Promise。该方法在所有给定的 Promise 都已fulfilled 或 rejected 后解析。

$ node async_req.js
http://webcode.me -> nginx/1.6.2: 200
https://example.com -> ECS (dcb/7F83): 200
http://httpbin.org -> gunicorn/19.9.0: 200
https://clojure.org -> AmazonS3: 200
https://fsharp.org -> GitHub.com: 200
https://symfony.com.cn -> cloudflare: 200
https://perl.net.cn -> Combust/Plack (Perl): 200
https://php.ac.cn -> myracloud: 200
https://pythonlang.cn -> nginx: 200
https://vscode.js.cn -> Microsoft-IIS/10.0: 200
https://github.com -> GitHub.com: 200

Java 异步请求

CompletableFuture 是 Java 中异步编程的高级 API。

com/zetcode/AsyncReqEx.java

package com.zetcode;

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.stream.Stream;

import static java.util.stream.Collectors.toList;

public class AsyncReqEx {

    public static void main(String[] args) {

        List<URI> uris = Stream.of(
                "https://www.google.com/",
                "https://clojure.org",
                "https://rust-lang.net.cn",
                "https://golang.ac.cn",
                "https://pythonlang.cn",
                "https://vscode.js.cn",
                "https://ifconfig.me",
                "http://termbin.com",
                "https://www.github.com/"
        ).map(URI::create).collect(toList());

        HttpClient httpClient = HttpClient.newBuilder()
                .connectTimeout(Duration.ofSeconds(10))
                .followRedirects(HttpClient.Redirect.ALWAYS)
                .build();

        var futures = uris.stream()
                .map(uri -> verifyUri(httpClient, uri))
                .toArray(CompletableFuture[]::new);

        CompletableFuture.allOf(futures).join();
    }

    private static CompletableFuture<Void> verifyUri(HttpClient httpClient,
                                                     URI uri) {
        HttpRequest request = HttpRequest.newBuilder()
                .timeout(Duration.ofSeconds(5))
                .uri(uri)
                .build();

        return httpClient.sendAsync(request, HttpResponse.BodyHandlers.discarding())
                .thenApply(HttpResponse::statusCode)
                .thenApply(statusCode -> statusCode == 200)
                .exceptionally(ex -> false)
                .thenAccept(valid -> {
                    if (valid) {
                        System.out.printf("[SUCCESS] Verified %s%n", uri);
                    } else {
                        System.out.printf("[FAILURE] Failed to verify%s%n", uri);
                    }
                });
    }
}

在示例中，我们有一个 URL 列表。我们检查给定网页的状态。该示例使用 HttpClient 发起 Web 请求，并使用 CompletableFuture 进行异步执行。

[SUCCESS] Verified http://termbin.com
[SUCCESS] Verified https://clojure.org
[SUCCESS] Verified https://www.google.com/
[SUCCESS] Verified https://ifconfig.me
[SUCCESS] Verified https://pythonlang.cn
[SUCCESS] Verified https://vscode.js.cn
[SUCCESS] Verified https://golang.ac.cn
[SUCCESS] Verified https://rust-lang.net.cn
[SUCCESS] Verified https://www.github.com/

PHP 异步请求

在 PHP 中，我们使用 cURL 库。

async_req.php

<?php

$urls = [
    "http://webcode.me",
    "https://example.com",
    "http://httpbin.org",
    "https://perl.net.cn",
    "https://php.ac.cn",
    "https://pythonlang.cn",
    "https://vscode.js.cn",
    "https://ifconfig.me"
];

$options = [CURLOPT_HEADER => true, CURLOPT_NOBODY => true,
    CURLOPT_RETURNTRANSFER => true];

$mh = curl_multi_init();
$chs = [];


foreach ($urls as $url) {

    $ch = curl_init($url);
    curl_setopt_array($ch, $options);
    curl_multi_add_handle($mh, $ch);
    $chs[] = $ch;
}

$running = false;

do {
    curl_multi_exec($mh, $running);
} while ($running);

foreach ($chs as $h) {

    curl_multi_remove_handle($mh, $h);
}

curl_multi_close($mh);

foreach ($chs as $h) {

    $status = curl_getinfo($h, CURLINFO_RESPONSE_CODE);
    echo $status . "\n";
}

foreach ($chs as $h) {

    echo "----------------------\n";
    echo curl_multi_getcontent($h);
}

我们打印请求网页的响应状态码和标头。

$ch = curl_init($url);

curl_multi_init 函数创建一个新的 multi handle，它允许异步处理多个 cURL handle。

$ php async_req.php
200
200
200
200
200
200
200
200
----------------------
HTTP/1.1 200 OK
Server: nginx/1.6.2
Date: Thu, 22 Jul 2021 13:14:22 GMT
Content-Type: text/html
Content-Length: 348
Last-Modified: Sat, 20 Jul 2019 11:49:25 GMT
Connection: keep-alive
ETag: "5d32ffc5-15c"
Accept-Ranges: bytes

----------------------
HTTP/2 200
content-encoding: gzip
accept-ranges: bytes
...

在本教程中，我们已经用 Go、C#、F#、Python、Perl、Java、JavaScript 和 PHP 生成了异步 Web 请求。

电子书