异步 HTTP 请求
最后修改于 2023 年 1 月 10 日
异步 HTTP 请求教程演示了如何在 Go、C#、F#、Groovy、Python、Perl、Java、JavaScript 和 PHP 中创建异步 HTTP 请求。
异步请求不会阻塞客户端,并允许我们更有效地生成 HTTP 请求。
我们不是一个接一个地生成请求,在当前请求完成后再执行下一个请求,而是快速执行所有请求,然后等待它们全部完成。
Go 异步请求
Go 使用 goroutine 来发起异步请求。Goroutine 是由 Go 运行时管理的一种轻量级线程。
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"regexp"
"sync"
)
func main() {
urls := []string{
"http://webcode.mse",
"https://example.com",
"http://httpbin.org",
"https://perl.net.cn",
"https://php.ac.cn",
"https://pythonlang.cn",
"https://vscode.js.cn",
"https://clojure.org",
}
var wg sync.WaitGroup
for _, u := range urls {
wg.Add(1)
go func(url string) {
defer wg.Done()
content := doReq(url)
title := getTitle(content)
fmt.Println(title)
}(u)
}
wg.Wait()
}
func doReq(url string) (content string) {
resp, err := http.Get(url)
if err != nil {
log.Println(err)
return
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Println(err)
return
}
return string(body)
}
func getTitle(content string) (title string) {
re := regexp.MustCompile("<title>(.*)</title>")
parts := re.FindStringSubmatch(content)
if len(parts) > 0 {
return parts[1]
} else {
return "no title"
}
}
我们发起多个异步 HTTP 请求。我们获取每个网页的 title 标签的内容。
var wg sync.WaitGroup
WaitGroups 用于管理 goroutine。它等待一组 goroutine 完成。
go func(url string) {
defer wg.Done()
content := doReq(url)
title := getTitle(content)
fmt.Println(title)
}(u)
使用 go 关键字创建 goroutine。
$ go run async_req.go The Perl Programming Language - www.perl.org Welcome to Python.org Visual Studio Code - Code Editing. Redefined PHP: Hypertext Preprocessor Example Domain httpbin.org Clojure My html page
C# 异步请求
在 C# 中,我们使用 HttpClient 来生成异步请求。
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
var urls = new string[] { "http://webcode.me", "http://example.com",
"http://httpbin.org", "https://ifconfig.me", "http://termbin.com",
"https://github.com"
};
var rx = new Regex(@"<title>\s*(.+?)\s*</title>",
RegexOptions.Compiled);
using var client = new HttpClient();
var tasks = new List<Task<string>>();
foreach (var url in urls)
{
tasks.Add(client.GetStringAsync(url));
}
Task.WaitAll(tasks.ToArray());
var data = new List<string>();
foreach (var task in tasks)
{
data.Add(await task);
}
foreach (var content in data)
{
var matches = rx.Matches(content);
foreach (var match in matches)
{
Console.WriteLine(match);
}
}
我们异步下载给定的网页并打印它们的 HTML 标题标签。
tasks.Add(client.GetStringAsync(url));
GetStringAsync 向指定 URL 发送 GET 请求,并在异步操作中将响应正文作为字符串返回。它返回一个新的任务;在 C# 中,任务代表一个异步操作。
Task.WaitAll(tasks.ToArray());
Task.WaitAll 等待所有提供的任务完成执行。
data.Add(await task);
await 关键字解开结果值。
$ dotnet run <title>My html page</title> <title>Example Domain</title> <title>httpbin.org</title> <title>termbin.com - terminal pastebin</title> <title>GitHub: Where the world builds software · GitHub</title>
F# 异步请求
以下示例使用 HttpClient 和任务表达式异步获取网站标题。
open System.Net.Http
open System.Text.RegularExpressions
open System.Threading.Tasks
let fetchTitleAsync (url: string) =
task {
use client = new HttpClient()
let! html = client.GetStringAsync(url)
let pattern = "<title>\s*(.+?)\s*</title>"
let m = Regex.Match(html, pattern)
return m.Value
}
let sites =
[| "http://webcode.me"
"http://example.com"
"https://bing.com"
"http://httpbin.org"
"https://ifconfig.me"
"http://termbin.com"
"https://github.com" |]
let titles =
sites
|> Array.map fetchTitleAsync
|> Task.WhenAll
|> Async.AwaitTask
|> Async.RunSynchronously
titles
|> Array.iter (fun title -> printfn $"%s{title}")
该示例异步检索给定 URL 的标题。
另一种解决方案是使用 WebRequest 来生成请求。其 GetResponseStream 在异步操作中返回对请求的响应。
open System.Net
open System
open System.Text.RegularExpressions
let fetchTitleAsync url =
async {
let req = WebRequest.Create(Uri(url))
use! resp = req.AsyncGetResponse()
use stream = resp.GetResponseStream()
use reader = new IO.StreamReader(stream)
let html = reader.ReadToEnd()
let pattern = "<title>\s*(.+?)\s*</title>"
let m = Regex.Match(html, pattern)
return m.Value
}
let sites =
[ "http://webcode.me"
"http://example.com"
"https://bing.com"
"http://httpbin.org"
"https://ifconfig.me"
"http://termbin.com"
"https://github.com" ]
let titles = sites
|> List.map fetchTitleAsync
|> Async.Parallel
|> Async.RunSynchronously
titles |> Array.iter (fun title -> printfn $"%s{title}")
该示例异步检索给定 URL 的标题。
Groovy 异步请求
在 Groovy 中,我们使用 ExecutorService 和 HttpClient。
import java.util.concurrent.Executors
import java.util.concurrent.TimeUnit
import java.net.http.HttpClient
import java.net.http.HttpRequest
import java.net.http.HttpResponse
int nThreads = 30
def executor = Executors.newFixedThreadPool(nThreads)
def urls = [
"https://crunchify.com",
"https://yahoo.com",
"https://www.ebay.com",
"https://google.com",
"https://www.example.co",
"https://paypal.com",
"http://bing.com/",
"https://techcrunch.com/",
"http://mashable.com/",
"https://pro.crunchify.com/",
"https://wordpress.com/",
"https://wordpress.ac.cn/",
"https://example.com/",
"https://sjsu.edu/",
"https://ask.crunchify.com/",
"https://test.com.au/",
"https://www.wikipedia.org/",
"https://en.wikipedia.org"
]
for (String url in urls ) {
executor.execute(() -> {
worker(url)
// try {
// worker(url)
// } catch (Exception e) {
// e.printStackTrace()
// }
})
}
executor.shutdown()
executor.awaitTermination(30, TimeUnit.SECONDS)
println("finished")
def worker(url) {
def client = HttpClient.newHttpClient()
def request = HttpRequest.newBuilder()
.uri(URI.create(url))
.build()
HttpResponse<Void> res = client.send(request,
HttpResponse.BodyHandlers.discarding())
println "${url}: ${res.statusCode()}"
}
该示例向 URL 发起多个异步请求并打印它们的响应状态码。
$ groovy mul_async_req.gvy http://mashable.com/: 301 http://bing.com/: 301 https://paypal.com: 302 https://en.wikipedia.org: 301 https://paypal.com: 302 https://en.wikipedia.org: 301 https://en.wikipedia.org: 301 https://google.com: 301 https://example.com/: 200 https://example.com/: 200 https://yahoo.com: 301 https://test.com.au/: 301 https://wordpress.com/: 200 https://techcrunch.com/: 200 https://www.ebay.com: 200 https://ask.crunchify.com/: 200 https://pro.crunchify.com/: 200 https://sjsu.edu/: 200 finished
Python 异步请求
在 Python 中,我们使用 httpx 和 asyncio 模块。
#!/usr/bin/python
import httpx
import asyncio
async def get_async(url):
async with httpx.AsyncClient() as client:
return await client.get(url)
urls = ['http://webcode.me', 'https://httpbin.org/get',
'https://google.com', 'https://stackoverflow.com',
'https://github.com']
async def launch():
resps = await asyncio.gather(*map(get_async, urls))
data = [resp.status_code for resp in resps]
for status_code in data:
print(status_code)
asyncio.run(launch())
该示例在 Python 中发起异步请求。它打印所有提供 URL 的响应状态码。
./async_req.py 200 200 200 200 200
Perl 异步请求
在 Perl 中,我们使用 LWP 模块生成请求,并使用 Parallel::ForkManager 模块使它们异步。
$ cpanm Parallel::ForkManager LWP
我们使用 cpanm 安装模块。
http://webcode.me https://example.com http://httpbin.org https://google.com https://perl.net.cn https://fsharp.org https://clojure.org https://rust-lang.net.cn https://golang.ac.cn https://pythonlang.cn https://vscode.js.cn https://ifconfig.me http://termbin.com https://github.com https://stackoverflow.com https://php.ac.cn/
urls.txt 包含一个网站列表。
#!/usr/bin/perl
use warnings;
use 5.30.0;
use Path::Tiny;
use LWP::UserAgent;
use Parallel::ForkManager;
my @urls = split "\n", path('urls.txt')->slurp_utf8;
my $pm = Parallel::ForkManager->new(4);
my $ua = LWP::UserAgent->new;
$ua->agent('Perl script');
say "downloading ", scalar @urls, " files";
my $dir = 'files/';
mkdir $dir if not -d $dir;
foreach my $link (@urls) {
my $name = $1 if $link =~ m%https?://(.+)\.\w+%;
my $file_name = "$dir/$name" . '.txt';
$pm->start and next;
my $resp = $ua->get($link);
if ($resp->is_success) {
path($file_name)->spew_utf8($resp->decoded_content);
} else { warn $resp->status_line }
$pm->finish;
}
$pm->wait_all_children;
该示例读取 urls.txt 文件并获取链接。它向给定 URL 发起异步请求。网页内容被写入文件。
$ ./async_req.pl downloading 15 files $ ls -1 files/ clojure.txt code.visualstudio.txt example.txt fsharp.txt github.txt golang.txt google.txt httpbin.txt ifconfig.txt stackoverflow.txt termbin.txt webcode.txt www.perl.txt www.python.txt www.rust-lang.txt
JS 异步请求
对于 JavaScript,我们选择了 axios 模块。
$ npm i axios
我们安装 axios 模块。
const axios = require('axios');
async function makeRequests(urls) {
const fetchUrl = (url) => axios.get(url);
const promises = urls.map(fetchUrl);
let responses = await Promise.all(promises);
responses.forEach(resp => {
let msg = `${resp.config.url} -> ${resp.headers.server}: ${resp.status}`;
console.log(msg);
});
}
let urls = [
'http://webcode.me',
'https://example.com',
'http://httpbin.org',
'https://clojure.org',
'https://fsharp.org',
'https://symfony.com.cn',
'https://perl.net.cn',
'https://php.ac.cn',
'https://pythonlang.cn',
'https://vscode.js.cn',
'https://github.com'
];
makeRequests(urls);
该示例向给定的 URL 列表发起异步请求。它打印网站的 URL、服务器名称和响应状态码。
const fetchUrl = (url) => axios.get(url);
axios.get 发起异步请求并返回一个 Promise。
let responses = await Promise.all(promises);
我们使用 Promise.All 收集所有 Promise。该方法在所有给定的 Promise 都已fulfilled 或 rejected 后解析。
$ node async_req.js http://webcode.me -> nginx/1.6.2: 200 https://example.com -> ECS (dcb/7F83): 200 http://httpbin.org -> gunicorn/19.9.0: 200 https://clojure.org -> AmazonS3: 200 https://fsharp.org -> GitHub.com: 200 https://symfony.com.cn -> cloudflare: 200 https://perl.net.cn -> Combust/Plack (Perl): 200 https://php.ac.cn -> myracloud: 200 https://pythonlang.cn -> nginx: 200 https://vscode.js.cn -> Microsoft-IIS/10.0: 200 https://github.com -> GitHub.com: 200
Java 异步请求
CompletableFuture 是 Java 中异步编程的高级 API。
package com.zetcode;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.stream.Stream;
import static java.util.stream.Collectors.toList;
public class AsyncReqEx {
public static void main(String[] args) {
List<URI> uris = Stream.of(
"https://www.google.com/",
"https://clojure.org",
"https://rust-lang.net.cn",
"https://golang.ac.cn",
"https://pythonlang.cn",
"https://vscode.js.cn",
"https://ifconfig.me",
"http://termbin.com",
"https://www.github.com/"
).map(URI::create).collect(toList());
HttpClient httpClient = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(10))
.followRedirects(HttpClient.Redirect.ALWAYS)
.build();
var futures = uris.stream()
.map(uri -> verifyUri(httpClient, uri))
.toArray(CompletableFuture[]::new);
CompletableFuture.allOf(futures).join();
}
private static CompletableFuture<Void> verifyUri(HttpClient httpClient,
URI uri) {
HttpRequest request = HttpRequest.newBuilder()
.timeout(Duration.ofSeconds(5))
.uri(uri)
.build();
return httpClient.sendAsync(request, HttpResponse.BodyHandlers.discarding())
.thenApply(HttpResponse::statusCode)
.thenApply(statusCode -> statusCode == 200)
.exceptionally(ex -> false)
.thenAccept(valid -> {
if (valid) {
System.out.printf("[SUCCESS] Verified %s%n", uri);
} else {
System.out.printf("[FAILURE] Failed to verify%s%n", uri);
}
});
}
}
在示例中,我们有一个 URL 列表。我们检查给定网页的状态。该示例使用 HttpClient 发起 Web 请求,并使用 CompletableFuture 进行异步执行。
[SUCCESS] Verified http://termbin.com [SUCCESS] Verified https://clojure.org [SUCCESS] Verified https://www.google.com/ [SUCCESS] Verified https://ifconfig.me [SUCCESS] Verified https://pythonlang.cn [SUCCESS] Verified https://vscode.js.cn [SUCCESS] Verified https://golang.ac.cn [SUCCESS] Verified https://rust-lang.net.cn [SUCCESS] Verified https://www.github.com/
PHP 异步请求
在 PHP 中,我们使用 cURL 库。
<?php
$urls = [
"http://webcode.me",
"https://example.com",
"http://httpbin.org",
"https://perl.net.cn",
"https://php.ac.cn",
"https://pythonlang.cn",
"https://vscode.js.cn",
"https://ifconfig.me"
];
$options = [CURLOPT_HEADER => true, CURLOPT_NOBODY => true,
CURLOPT_RETURNTRANSFER => true];
$mh = curl_multi_init();
$chs = [];
foreach ($urls as $url) {
$ch = curl_init($url);
curl_setopt_array($ch, $options);
curl_multi_add_handle($mh, $ch);
$chs[] = $ch;
}
$running = false;
do {
curl_multi_exec($mh, $running);
} while ($running);
foreach ($chs as $h) {
curl_multi_remove_handle($mh, $h);
}
curl_multi_close($mh);
foreach ($chs as $h) {
$status = curl_getinfo($h, CURLINFO_RESPONSE_CODE);
echo $status . "\n";
}
foreach ($chs as $h) {
echo "----------------------\n";
echo curl_multi_getcontent($h);
}
我们打印请求网页的响应状态码和标头。
$ch = curl_init($url);
curl_multi_init 函数创建一个新的 multi handle,它允许异步处理多个 cURL handle。
$ php async_req.php 200 200 200 200 200 200 200 200 ---------------------- HTTP/1.1 200 OK Server: nginx/1.6.2 Date: Thu, 22 Jul 2021 13:14:22 GMT Content-Type: text/html Content-Length: 348 Last-Modified: Sat, 20 Jul 2019 11:49:25 GMT Connection: keep-alive ETag: "5d32ffc5-15c" Accept-Ranges: bytes ---------------------- HTTP/2 200 content-encoding: gzip accept-ranges: bytes ...
在本教程中,我们已经用 Go、C#、F#、Python、Perl、Java、JavaScript 和 PHP 生成了异步 Web 请求。