Perl fc 函数

最后修改于 2025 年 4 月 4 日

Perl 的 fc 函数对字符串执行 Unicode 大小写折叠。它返回输入字符串的大小写折叠版本。

大小写折叠类似于大小写转换，但更全面。它用于 Unicode 感知应用程序中的不区分大小写的字符串比较。

基本的 fc 用法

使用 fc 最简单的方法是处理单个字符串。

basic.pl

#!/usr/bin/perl

use strict;
use warnings;
use v5.16.0;  # fc introduced in Perl 5.16

my $text = "Hello World";
my $folded = fc($text);

print "Original: $text\n";
print "Folded: $folded\n";

我们演示了 fc 将字符串转换为其大小写折叠形式。该函数返回一个新字符串，而不修改原始字符串。

$ ./basic.pl
Original: Hello World
Folded: hello world

不区分大小写的比较

fc 支持可靠的不区分大小写的字符串比较。

compare.pl

#!/usr/bin/perl

use strict;
use warnings;
use v5.16.0;

my $str1 = "Straße";
my $str2 = "STRASSE";

if (fc($str1) eq fc($str2)) {
    print "Strings are equal when case-folded\n";
} else {
    print "Strings are not equal\n";
}

此脚本比较不同大小写和形式的德语单词。ß/SS 的等价性通过大小写折叠得到正确处理。

$ ./compare.pl
Strings are equal when case-folded

fc 与 lc/uc 对比

fc 提供了比 lc 更全面的大小写处理。

comparison.pl

#!/usr/bin/perl

use strict;
use warnings;
use v5.16.0;

my $text = "İstanbul";

print "Original: $text\n";
print "lc: ", lc($text), "\n";
print "fc: ", fc($text), "\n";

对于带点的土耳其字母 I，fc 可以正确处理特殊情况。简单的转小写转换可能无法正确处理所有 Unicode 情况。

$ ./comparison.pl
Original: İstanbul
lc: i̇stanbul
fc: i̇stanbul

将 fc 用于数组

fc 可以使用 map 来处理数组元素。

array.pl

#!/usr/bin/perl

use strict;
use warnings;
use v5.16.0;

my @words = ("Apple", "BANANA", "Cherry");
my @folded = map { fc } @words;

print "Original: @words\n";
print "Folded: @folded\n";

我们使用 map 将 fc 应用于每个数组元素。这会创建一个新数组，其中包含所有字符串的大小写折叠版本。

$ ./array.pl
Original: Apple BANANA Cherry
Folded: apple banana cherry

使用正则表达式进行大小写折叠

fc 可以与正则表达式结合使用以进行高级匹配。

regex.pl

#!/usr/bin/perl

use strict;
use warnings;
use v5.16.0;

my $text = "The Greek letter Σ (sigma) has lowercase form σ or ς";

if (fc($text) =~ /ς/) {
    print "Found final sigma form\n";
}

此脚本演示了如何使用大小写折叠查找不同的 sigma 形式。无论原始大小写如何，都可以匹配最后的 sigma (ς)。

$ ./regex.pl
Found final sigma form

使用哈希进行大小写折叠

fc 有助于创建不区分大小写的哈希键。

hash.pl

#!/usr/bin/perl

use strict;
use warnings;
use v5.16.0;

my %color = (fc("Red") => "#FF0000", fc("Green") => "#00FF00");

my $input = "RED";
print "$input is $color{fc($input)}\n";

我们使用 fc 来规范化哈希键和查找。这确保了对哈希值的区分大小写的访问。

$ ./hash.pl
RED is #FF0000

性能注意事项

对于大型文本，大小写折叠操作可能会非常耗时。

benchmark.pl

#!/usr/bin/perl

use strict;
use warnings;
use v5.16.0;
use Benchmark qw(cmpthese);

my $text = "ÄÖÜäöüß" x 1000;

cmpthese(-1, {
    fc => sub { my $x = fc($text) },
    lc => sub { my $x = lc($text) },
});

此基准测试比较了 fc 和 lc 的性能。大小写折叠通常速度较慢，但对于 Unicode 文本更准确。

最佳实践

用于比较：对于 Unicode，优先使用 fc 而不是 lc/uc。
尽早规范化：在存储或首次处理时进行大小写折叠。
与 NFC 结合：如果需要，请考虑 Unicode 规范化。
记录用法：注意大小写折叠的应用位置。

来源

Perl fc 文档

本教程涵盖了 Perl 的 fc 函数，并通过实际示例演示了其在 Unicode 字符串处理中的用法。

作者

我的名字是 Jan Bodnar，我是一名热情的程序员，拥有丰富的编程经验。我自 2007 年以来一直撰写编程文章。至今，我已撰写了 1400 多篇文章和 8 本电子书。我在编程教学方面拥有十多年的经验。

列出所有 Perl 教程。