ZetCode

C# DataFrame

上次修改时间:2023 年 8 月 23 日

在本文中,我们将使用 C# 中的 DataFrameDataFrame 位于 Microsoft.Data.Analysis 命名空间中。

DataFrame 是一种将数据组织成二维表格(类似于电子表格)的数据结构,包含行和列。

$ dotnet add package Microsoft.Data.Analysis

我们将库添加到项目中。

products.csv
id,product_name,category,unit_price,units_in_stock
1,Chai,Beverages,18.0,39
2,Chang,Beverages,19.0,17
3,Aniseed Syrup,Condiments,10.0,13
4,Chef Anton's Cajun Seasoning,Condiments,22.0,53
5,Chef Anton's Gumbo Mix,Condiments,21.35,0
6,Grandma's Boysenberry Spread,Condiments,25.0,120
7,Uncle Bob's Organic Dried Pears,Produce,30.0,15
8,Northwoods Cranberry Sauce,Condiments,40.0,6
9,Mishi Kobe Niku,Meat/Poultry,97.0,29
10,Ikura,Seafood,31.0,31
11,Queso Cabrales,Dairy Products,21.0,22
12,Queso Manchego La Pastora,Dairy Products,38.0,86
13,Konbu,Seafood,6.0,24
14,Tofu,Produce,23.25,35
15,Genen Shouyu,Condiments,15.5,39
16,Pavlova,Confections,17.45,29
17,Alice Mutton,Meat/Poultry,39.0,0
18,Carnarvon Tigers,Seafood,62.5,42
19,Teatime Chocolate Biscuits,Confections,9.2,25
20,Sir Rodney's Marmalade,Confections,81.0,40
21,Sir Rodney's Scones,Confections,10.0,3
22,Gustaf's Knäckebröd,Grains/Cereals,21.0,104
23,Tunnbröd,Grains/Cereals,9.0,61
24,Guaraná Fantástica,Beverages,4.5,20
25,NuNuCa Nuß-Nougat-Creme,Confections,14.0,76
26,Gumbär Gummibärchen,Confections,31.23,15
27,Schoggi Schokolade,Confections,43.9,49
28,Rössle Sauerkraut,Produce,45.6,26
29,Thüringer Rostbratwurst,Meat/Poultry,123.79,0
30,Nord-Ost Matjeshering,Seafood,25.89,10
31,Gorgonzola Telino,Dairy Products,12.5,0
32,Mascarpone Fabioli,Dairy Products,32.0,9
33,Geitost,Dairy Products,2.5,112
34,Sasquatch Ale,Beverages,14.0,111
35,Steeleye Stout,Beverages,18.0,20
36,Inlagd Sill,Seafood,19.0,112
37,Gravad lax,Seafood,26.0,11
38,Côte de Blaye,Beverages,263.5,17
39,Chartreuse verte,Beverages,18.0,69
40,Boston Crab Meat,Seafood,18.4,123
41,Jack'England Clam Chowder,Seafood,9.65,85
42,Singaporean Hokkien Fried Mee,Grains/Cereals,14.0,26
43,Ipoh Coffee,Beverages,46.0,17
44,Gula Malacca,Condiments,19.45,27
45,Rogede sild,Seafood,9.5,5
46,Spegesild,Seafood,12.0,95
47,Zaanse koeken,Confections,9.5,36
48,Chocolade,Confections,12.75,15
49,Maxilaku,Confections,20.0,10
50,Valkoinen suklaa,Confections,16.25,65
51,Manjimup Dried Apples,Produce,53.0,20
52,Filo Mix,Grains/Cereals,7.0,38
53,Perth Pasties,Meat/Poultry,32.8,0
54,Tourtière,Meat/Poultry,7.45,21
55,Pâté chinois,Meat/Poultry,24.0,115
56,Gnocchi di nonna Alice,Grains/Cereals,38.0,21
57,Ravioli Angelo,Grains/Cereals,19.5,36
58,Escargots de Bourgogne,Seafood,13.25,62
59,Raclette Courdavault,Dairy Products,55.0,79
60,Camembert Pierrot,Dairy Products,34.0,19
61,Sirop d'érable,Condiments,28.5,113
62,Tarte au sucre,Confections,49.3,17
63,Vegie-spread,Condiments,43.9,24
64,Wimmers gute Semmelknödel,Grains/Cereals,33.25,22
65,Louisiana Fiery Hot Pepper Sauce,Condiments,21.05,76
66,Louisiana Hot Spiced Okra,Condiments,17.0,4
67,Laughing Lumberjack Lager,Beverages,14.0,52
68,Scottish Longbreads,Confections,12.5,6
69,Gudbrandsdalsost,Dairy Products,36.0,26
70,Outback Lager,Beverages,15.0,15
71,Flotemysost,Dairy Products,21.5,26
72,Mozzarella di Giovanni,Dairy Products,34.8,14
73,Röd Kaviar,Seafood,15.0,101
74,Longlife Tofu,Produce,10.0,4
75,Rhönbräu Klosterbier,Beverages,7.75,125
76,Lakkalikööri,Beverages,18.0,57
77,Original Frankfurter grüne Soße,Condiments,13.0,32

在示例中,我们使用此 CSV 文件。

C# DataFrame 简单示例

以下是一个简单的 DataFrame 示例。

Program.cs
using Microsoft.Data.Analysis;

var fname = Path.GetFullPath("products.csv");
var df = DataFrame.LoadCsv(fname);

Console.WriteLine(df.Info());
Console.WriteLine(df.Description());

我们从 products.csv 文件加载数据,并在 dataframe 上调用 InfoDescription 方法。

var df = DataFrame.LoadCsv(fname);

数据使用 DataFrame.LoadCsv 方法从 CSV 文件加载。


DataFrame.LoadCsv 会自动推断列类型。 我们可以使用 dataTypes 选项显式指定列数据类型。

Program.cs
using Microsoft.Data.Analysis;

var fname = Path.GetFullPath("products.csv");
var df = DataFrame.LoadCsv(fname,
    dataTypes: new Type[] { typeof(int), typeof(string), typeof(string), 
        typeof(decimal), typeof(decimal) });

Console.WriteLine(df.Info());

在示例中,我们为五个列指定了数据类型。

C# DataFrame Head/Tail

Head 方法返回前 n 行,而 Tail 返回后 n 行。

Program.cs
using Microsoft.Data.Analysis;

var fname = Path.GetFullPath("products.csv");
var df = DataFrame.LoadCsv(fname);

Console.WriteLine(df.Head(4));
Console.WriteLine(df.Tail(4));

该示例显示了 dataframe 中的前四行和后四行。

C# DataFrame 筛选

我们可以使用 Filter 方法来筛选数据。

Program.cs
using Microsoft.Data.Analysis;

var fname = Path.GetFullPath("products.csv");
var df = DataFrame.LoadCsv(fname);

PrimitiveDataFrameColumn<bool> fil = df["unit_price"].ElementwiseGreaterThan(100);
Console.WriteLine(df.Filter(fil));

Console.WriteLine(df.Filter(df.Columns[4].ElementwiseLessThan(10)));

在该程序中,我们应用了两个筛选器。

PrimitiveDataFrameColumn<bool> fil = df["unit_price"].ElementwiseGreaterThan(100);
Console.WriteLine(df.Filter(fil));

在第一个筛选器中,我们显示所有价格大于 100 的产品。

Console.WriteLine(df.Filter(df.Columns[4].ElementwiseLessThan(10)));

第二个筛选器用于显示库存供应量少于 10 的产品。

带有 Spectre.Console 的控制台输出

在以下示例中,我们使用 Spectre.Console 库以美观的表格形式输出数据。

$ dotnet add project Spectre.Console

我们将库添加到项目中。

Program.cs
using Microsoft.Data.Analysis;
using Spectre.Console;

var fname = Path.GetFullPath("products.csv");
var df = DataFrame.LoadCsv(fname);

var table = new Table()
    .Border(TableBorder.Ascii)
    .BorderColor(Color.SteelBlue)
    .AddColumn(new TableColumn("Id").RightAligned())
    .AddColumn(new TableColumn("Product name"))
    .AddColumn(new TableColumn("Category").LeftAligned())
    .AddColumn(new TableColumn("Unit price").RightAligned())
    .AddColumn(new TableColumn("Units in stock").RightAligned());

foreach (var e in df.Rows)
{
    string[] row = { $"{e[0]}", $"{e[1]}", $"{e[2]}", $"{e[3]:0.00}", $"{e[4]:0.00}" };
    table.AddRow(row);
}

AnsiConsole.Write(table);

在该程序中,我们以整洁的控制台表格显示所有产品。

var table = new Table()
    .Border(TableBorder.Ascii)
    .BorderColor(Color.SteelBlue)
    .AddColumn(new TableColumn("Id").RightAligned())
    .AddColumn(new TableColumn("Product name"))
    .AddColumn(new TableColumn("Category").LeftAligned())
    .AddColumn(new TableColumn("Unit price").RightAligned())
    .AddColumn(new TableColumn("Units in stock").RightAligned());

我们构建表格。 我们设置边框和列。

foreach (var e in df.Rows)
{
    string[] row = { $"{e[0]}", $"{e[1]}", $"{e[2]}", $"{e[3]:0.00}", $"{e[4]:0.00}" };
    table.AddRow(row);
}

我们遍历 dataframe 的行,并将每一行添加到控制台表格中。

AnsiConsole.Write(table);

我们使用 AnsiConsole.Write 输出表格。

$ dotnet run 
+--------------------------------------------------------------------------------------+
| Id | Product name                     | Category       | Unit price | Units in stock |
|----+----------------------------------+----------------+------------+----------------|
|  1 | Chai                             | Beverages      |      18.00 |          39.00 |
|  2 | Chang                            | Beverages      |      19.00 |          17.00 |
|  3 | Aniseed Syrup                    | Condiments     |      10.00 |          13.00 |
|  4 | Chef Anton's Cajun Seasoning     | Condiments     |      22.00 |          53.00 |
|  5 | Chef Anton's Gumbo Mix           | Condiments     |      21.35 |           0.00 |
|  6 | Grandma's Boysenberry Spread     | Condiments     |      25.00 |         120.00 |
|  7 | Uncle Bob's Organic Dried Pears  | Produce        |      30.00 |          15.00 |
|  8 | Northwoods Cranberry Sauce       | Condiments     |      40.00 |           6.00 |
|  9 | Mishi Kobe Niku                  | Meat/Poultry   |      97.00 |          29.00 |
...
| 71 | Flotemysost                      | Dairy Products |      21.50 |          26.00 |
| 72 | Mozzarella di Giovanni           | Dairy Products |      34.80 |          14.00 |
| 73 | Röd Kaviar                       | Seafood        |      15.00 |         101.00 |
| 74 | Longlife Tofu                    | Produce        |      10.00 |           4.00 |
| 75 | Rhönbräu Klosterbier             | Beverages      |       7.75 |         125.00 |
| 76 | Lakkalikööri                     | Beverages      |      18.00 |          57.00 |
| 77 | Original Frankfurter grüne Soße  | Condiments     |      13.00 |          32.00 |
+--------------------------------------------------------------------------------------+

C# DataFrame 排序

我们使用 OrderBy 对数据进行排序。

using Microsoft.Data.Analysis;
using Spectre.Console;

var file = Path.GetFullPath("products.csv");
var df = DataFrame.LoadCsv(file);

var table = new Table()
    .Border(TableBorder.Ascii)
    .BorderColor(Color.SteelBlue)
    .AddColumn(new TableColumn("Id").RightAligned())
    .AddColumn(new TableColumn("Product name"))
    .AddColumn(new TableColumn("Category").LeftAligned())
    .AddColumn(new TableColumn("Unit price").RightAligned())
    .AddColumn(new TableColumn("Units in stock").RightAligned());

foreach (var e in df.OrderBy("unit_price").Rows)
{
    string[] row = { $"{e[0]}", $"{e[1]}", $"{e[2]}", $"{e[3]:0.00}", $"{e[4]:0.00}" };
    table.AddRow(row);
}

AnsiConsole.Write(table);

在该程序中,我们按单价对数据进行排序。

$ dotnet run
+--------------------------------------------------------------------------------------+
| Id | Product name                     | Category       | Unit price | Units in stock |
|----+----------------------------------+----------------+------------+----------------|
| 33 | Geitost                          | Dairy Products |       2.50 |         112.00 |
| 24 | Guaraná Fantástica               | Beverages      |       4.50 |          20.00 |
| 13 | Konbu                            | Seafood        |       6.00 |          24.00 |
| 52 | Filo Mix                         | Grains/Cereals |       7.00 |          38.00 |
| 54 | Tourtière                        | Meat/Poultry   |       7.45 |          21.00 |
| 75 | Rhönbräu Klosterbier             | Beverages      |       7.75 |         125.00 |
| 23 | Tunnbröd                         | Grains/Cereals |       9.00 |          61.00 |
...
|  9 | Mishi Kobe Niku                  | Meat/Poultry   |      97.00 |          29.00 |
| 29 | Thüringer Rostbratwurst          | Meat/Poultry   |     123.79 |           0.00 |
| 38 | Côte de Blaye                    | Beverages      |     263.50 |          17.00 |
+--------------------------------------------------------------------------------------+

C# DataFrame GroupBy

GroupBy 方法按列名中的唯一值对 DataFrame 的行进行分组。 它返回一个 GroupBy 对象,该对象存储分组信息。

Program.cs
using Microsoft.Data.Analysis;
using Spectre.Console;

var file = Path.GetFullPath("products.csv");
var df = DataFrame.LoadCsv(file);

var table = new Table()
    .Border(TableBorder.Ascii)
    .BorderColor(Color.SteelBlue)
    .AddColumn(new TableColumn("Id").RightAligned())
    .AddColumn(new TableColumn("Product name"))
    .AddColumn(new TableColumn("Category"))
    .AddColumn(new TableColumn("Unit price").RightAligned())
    .AddColumn(new TableColumn("Units in stock").RightAligned());

var g = df.GroupBy("category");

foreach (var e in g.Head(100).Rows)
{
    string[] row = { $"{e[1]}", $"{e[2]}", $"{e[0]}", $"{e[3]:0.00}", $"{e[4]:0.00}" };
    table.AddRow(row);
}

AnsiConsole.Write(table);

在该示例中,我们按类别对产品进行分组,并在表格中显示它们。

$ dotnet run
+--------------------------------------------------------------------------------------+
| Id | Product name                     | Category       | Unit price | Units in stock |
|----+----------------------------------+----------------+------------+----------------|
|  1 | Chai                             | Beverages      |      18.00 |          39.00 |
|  2 | Chang                            | Beverages      |      19.00 |          17.00 |
| 24 | Guaraná Fantástica               | Beverages      |       4.50 |          20.00 |
| 34 | Sasquatch Ale                    | Beverages      |      14.00 |         111.00 |
| 35 | Steeleye Stout                   | Beverages      |      18.00 |          20.00 |
| 38 | Côte de Blaye                    | Beverages      |     263.50 |          17.00 |
| 39 | Chartreuse verte                 | Beverages      |      18.00 |          69.00 |
| 43 | Ipoh Coffee                      | Beverages      |      46.00 |          17.00 |
| 67 | Laughing Lumberjack Lager        | Beverages      |      14.00 |          52.00 |
| 70 | Outback Lager                    | Beverages      |      15.00 |          15.00 |
| 75 | Rhönbräu Klosterbier             | Beverages      |       7.75 |         125.00 |
| 76 | Lakkalikööri                     | Beverages      |      18.00 |          57.00 |
|  3 | Aniseed Syrup                    | Condiments     |      10.00 |          13.00 |
|  4 | Chef Anton's Cajun Seasoning     | Condiments     |      22.00 |          53.00 |
|  5 | Chef Anton's Gumbo Mix           | Condiments     |      21.35 |           0.00 |
|  6 | Grandma's Boysenberry Spread     | Condiments     |      25.00 |         120.00 |
|  8 | Northwoods Cranberry Sauce       | Condiments     |      40.00 |           6.00 |
...
| 52 | Filo Mix                         | Grains/Cereals |       7.00 |          38.00 |
| 56 | Gnocchi di nonna Alice           | Grains/Cereals |      38.00 |          21.00 |
| 57 | Ravioli Angelo                   | Grains/Cereals |      19.50 |          36.00 |
| 64 | Wimmers gute Semmelknödel        | Grains/Cereals |      33.25 |          22.00 |
+--------------------------------------------------------------------------------------+

在分组的数据上,我们可以计算聚合操作,例如 max、min 或 sum。

Program.cs
using Microsoft.Data.Analysis;
using Spectre.Console;

var file = Path.GetFullPath("products.csv");
var df = DataFrame.LoadCsv(file);

var table = new Table()
    .Border(TableBorder.Ascii)
    .BorderColor(Color.SteelBlue)
    .AddColumn(new TableColumn("Category"))
    .AddColumn(new TableColumn("Max price").RightAligned());

var g = df.GroupBy("category");
var df2 = g.Max("unit_price");

foreach (var e in df2.Rows)
{
    string[] row = { $"{e[0]}", $"{e[1]:0.00}" };
    table.AddRow(row);
}

AnsiConsole.Write(table);

在该示例中,我们计算每个类别的最高价格。

var g = df.GroupBy("category");

我们按类别对数据进行分组。

var df2 = g.Max("unit_price");

我们计算每个类别的最高价格。

$ dotnet run
+----------------------------+
| Category       | Max price |
|----------------+-----------|
| Beverages      |    263.50 |
| Condiments     |     43.90 |
| Produce        |     53.00 |
| Meat/Poultry   |    123.79 |
| Seafood        |     62.50 |
| Dairy Products |     55.00 |
| Confections    |     81.00 |
| Grains/Cereals |     38.00 |
+----------------------------+

来源

DataFrame 入门

在本文中,我们使用了 C# 中的 Microsoft 数据分析 DataFrame。

作者

我叫 Jan Bodnar,是一位充满热情的程序员,拥有丰富的编程经验。 我从 2007 年开始撰写编程文章。到目前为止,我已经撰写了超过 1,400 篇文章和 8 本电子书。 我拥有超过十年的编程教学经验。

列出所有 C# 教程