Ada 编程/库/GNAT.String Split
根据一组分隔符将字符串分解成多个组件可以用多种不同的方法实现。在本文中,我们将重点介绍使用GNAT.String_Split
包的解决方案。
如果你在自己的程序中使用以下示例,结果将是一个可移植性较差的程序。GNAT 包仅在 [GPL] 和 [GCC GNAT] 编译器中找到,这意味着你的程序可能无法用其他 Ada 编译器编译。
你想要将一个字符串分割成一组单独的组件,例如
This is a string
进入
This is a string
这正是你使用 GNAT.String_Split
包可以做到的。
让我们直接进入解决字符串分割问题的代码。创建一个名为 explode.adb
的文件,并将此代码添加到其中
-- A procedure to illustrate the use of the GNAT.String_Split package. This
-- is just the simplest, most basic usage; the package can do a lot more, like
-- splitting on a char set, re-split the string with new separators, and
-- return the separators found before and after each substring. Left as an
-- exercise for the reader. ;)
with Ada.Characters.Latin_1;
with Ada.Text_IO;
with GNAT.String_Split;
procedure Explode is
use Ada.Characters;
use Ada.Text_IO;
use GNAT;
Data : constant String :=
"This becomes a " & Latin_1.HT & " bunch of substrings";
-- The input data would normally be read from some external source or
-- whatever. Latin_1.HT is a horizontal tab.
Subs : String_Split.Slice_Set;
-- Subs is populated by the actual substrings.
Seps : constant String := " " & Latin_1.HT;
-- just an arbitrary simple set of whitespace.
begin
Put_Line ("Splitting '" & Data & "' at whitespace.");
-- Introduce our job.
String_Split.Create (S => Subs,
From => Data,
Separators => Seps,
Mode => String_Split.Multiple);
-- Create the split, using Multiple mode to treat strings of multiple
-- whitespace characters as a single separator.
-- This populates the Subs object.
Put_Line
("Got" &
String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs)) &
" substrings:");
-- Report results, starting with the count of substrings created.
for I in 1 .. String_Split.Slice_Count (Subs) loop
-- Loop though the substrings.
declare
Sub : constant String := String_Split.Slice (Subs, I);
-- Pull the next substring out into a string object for easy handling.
begin
Put_Line (String_Split.Slice_Number'Image (I) &
" -> " &
Sub &
" (length" & Positive'Image (Sub'Length) &
")");
-- Output the individual substrings, and their length.
end;
end loop;
end Explode;
你像这样编译并执行 Explode
程序
$ gnatmake explode.adb $ ./explode
你应该看到类似于此的输出
Splitting 'This becomes a bunch of substrings' at whitespace. Got 6 substrings: 1 -> This (length 4) 2 -> becomes (length 7) 3 -> a (length 1) 4 -> bunch (length 5) 5 -> of (length 2) 6 -> substrings (length 10)
示例中的注释或多或少地解释了正在发生的事情,但为了清楚起见,我们将逐步介绍代码,从依赖项和 use
子句开始
with Ada.Characters.Latin_1;
with Ada.Text_IO;
with GNAT.String_Split;
procedure Explode is
use Ada.Characters;
use Ada.Text_IO;
use GNAT;
这三行 with
列出了我们的程序所依赖的包。当编译器遇到这些包时,它会从其库中检索这些包。"//Procedure Explode is//" 行标记了我们程序的开始,特别是声明部分,我们在这里声明/初始化我们的常量和变量。它还命名了我们的程序 Explode
。请注意 use
子句。添加这些子句使我们能够做到这一点
Put_Line ("Some text");
而不是这个
Ada.Text_IO.Put_Line ("Some text");
在程序中。非常方便。
作为练习,尝试注释掉三个 use
子句,并在程序中为所有类型和过程添加实际的包名称。
接下来我们有这个
Data : constant String :=
"This becomes a " & Latin_1.HT & " bunch of substrings";
这是我们要分割成单个组件的 String
。Latin_1.HT
是在 Ada.Characters.Latin_1
中声明的常量。它在字符串中插入一个水平制表符。由于我们在整个程序中都没有更改 Data
的值,因此我们已将其初始化为 常量。
Subs : String_Split.Slice_Set;
Subs
变量是单个组件或“切片”的容器。
Seps : constant String := " " & Latin_1.HT;
这些是我们的分隔符。在本例中,我们要根据空格 (" ") 和水平制表符 (//Latin_1.HT//) 分割字符串。请注意,分隔符不包含在生成的 Slice_Set
中。尝试使用不同的分隔符进行试验。
begin
Put_Line ("Splitting '" & Data & "' at whitespace.");
begin
标记了我们程序主体的开始。在 begin
之后,我们输出一条简短的消息。
String_Split.Create (S => Subs,
From => Data,
Separators => Seps,
Mode => String_Split.Multiple);
这是程序的核心。在这条语句中,Data
String
根据 Seps
分隔符被分割成单个切片,并将生成的切片放入 Subs Slice_Set
中。请注意 Mode => String_Split.Multiple
参数。使用 Multiple
模式时,String_Split.Create
将将连续的空格和水平制表符视为一个分隔符。
作为练习,尝试将 Multiple
更改为 Single
看看会发生什么。
Put_Line
("Got" &
String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs)) &
" substrings:");
这是负责输出的代码行
Got 6 substrings:
是的,对于这么少的输出来说,这看起来像是一行非常长的代码,但这是有原因的
String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs))
该代码行负责输出中的“6”部分。它所做的就是将 Integer
值 6
转换为 String
值“6”,它使用 Image
[[1]] 完成此操作。String_Split.Slice_Count (Subs)
返回一个 Slice_Number
类型,它基本上只是一个值 >=0 的 Integer
,然后 Image
将其转换为适合输出的 String
。
for I in 1 .. String_Split.Slice_Count (Subs) loop
-- Loop though the substrings.
declare
Sub : constant String := String_Split.Slice (Subs, I);
-- Pull the next substring out into a string object for easy handling.
begin
Put_Line (String_Split.Slice_Number'Image (I) &
" -> " &
Sub &
" (length" & Positive'Image (Sub'Length) &
")");
-- Output the individual substrings, and their length.
end;
end loop;
在这里,我们开始一个循环,该循环重复 String_Split.Slice_Count (Subs)
次,在本例中为 6 次。因此,在第一个循环中 I
为 1,在最后一个循环中 I
为 6。在循环内部,我们 declare
一个新的块。这使我们能够在每次循环重复时重新初始化 Sub
常量,并使用我们分割后的下一个切片重新初始化它。这是使用 String_Split.Slice
函数完成的,该函数以我们的 Sub
常量和 I
循环计数器作为参数,并返回一个 String
。在块的主体中,我们输出每个切片,以及它在 Subs Slice_Set
中的索引和长度。如你所见,我们再次使用 Image
属性将数值转换为 Strings
。
你可以像这样去除循环内部的块
for I in 1 .. String_Split.Slice_Count (Subs) loop
-- Loop though the substrings.
Put_Line
(String_Split.Slice_Number'Image (I) &
" -> " &
String_Split.Slice (Subs, I) &
" (length" & Positive'Image (String_Split.Slice (Subs, I)'Length) &
")");
-- Output the individual substrings, and their length.
end loop;
如你所见,我们不再使用 Sub
常量。相反,我们直接调用 String_Split.Slice (Subs, I)
。它工作方式相同,但可能不太易读。
另一个选择是使用 Ada.Strings.Unbounded.Unbounded_String
。你可以在此处查看可能的解决方案
foobar.adb
with Ada.Characters.Latin_1; with Ada.Strings.Unbounded; with Ada.Text_IO; with Ada.Text_IO.Unbounded_IO; with GNAT.String_Split;
procedure Foobar is
use Ada.Characters; use Ada.Strings.Unbounded; use Ada.Text_IO; use Ada.Text_IO.Unbounded_IO; use GNAT; Data : constant String := "This becomes a " & Latin_1.HT & " bunch of substrings"; -- The input data, normally would be read from some external source or -- whatever. Latin_1.HT is a horizontal tab. Subs : String_Split.Slice_Set; -- Subs is populated by the actual substrings. Seps : constant String := " " & Latin_1.HT; -- just arbitrary simple set of whitespace. Sub : Unbounded_String; -- Object to a slice.
begin
Put_Line ("Splitting '" & Data & "' at whitespace."); -- Introduce our job String_Split.Create (S => Subs, From => Data, Separators => Seps, Mode => String_Split.Multiple); -- Create the split, using Multiple mode to treat strings of multiple -- whitespace characters as a single separator. -- This populates the Subs object. Put_Line ("Got" & String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs)) & " substrings:"); -- Report results, starting with the count of substrings created for I in 1 .. String_Split.Slice_Count (Subs) loop -- Loop though the substrings -- Note that we've avoided the block from the first example. This is -- possible because our Sub variable is now an Unbounded_String, which -- does not have to be declared with an initial length. Sub := To_Unbounded_String (String_Split.Slice (Subs, I)); -- Pull the next substring out into an Unbounded_String object for -- easy handling. String_Split.Slice return a String, which we convert -- to an Unbounded_String using the aptly named To_Unbounded_String -- function. Put (String_Split.Slice_Number'Image (I)); Put (" -> "); Put (Sub); Put (" (length" & Positive'Image (Length (Sub)) & ")"); New_Line; end loop;
end Foobar; </syntaxhighlight>
最后我们有
end Explode;
它只是简单地结束程序。
至此,我们完成了这个关于如何根据一组分隔符将字符串分割成单个部分(切片)的小教程。我希望你喜欢阅读它,就像我喜欢撰写它一样。
外部示例
[编辑源代码]- 在以下位置搜索
GNAT.String_Split
的 示例:Rosetta Code,GitHub (gists),任何 Alire 包 或 本维基教科书。 - 在以下位置搜索与
GNAT.String_Split
相关的 帖子:Stack Overflow,comp.lang.ada 或 任何与 Ada 相关的页面。