跳至内容

Ada 编程/库/GNAT.String Split

来自 维基教科书,开放书籍,为开放世界

Ada. Time-tested, safe and secure.
Ada. 经久耐用、安全可靠。

根据一组分隔符将字符串分解成多个组件可以用多种不同的方法实现。在本文中,我们将重点介绍使用GNAT.String_Split 包的解决方案。

注意事项

[编辑 | 编辑源代码]

如果你在自己的程序中使用以下示例,结果将是一个可移植性较差的程序。GNAT 包仅在 [GPL] 和 [GCC GNAT] 编译器中找到,这意味着你的程序可能无法用其他 Ada 编译器编译。

你想要将一个字符串分割成一组单独的组件,例如

 This is a string 

进入

 This
 is
 a
 string

这正是你使用 GNAT.String_Split 包可以做到的。

GNAT.String_Split 解决方法

[编辑 | 编辑源代码]

让我们直接进入解决字符串分割问题的代码。创建一个名为 explode.adb 的文件,并将此代码添加到其中

--  A procedure to illustrate the use of the GNAT.String_Split package.  This
--  is just the simplest, most basic usage; the package can do a lot more, like
--  splitting on a char set, re-split the string with new separators, and
--  return the separators found before and after each substring.  Left as an
--  exercise for the reader. ;)

with Ada.Characters.Latin_1;
with Ada.Text_IO; 
with GNAT.String_Split;

procedure Explode is
   use Ada.Characters;
   use Ada.Text_IO;
   use GNAT;
   
   Data : constant String :=
            "This becomes a " & Latin_1.HT & " bunch of     substrings";
   --  The input data would normally be read from some external source or 
   --  whatever. Latin_1.HT is a horizontal tab.
   
   Subs : String_Split.Slice_Set;
   --  Subs is populated by the actual substrings.
   
   Seps : constant String := " " & Latin_1.HT;  
   --  just an arbitrary simple set of whitespace.                                 
begin
   Put_Line ("Splitting '" & Data & "' at whitespace.");
   --  Introduce our job.
   
   String_Split.Create (S          => Subs,
                        From       => Data,
                        Separators => Seps,
                        Mode       => String_Split.Multiple);
   --  Create the split, using Multiple mode to treat strings of multiple
   --  whitespace characters as a single separator.
   --  This populates the Subs object.
   
   Put_Line 
     ("Got" & 
      String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs)) &
      " substrings:");
   --  Report results, starting with the count of substrings created.
   
   for I in 1 .. String_Split.Slice_Count (Subs) loop
      --  Loop though the substrings.  
      declare
         Sub : constant String := String_Split.Slice (Subs, I);
         --  Pull the next substring out into a string object for easy handling.
      begin
         Put_Line (String_Split.Slice_Number'Image (I) &
                   " -> " & 
                   Sub & 
                   " (length" & Positive'Image (Sub'Length) & 
                   ")");
         --  Output the individual substrings, and their length.
         
      end;
   end loop;
end Explode;

你像这样编译并执行 Explode 程序

 $ gnatmake explode.adb
 $ ./explode

你应该看到类似于此的输出

 Splitting 'This becomes a   bunch of     substrings' at whitespace.
 Got 6 substrings:
  1 -> This (length 4)
  2 -> becomes (length 7)
  3 -> a (length 1)
  4 -> bunch (length 5)
  5 -> of (length 2)
  6 -> substrings (length 10)

示例中的注释或多或少地解释了正在发生的事情,但为了清楚起见,我们将逐步介绍代码,从依赖项和 use 子句开始

with Ada.Characters.Latin_1;
with Ada.Text_IO; 
with GNAT.String_Split;

procedure Explode is
   use Ada.Characters;
   use Ada.Text_IO;
   use GNAT;

这三行 with 列出了我们的程序所依赖的包。当编译器遇到这些包时,它会从其库中检索这些包。"//Procedure Explode is//" 行标记了我们程序的开始,特别是声明部分,我们在这里声明/初始化我们的常量和变量。它还命名了我们的程序 Explode。请注意 use 子句。添加这些子句使我们能够做到这一点

Put_Line ("Some text");

而不是这个

Ada.Text_IO.Put_Line ("Some text");

在程序中。非常方便。

作为练习,尝试注释掉三个 use 子句,并在程序中为所有类型和过程添加实际的包名称。

接下来我们有这个

Data : constant String :=
            "This becomes a " & Latin_1.HT & " bunch of     substrings";

这是我们要分割成单个组件的 StringLatin_1.HT 是在 Ada.Characters.Latin_1 中声明的常量。它在字符串中插入一个水平制表符。由于我们在整个程序中都没有更改 Data 的值,因此我们已将其初始化为 常量

Subs : String_Split.Slice_Set;

Subs 变量是单个组件或“切片”的容器。

Seps : constant String := " " & Latin_1.HT;

这些是我们的分隔符。在本例中,我们要根据空格 (" ") 和水平制表符 (//Latin_1.HT//) 分割字符串。请注意,分隔符不包含在生成的 Slice_Set 中。尝试使用不同的分隔符进行试验。

begin
   Put_Line ("Splitting '" & Data & "' at whitespace.");

begin 标记了我们程序主体的开始。在 begin 之后,我们输出一条简短的消息。

String_Split.Create (S          => Subs,
                     From       => Data,
                     Separators => Seps,
                     Mode       => String_Split.Multiple);

这是程序的核心。在这条语句中,Data String 根据 Seps 分隔符被分割成单个切片,并将生成的切片放入 Subs Slice_Set 中。请注意 Mode => String_Split.Multiple 参数。使用 Multiple 模式时,String_Split.Create 将将连续的空格和水平制表符视为一个分隔符。

作为练习,尝试将 Multiple 更改为 Single 看看会发生什么。

Put_Line 
     ("Got" & 
      String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs)) &
      " substrings:");

这是负责输出的代码行

 Got 6 substrings:

是的,对于这么少的输出来说,这看起来像是一行非常长的代码,但这是有原因的

String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs))

该代码行负责输出中的“6”部分。它所做的就是将 Integer6 转换为 String 值“6”,它使用 Image [[1]] 完成此操作。String_Split.Slice_Count (Subs) 返回一个 Slice_Number 类型,它基本上只是一个值 >=0 的 Integer,然后 Image 将其转换为适合输出的 String

for I in 1 .. String_Split.Slice_Count (Subs) loop
   --  Loop though the substrings.   
   declare
      Sub : constant String := String_Split.Slice (Subs, I);
      --  Pull the next substring out into a string object for easy handling.
   begin
      Put_Line (String_Split.Slice_Number'Image (I) &
                " -> " & 
                Sub & 
                " (length" & Positive'Image (Sub'Length) & 
                ")");
      --  Output the individual substrings, and their length.    
   end;
end loop;

在这里,我们开始一个循环,该循环重复 String_Split.Slice_Count (Subs) 次,在本例中为 6 次。因此,在第一个循环中 I 为 1,在最后一个循环中 I 为 6。在循环内部,我们 declare 一个新的块。这使我们能够在每次循环重复时重新初始化 Sub 常量,并使用我们分割后的下一个切片重新初始化它。这是使用 String_Split.Slice 函数完成的,该函数以我们的 Sub 常量和 I 循环计数器作为参数,并返回一个 String。在块的主体中,我们输出每个切片,以及它在 Subs Slice_Set 中的索引和长度。如你所见,我们再次使用 Image 属性将数值转换为 Strings

你可以像这样去除循环内部的块

for I in 1 .. String_Split.Slice_Count (Subs) loop
   --  Loop though the substrings.   
   Put_Line 
     (String_Split.Slice_Number'Image (I) &
      " -> " & 
      String_Split.Slice (Subs, I) & 
      " (length" & Positive'Image (String_Split.Slice (Subs, I)'Length) & 
      ")");
   --  Output the individual substrings, and their length.
end loop;

如你所见,我们不再使用 Sub 常量。相反,我们直接调用 String_Split.Slice (Subs, I)。它工作方式相同,但可能不太易读。

另一个选择是使用 Ada.Strings.Unbounded.Unbounded_String。你可以在此处查看可能的解决方案

foobar.adb

with Ada.Characters.Latin_1; with Ada.Strings.Unbounded; with Ada.Text_IO; with Ada.Text_IO.Unbounded_IO; with GNAT.String_Split;

procedure Foobar is

  use Ada.Characters;
  use Ada.Strings.Unbounded;
  use Ada.Text_IO;
  use Ada.Text_IO.Unbounded_IO;
  use GNAT;

  Data : constant String := 
           "This becomes a " & Latin_1.HT & " bunch of     substrings";
  --  The input data, normally would be read from some external source or 
  --  whatever. Latin_1.HT is a horizontal tab.

  Subs : String_Split.Slice_Set;
  --  Subs is populated by the actual substrings.

  Seps : constant String := " " & Latin_1.HT;  
  --  just arbitrary simple set of whitespace.

  Sub : Unbounded_String;
  --  Object to a slice.

begin

  Put_Line ("Splitting '" & Data & "' at whitespace.");
  --  Introduce our job

  String_Split.Create (S          => Subs,
                       From       => Data,
                       Separators => Seps,
                       Mode       => String_Split.Multiple);
  --  Create the split, using Multiple mode to treat strings of multiple
  --  whitespace characters as a single separator.
  --  This populates the Subs object.

  Put_Line 
    ("Got" & 
     String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs)) &
     " substrings:");
  --  Report results, starting with the count of substrings created

  for I in 1 .. String_Split.Slice_Count (Subs) loop
     --  Loop though the substrings

     --  Note that we've avoided the block from the first example. This is
     --  possible because our Sub variable is now an Unbounded_String, which
     --  does not have to be declared with an initial length.

     Sub := To_Unbounded_String (String_Split.Slice (Subs, I));
     --  Pull the next substring out into an Unbounded_String object for 
     --  easy handling. String_Split.Slice return a String, which we convert
     --  to an Unbounded_String using the aptly named To_Unbounded_String
     --  function.

     Put (String_Split.Slice_Number'Image (I));
     Put (" -> "); 
     Put (Sub); 
     Put (" (length" & Positive'Image (Length (Sub)) & ")");
     New_Line;
  end loop;

end Foobar; </syntaxhighlight>

最后我们有

end Explode;

它只是简单地结束程序。

至此,我们完成了这个关于如何根据一组分隔符将字符串分割成单个部分(切片)的小教程。我希望你喜欢阅读它,就像我喜欢撰写它一样。

维基教科书

[编辑 | 编辑源代码]

外部示例

[编辑源代码]
华夏公益教科书