跳转到内容

结构化查询语言/递归

来自维基教科书,开放的书籍,开放的世界

有时,一个表的行以这样一种方式结构化,即它们表示此表内部的层次结构或网络。典型用例是管理结构、物料清单(一台机器由多台更小的机器组成,…)或网络结构(例如:航班计划)。

为了检索特定行以及与它们相关的全部行,可以使用集合运算结合子查询将它们合并到一个结果集中。但这种技术是有限的,因为必须精确地知道级别数。除了级别数因情况而异之外,子查询语法在不同级别之间也不同。为了克服这些限制,SQL 提供了一种语法以递归方式表达查询。它们检索所有受影响级别的行,而与它们的级别数无关。

SQL 标准使用其WITH 子句的一种特殊形式(在上一页中进行了解释)来定义递归查询。该子句出现在 SELECT、INSERT、UPDATE 或 DELETE 关键字之前,并且是相应命令的一部分。

提示:WITH 子句(带或不带“RECURSIVE”关键字)通常被称为“公用表表达式 (CTE)”。

提示:Oracle 从 11.2 版本开始支持 SQL 标准的语法。MySQL 8.0 支持 RECURSIVE 关键字。早期的 MySQL 版本根本不支持递归,并建议使用过程式解决方法。

-- The command starts with a 'with clause', which contains the optional 'RECURSIVE' key word.
WITH [RECURSIVE] intermediate_table (temp_column_name [,...]) AS 
  (SELECT ... FROM real_table          -- initial query to a real table                          (1)
     UNION ALL                                                                                   (3)
   SELECT ... FROM intermediate_table  -- repetitive query using the intermediate table          (2)
  )
-- The 'with clause' is part of a regular SELECT.
-- This SELECT refers to the final result of the 'with clause'.                                  (4)
SELECT ... FROM intermediate_table
; -- consider the semicolon: the command runs from the 'WITH' up to here.

评估顺序如下

  1. 对真实表或视图的初始查询将被执行,并为步骤 2 创建起点。
  2. 通常,重复查询包含对真实表或视图与迄今为止构建的结果集的连接。重复此步骤,直到没有找到新行。
  3. 将步骤 1 和 2 中的结果集合并在一起。
  4. 最终的 SELECT 对步骤 3 的结果进行操作。

示例表

[编辑 | 编辑源代码]

为了演示递归查询,我们定义了一个示例表。它保存有关人员及其祖先的信息。由于祖先始终是人,因此所有内容都存储在同一张表中。father_idmother_id 充当对存储父亲和母亲信息的行的引用。father_idmother_idfirstname 的组合充当标准,根据这三个值唯一标识行(我们假设父母给孩子起了不同的名字)。

CREATE TABLE family_tree (
  -- define columns (name / type / default value / nullable)
  id             DECIMAL      NOT NULL,
  firstname      VARCHAR(50)  NOT NULL,
  lastname       VARCHAR(50)  NOT NULL,
  year_of_birth  DECIMAL      NOT NULL,
  year_of_death  DECIMAL,
  father_id      DECIMAL,
  mother_id      DECIMAL,
  -- the primary key
  CONSTRAINT family_tree_pk   PRIMARY KEY (id),
  -- an additional criterion to uniquely distinguish rows from each other
  CONSTRAINT family_tree_uniq UNIQUE (father_id, mother_id, firstname),
  -- two foreign keys (to the same table in this special case) to ensure that no broken links arise
  CONSTRAINT family_tree_fk1  FOREIGN KEY (father_id) REFERENCES family_tree(id),
  CONSTRAINT family_tree_fk2  FOREIGN KEY (mother_id) REFERENCES family_tree(id),
  -- plausibility checks
  CONSTRAINT family_tree_check1 CHECK ( year_of_birth >= 1800 AND year_of_birth < 2100),
  CONSTRAINT family_tree_check2 CHECK ((year_of_death >= 1800 AND year_of_death < 2100) OR year_of_death IS NULL)
);

-- a fictional couple
INSERT INTO family_tree VALUES ( 1, 'Karl',   'Miller', 1855, 1905, null, null);
INSERT INTO family_tree VALUES ( 2, 'Lisa',   'Miller', 1851, 1912, null, null);
-- their children
INSERT INTO family_tree VALUES ( 3, 'Ruth',   'Miller', 1878, 1888, 1,    2);
INSERT INTO family_tree VALUES ( 4, 'Helen',  'Miller', 1880, 1884, 1,    2);
INSERT INTO family_tree VALUES ( 5, 'Carl',   'Miller', 1882, 1935, 1,    2);
INSERT INTO family_tree VALUES ( 6, 'John',   'Miller', 1883, 1900, 1,    2);
-- some more people; some of them are descendants of the Millers
INSERT INTO family_tree VALUES ( 7, 'Emily',  'Newton', 1880, 1940, null, null);
INSERT INTO family_tree VALUES ( 8, 'Charly', 'Miller', 1908, 1978, 5,    7);
INSERT INTO family_tree VALUES ( 9, 'Deborah','Brown',  1910, 1980, null, null);
INSERT INTO family_tree VALUES (10, 'Chess',  'Miller', 1948, null, 8,    9);
COMMIT;

基本查询

[编辑 | 编辑源代码]

作为第一个例子,我们检索卡尔·米勒先生及其所有后代。为此,我们必须检索他自己的行并定义一条规则,说明如何在此家谱中从一级到一级“导航”。

-- Choose a name for the intermediate table and its columns. The column names may differ from the names in the real table.
WITH intermediate_table (id, firstname, lastname) AS
(
  -- Retrieve the starting row (or rows)
  SELECT id, firstname, lastname
  FROM   family_tree
  WHERE  firstname = 'Karl'
  AND    lastname  = 'Miller'
    UNION ALL
  -- Define the rule for querying the next level. In most cases this is done with a join operation.
  SELECT f.id, f.firstname, f.lastname        -- the alias 'f' refers to the real table
  FROM   intermediate_table i                 -- the alias 'i' refers to the intermediate table
  JOIN   family_tree f ON f.father_id = i.id OR f.mother_id = i.id  -- the join operation defines, how to reach the next level
)
-- The final SELECT
SELECT * FROM intermediate_table;


可以使用 SQL 的所有语言功能来进一步处理中间表。(它不是一个真正的表,它只是一个具有表结构的中间结果)。例如,要计算后代的数量。

WITH ...  -- The 'with clause' as above
-- The final SELECT
SELECT count(*) FROM intermediate_table
;

为了演示在没有递归 SELECT 的情况下出现的问题,我们将显示一个使用子查询的语法。

-- This query retrieves only Mr. Karl Miller ...
SELECT * 
FROM   family_tree
WHERE  firstname = 'Karl'
AND    lastname  = 'Miller'
  UNION ALL
-- ... and his children
SELECT * 
FROM   family_tree
WHERE  father_id IN (SELECT id
                     FROM   family_tree
                     WHERE  firstname = 'Karl'
                     AND    lastname  = 'Miller'
                    )
;

每一级都有自己的语法,例如,要检索孙子辈,我们需要在子查询中使用子查询。

作为第二个例子,我们反向遍历层次结构:从一个人到他们的父系(男性血统)祖先。与第一个例子相比,两件事发生了变化。查询的起点不再是卡尔·米勒先生,因为他在我们的示例表中没有祖先。我们必须通过交换 id 和 father_id 来更改连接条件。

-- Retrieve ancestors
WITH intermediate_table (id, father_id, firstname, lastname) AS
(
  -- Retrieve the starting row (or rows)
  SELECT id, father_id, firstname, lastname  -- now we need the 'father_id'
  FROM   family_tree
  WHERE  firstname = 'Chess'
  AND    lastname  = 'Miller'
    UNION ALL
  -- Define the rule for querying the next level.
  SELECT f.id, f.father_id, f.firstname, f.lastname
  FROM   intermediate_table i
  JOIN   family_tree f ON f.id = i.father_id  -- compared with the first example this join operation defines the opposite direction
)
-- The final SELECT
SELECT * FROM intermediate_table
;

注意级别

[编辑 | 编辑源代码]

有时我们需要知道行属于层次结构或网络中的哪个级别。为了显示此级别,我们在查询中包含一个带任意名称的伪列。我们选择名称 hier_level(因为 level 是保存点上下文中保留的词)。

-- We extend the above example to show the hierarchy level
WITH intermediate_table (id, firstname, lastname, hier_level) AS
( SELECT id, firstname, lastname, 0 as hier_level         -- set the level of the start point to a fix number
  FROM   family_tree
  WHERE  firstname = 'Karl'
  AND    lastname  = 'Miller'
    UNION ALL
  SELECT f.id, f.firstname, f.lastname, i.hier_level + 1  -- increment the level
  FROM   intermediate_table i
  JOIN   family_tree f ON f.father_id = i.id OR f.mother_id = i.id
)
SELECT * FROM intermediate_table;

现在可以使用该级别,并且可以将其用作附加条件,例如,限制为前两个级别。

-- The with clause remains unchanged
...
SELECT * FROM intermediate_table WHERE hier_level < 2; -- restrict the result to the first two levels

-- or, as with the above solution the intermediate result set is computed over ALL levels and later restricted to the first two:
WITH intermediate_table (id, firstname, lastname, hier_level) AS
( SELECT id, firstname, lastname, 0 as hier_level
  FROM   family_tree
  WHERE  firstname = 'Karl'
  AND    lastname  = 'Miller'
    UNION ALL
  SELECT f.id, f.firstname, f.lastname, i.hier_level + 1
  FROM   intermediate_table i
  JOIN   family_tree f ON f.father_id = i.id OR f.mother_id = i.id
  WHERE  hier_level < 1   -- restrict the join to the expected result
)
SELECT * FROM intermediate_table;

创建路径

[编辑 | 编辑源代码]

有时我们想从层次结构或网络的起点构建到实际行的路径,例如,对于多面分类(如 1.5.3)或简单地对访问的节点进行编号。这可以通过与计算级别类似的方式实现。我们需要一个带任意名称的伪列,并将实际值附加到已经形成的值。

-- Save the path from person to person in an additional column. We choose the name 'hier_path' as its name. 
WITH intermediate_table (id, firstname, lastname, hier_level, hier_path) AS
( SELECT id, firstname, lastname, 0 as hier_level, firstname as hier_path -- we collect the given names
  FROM   family_tree
  WHERE  firstname = 'Karl'
  AND    lastname  = 'Miller'
    UNION ALL
  -- The SQL standard knows only a two-parameter function concat(). We us it twice.
  SELECT f.id, f.firstname, f.lastname, i.hier_level + 1, concat (concat (i.hier_path, ' / ' ), f.firstname)
  FROM   intermediate_table i
  JOIN   family_tree f ON f.father_id = i.id OR f.mother_id = i.id
)
SELECT * FROM intermediate_table;

深度优先/广度优先

[编辑 | 编辑源代码]

有两种遍历层次结构和网络的方式。必须决定要首先处理哪种节点:子节点(下一级的节点)或兄弟节点(同一级的节点)。这两种方法分别称为深度优先广度优先。使用关键字DEPTH FIRSTBREADTH FIRST(默认值)可以在两种变体之间进行选择。

<with_clause>
SEARCH [DEPTH FIRST|BREADTH FIRST] BY <column_name> SET <sequence_number>
<select_clause>

这些关键字出现在WITH 子句SELECT 子句之间。由于与 JAVA 或 C++ 等编程语言中的树不同,或者与 XML 实例不同,表的行没有隐式顺序,因此必须为节点定义其级别内的顺序。这在BY 关键字后面完成。在SET 关键字之后,定义一个额外的伪列的名称,其中自动存储对所有行的编号。

WITH intermediate_table (id, firstname, lastname, hier_level) AS
( SELECT id, firstname, lastname, 0 AS hier_level
  FROM   family_tree
  WHERE  firstname = 'Karl'
  AND    lastname  = 'Miller'
    UNION ALL
  SELECT f.id, f.firstname, f.lastname, i.hier_level + 1
  FROM   intermediate_table i
  JOIN   family_tree f ON f.father_id = i.id OR f.mother_id = i.id
)
-- SEARCH BREADTH FIRST BY firstname SET sequence_number
SEARCH DEPTH FIRST BY firstname SET sequence_number
SELECT * FROM intermediate_table;

对上述查询有一些值得注意的说明

  1. 与本页上的其他查询(我们隐式使用了默认的BREADTH FIRST)相反,家谱以这样一种方式遍历:在每行之后,都处理其“子”行。这在级别 1 处很重要。
  2. 如果每级有多行,则根据BY 定义对这些行进行排序:在本例中为 firstname
  3. 这些行有一个序列号:在本例中为 sequence_number。可以使用此号码进行任何额外的处理。

检索切斯·米勒及其所有女性祖先。

点击查看解决方案
WITH intermediate_table (id, mother_id, firstname, lastname) AS
(
  SELECT id, mother_id, firstname, lastname
  FROM   family_tree
  WHERE  firstname = 'Chess'
  AND    lastname  = 'Miller'
    UNION ALL
  SELECT f.id, f.mother_id, f.firstname, f.lastname
  FROM   intermediate_table i
  JOIN   family_tree f ON f.id = i.mother_id
)
SELECT * FROM intermediate_table;

检索切斯·米勒及其所有祖先:男性和女性。

点击查看解决方案
WITH intermediate_table (id, father_id, mother_id, firstname, lastname) AS
(
  SELECT id, father_id, mother_id, firstname, lastname
  FROM   family_tree
  WHERE  firstname = 'Chess'
  AND    lastname  = 'Miller'
    UNION ALL
  SELECT f.id, f.father_id, f.mother_id, f.firstname, f.lastname
  FROM   intermediate_table i
  -- extend the JOIN condition!
  JOIN   family_tree f ON (f.id = i.mother_id OR f.id = i.father_id)
)
SELECT * FROM intermediate_table;

为了使情况更清晰,在前面的查询中添加一个数字,显示实际级别。

点击查看解决方案
WITH intermediate_table (id, father_id, mother_id, firstname, lastname, hier_level) AS
(
  SELECT id, father_id, mother_id, firstname, lastname, 0  -- we start with '0'
  FROM   family_tree
  WHERE  firstname = 'Chess'
  AND    lastname  = 'Miller'
    UNION ALL
  SELECT f.id, f.father_id, f.mother_id, f.firstname, f.lastname, i.hier_level + 1
  FROM   intermediate_table i
  JOIN   family_tree f ON (f.id = i.mother_id OR f.id = i.father_id)
)
SELECT * FROM intermediate_table;

为了使情况完全透明,用某种路径(孩子/父母/祖父母/...)替换级别。

点击查看解决方案
WITH intermediate_table (id, father_id, mother_id, firstname, lastname, ancestry) AS
(
  SELECT id, father_id, mother_id, firstname, lastname, firstname
  FROM   family_tree
  WHERE  firstname = 'Chess'
  AND    lastname  = 'Miller'
    UNION ALL
  SELECT f.id, f.father_id, f.mother_id, f.firstname, f.lastname, concat (concat (i.ancestry, ' / '), f.firstname)
  FROM   intermediate_table i
  JOIN   family_tree f ON (f.id = i.mother_id OR f.id = i.father_id)
)
SELECT * FROM intermediate_table;

检索卡尔·米勒的所有孙辈。

点击查看解决方案
WITH intermediate_table (id, father_id, mother_id, firstname, lastname, hier_level) AS
(
  SELECT id, father_id, mother_id, firstname, lastname, 0   -- we start with '0'
  FROM   family_tree
  WHERE  firstname = 'Karl'
  AND    lastname  = 'Miller'
    UNION ALL
  SELECT f.id, f.father_id, f.mother_id, f.firstname, f.lastname, i.hier_level + 1
  FROM   intermediate_table i
  JOIN   family_tree f ON (f.father_id = i.id AND hier_level < 2) -- performance: abort joining after the second level
)
SELECT * FROM intermediate_table WHERE hier_level = 2; -- this is the restriction to the grandchildren

检索 family_tree 表中的每个人,并显示其姓氏及其男性血统中第一个已知祖先的姓氏。

点击查看解决方案
WITH intermediate_table (id, father_id, firstname, lastname, initial_row, hier_level) AS
( -- The starting points are persons (more than one in our example table) for which no father is known.
  SELECT id, father_id, firstname, lastname, firstname, 0
  FROM   family_tree
  WHERE  father_id IS NULL
    UNION ALL
  -- The start name is preserved from level to level
  SELECT f.id, f.father_id, f.firstname, f.lastname, i.initial_row, i.hier_level + 1
  FROM   intermediate_table i
  JOIN   family_tree f ON f.father_id = i.id
)
SELECT * FROM intermediate_table;
-- or:
... unchanged 'with clause'
SELECT id, firstname, '-->', initial_row, 'in ', hier_level, 'generation(s)' FROM intermediate_table;

a) 示例表中存储了多少卡尔·米勒的后代?
b) 与之前的问题相同,但按级别区分。

点击查看解决方案
-- a)
WITH intermediate_table (id, firstname, lastname, hier_level) AS
( SELECT id, firstname, lastname, 0 AS hier_level
  FROM   family_tree
  WHERE  firstname = 'Karl'
  AND    lastname  = 'Miller'
    UNION ALL
  SELECT f.id, f.firstname, f.lastname, i.hier_level + 1
  FROM   intermediate_table i
  JOIN   family_tree f ON f.father_id = i.id
)
SELECT count(*) FROM intermediate_table where hier_level > 0;
-- b) Use the same WITH clause. Only the final SELECT changes.
...
SELECT hier_level, count(hier_level) FROM intermediate_table WHERE hier_level > 0 GROUP BY hier_level;


华夏公益教科书