结构化查询语言/递归
有时,一个表的行以这样一种方式结构化,即它们表示此表内部的层次结构或网络。典型用例是管理结构、物料清单(一台机器由多台更小的机器组成,…)或网络结构(例如:航班计划)。
为了检索特定行以及与它们相关的全部行,可以使用集合运算结合子查询将它们合并到一个结果集中。但这种技术是有限的,因为必须精确地知道级别数。除了级别数因情况而异之外,子查询语法在不同级别之间也不同。为了克服这些限制,SQL 提供了一种语法以递归方式表达查询。它们检索所有受影响级别的行,而与它们的级别数无关。
SQL 标准使用其WITH 子句
的一种特殊形式(在上一页中进行了解释)来定义递归查询。该子句出现在 SELECT、INSERT、UPDATE 或 DELETE 关键字之前,并且是相应命令的一部分。
提示:WITH 子句
(带或不带“RECURSIVE”关键字)通常被称为“公用表表达式 (CTE)”。
提示:Oracle 从 11.2 版本开始支持 SQL 标准的语法。MySQL 8.0 支持 RECURSIVE 关键字。早期的 MySQL 版本根本不支持递归,并建议使用过程式解决方法。
-- The command starts with a 'with clause', which contains the optional 'RECURSIVE' key word.
WITH [RECURSIVE] intermediate_table (temp_column_name [,...]) AS
(SELECT ... FROM real_table -- initial query to a real table (1)
UNION ALL (3)
SELECT ... FROM intermediate_table -- repetitive query using the intermediate table (2)
)
-- The 'with clause' is part of a regular SELECT.
-- This SELECT refers to the final result of the 'with clause'. (4)
SELECT ... FROM intermediate_table
; -- consider the semicolon: the command runs from the 'WITH' up to here.
评估顺序如下
- 对真实表或视图的初始查询将被执行,并为步骤 2 创建起点。
- 通常,重复查询包含对真实表或视图与迄今为止构建的结果集的连接。重复此步骤,直到没有找到新行。
- 将步骤 1 和 2 中的结果集合并在一起。
- 最终的 SELECT 对步骤 3 的结果进行操作。
为了演示递归查询,我们定义了一个示例表。它保存有关人员及其祖先的信息。由于祖先始终是人,因此所有内容都存储在同一张表中。father_id 和 mother_id 充当对存储父亲和母亲信息的行的引用。father_id、mother_id 和 firstname 的组合充当标准,根据这三个值唯一标识行(我们假设父母给孩子起了不同的名字)。
CREATE TABLE family_tree (
-- define columns (name / type / default value / nullable)
id DECIMAL NOT NULL,
firstname VARCHAR(50) NOT NULL,
lastname VARCHAR(50) NOT NULL,
year_of_birth DECIMAL NOT NULL,
year_of_death DECIMAL,
father_id DECIMAL,
mother_id DECIMAL,
-- the primary key
CONSTRAINT family_tree_pk PRIMARY KEY (id),
-- an additional criterion to uniquely distinguish rows from each other
CONSTRAINT family_tree_uniq UNIQUE (father_id, mother_id, firstname),
-- two foreign keys (to the same table in this special case) to ensure that no broken links arise
CONSTRAINT family_tree_fk1 FOREIGN KEY (father_id) REFERENCES family_tree(id),
CONSTRAINT family_tree_fk2 FOREIGN KEY (mother_id) REFERENCES family_tree(id),
-- plausibility checks
CONSTRAINT family_tree_check1 CHECK ( year_of_birth >= 1800 AND year_of_birth < 2100),
CONSTRAINT family_tree_check2 CHECK ((year_of_death >= 1800 AND year_of_death < 2100) OR year_of_death IS NULL)
);
-- a fictional couple
INSERT INTO family_tree VALUES ( 1, 'Karl', 'Miller', 1855, 1905, null, null);
INSERT INTO family_tree VALUES ( 2, 'Lisa', 'Miller', 1851, 1912, null, null);
-- their children
INSERT INTO family_tree VALUES ( 3, 'Ruth', 'Miller', 1878, 1888, 1, 2);
INSERT INTO family_tree VALUES ( 4, 'Helen', 'Miller', 1880, 1884, 1, 2);
INSERT INTO family_tree VALUES ( 5, 'Carl', 'Miller', 1882, 1935, 1, 2);
INSERT INTO family_tree VALUES ( 6, 'John', 'Miller', 1883, 1900, 1, 2);
-- some more people; some of them are descendants of the Millers
INSERT INTO family_tree VALUES ( 7, 'Emily', 'Newton', 1880, 1940, null, null);
INSERT INTO family_tree VALUES ( 8, 'Charly', 'Miller', 1908, 1978, 5, 7);
INSERT INTO family_tree VALUES ( 9, 'Deborah','Brown', 1910, 1980, null, null);
INSERT INTO family_tree VALUES (10, 'Chess', 'Miller', 1948, null, 8, 9);
COMMIT;
作为第一个例子,我们检索卡尔·米勒先生及其所有后代。为此,我们必须检索他自己的行并定义一条规则,说明如何在此家谱中从一级到一级“导航”。
-- Choose a name for the intermediate table and its columns. The column names may differ from the names in the real table.
WITH intermediate_table (id, firstname, lastname) AS
(
-- Retrieve the starting row (or rows)
SELECT id, firstname, lastname
FROM family_tree
WHERE firstname = 'Karl'
AND lastname = 'Miller'
UNION ALL
-- Define the rule for querying the next level. In most cases this is done with a join operation.
SELECT f.id, f.firstname, f.lastname -- the alias 'f' refers to the real table
FROM intermediate_table i -- the alias 'i' refers to the intermediate table
JOIN family_tree f ON f.father_id = i.id OR f.mother_id = i.id -- the join operation defines, how to reach the next level
)
-- The final SELECT
SELECT * FROM intermediate_table;
可以使用 SQL 的所有语言功能来进一步处理中间表。(它不是一个真正的表,它只是一个具有表结构的中间结果)。例如,要计算后代的数量。
WITH ... -- The 'with clause' as above
-- The final SELECT
SELECT count(*) FROM intermediate_table
;
为了演示在没有递归 SELECT 的情况下出现的问题,我们将显示一个使用子查询的语法。
-- This query retrieves only Mr. Karl Miller ...
SELECT *
FROM family_tree
WHERE firstname = 'Karl'
AND lastname = 'Miller'
UNION ALL
-- ... and his children
SELECT *
FROM family_tree
WHERE father_id IN (SELECT id
FROM family_tree
WHERE firstname = 'Karl'
AND lastname = 'Miller'
)
;
每一级都有自己的语法,例如,要检索孙子辈,我们需要在子查询中使用子查询。
作为第二个例子,我们反向遍历层次结构:从一个人到他们的父系(男性血统)祖先。与第一个例子相比,两件事发生了变化。查询的起点不再是卡尔·米勒先生,因为他在我们的示例表中没有祖先。我们必须通过交换 id 和 father_id 来更改连接条件。
-- Retrieve ancestors
WITH intermediate_table (id, father_id, firstname, lastname) AS
(
-- Retrieve the starting row (or rows)
SELECT id, father_id, firstname, lastname -- now we need the 'father_id'
FROM family_tree
WHERE firstname = 'Chess'
AND lastname = 'Miller'
UNION ALL
-- Define the rule for querying the next level.
SELECT f.id, f.father_id, f.firstname, f.lastname
FROM intermediate_table i
JOIN family_tree f ON f.id = i.father_id -- compared with the first example this join operation defines the opposite direction
)
-- The final SELECT
SELECT * FROM intermediate_table
;
有时我们需要知道行属于层次结构或网络中的哪个级别。为了显示此级别,我们在查询中包含一个带任意名称的伪列。我们选择名称 hier_level(因为 level 是保存点上下文中保留的词)。
-- We extend the above example to show the hierarchy level
WITH intermediate_table (id, firstname, lastname, hier_level) AS
( SELECT id, firstname, lastname, 0 as hier_level -- set the level of the start point to a fix number
FROM family_tree
WHERE firstname = 'Karl'
AND lastname = 'Miller'
UNION ALL
SELECT f.id, f.firstname, f.lastname, i.hier_level + 1 -- increment the level
FROM intermediate_table i
JOIN family_tree f ON f.father_id = i.id OR f.mother_id = i.id
)
SELECT * FROM intermediate_table;
现在可以使用该级别,并且可以将其用作附加条件,例如,限制为前两个级别。
-- The with clause remains unchanged
...
SELECT * FROM intermediate_table WHERE hier_level < 2; -- restrict the result to the first two levels
-- or, as with the above solution the intermediate result set is computed over ALL levels and later restricted to the first two:
WITH intermediate_table (id, firstname, lastname, hier_level) AS
( SELECT id, firstname, lastname, 0 as hier_level
FROM family_tree
WHERE firstname = 'Karl'
AND lastname = 'Miller'
UNION ALL
SELECT f.id, f.firstname, f.lastname, i.hier_level + 1
FROM intermediate_table i
JOIN family_tree f ON f.father_id = i.id OR f.mother_id = i.id
WHERE hier_level < 1 -- restrict the join to the expected result
)
SELECT * FROM intermediate_table;
有时我们想从层次结构或网络的起点构建到实际行的路径,例如,对于多面分类(如 1.5.3)或简单地对访问的节点进行编号。这可以通过与计算级别类似的方式实现。我们需要一个带任意名称的伪列,并将实际值附加到已经形成的值。
-- Save the path from person to person in an additional column. We choose the name 'hier_path' as its name.
WITH intermediate_table (id, firstname, lastname, hier_level, hier_path) AS
( SELECT id, firstname, lastname, 0 as hier_level, firstname as hier_path -- we collect the given names
FROM family_tree
WHERE firstname = 'Karl'
AND lastname = 'Miller'
UNION ALL
-- The SQL standard knows only a two-parameter function concat(). We us it twice.
SELECT f.id, f.firstname, f.lastname, i.hier_level + 1, concat (concat (i.hier_path, ' / ' ), f.firstname)
FROM intermediate_table i
JOIN family_tree f ON f.father_id = i.id OR f.mother_id = i.id
)
SELECT * FROM intermediate_table;
有两种遍历层次结构和网络的方式。必须决定要首先处理哪种节点:子节点(下一级的节点)或兄弟节点(同一级的节点)。这两种方法分别称为深度优先 和 广度优先。使用关键字DEPTH FIRST
和 BREADTH FIRST
(默认值)可以在两种变体之间进行选择。
<with_clause>
SEARCH [DEPTH FIRST|BREADTH FIRST] BY <column_name> SET <sequence_number>
<select_clause>
这些关键字出现在WITH 子句
和SELECT 子句
之间。由于与 JAVA 或 C++ 等编程语言中的树不同,或者与 XML 实例不同,表的行没有隐式顺序,因此必须为节点定义其级别内的顺序。这在BY
关键字后面完成。在SET
关键字之后,定义一个额外的伪列的名称,其中自动存储对所有行的编号。
WITH intermediate_table (id, firstname, lastname, hier_level) AS
( SELECT id, firstname, lastname, 0 AS hier_level
FROM family_tree
WHERE firstname = 'Karl'
AND lastname = 'Miller'
UNION ALL
SELECT f.id, f.firstname, f.lastname, i.hier_level + 1
FROM intermediate_table i
JOIN family_tree f ON f.father_id = i.id OR f.mother_id = i.id
)
-- SEARCH BREADTH FIRST BY firstname SET sequence_number
SEARCH DEPTH FIRST BY firstname SET sequence_number
SELECT * FROM intermediate_table;
对上述查询有一些值得注意的说明
- 与本页上的其他查询(我们隐式使用了默认的
BREADTH FIRST
)相反,家谱以这样一种方式遍历:在每行之后,都处理其“子”行。这在级别 1 处很重要。 - 如果每级有多行,则根据
BY
定义对这些行进行排序:在本例中为 firstname。 - 这些行有一个序列号:在本例中为 sequence_number。可以使用此号码进行任何额外的处理。
检索切斯·米勒及其所有女性祖先。
WITH intermediate_table (id, mother_id, firstname, lastname) AS
(
SELECT id, mother_id, firstname, lastname
FROM family_tree
WHERE firstname = 'Chess'
AND lastname = 'Miller'
UNION ALL
SELECT f.id, f.mother_id, f.firstname, f.lastname
FROM intermediate_table i
JOIN family_tree f ON f.id = i.mother_id
)
SELECT * FROM intermediate_table;
检索切斯·米勒及其所有祖先:男性和女性。
WITH intermediate_table (id, father_id, mother_id, firstname, lastname) AS
(
SELECT id, father_id, mother_id, firstname, lastname
FROM family_tree
WHERE firstname = 'Chess'
AND lastname = 'Miller'
UNION ALL
SELECT f.id, f.father_id, f.mother_id, f.firstname, f.lastname
FROM intermediate_table i
-- extend the JOIN condition!
JOIN family_tree f ON (f.id = i.mother_id OR f.id = i.father_id)
)
SELECT * FROM intermediate_table;
为了使情况更清晰,在前面的查询中添加一个数字,显示实际级别。
WITH intermediate_table (id, father_id, mother_id, firstname, lastname, hier_level) AS
(
SELECT id, father_id, mother_id, firstname, lastname, 0 -- we start with '0'
FROM family_tree
WHERE firstname = 'Chess'
AND lastname = 'Miller'
UNION ALL
SELECT f.id, f.father_id, f.mother_id, f.firstname, f.lastname, i.hier_level + 1
FROM intermediate_table i
JOIN family_tree f ON (f.id = i.mother_id OR f.id = i.father_id)
)
SELECT * FROM intermediate_table;
为了使情况完全透明,用某种路径(孩子/父母/祖父母/...)替换级别。
WITH intermediate_table (id, father_id, mother_id, firstname, lastname, ancestry) AS
(
SELECT id, father_id, mother_id, firstname, lastname, firstname
FROM family_tree
WHERE firstname = 'Chess'
AND lastname = 'Miller'
UNION ALL
SELECT f.id, f.father_id, f.mother_id, f.firstname, f.lastname, concat (concat (i.ancestry, ' / '), f.firstname)
FROM intermediate_table i
JOIN family_tree f ON (f.id = i.mother_id OR f.id = i.father_id)
)
SELECT * FROM intermediate_table;
检索卡尔·米勒的所有孙辈。
WITH intermediate_table (id, father_id, mother_id, firstname, lastname, hier_level) AS
(
SELECT id, father_id, mother_id, firstname, lastname, 0 -- we start with '0'
FROM family_tree
WHERE firstname = 'Karl'
AND lastname = 'Miller'
UNION ALL
SELECT f.id, f.father_id, f.mother_id, f.firstname, f.lastname, i.hier_level + 1
FROM intermediate_table i
JOIN family_tree f ON (f.father_id = i.id AND hier_level < 2) -- performance: abort joining after the second level
)
SELECT * FROM intermediate_table WHERE hier_level = 2; -- this is the restriction to the grandchildren
检索 family_tree 表中的每个人,并显示其姓氏及其男性血统中第一个已知祖先的姓氏。
WITH intermediate_table (id, father_id, firstname, lastname, initial_row, hier_level) AS
( -- The starting points are persons (more than one in our example table) for which no father is known.
SELECT id, father_id, firstname, lastname, firstname, 0
FROM family_tree
WHERE father_id IS NULL
UNION ALL
-- The start name is preserved from level to level
SELECT f.id, f.father_id, f.firstname, f.lastname, i.initial_row, i.hier_level + 1
FROM intermediate_table i
JOIN family_tree f ON f.father_id = i.id
)
SELECT * FROM intermediate_table;
-- or:
... unchanged 'with clause'
SELECT id, firstname, '-->', initial_row, 'in ', hier_level, 'generation(s)' FROM intermediate_table;
a) 示例表中存储了多少卡尔·米勒的后代?
b) 与之前的问题相同,但按级别区分。
-- a)
WITH intermediate_table (id, firstname, lastname, hier_level) AS
( SELECT id, firstname, lastname, 0 AS hier_level
FROM family_tree
WHERE firstname = 'Karl'
AND lastname = 'Miller'
UNION ALL
SELECT f.id, f.firstname, f.lastname, i.hier_level + 1
FROM intermediate_table i
JOIN family_tree f ON f.father_id = i.id
)
SELECT count(*) FROM intermediate_table where hier_level > 0;
-- b) Use the same WITH clause. Only the final SELECT changes.
...
SELECT hier_level, count(hier_level) FROM intermediate_table WHERE hier_level > 0 GROUP BY hier_level;