Haskell/Arrow 教程

Arrow 教程

高级 Haskell

幺半群
 应用函子
 可折叠
 可遍历
 Arrow 教程
 理解箭头
延续传递风格
 拉链
镜头
 共幺半群
值递归 (MonadFix)
有副作用的流式处理
 可变对象
并发
模板 Haskell
类型族

箭头为用基本函子类结构化计算提供了另一种方法。本章提供了关于它们的动手教程，而下一章，理解箭头，用概念概述作为补充。我们建议您从教程开始，以便您可以体验使用箭头的编程感觉。当然，如果您喜欢以更慢的速度进行，您可以在教程和“理解箭头”的第一部分之间来回切换。务必在 GHC(i) 上遵循教程的每一步。

Stephen 的 Arrow 教程

在本教程中，我将创建自己的箭头，展示如何使用箭头 proc 语法，并展示 ArrowChoice 如何工作。最终我们将得到一个简单的猜词游戏。

首先，我们给出语言 pragma（省略）以在编译器中启用箭头 do 语法。然后是一些导入

module Main where

import Control.Arrow
import Control.Monad
import qualified Control.Category as Cat
import Data.List
import Data.Maybe
import System.Random

任何 Haskell 函数都可以作为箭头，因为函数类型构造函数 (->) 有一个 Arrow 实例。在本教程中，我将构建一个比这更有意思的箭头，它能够维护状态（普通的 Haskell 函数箭头无法做到）。箭头可以产生各种效果，包括 I/O，但我们只会探讨一些简单的例子。

我们将把我们的新箭头称为 Circuit，以暗示我们可以将箭头可视化为电路。^[1]

`Circuit` 的类型定义

作为箭头的普通 Haskell 函数具有 a -> b 类型。我们的 Circuit 箭头有两个显著特征：首先，我们将它包装在一个 newtype 声明中，以清晰地定义一个 Arrow 实例。其次，为了使电路能够维护其内部状态，我们的箭头返回一个替换它自身的箭头，以及正常的 b 输出值。

newtype Circuit a b = Circuit { unCircuit :: a -> (Circuit a b, b) }

为了使它成为一个箭头，我们需要使其成为 Category 和 Arrow 的实例。在这些定义中，我们始终将每个 Circuit 替换为它返回的新版本。

instance Cat.Category Circuit where
    id = Circuit $ \a -> (Cat.id, a)
    (.) = dot
      where
        (Circuit cir2) `dot` (Circuit cir1) = Circuit $ \a ->
            let (cir1', b) = cir1 a
                (cir2', c) = cir2 b
            in  (cir2' `dot` cir1', c)

Cat.id 函数用自身的一个副本替换自身，而不维护任何状态。(.) 函数的目的是将两个箭头从右到左连接起来。(>>>) 和 (<<<) 基于 (.)。它需要用执行参数 Circuits 返回的两个替换的点运算替换自身。

instance Arrow Circuit where
    arr f = Circuit $ \a -> (arr f, f a)
    first (Circuit cir) = Circuit $ \(b, d) ->
        let (cir', c) = cir b
        in  (first cir', (c, d))

arr 将普通 Haskell 函数提升为箭头。与 id 一样，它给出的替换只是自身，因为普通的 Haskell 函数无法维护状态。

现在我们需要一个函数来运行电路

runCircuit :: Circuit a b -> [a] -> [b]
runCircuit _   []     = []
runCircuit cir (x:xs) =
    let (cir',x') = unCircuit cir x
    in  x' : runCircuit cir' xs

对于像我一样的 mapAccumL 爱好者，这也可以写成

runCircuit :: Circuit a b -> [a] -> [b]
runCircuit cir inputs =
    snd $ mapAccumL (\cir x -> unCircuit cir x) cir inputs

或者，在 eta-reduction 后，简单地写成

runCircuit :: Circuit a b -> [a] -> [b]
runCircuit cir = snd . mapAccumL unCircuit cir

`Circuit` 原语

让我们定义一个通用的累加器作为我们之后工作的基础。accum' 是 accum 的一个不那么通用的版本。

-- | Accumulator that outputs a value determined by the supplied function.
accum :: acc -> (a -> acc -> (b, acc)) -> Circuit a b
accum acc f = Circuit $ \input ->
    let (output, acc') = input `f` acc
    in  (accum acc' f, output)

-- | Accumulator that outputs the accumulator value.
accum' :: b -> (a -> b -> b) -> Circuit a b
accum' acc f = accum acc (\a b -> let b' = a `f` b in (b', b'))

这里是一个有用的具体累加器，它保持所有作为输入传递给它的数字的运行总和。

total :: Num a => Circuit a a
total = accum' 0 (+)

我们可以像这样运行这个电路

*Main> runCircuit total [1,0,1,0,0,2]
[1,1,2,2,2,4]
*Main>

Arrow `proc` 语法

这里有一个统计平均值函数

mean1 :: Fractional a => Circuit a a
mean1 = (total &&& (const 1 ^>> total)) >>> arr (uncurry (/))

它维护两个累加器单元，一个用于总和，一个用于元素数量。它使用“扇出”运算符 &&& 拆分输入，在第二个流的输入之前，它丢弃输入值并用 1 替换它。

const 1 ^>> total 是 arr (const 1) >>> total 的简写。第一个流是输入的总和。第二个流是每个输入的 1 之和（即输入数量的计数）。然后，它使用 (/) 运算符合并这两个流。

这里有同一个函数，但使用箭头 proc 语法编写

mean2 :: Fractional a => Circuit a a
mean2 = proc value -> do
    t <- total -< value
    n <- total -< 1
    returnA -< t / n

proc 语法描述了箭头之间的相同关系，但以完全不同的方式。您不是显式描述布线，而是使用变量绑定和纯 Haskell 表达式将箭头粘合在一起，编译器为您完成所有 arr, (>>>), (&&&) 工作。箭头 proc 语法也包含一个纯 'let' 语句，与单子 do 语句完全一样。

proc 是引入箭头符号的关键字，它将箭头输入绑定到一个模式（本例中为 value）。do 块中的箭头语句采用以下形式之一

变量绑定模式 <- 箭头 -> 给出箭头输入的纯表达式
箭头 -> 给出箭头输入的纯表达式

与单子一样，do 关键字仅用于使用 <- 将多个行与变量绑定模式组合在一起。与单子一样，最后一行不允许包含变量绑定模式，最后一行输出的值是箭头的输出值。returnA 就像 'total' 一样是一个箭头（实际上，returnA 只是恒等箭头，定义为 arr id）。

同样与单子一样，除了最后一行以外的其他行可能没有变量绑定，并且你只能得到效果，丢弃返回值。在 Circuit 中，这样做永远没有意义（因为没有状态可以逃逸，除了通过返回值），但在许多箭头中会有意义。

如你所见，对于此示例，proc 符号使代码更易读。让我们试一试

*Main> runCircuit mean1 [0,10,7,8]
[0.0,5.0,5.666666666666667,6.25]
*Main> runCircuit mean2 [0,10,7,8]
[0.0,5.0,5.666666666666667,6.25]
*Main>

Hangman：选择一个词

现在，对于我们的 Hangman 游戏，让我们从字典中选择一个词

generator :: Random a => (a, a) -> StdGen -> Circuit () a
generator range rng = accum rng $ \() rng -> randomR range rng

dictionary = ["dog", "cat", "bird"]

pickWord :: StdGen -> Circuit () String
pickWord rng = proc () -> do
    idx <- generator (0, length dictionary-1) rng -< ()
    returnA -< dictionary !! idx

使用 generator，我们使用累加器功能来保存我们的随机数生成器。pickWord 没有引入任何新内容，只是生成器箭头是由一个接受参数的 Haskell 函数构建的。以下是输出

*Main> rng <- getStdGen
*Main> runCircuit (pickWord rng) [(), (), ()]
["dog","bird","dog"]
*Main>

我们将在稍后使用这些小箭头。第一个在第一次返回 True，之后永远返回 False

oneShot :: Circuit () Bool
oneShot = accum True $ \_ acc -> (acc, False)

*Main> runCircuit oneShot [(), (), (), (), ()]
[True,False,False,False,False]

第二个存储一个值并返回它，当它得到一个新值时

delayedEcho :: a -> Circuit a a
delayedEcho acc = accum acc (\a b -> (b,a))

可以缩短为

delayedEcho :: a -> Circuit a a
delayedEcho acc = accum acc (flip (,))

*Main> runCircuit (delayedEcho False) [True, False, False, False, True] 
[False,True,False,False,False]

游戏的主要箭头将重复执行，我们希望只在第一次迭代时选择单词，并在游戏剩余时间内记住它。我们不希望在后续循环中仅仅掩盖它的输出，而是希望实际运行 pickWord 仅一次（因为在实际实现中它可能非常慢）。然而，就目前而言，Circuit 中的数据流必须经过所有组件箭头的路径。为了允许数据流经过一条路径而不经过另一条路径，我们需要使我们的箭头成为 ArrowChoice 的实例。以下是最小定义

instance ArrowChoice Circuit where
    left orig@(Circuit cir) = Circuit $ \ebd -> case ebd of
        Left b -> let (cir', c) = cir b
                  in  (left cir', Left c)
        Right d -> (left orig, Right d)

getWord :: StdGen -> Circuit () String
getWord rng = proc () -> do
    -- If this is the first game loop, run pickWord. mPicked becomes Just <word>.
    -- On subsequent loops, mPicked is Nothing.
    firstTime <- oneShot -< ()
    mPicked <- if firstTime
        then do
            picked <- pickWord rng -< ()
            returnA -< Just picked
        else returnA -< Nothing
    -- An accumulator that retains the last 'Just' value.
    mWord <- accum' Nothing mplus -< mPicked
    returnA -< fromJust mWord

因为 ArrowChoice 已定义，编译器现在允许我们在 <- 后面放置一个 if，从而选择要执行的箭头（要么运行 pickWord，要么跳过它）。请注意，这不是普通的 Haskell if：编译器使用 ArrowChoice 实现它。编译器在这里也以相同的方式实现 case。

重要的是要理解，所有局部名称绑定，包括 proc 参数，在 <- 和 -> 之间都不在范围内，除了在 if 或 case 的条件中。例如，以下操作是非法的

{-
proc rng -> do
    idx <- generator (0, length dictionary-1) rng -< ()  -- ILLEGAL
    returnA -< dictionary !! idx
-}

要执行的箭头，这里为 generator (0, length dictionary -1) rng，在 'proc' 语句外部存在的范围内进行评估。rng 在此范围内不存在。如果你仔细想想，这很有道理，因为箭头只在开头（在 proc 外部）构建。如果它是在每次执行箭头时构建的，那么它如何保持其状态呢？

让我们试一试 getWord

*Main> rng <- getStdGen
*Main> runCircuit (getWord rng) [(), (), (), (), (), ()]
["dog","dog","dog","dog","dog","dog"]
*Main>

Hangman：主程序

现在，以下是游戏

attempts :: Int
attempts = 5

livesLeft :: Int -> String
livesLeft hung = "Lives: ["
              ++ replicate (attempts - hung) '#'
              ++ replicate hung ' '
              ++ "]"

hangman :: StdGen -> Circuit String (Bool, [String])
hangman rng = proc userInput -> do
    word <- getWord rng -< ()
    let letter = listToMaybe userInput
    guessed <- updateGuess -< (word, letter)
    hung <- updateHung -< (word, letter)
    end <- delayedEcho True -< not (word == guessed || hung >= attempts)
    let result = if word == guessed
                   then [guessed, "You won!"]
                   else if hung >= attempts
                       then [guessed, livesLeft hung, "You died!"]
                       else [guessed, livesLeft hung]
    returnA -< (end, result)
  where
    updateGuess :: Circuit (String, Maybe Char) String
    updateGuess = accum' (repeat '_') $ \(word, letter) guess ->
        case letter of
            Just l  -> map (\(w, g) -> if w == l then w else g) (zip word guess)
            Nothing -> take (length word) guess

    updateHung :: Circuit (String, Maybe Char) Int
    updateHung = proc (word, letter) -> do
        total -< case letter of
            Just l  -> if l `elem` word then 0 else 1
            Nothing -> 0


main :: IO ()
main = do
    rng <- getStdGen
    interact $ unlines                      -- Concatenate lines out output
        . ("Welcome to Arrow Hangman":)     -- Prepend a greeting to the output
        . concat . map snd . takeWhile fst  -- Take the [String]s as long as the first element of the tuples is True
        . runCircuit (hangman rng)          -- Process the input lazily
        . ("":)                             -- Act as if the user pressed ENTER once at the start
        . lines                             -- Split input into lines

这里是一个示例会话。为了获得最佳效果，请编译游戏并从终端而不是从 GHCi 中运行它

Welcome to Arrow Hangman
___
Lives: [#####]
a
___
Lives: [#### ]
g
__g
Lives: [#### ]
d
d_g
Lives: [#### ]
o
dog
You won!

高级内容

在本节中，我将完成对箭头符号的介绍。

将箭头命令与函数组合

我们这样实现 mean2

mean2 :: Fractional a => Circuit a a
mean2 = proc value -> do
    t <- total -< value
    n <- total -< 1
    returnA -< t / n

GHC 定义了一个香蕉括号语法，用于将箭头语句与对箭头进行操作的函数组合在一起。（在 Ross Paterson 的论文 ^[2] 中使用了 form 关键字，但 GHC 代替采用了香蕉括号。）尽管没有真正的原因，但我们可以这样编写 mean

mean3 :: Fractional a => Circuit a a
mean3 = proc value -> do
    (t, n) <- (| (&&&) (total -< value) (total -< 1) |)
    returnA -< t / n

(| ... |) 内的第一个项目是一个函数，它以任意数量的箭头作为输入，并返回一个箭头。这里不能使用中缀符号。它后面跟着参数，参数的形式为 proc 语句。这些语句如果需要，可以包含 do 和使用 <- 的绑定。每个参数都会被转换为一个箭头，并作为参数传递给函数 (&&&)。

你可能会问，这样做有什么意义？我们可以很方便地组合箭头，而无需 proc 符号。好吧，重点是你可以在语句中使用局部变量绑定。

香蕉括号实际上不是必需的。编译器足够智能，可以假设当你这样编写时这就是你的意思（注意，这里允许使用中缀符号）

mean4 :: Fractional a => Circuit a a
mean4 = proc value -> do
    (t, n) <- (total -< value) &&& (total -< 1)
    returnA -< t / n

那么我们为什么要使用香蕉括号呢？对于更简单的语法存在歧义的情况。原因是 proc 命令的箭头部分不是普通的 Haskell 表达式。请记住，对于在 proc 语句中指定的箭头，以下内容是正确的

局部变量绑定仅允许在 -> 后面的输入表达式中，以及在 if 和 case 条件中。箭头本身在 proc 外部存在的范围内进行解释。
if 和 case 语句不是普通的 Haskell。它们使用 ArrowChoice 实现。
用于组合箭头的函数也不是普通的 Haskell。它们是香蕉括号语法的简写。

递归绑定

冒着用完 mean 示例的风险，这里还有另一种使用递归绑定来实现它的方法。为了使它能够正常工作，我们需要一个将输入延迟一步的箭头

delay :: a -> Circuit a a
delay last = Circuit $ \this -> (delay this, last)

以下是 delay 的作用

*Main> runCircuit (delay 0) [5,6,7]
[0,5,6]
*Main>

以下是我们递归版本的 mean

mean5 :: Fractional a => Circuit a a 
mean5 = proc value -> do
    rec
        (lastTot, lastN) <- delay (0,0) -< (tot, n)
        let (tot, n) = (lastTot + value, lastN + 1)
        let mean = tot / n
    returnA -< mean

rec 块类似于 do' 块，只是

最后一行可以（通常是）一个变量绑定。它是否是 let 绑定还是使用 <- 的 do-块绑定并不重要。
rec 块没有返回值。var <- rec ... 是非法的，rec 不允许是 do 块中的最后一个元素。
预期变量的使用会形成一个循环（否则 rec 就没有意义）。

rec 的机制由 ArrowLoop 类的 loop 函数处理，我们为 Circuit 定义如下

instance ArrowLoop Circuit where
    loop (Circuit cir) = Circuit $ \b ->
        let (cir', (c,d)) = cir (b,d)
        in  (loop cir', c)

在幕后，它的工作原理是这样的

在 rec 中定义的任何在 rec 中被向前引用的变量都会通过传递给 loop 的第二个元组元素来循环。实际上，变量绑定及其对它们的引用可以按任何顺序排列（但箭头语句的顺序在效果方面很重要）。
在 rec 中定义的任何在 rec 外部引用的变量都会在 loop 的第一个元组元素中返回。

重要的是要理解，loop（因此 rec）只是绑定变量。它不会保留值并在下次调用时传递回来 - delay 负责这部分。由变量引用形成的循环必须由某种延迟箭头或惰性求值打破，否则代码将陷入无限循环，就像你在普通的 Haskell 中编写 let a = a+1 一样。