Haskell/性能示例
目标:用实际发生的示例逐步解释优化。
dons:像 C 一样快速编写 Haskell:利用严格性、惰性和递归。
dons:像 C 一样快速编写 Haskell:利用严格性、惰性和递归。
haskell-cafe:另一个新手性能问题 我希望他不介意我在这里发布他的代码,我还是得问他。 -- apfeλmus 08:46, 18 May 2008 (UTC)
type CSV = [[String]] main = do args <- getArgs file <- readFile (head args) writeFile (head args ++ "2") (processFile (args !! 1) file) processFile s = writeCSV . doInteraction s . readCSV doInteraction line csv = insertLine (show line) (length csv - 1) csv writeCSV = (\x -> x ++ "\n") . concat . intersperse "\n" . (map (concat . intersperse "," . (map show))) insertLine line pos csv = (take pos csv) ++ [readCSVLine line] ++ drop pos csv readCSVLine = read . (\x -> "["++x++"]") readCSV = map readCSVLine . lines
我认为在邮件列表中还有另一个 cvs 解析线程我认为很合适,但我记不起来了。
jkff 在 #haskell 中询问了一些分析日志文件的代码。基本上,它是在构建直方图
foldl' (\m (x,y) -> insertWith' x (\[y] ys -> y:ys) [y] m) M.empty [(ByteString.copy foo, ByteString.copy bar) | (foo,bar) <- map (match regex) lines]
输入是一个 1GB 的日志文件,程序占用了可用内存,主要是因为 ByteString.copy
没有被强制,整个文件都停留在内存中。
此代码演示了您的 CPU 可能在预测代码分支(即 if 子句)时成功或失败。该代码生成了一个很大的随机 Int 的 Vector。由此,我们计算所有大于 999 的数字之和。如果数据已排序,CPU 可以轻松推断出下一个分支的可能结果。如果数据未排序,这将更加困难且成功率更低。不幸的是,为一个目的而排序很少是一个好主意。该代码使用 criterion 包进行时间测量。例如,使用 -Wall -keep-llvm-files -O2 -optlo-O3 -fllvm -rtsopts -threaded -eventlog -feager-blackholing -fmax-simplifier-iterations=20 -fsimplifier-phases=3 -fno-liberate-case -fspecialise-aggressively 编译此代码。这个想法来自 github.com/Kobzol/hardware-effects。(tkx68hamburg)
{-# LANGUAGE FlexibleInstances, RankNTypes, BangPatterns, CPP, PartialTypeSignatures, UnicodeSyntax, OverloadedStrings #-} {-# OPTIONS_GHC -fno-warn-orphans #-} module Main (main) where import Control.DeepSeq import Control.Exception (evaluate) import Control.Monad.ST import Criterion.Main import qualified Data.Vector.Algorithms.Radix as Radix import qualified Data.Vector.Unboxed as UV import System.Random #ifdef DEFINE_NFDATA_VECTOR instance (NFData a) => NFData (V.Vector a) where rnf v = V.foldr (\a b -> rnf a `seq` b) () v #endif setupUnsorted :: IO [UV.Vector Int] setupUnsorted = do small <- randomSampleUVVector 1024 med <- randomSampleUVVector $ 1024*1024 large <- randomSampleUVVector $ 20*1024*1024 return [small, med, large] setupSorted :: IO [UV.Vector Int] setupSorted = fmap (fmap sortUVec) setupUnsorted main :: IO () main = do defaultMainWith defaultConfig [ env setupUnsorted (\ ~[small, med, large] -> bgroup "Branch without sort" [ bench "small" $ whnf core1 small, bench "medium" $ whnf core1 med, bench "large" $ whnf core1 large ] ), env setupUnsorted (\ ~[small, med, large] -> bgroup "Branch without sort using filter" [ bench "small" $ whnf core1a small, bench "medium" $ whnf core1a med, bench "large" $ whnf core1a large ] ), env setupUnsorted (\ ~[small, med, large] -> bgroup "Branch including sort" [ bench "small" $ whnf core2 small, bench "medium" $ whnf core2 med, bench "large" $ whnf core2 large ] ), env setupSorted (\ ~[small, med, large] -> bgroup "Branch after sort" [ bench "small" $ whnf core1 small, bench "medium" $ whnf core1 med, bench "large" $ whnf core1 large ] ) ] {- INLINE -} core1 :: UV.Vector Int -> Int core1 = UV.foldl' condSum 0 {- INLINE -} core2 :: UV.Vector Int -> Int core2 = UV.foldl' condSum 0 . sortUVec {- INLINE -} core1a :: UV.Vector Int -> Int core1a = UV.sum . (UV.filter (>999)) {- INLINE -} condSum :: Int -> Int -> Int condSum !x !y = if y>999 then x+y else x randomSampleUVVector :: Int -> IO (UV.Vector Int) randomSampleUVVector i = evaluate $ force $ UV.fromList (take i (randoms (mkStdGen 0) :: [Int])) sortUVec :: forall a . (Ord a, Radix.Radix a, UV.Unbox a) => UV.Vector a -> UV.Vector a sortUVec vec = runST (do mv <- UV.thaw vec Radix.sort mv UV.unsafeFreeze mv)