MINC/教程/NumericPyminc

Numeric Pyminc，或者说，python 如何让我爱上 fortran，或者，Matlab！滚出我的地盘！

之前的教程介绍了一些 pyminc 的基础知识，但留下了为什么使用 python 和 pyminc 的基本问题。如果 C 和 perl 对 Peter Neelin 来说已经足够好了，为什么还要在乎？在我看来，主要原因是你可以使用 python 作为一种编程语言，使用起来很愉快（不像 Matlab，令人作呕），可以自由分发（Matlab？哼！），同时可以让你获得接近原生 C 的速度。而且不止一种方法可以做到这一点。继续阅读。

皮质厚度，或献给皮埃尔-西蒙，拉普拉斯侯爵

我们测量皮质厚度的一种方法是使用拉普拉斯方程在皮质内外之间创建流线。因此，对于下面的数值示例，我将采用该问题的一个核心部分并在 python 可访问的所有不同数值方法中对其进行求解。简而言之，给定定义的内外边界（见上面链接中的 labels.mnc），它会在它们之间创建等势线。这是通过一种简单的迭代六点平均风格完成的。作为前瞻，样本标签和代码都可以在这里找到

http://www.bic.mni.mcgill.ca/users/jason/pyminc-tutorial/

标准 python

给定边界，简单的解决方案如下

# the simplest solver in plain old unfettered python
def singleSolveLaplace(grid, output):
    convergence = 0
    for v0 in range(grid.sizes[0]):
        for v1 in range(grid.sizes[1]):
            for v2 in range(grid.sizes[2]):
                if grid.data[v0,v1,v2] > innerValue and \
                       grid.data[v0,v1,v2] < outerValue:
                    oldvalue = output.data[v0,v1,v2]
                    output.data[v0,v1,v2] = (output.data[v0+1,v1,v2] + \
                                             output.data[v0-1,v1,v2] + \
                                             output.data[v0,v1+1,v2] + \
                                             output.data[v0,v1-1,v2] + \
                                             output.data[v0,v1,v2+1] + \
                                             output.data[v0,v1,v2-1]) / 6.0
                    convergence += (oldvalue - output.data[v0,v1,v2])
    return(convergence)

遍历每个体素，将所有邻居加起来并除以 6。就是这样。在示例网格上，这在我的计算机上大约需要 7 秒，这就是问题所在。python 是一种解释型语言，在像上面描述的那样遍历体素时效率并不高。

向量化的 python

正如任何 Matlab 专家都会告诉你的，一个解决方案是将问题向量化。这在 python 中也可以做到，从而产生以下代码

# vectorizing all the operations - much faster, not possible for all
# routines.
def vectorizedLaplace(grid, output):
    output.data[1:-1,1:-1,1:-1] = (output.data[0:-2,1:-1,1:-1] + \
                                   output.data[2:,1:-1,1:-1] + \
                                   output.data[1:-1,0:-2,1:-1] + \
                                   output.data[1:-1,2:,1:-1] + \
                                   output.data[1:-1,1:-1,0:-2] + \
                                   output.data[1:-1,1:-1,2:,]) / 6
    output.data[grid.data ==10] = 10
    output.data[grid.data == 0] = 0
    return(0)

快多了 - 大约 0.4 秒。然而，阅读起来更难看，而且并非所有问题都适合这种技巧。

Weave 登场

因此，你可以用 C 或 C++ 重写整个代码。但这样一来，你就得处理用这两种语言进行编码的痛苦。那么为什么不只是重写内部循环呢？在 python 中很简单 - 进入一个名为 weave 的很好的扩展。这里，一个 C++ 字符串可以嵌入到 python 中。它第一次看到它时，就会动态地编译它，除非代码发生变化，否则它会一直重用编译后的输出。

    convergence = 0
    nv0 = sizes[0]
    nv1 = sizes[1]
    nv2 = sizes[2]
    code = r"""
#line 45 "laplacian-python-test.py"
for (int v0=0; v0 < nv0; v0++) {
   for (int v1=0; v1 < nv1; v1++) {
      for (int v2=0; v2 < nv2; v2++) {
         if (grid(v0,v1,v2) > 0 && grid(v0,v1,v2) < 10) {
            double oldvalue = output(v0, v1, v2);
            output(v0,v1,v2) = (output(v0+1,v1,v2) +
                                output(v0-1,v1,v2) +
                                output(v0,v1+1,v2) +
                                output(v0,v1-1,v2) +
                                output(v0,v1,v2+1) +
                                output(v0,v1,v2-1))/6;
            convergence += oldvalue - output(v0,v1,v2);
}}}}
"""
    weave.inline(code, ['grid', 'output', 'convergence', 'nv0', 'nv1', 'nv2'],
                 type_converters = converters.blitz,
                 compiler='gcc')
    return(convergence)

同样大约需要 0.35 秒。不错。

do 300, v0=1, nv0 - 重返老式风格

除了 weave 中的 C++ 接口，还有一个很棒的 Fortran 接口，f2py，可以连接到 python。你不会相信，python 让 Fortran 又变得可爱了。虽然可以使用任何 Fortran 方言，但我坚持使用 F77，纯粹是为了自虐。以下 Fortran 代码

C FILE: laplace.f
      subroutine laplace (grid, output, nv0, nv1, nv2, convergence)
C     
C     single evaluation of laplace's function
C
      INTEGER grid(nv0, nv1, nv2)
      REAL output(nv0, nv1, nv2)
      double precision convergence
      double precision oldvalue

      INTEGER v0, v1, v2
cf2py intent(in) :: grid, n0, nv1, nv2
cf2py intent(inplace) :: output
cf2py intent(out) :: convergence

      do 300, v0 = 1, nv0
         do 200, v1 = 1, nv1
            do 100, v2 = 1, nv2
               if (output(v0,v1,v2) .lt. 10) then
                  if (output(v0,v1,v2) .gt. 0) then
                     oldvalue = output(v0,v1,v2)
                     
                     output(v0,v1,v2) = (output(v0-1, v1, v2) + 
     +                    output(v0+1,v1,v2) + 
     +                    output(v0,v1-1,v2) + 
     +                    output(v0,v1+1,v2) + 
     +                    output(v0,v1,v2-1) + 
     +                    output(v0,v1,v2+1))/6
                     convergence = convergence + 
     +                    (oldvalue - output(v0,v1,v2))
                  endif
               endif
 100        continue
 200     continue
 300  continue
      end

然后，在命令行中，使用 'f2py -c laplace.f -m laplace' 编译它，并在 python 中像这样调用它

# in fortran (F77, no less - rocking it old school!)
# actual code is in laplace.f
def flaplaceSolve(g,o):
    convergence = 0
    convergence = laplace.laplace(g,o, grid.sizes[0],
                                  grid.sizes[1], grid.sizes[2])
    return(convergence)

完成！现在是 0.315 秒。注意，如果你查看完整的源代码，我不得不添加一个转置，因为行主数组和列主数组之间的差异令人愉快。

漂亮的 Cython

但如果你想编写干净的 python 代码而不深入研究 Fortran，该怎么办？另一个扩展来解救你 - Cython。它是一种 python 方言，允许你添加类型信息以获得接近原生速度。代码如下

# compile as follows:
# cython cython_laplace.pyx
# gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.5 -o cython_laplace.so cython_laplace.c

# note - there are smarter ways of doing this in distutils.

import numpy as np
cimport numpy as np

# some type definitions - first for the floats
FDTYPE = np.float64
ctypedef np.float64_t FDTYPE_t

# ... and then for the unsigned bytes
BDTYPE = np.uint8
ctypedef np.uint8_t BDTYPE_t

# the function definition - notice the types and dims, which are key for speed.
def cythonLaplaceStep(np.ndarray[BDTYPE_t, ndim=3] g,
                      np.ndarray[FDTYPE_t, ndim=3] o):

    # die a horrible flaming death if inputs had different types
    assert g.dtype == BDTYPE and o.dtype == FDTYPE

    # get the dimension info - again, notice the importance of defining types
    cdef int nv0 = g.shape[0]
    cdef int nv1 = g.shape[1]
    cdef int nv2 = g.shape[2]

    cdef int v0, v1, v2

    cdef double convergence = 0.0
    cdef double oldvalue

    # the actual loop - looks identical to the python code
    for v0 in range(nv0):
        for v1 in range(nv1):
            for v2 in range(nv2):
                if g[v0,v1,v2] > 0 and g[v0,v1,v2] < 10:
                    oldvalue = o[v0,v1,v2]
                    o[v0,v1,v2] = (o[v0+1,v1,v2] +
                                   o[v0-1,v1,v2] +
                                   o[v0,v1+1,v2] +
                                   o[v0,v1-1,v2] +
                                   o[v0,v1,v2+1] +
                                   o[v0,v1,v2-1])/6.0
                    convergence += oldvalue - o[v0,v1,v2]
    return(convergence)

然后，这段代码被编译 - 注意代码看起来几乎与简单且缓慢的 python 循环完全相同，只是添加了类型 - 然后从主 python 代码中调用，如下所示

# in cython - the actual code is in cython_laplace.pyx
def cythonLaplace(grid, output):
    convergence = cython_laplace.cythonLaplaceStep(asarray(grid.data),
                                                   asarray(output.data))
    return(convergence)

大约需要 0.32 秒。

结论

以上是使用 pyminc 的所有原因 - 许多选择可以让你在一种让人愉快地使用的语言中编写高效的代码。下面是使用的完整代码 - 玩得开心

#!/usr/bin/env python

# lots of imports
from pyminc.volumes.factory import *
from numpy import *
import sys
from optparse import OptionParser
from scipy import weave
from scipy.weave import converters

import laplace # the fortran version
import cython_laplace # and the cython version

# the simplest solver in plain old unfettered python
def singleSolveLaplace(grid, output):
    convergence = 0
    for v0 in range(grid.sizes[0]):
        for v1 in range(grid.sizes[1]):
            for v2 in range(grid.sizes[2]):
                if grid.data[v0,v1,v2] > innerValue and \
                       grid.data[v0,v1,v2] < outerValue:
                    oldvalue = output.data[v0,v1,v2]
                    output.data[v0,v1,v2] = (output.data[v0+1,v1,v2] + \
                                             output.data[v0-1,v1,v2] + \
                                             output.data[v0,v1+1,v2] + \
                                             output.data[v0,v1-1,v2] + \
                                             output.data[v0,v1,v2+1] + \
                                             output.data[v0,v1,v2-1]) / 6.0
                    convergence += (oldvalue - output.data[v0,v1,v2])
    return(convergence)

# vectorizing all the operations - much faster, not possible for all
# routines.
def vectorizedLaplace(grid, output):
    output.data[1:-1,1:-1,1:-1] = (output.data[0:-2,1:-1,1:-1] + \
                                   output.data[2:,1:-1,1:-1] + \
                                   output.data[1:-1,0:-2,1:-1] + \
                                   output.data[1:-1,2:,1:-1] + \
                                   output.data[1:-1,1:-1,0:-2] + \
                                   output.data[1:-1,1:-1,2:,]) / 6
    output.data[grid.data ==10] = 10
    output.data[grid.data == 0] = 0
    return(0)

# in cython - the actual code is in cython_laplace.pyx
def cythonLaplace(grid, output):
    convergence = cython_laplace.cythonLaplaceStep(asarray(grid.data),
                                                   asarray(output.data))
    return(convergence)

# in weave - allows you to embed a C++ string in the code
def weaveLaplace(grid, output, sizes):
    convergence = 0
    nv0 = sizes[0]
    nv1 = sizes[1]
    nv2 = sizes[2]
    code = r"""
#line 45 "laplacian-python-test.py"
for (int v0=0; v0 < nv0; v0++) {
   for (int v1=0; v1 < nv1; v1++) {
      for (int v2=0; v2 < nv2; v2++) {
         if (grid(v0,v1,v2) > 0 && grid(v0,v1,v2) < 10) {
            double oldvalue = output(v0, v1, v2);
            output(v0,v1,v2) = (output(v0+1,v1,v2) +
                                output(v0-1,v1,v2) +
                                output(v0,v1+1,v2) +
                                output(v0,v1-1,v2) +
                                output(v0,v1,v2+1) +
                                output(v0,v1,v2-1))/6;
            convergence += oldvalue - output(v0,v1,v2);
}}}}
"""
    weave.inline(code, ['grid', 'output', 'convergence', 'nv0', 'nv1', 'nv2'],
                 type_converters = converters.blitz,
                 compiler='gcc')
    return(convergence)

# in fortran (F77, no less - rocking it old school!)
# actual code is in laplace.f
def flaplaceSolve(g,o):
    convergence = 0
    convergence = laplace.laplace(g,o, grid.sizes[0],
                                  grid.sizes[1], grid.sizes[2])
    return(convergence)

if __name__ == "__main__":

    usage = "%prog input-grid.mnc output-solved.mnc"
    description = "Simple solver of laplace's equation - designed to help teach people how to use numeric python"


    # argument handling - an option chooses which way to do the computation
    parser = OptionParser(usage=usage, description=description)
    parser.add_option("--slow-python", dest="mode", action="store_const",
                      const="slow-python")
    parser.add_option("--fortran", dest="mode", action="store_const",
                      const="fortran")
    parser.add_option("--vectorized-python", dest="mode", action="store_const",
                      const="vector-python")
    parser.add_option("--weave", dest="mode", action="store_const",
                      const="weave")
    parser.add_option("--cython", dest="mode", action="store_const",
                      const="cython")

    (options, args) = parser.parse_args()
    if len(args) != 2:
        parser.error("Incorrect number of arguments")
        
    gridfile = args[0]
    outputfile = args[1]

    # get the volumes
    grid = volumeFromFile(gridfile, dtype='ubyte')
    output = volumeLikeFile(gridfile, outputfile)
    # initialize the output to the boundaries
    output.data[:,:,:] = grid.data[:,:,:]

    innerValue = 0
    outerValue = 10

    convergence = 0

    # transpose for the fortran bits - the joys of column versus row major
    g = asfortranarray(grid.data)
    o = asfortranarray(output.data)

    for i in range(100):
        if options.mode == "slow-python":
            convergence = singleSolveLaplace(grid, output)
        elif options.mode == "fortran":
            convergence = flaplaceSolve(g,o)
        elif options.mode == "vector-python":
            convergence = vectorizedLaplace(grid, output)
        elif options.mode == "weave":
            # notice the cast to array - no speed penalty, but necessary
            # since weave does not understand the hyperslab subclass
            # that pyminc uses
            convergence = weaveLaplace(asarray(grid.data),
                                       asarray(output.data),
                                       grid.sizes)
        elif options.mode == "cython":
            convergence = cythonLaplace(grid, output)
    if options.mode == "fortran":
        output.data = ascontiguousarray(o)

    output.writeFile()
    output.closeVolume()
    grid.closeVolume()