Cg 编程/Unity/计算最亮像素

本教程展示了如何使用 Unity 中的计算着色器来计算图像中最亮像素的位置。特别是，它展示了线程组中的线程如何使用“groupshared”数据以及如何同步这些线程的执行。如果您不熟悉 Unity 中的计算着色器，您应该首先阅读 “计算图像效果”部分和 “计算颜色直方图”部分。请注意，计算着色器在 macOS 上不受支持。

为什么要这样做？

在拍摄的图像中查找最亮像素对于光学运动捕捉的一些应用很有用。另一个应用是模板匹配算法，该算法应用于图像所有像素的位置，并将匹配可能性存储在中间图像的每个像素处。在这种情况下，该中间图像的“最亮”像素表示与模板的最佳匹配。找到这种最佳匹配对于基于模板的特征检测和跟踪很有用。

此外，查找最亮像素的问题与许多其他问题密切相关，例如，查找最暗像素或两个（或更多）最亮像素或两个（或更多）具有特定距离的最亮像素或图像所有像素的总和或平均值等。事实上，通过解决查找最亮像素的问题，人们非常接近解决几个相关的问题。

使用计算着色器查找最亮像素

为了在图像中查找最亮像素，必须查看图像的所有像素；因此，该问题可以从并行化中受益匪浅。

在本教程中，我们实现了一个计算着色器，它首先在图像的一行像素中找到最亮像素——只需循环遍历该行中的所有像素并跟踪遇到的最亮像素即可。我们将此计算着色器并行调用图像的所有行。结果是一个数组，其中包含每行的最亮像素，这可能是一个相对较大的数组（取决于图像的高度）。因此，我们通过在着色器末尾计算每个线程组的最亮像素来减少该数组的大小。由于我们使用 64 个线程的线程组，这将使结果数组的维度减少 64 倍，新结果是每个线程组的最亮像素数组。可以尝试进一步并行减少该数组，但由于该数组已经相对较小，因此我们只需将数据传输到 CPU 并通过 CPU 上的线性搜索找到整个图像中最亮像素。注意：对于任何工具，不仅要知道何时使用它，还要知道何时不使用它。

这是计算着色器的第一个版本

#pragma kernel MaximumMain

Texture2D<float4> InputTexture;
int InputTextureWidth;

struct maxStruct 
{
   uint xMax; // column of maximum
   uint yMax; // row of maximum
   uint lMax; // luminance of maximum (0, ..., 1023)
};

RWStructuredBuffer<maxStruct> GroupMaxBuffer;

groupshared maxStruct rowMaxData[64];

[numthreads(64,1,1)]
void MaximumMain (uint3 groupID : SV_GroupID, 
      // 3D ID of thread group; range depends on Dispatch call
   uint3 groupThreadID : SV_GroupThreadID, 
      // 3D ID of thread in a thread group; range depends on numthreads
   uint groupIndex : SV_GroupIndex, 
      // flattened/linearized SV_GroupThreadID. 
      // groupIndex specifies the index within the group (0 to 63)
   uint3 id : SV_DispatchThreadID) 
      // = SV_GroupID * numthreads + SV_GroupThreadID
      // id.x specifies the row in the input texture image
{
   int column;

   // find the maximum of this row 
   // and store its data in rowMaxData[groupIndex]
   rowMaxData[groupIndex].xMax = 0; 
   rowMaxData[groupIndex].yMax = id.x; 
   rowMaxData[groupIndex].lMax = 0;
   for (column = 0; column < InputTextureWidth; column++) 
   {
      float4 color = InputTexture[uint2(column, id.x)];
      uint luminance = (uint)(1023.0 * 
         (0.21 * color.r + 0.72 * color.g + 0.07 * color.b));
      if (luminance > rowMaxData[groupIndex].lMax) 
      {
         rowMaxData[groupIndex].xMax = column;
         rowMaxData[groupIndex].lMax = luminance;
      }
   }

   // find the maximum of this group 
   // and store its data in GroupMaxBuffer[groupID.x]
   GroupMemoryBarrierWithGroupSync(); 
   if (0 == groupIndex) 
   {
      int row; 
      int rowMax = 0;
      for (row = 1; row < 64; row++) 
      { 
         if (rowMaxData[row].lMax > rowMaxData[rowMax].lMax) 
         {
            rowMax = row;
         }
      }
      GroupMaxBuffer[groupID.x] = rowMaxData[rowMax];
   }
}

第一行（特定于 Unity）#pragma kernel MaximumMain 指定函数 MaximumMain() 是一个计算着色器函数，可以从脚本中调用。

Texture2D<float4> InputTexture 是一个统一变量，用于访问 RGBA 输入纹理，而 int InputTextureWidth 是一个统一变量，用于获取其宽度，即一行像素的长度。

接下来的几行定义了一个结构体，用于存储最亮像素候选者的数据。xMax 和 yMax 是它的坐标，而 lMax 是它的相对亮度，从 0 到 1023

struct maxStruct 
{
   uint xMax; // column of maximum
   uint yMax; // row of maximum
   uint lMax; // luminance of maximum (0, ..., 1023)
};

定义 RWStructuredBuffer<maxStruct> GroupMaxBuffer 使用此结构体来定义一个 RWStructuredBuffer（对应于 Unity 中的计算缓冲区），用于存储每个线程组中最亮像素的信息。

定义 groupshared maxStruct rowMaxData[64] 使用相同的结构体来定义一个 groupshared 数组，用于存储当前线程组中每个线程（即每行）中最亮像素的信息。请注意，Direct3D 11 中 groupshared 数据的总大小限制为 32 KB。假设无符号 int 最多需要 8 字节，rowMaxData 数组最多需要 64 × 3 × 8 = 1536 字节，远低于 32 KB 的限制。

我们使用 [numthreads(64, 1, 1)] 而不是 [numthreads(1, 64, 1)] 来定义线程组的维度，因为线程组假定要处理 64 行的“一维数组”，并且通常使用 x 维度来表示一维组更简单。

计算着色器函数 MaximumMain() 请求所有可用的线程相关索引（尽管它没有使用 groupThreadID）。线程组的索引 groupID.x 用于索引 GroupMaxBuffer，线程组内的线程索引 groupIndex 用于索引 rowMaxData，并且整体调度索引 id.x 指定图像的整行。

然后函数 MaximumMain() 通过将变量 column 从 0 计数到 InputTextureWidth - 1 来运行循环遍历线程行的所有像素。它计算每个像素的相对亮度（按 1023 比例缩放以使用无符号 int），将此亮度与到目前为止的最大亮度进行比较，如果新亮度更大，则更新 rowMaxData[groupIndex] 中的数据，该数据在循环结束时包含关于该行中最亮像素的信息。

在计算完一行中最亮像素后，该函数计算线程组中最亮像素。由于我们需要比较不同线程的数据，因此首先必须确保所有线程都已确定其行中最亮像素。这是通过 GroupMemoryBarrierWithGroupSync() 实现的，它不仅确保该行之前的线程组的所有内存写入都已完成，而且还会等到线程组中的所有线程都到达该行。然后代码检查 groupIndex 是否为 0，即这是否是线程组的第零个线程。只有此线程确定 rowMaxData 中像素中最亮的像素，并将其写入 GroupMaxBuffer[groupID.x]。虽然这种解决方案有效（并且易于实现），但它在某种程度上是浪费的，因为线程组中的其他 63 个线程在第零个线程在此循环中工作时无事可做。

下面给出了一个更有效的替代方案，即计算着色器的第二个版本。它实现了类似于淘汰赛的减少操作（或折叠函数）：在第一步中，每个偶数编号的线程将其最亮的像素与下一个线程的最亮的像素进行比较。在第二步中，每个编号可被 4 整除的线程将其最佳候选像素与下一个线程进行比较，依此类推。在第六步（也是最后一步）中，第零个线程将其最佳候选者与第 32 个线程的最佳候选者进行比较。最后一次比较的“获胜者”就是该组中最亮的像素。在这个版本中仍然有很多空闲线程，但它只需要 6 步，而不是 64 次迭代的循环，这是一个值得的改进。避免任何空闲线程将需要多个调度调用，这会带来一些开销，因此可能不会节省任何时间。

这是改进的着色器

#pragma kernel MaximumMain

Texture2D<float4> InputTexture;
int InputTextureWidth;

struct maxStruct 
{
   uint xMax; // column of maximum
   uint yMax; // row of maximum
   uint lMax; // luminance of maximum (0, ..., 1023)
};

RWStructuredBuffer<maxStruct> GroupMaxBuffer;

groupshared maxStruct rowMaxData[64];

[numthreads(64,1,1)]
void MaximumMain (uint3 groupID : SV_GroupID, 
      // 3D ID of thread group; range depends on Dispatch call
   uint3 groupThreadID : SV_GroupThreadID, 
      // 3D ID of thread in a thread group; range depends on numthreads
   uint groupIndex : SV_GroupIndex, 
      // flattened/linearized SV_GroupThreadID. 
      // groupIndex specifies the index within the group (0 to 63)
   uint3 id : SV_DispatchThreadID) 
      // = SV_GroupID * numthreads + SV_GroupThreadID
      // id.x specifies the row in the input texture image
{
   int column;

   // find the maximum of this row 
   // and store its data in rowMaxData[groupIndex]
   rowMaxData[groupIndex].xMax = 0; 
   rowMaxData[groupIndex].yMax = id.x; 
   rowMaxData[groupIndex].lMax = 0;
   for (column = 0; column < InputTextureWidth; column++) 
   {
      float4 color = InputTexture[uint2(column, id.x)];
      uint luminance = (uint)(1023.0 * 
         (0.21 * color.r + 0.72 * color.g + 0.07 * color.b));
      if (luminance > rowMaxData[groupIndex].lMax) 
      {
         rowMaxData[groupIndex].xMax = column;
         rowMaxData[groupIndex].lMax = luminance;
      }
   }

   // find the maximum of this group
   // and store its data in GroupMaxBuffer[groupID.x]
   GroupMemoryBarrierWithGroupSync(); 
      // we have to wait for all writes to rowMaxData by the group's threads
   if (0 == (groupIndex & 1)) { // is groupIndex even?
      if (rowMaxData[groupIndex + 1].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 1];
      }
   }
   GroupMemoryBarrierWithGroupSync(); 
   if (0 == (groupIndex & 3)) { // is groupIndex divisible by 4?
      if (rowMaxData[groupIndex + 2].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 2];
      }
   }
   GroupMemoryBarrierWithGroupSync(); 
   if (0 == (groupIndex & 7)) { // is groupIndex divisible by 8?
      if (rowMaxData[groupIndex + 4].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 4];
      }
   }
   GroupMemoryBarrierWithGroupSync();
   if (0 == (groupIndex & 15)) { // is groupIndex divisible by 16?
      if (rowMaxData[groupIndex + 8].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 8];
      }
   }
   GroupMemoryBarrierWithGroupSync(); 
   if (0 == (groupIndex & 31)) { // is groupIndex divisible by 32?
      if (rowMaxData[groupIndex + 16].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 16];
      }
   }
   GroupMemoryBarrierWithGroupSync();
   if (0 == (groupIndex & 63)) { // is groupIndex divisible by 64?
      if (rowMaxData[groupIndex + 32].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 32];
      }
      GroupMaxBuffer[groupID.x] = rowMaxData[groupIndex];
         // copy maximum of group to buffer
   }
}

请注意，代码使用按位与运算符 & 与 2 的幂减 1 进行测试，以判断 groupIndex 是否可被 2 的幂整除。我们也可以使用模运算符 % 与 2 的幂代替。

调用计算着色器

调用计算着色器的 C# 脚本相对简单

using UnityEngine;

public class maximumScript : MonoBehaviour 
{
   public ComputeShader shader;
   public Texture2D inputTexture;

   public uint[] groupMaxData;
   public int groupMax;

   private ComputeBuffer groupMaxBuffer;
   
   private int handleMaximumMain;

   void Start () 
   {
      if (null == shader || null == inputTexture) 
      {
         Debug.Log("Shader or input texture missing.");
         return;
      }

      handleMaximumMain = shader.FindKernel("MaximumMain");
      groupMaxBuffer = new ComputeBuffer((inputTexture.height + 63) / 64, sizeof(uint) * 3);
      groupMaxData = new uint[((inputTexture.height + 63) / 64) * 3];

      if (handleMaximumMain < 0 || null == groupMaxBuffer || null == groupMaxData) 
      {
         Debug.Log("Initialization failed.");
         return;
      }
      
      shader.SetTexture(handleMaximumMain, "InputTexture", inputTexture);
      shader.SetInt("InputTextureWidth", inputTexture.width);
      shader.SetBuffer(handleMaximumMain, "GroupMaxBuffer", groupMaxBuffer);
   }

   void OnDestroy() 
   {
      if (null != groupMaxBuffer) 
      {
         groupMaxBuffer.Release();
      }
   }

   void Update()
   {
      shader.Dispatch(handleMaximumMain, (inputTexture.height + 63) / 64, 1, 1);
         // divided by 64 in x because of [numthreads(64,1,1)] in the compute shader code
         // added 63 to make sure that there is a group for all rows
      
      // get maxima of groups
      groupMaxBuffer.GetData(groupMaxData);
      
      // find maximum of all groups
      groupMax = 0;
      for (int group = 1; group < (inputTexture.height + 63) / 64; group++) 
      {
         if (groupMaxData[3 * group + 2] > groupMaxData[3 * groupMax + 2]) 
         {
            groupMax = group;
         }
      }
   }
}

该脚本具有用于计算着色器和输入纹理图像的公共变量，您需要设置这些变量。它在数组 uint[] groupMaxData 中返回其结果，位置由 groupMax 确定。

计算着色器的 RWStructuredBuffer 对应于计算缓冲区 groupMaxBuffer。请注意，这是一个包含 3 个无符号 int 元素的数组。数组 groupMaxData 具有相同的内存布局，但由无符号 int 组成；因此，它包含的元素是 groupMaxBuffer 的三倍。

Start() 函数进行一些错误检查，找到计算着色器函数的句柄，创建 groupMaxBuffer 和 groupMaxData，并设置计算着色器的统一变量。

OnDestroy() 函数释放了计算缓冲区，因为它不会被垃圾收集器释放。

Update() 函数只调用计算着色器函数，其中线程组的数量由图像的行数（即高度）除以一个线程组中的线程数（在本例中为 64）确定。我们在除法之前在行数上加 63，以确保对于不可被 64 整除的图像高度，我们有足够的线程组。

groupMaxBuffer.GetData(groupMaxData) 将数据从计算缓冲区复制到 groupMaxData 数组。然后代码通过循环遍历所有组来找到该数组中最亮的像素。请注意，索引为 group 的组的相对亮度位于 groupMaxData[3 * group + 2] 处，因为 groupMaxData 是一个“扁平化”的无符号整数数组，而不是一个包含 3 个无符号整数的结构的数组。

最后，最亮像素的相对亮度位于 groupMaxData[3 * groupMax + 2] 处。它的 x 坐标位于 groupMaxData[3 * groupMax + 0] 处，它的 y 坐标位于 groupMaxData[3 * groupMax + 1] 处。

总结

您已经完成了本教程！您所学到的一些内容是

如何对图像的所有像素进行并行搜索。
如何在线程组中同步线程的执行。
如何在线程组中的线程之间通信数据。
如何使用减少操作来加速“groupshared”数组中的搜索。

进一步阅读

如果您仍然想知道更多

有关 Unity 中的计算着色器的信息，请参见 “计算图像效果”部分。
有关 Unity 中的计算缓冲区的信息，请参见 Unity 文档中的描述。
有关 HLSL 中的 groupshared 变量的信息，请参见 Microsoft 开发者网络中的变量语法中的描述。
有关 HLSL 中的 GroupMemoryBarrierWithGroupSync() 和其他内在函数的信息，请参见 Microsoft 开发者网络中的内在函数中的描述。

< Cg 编程/Unity

除非另有说明，否则本页上的所有示例源代码均授予公有领域。