MATLAB 编程/相位声码器和编码器

以下相位声码器及其相应的频谱编码例程是为 MATLAB 克隆编译器编写的，它可能在GNU Octave中运行。有关背景信息，请查阅

相位声码器在英文维基百科上；
Laroche, J. 和 Dolson, M. (1999) "改进的相位声码器音频时标修改" IEEE 语音和音频处理学报 7(3):323-32;
Miller Puckette (1995) "锁相声码器" [PostScript 文件] IEEE ASSP 应用信号处理到音频和声学会议论文集 (莫洪克，纽约：1995) - 通过在重新合成期间锁相相邻频谱箱和帧来开创相位相干传播；
Axel Röbel (2003) "相位声码器中瞬态处理的新方法" 第 6 届国际数字音频效果大会论文集 (伦敦：2003 年 9 月 8-11 日)；以及
MATLAB 编程，其中此代码作为示例给出。

源代码

以下代码、注释和相关评论由作者发布到公共领域，或者如果这不可行，则允许所有人将此材料用于任何目的。

% fs.m - synthesis from frequency spectra.
%     [output: waveform] = fs(freqresp: series of spectra)
%
% James Salsman, 1999-2000
%
% Note: sometimes this produces data outside the (1,-1) range, even when 
% run with input from that smaller domain given to af.m.  It is safe to 
% clamp outlying values to the extrema of the corresponding input domain.
% If the resulting signal sounds clipped, it is NOT because of clamping 
% domain outliers of the range (e.g., 1.2, -1.3) to the domain edges (1,-1). 

% To improve the quality, try increasing the length of the fft; if that
% doesn't work, then try increasing the sampling rate.  To improve the speed, 
% rewrite it in C; fftPrep is the only complex vector, and the call to 'sort' 
% is only for the ordering of the dot-product magnitudes, not the sorting.

function [output] = fs(freqresp)

samplingRate = 16000;                  % changed from 8000
frameRate    = 100;                    % should be even, maybe try 160
windowStep   = samplingRate/frameRate; % 16000/100 = 160, shall be even
% windowSize   = samplingRate/40;        % must be even; use 256 for 11k/s rate 
windowSize   = windowStep*2;           % this is better [2002]

[rows, cols] = size(freqresp);         % input dimensions
origFFtSize = rows*2;                  % should be power of two

% Allocate (preextend) all space needed for output array and helpers.
%
output     =  zeros(cols*windowStep + (windowSize - windowStep), 1);
FFtPrep    =  zeros(1, origFFtSize) + 0*i;  % complex ifft input
iFFtReal   =  zeros(1, origFFtSize);        % real ifft output
phases     =  zeros(1, origFFtSize/2);      % resynthesis phase array
newPhases  =  zeros(1, origFFtSize/2);      % uses "-1" for unassigned phases
transInd   =  zeros(1, origFFtSize/2);      % index into bins by magnitude
ignore     =  zeros(1, origFFtSize/2);      % sort values unused but can't hurt

for n = 1:origFFtSize/2,              % Initialize synthesis phases.
    phases(n) = pi*mod(n, 2);         % Each starting phase 180 deg's apart
end;                                  %   from the next one.

% pre-calculate phase transition from one frame to the next for base 2nd bin
%
phaseFactor = 2*pi*samplingRate/(frameRate*origFFtSize);

% resynthesis COLA window should be Portnoff (sinc) because iFFT will 
% shape like the window it was taken from
%
wind = sinc((-windowSize/2:windowSize/2-1)/(windowSize/2));  % pure sinc

for c = 1:cols,                       % iterate over columns
 
    first = (c-1)*windowStep + 1;     % output index
    last = first + windowSize - 1;    %   boundaries
 
    for b = 2:origFFtSize/2,          % iterate over bins 

        % prepare resynthesis
        %
        [re, im] = pol2cart(phases(b), freqresp(b, c)); % polar to cartesian
        FFtPrep(b) = re + i*im;       % rectangular to complex
        FFtPrep(origFFtSize - b+2) = conj(FFtPrep(b)); % the conjugates count

    end;
 
    iFFtReal = real(ifft(FFtPrep));   % resynthesize
 
    output(first:last) = output(first:last) ... % note: column vector (')
      + (iFFtReal( origFFtSize/2 - windowSize/2 : ... 
                   origFFtSize/2 + windowSize/2 - 1  ) .* wind )';  % COLA

    if c < cols,                      % figure out the next set of phases
	
        newPhases = ones(1, origFFtSize/2)*-1;  % "-1" for unassigned phase

        % set transInd to index the least-to-greatest transitioning magnitudes
        %
        [ignore, transInd] = sort(freqresp(:,c)' .* freqresp(:,c+1)');

        % iterate downwards such that indexed bin will be greatest mags first
        %
        for idx = origFFtSize/2:-1:1, % step by -1 from greatest to least
        
            b = transInd(idx);        % get bin from index

            if b-1 > 0,               % adjacent lower neighboring bin
                lower = newPhases(b-1); 
            else
                lower = -1;           % nonexistent = unassigned
            end;

            if b+1 < origFFtSize/2,   % adjacent higher neighboring bin
                higher = newPhases(b+1); 
            else
                higher = -1;          % nonexistent = unassigned
            end;

            if (lower == -1) & (higher == -1),    % both unassigned: propagate
                newPhases(b) = mod(phases(b) + (b-1)*phaseFactor, 2*pi);
            elseif lower == -1,                   % make 180 degs from higher
                newPhases(b) = mod(higher + pi, 2*pi);
            elseif higher == -1,                  % make 180 degs from lower
                newPhases(b) = mod(lower + pi, 2*pi);
            else                                  % both neighbors assigned
                avgAng = (lower + higher)/2;      % Mean will be 180 deg off
                if abs(lower - higher) > pi,      %   when the angles are that 
                   avgAng = avgAng + pi;          %   far apart; if so, adjust
                end;                              %   180 degs from mean angle.
                newPhases(b) = mod(avgAng + pi, 2*pi);
            end;

        end;

        phases = newPhases;   % replace array for next iteration

    end; % if -- skips last iteration

end;

以下是相应的编码器

% af.m - frequency spectrum analysis.
%   [freqresp: sequence of spectra] = af(input: waveform)
%
% James Salsman, '99-2000; adapted from Malcolm Slaney's mfcc.m [from his 
% "Auditory Toolbox" at www.interval.com.]
%
% Also works when samplingRate is 8000 or fftSize is 256, but I haven't tried 
% both together.

function [freqresp] = af(input)

samplingRate = 16000;                % samples/sec
frameRate = 100;                     % started at 100/sec, maybe use 160 later
windowStep = samplingRate/frameRate; % 16000/100 = 160; shall be even
windowSize = 400;                    % large, but from ETSI standard ES 201108
windowSize = samplingRate/40;        % must be even; use 256 for 11k/s rate 
fftSize = 512;                       % larger better but slower; usually 2^int.

[rows, cols] = size(input);          % sanity-check input dimensions
if (rows > cols)                     % if there are more rows than columns
   input=input';                     % then transpose 
end

% hamming window shapes input signal
%
hamWindow = 0.54 - 0.46*cos(2*pi*(0:windowSize-1)/windowSize);

% how many columns of data we will end up with
%
cols = ceil((length(input) - windowSize)/windowStep);

% Allocate (preextend) all the space we need for the output arrays.
%
freqresp  =  zeros(fftSize/2, cols);
fftData   =  zeros(1, fftSize);
fftMag    =  zeros(1, fftSize);

for c = 1:cols,                             % for each column of data

    first = (c-1)*windowStep + 1;           % determine
    last = first + windowSize - 1;          %   endpoints
    
    % shape the data with a hamming window
    %
    fftData(1:windowSize) = input(first:last).*hamWindow;

    % find the magnitude of the fft
    %
    fftMag = abs(fft(fftData));             % keep only real magnitudes

    freqresp(:, c) = fftMag(1:fftSize/2)';  % emit

end;