跳转到内容

可计算性与复杂性/形式语言/乔姆斯基层次结构/上下文无关语言

来自维基教科书,开放的书籍,开放的世界

上下文无关语言

[编辑 | 编辑源代码]

上下文无关语言是乔姆斯基层次结构中第二类最受限制的语言。该类语言可以通过使用“非终结符”和“终结符”的一组生成规则来描述,其中终结符是语言的字母表。这些规则将非终结符替换为终结符或非终结符的字符串,或替换为空字符串。惯例是用大写字母表示非终结符,用小写字母表示字母表中的符号(以及终结符)。这组替换规则称为上下文无关文法。如果存在一个上下文无关文法,它从起始非终结符生成语言中的每个字符串(而不生成其他字符串),那么语言就是上下文无关的。

例如,考虑以下文法

  • S -> ε(ε 代表空字符串)
  • S -> A
  • A -> aAb
  • A -> ab

从起始符号 S 开始,该文法可以生成空字符串或一个单词,该单词由任意数量的 a 后跟相同数量的 b 组成。您可能还记得,关于 正则语言 的部分,该语言也可以写成 ,它不属于正则语言类别,但我们现在看到它是上下文无关的。

请注意,在示例中,文法提供了在 S 和 A 非终结符上应用什么规则的选择。正是这些选择使文法能够生成语言中的所有字符串,而不仅仅是一个字符串。

下推自动机

[编辑 | 编辑源代码]

上下文无关语言类别与称为下推自动机的机器所识别的语言类别相同。下推自动机 (PDA) 是一种非确定性机器,它由有限个状态组成,这些状态之间存在着转换,就像 NFA(见 正则语言)一样,但它还增加了一个大小无限的栈。作为转换的一部分,机器可以弹出栈顶元素,并将其内容用作转换的一部分,也可以将新元素压入栈中。它的转换可以写成 {A,x,y} -> {B,z},其中 A 和 B 是起始状态和结束状态,x 是下一个输入字符,y 是从栈中弹出的内容,z 是压入栈的内容。x、y 或 z 中的任何一个都可以是 ε,这意味着在该转换中没有放置或消耗任何内容。

栈的添加为机器提供了有限但任意大的内存量,使它能够识别比有限自动机更复杂的语言。如上所述,语言 是一种上下文无关语言,因此存在一个识别它的 PDA。它可以通过为字符串中的每个 a 添加一个项目到栈中来实现,从而存储它们的计数,并为每个 b 从栈中删除一个项目。如果 ab 的数量相同,则当没有更多 b 时,栈将清空,因此可以有效地比较这两个数字。有关识别此语言的机器的更详细示例,请参阅下面的示例机器。

如果 PDA 根本不使用栈,它就等效于 NFA,因此等效于 DFA,因此有限自动机识别的任何语言都可以由 PDA 识别。因此,由于存在非正则的上下文无关语言,正则语言类别是上下文无关语言类别的真子集。

局限性

[编辑 | 编辑源代码]

与正则语言一样,还有许多语言不是上下文无关的。PDA 上的栈虽然提供了无限的存储容量,但仍然是一个栈,因此在任何给定时间只能访问最后一个放置在它上面的元素。访问更早的元素需要删除并因此丢失后面的元素,因为没有其他栈可以放置它们。PDA 还有另一个限制,它必须按接收到的顺序消耗输入字符,并且不能再次访问它们,除非将它们放到栈中。

这些限制使得 PDA 无法识别语言 。通过使用栈,PDA 可以计算 a 的数量,并将它们与 b 的数量进行比较,但在这样做时,它必须消耗对 a 的记录,因此无法以相同的方式比较 c。机器无法在不遮蔽 a 的计数的情况下创建 b 的记录来与 c 进行比较,因此该方法同样不成功。简而言之,PDA 内存缺乏任何类型的随机或直接访问阻止了它识别此语言和许多其他语言,由于没有 PDA 能够识别它们,因此它们不可能是上下文无关语言。

以下代码是 Perl 中的 PDA 模拟器示例。给定机器的描述和输入字符串,它模拟机器处理输入字符串,并显示机器是否接受。

语法为:progname.pl PDAFile inputFile,其中 PDAFile 是包含 TM 指令的文本文件,inputFile 是包含输入字符串的文本文件。一些示例输入,包括一组用于识别 的机器的 PDA 指令集,可以在 示例 PDA 输入 中找到。

#!usr/bin/perl
use Text::ParseWords;
use strict;
use warnings;

my (@branches, @stack, %alphabet);

# Grabs the filenames for the machine and the word to be run on it.
my $pdaFile = $ARGV[0];
my $input = $ARGV[1];

# We use subroutines to parse and verify the data in the input files.
# The machine data is stored in the $machine structure as the keys rules, accepts, alphabet, and startState.
my $machine = readPDA($pdaFile);
# Rules and accepts are extracted from the $machine structure for ease of access.
my @rules = @{$machine->{rules}};
my %accepts = %{$machine->{accepts}};
# This reads the input file and parses it into an array of strings, with each element being one input symbol.
# It checks to make sure the elements are all in the machine's alphabet.
my @string = readInput($input, $machine->{alphabet});
# The newstate is a temporary storage point for when a new state is calculated.
my $newstate;
# The usedRule is a temporary storage point for when a new state is calculated.
my $usedRule;
# The changed variable represents whether or the current branch is unfinished.
my $changed = 1;
push(@stack, "");
# The top level of the branches array corresponds to each branch of possibilities created by the non-determinism of the PDA.
# Each element contains the conditions of the machine for that branched possibility.
# The first element of each collection is the state of the branch.
$branches[0][0] = $machine->{startState};
# The second element is how much of the input string the branch has read.
$branches[0][1] = 0;
# The third element is an array containing the stack for that branch.
$branches[0][2][0] = "";
# Now that the first branch is initialized, the processing can begin

for my $i (0..$#branches)
{
   # When we start a branch, print the branch number
    print "\nBeginning branch ".$i.".\n";
   # As long as things keep changing, keep cycling through the rules.
    while($changed)
    {
       # Unless it changes while going through the rules, this branch will quit.
        $changed = 0;
       # The input word is printed, with the next symbol highlighted.
        print "Input: @string[0..$branches[$i][1]-1] <".$string[$branches[$i][1]]."> @string[$branches[$i][1]+1..@string-1]\n";
       # The current state of the stack is printed.
        print "Stack: @{$branches[$i][2]}\n";
       # A new state is calculated by checking conditions against the list of rules
        for my $rNum (0..$#rules)
        {
#            print "::$rules[$rNum][0]??$branches[$i][0]";
#            print "::$rules[$rNum][1]??$string[$branches[$i][1]]";
#            print "::$rules[$rNum][2]??".$branches[$i][2]->[-1]."::\n";
           # Checks the current state, input, and top stack item against the rule
            if ($rules[$rNum][0] eq $branches[$i][0] &&
                ($rules[$rNum][1] eq "e" || $rules[$rNum][1] eq $string[$branches[$i][1]]) &&
                ($rules[$rNum][2] eq "e" || $rules[$rNum][2] eq $branches[$i][2]->[-1]))
            {
                if ($changed == 0)
                {
                   # Set the new state.
                    $newstate = $rules[$rNum][3];
                   # The state transition is printed.
                    print "State: ".$branches[$i][0]." -> ".$newstate."\n\n";
                    $changed = 1;
                   # Because possible branched depend on this state, we can't update it yet.
                   # When we can update this state, $usedRule will help us remember which rule to base those updates on.
                    $usedRule = $rNum;
                }
                else
                {
                   # Set the new state.
                    my $branchState = $rules[$rNum][3];
                   # The state transition is printed.
                    print "(branching) State: ".$branches[$i][0]." -> ".$branchState."\n\n";
                    my $newBranch = @branches;
                   # The state in the new branch is set.
                    $branches[$newBranch][0] = $branchState;
                   # The new branch starts with the same string position as the old branch,
                    $branches[$newBranch][1] = $branches[$i][1];
                   # and the same stack, so the stack has to be replicated.
                    @{$branches[$newBranch][2]} = @{$branches[$i][2]};
                   # If we read a symbol from the input to make the transition,
                    unless ($rules[$rNum][1] eq "e")
                    {
                       # then we should move to the next symbol.
                        $branches[$newBranch][1]++;
                    }
                   # If we used an element from the stack to make the transition,
                    unless ($rules[$rNum][2] eq "e")
                    {
                       # then it's used up and should be removed.
                        pop(@{$branches[$newBranch][2]});
                    }
                   # If the rule adds something to the stack,
                    unless ($rules[$rNum][4] eq "e")
                    {
                       # then it gets added.
                        push(@{$branches[$newBranch][2]}, $rules[$rNum][4]);
                    }
                }
            }
        }
       # Now that any branching has been finished, we can update the original branch.
        if ($changed)
        {
           # If we read a symbol from the input to make the transition,
            unless ($rules[$usedRule][1] eq "e")
            {
               # then we should move to the next symbol.
                $branches[$i][1]++;
            }
           # If we used an element from the stack to make the transition,
            unless ($rules[$usedRule][2] eq "e")
            {
               # then it's used up and should be removed.
                pop(@{$branches[$i][2]});
            }
           # If the rule adds something to the stack,
            unless ($rules[$usedRule][4] eq "e")
            {
               # then it gets added.
                push(@{$branches[$i][2]}, $rules[$usedRule][4]);
            }
           # The state changes to the new state.
            $branches[$i][0] = $newstate;
        }
    }
   # When the input is exhausted, the branch is in its final state.
    print "Final state of branch ".$i." is ".$branches[$i][0]."\n";
   # If that state is in the accept states list, the machine accepts the input and halts.
    if (defined($accepts{$branches[$i][0]}) && $branches[$i][1] == $#string)
    {
        print "The machine accepts the string.\n";
        exit;
    }
   # If that state doesn't, point it out.
    else { print "The branch does not accept the string.\n"; }
   # And move on.
    $changed = 1;
}
print "The machine does not accept the string.\n";

###################################################

sub readPDA
# This subroutine reads the machine data from the specified file into variables (mostly hashes).
{
    my (%states, %stackAlphabet, %accepts, %alphabet, @rules);
    open(INFILE, shift) or die "Can't open machine file: $!";
    
# This block reads the list of states from the machine file.
   # Discards the section header,
    <INFILE>;
    my $line = <INFILE>;
    chomp($line);
    my @words = &parse_line('\s+', 0, $line);
    for (@words)
    {
       # records the state names for checking the rules,
        $states{$_} = 0;
    }

# This block reads the start state from the machine file.
   # Discards the header,
    <INFILE>;
    my $startState = <INFILE>;
   # takes the whole line as the start state,
    chomp($startState);
   # and makes sure that the start state is defined in the list of states.
    defined($states{$startState}) or die "The start state $startState isn't a state!";
    
# This block reads the list of accepting states from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into state names,
    @words = parse_line('\s+', 0, $line);
    for (@words)
    {
   # checks to make sure that the accept states are defined states,
        defined($states{$_}) or die "$_ isn't a state!";
   # and defines those names in a new hash.  The use of a hash makes it easier to determine later if a specific state name accepts or not.
        $accepts{$_} = 1;
    }
    
# This block reads the list of symbols in the alphabet from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into alphabet symbols (note that the symbols can be of arbitrary length),
    @words = parse_line('\s+', 0, $line);
   # e is used as the empty symbol in the rules.
    $alphabet{e} = 1;
    for (@words)
    {
       # This records which symbols are in the alphabet for checking the rules.
        $alphabet{$_} = 0;
    }
    
# This block reads the list of symbols in the stack alphabet from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into alphabet symbols (note that the symbols can be of arbitrary length),
    @words = parse_line('\s+', 0, $line);
   # e is used as the empty symbol in the rules.
    $stackAlphabet{e} = 1;
    for (@words)
    {
       # This records which symbols are in the alphabet for checking the rules.
        $stackAlphabet{$_} = 0;
    }
    
# This block reads the state transition rules from the machine file.
   # Discards the header,
    <INFILE>;
   # This variable synchronizes the position of each rule in the rules array.
    my $rulesCounter=0;
    while(<INFILE>)
    {
   # breaks each rule into start state, input symbol, stack symbol, end state, and new stack symbol.
        chomp;
        @words = parse_line('\s+', 0, $_);
   # checks that all five pieces are defined in the state and alphabet hashes,
        defined($states{$words[0]}) or die "$words[0] isn't a defined state!";
        defined($alphabet{$words[1]}) or die "$words[1] isn't defined in the alphabet!";
        defined($stackAlphabet{$words[2]}) or die "$words[2] isn't defined in the stack alphabet!";
        defined($states{$words[3]}) or die "$words[3] isn't a defined state!";
        defined($stackAlphabet{$words[4]}) or die "$words[4] isn't defined in the stack alphabet!";
   # then creates an array of each rule.
        for (0..4)
        {
            $rules[$rulesCounter][$_] = $words[$_];
        }
       # The synchronization variable has to be updated.
        $rulesCounter++;
    }

   # Reading complete, the subroutine closes the file and returns the name of the start state.
    close INFILE;
   # The relevant data is stored in the $machine structure and returned to the main routine.
    my $machine = 
    {
        rules      => \@rules,
        accepts    => \%accepts,
        alphabet   => \%alphabet,
        startState => $startState
    };
    return $machine;
}

sub readInput
# This subroutine reads the input string from the specified file into an array of symbols.
{
    open(INFILE, shift) or die "Can't open ".$input.": $!";
    my $alphaRef = shift;
   # The first line of the file is read as the input, with symbols delimited by spaces.
    my $line = <INFILE>."";
    chomp($line);
    my @string = parse_line('\s+', 0, $line);
   # This makes sure every symbol in the input string was defined in the machine's alphabet.
    for (@string)
    { exists($alphaRef->{$_}) or die "$_ in $input isn't in this machine's alphabet!"; }
    close INFILE;
   # Since the machine can continue to make transitions after the string is exhausted, this adds a blank element to keep the string from overrunning.
    push(@string, "");
    return @string;
}

上一步 | 下一步

华夏公益教科书