可计算性和复杂性/形式语言/乔姆斯基层级/上下文敏感语言
上下文敏感语言是指其字符串可以通过上下文敏感文法生成的语言。这种文法形式类似于上下文无关文法(见上下文无关语言),但它不使用形式为 A -> X 的规则,其中 A 是一个非终结符,X 是一个终结符和非终结符的字符串,而上下文敏感文法包含形式为 X -> Y 的规则,其中 X 和 Y 是终结符和非终结符的字符串,并受制于 Y 的长度至少要与 X 的长度相同。有一个例外:如果 S 是起始符号,并且 S 从未出现在任何转换的右侧,则允许 S -> ε 的转换。这使得文法可以生成空字符串。
这种文法的一个例子是
- S -> ε|aAbc|abc
- A -> aAbC|abC
- Cb -> bC
- Cc -> cc
这种文法生成语言,这是一种上下文无关的语言。
上下文敏感文法的另一种定义是,其中每个转换都具有 YAZ -> YXZ 的形式,其中 A 是一个非终结符,而 X、Y 和 Z 都是终结符和非终结符的字符串(Y 和 Z 可以为空)。由于 Y 和 Z 保持不变,这与 A -> X 转换的作用相同,但可以使用 A 的上下文作为转换的要求。在1中,诺姆·乔姆斯基证明了这两个定义是等价的,并且生成相同的语言。
根据仅使用空上下文的第二种定义,任何上下文无关文法也是上下文敏感文法,因此上下文无关语言集是上下文敏感语言集的子集。由于在上面的示例中,上下文敏感文法可以生成语言,而 PDA 无法生成,因此上下文无关语言必须是上下文敏感语言的真子集。
上下文敏感语言类等价于可以被线性有界自动机识别的语言类。线性有界自动机(LBA)是一种基于状态的确定性机器,它有一个包含输入字符串的“磁带”以及一个沿磁带左右移动的读写头。机器根据其当前状态和头位置磁带上的符号,以及有限数量的规则来确定其下一个状态,要写入磁带上的符号以及要移动磁带头的方向(左/右)。与有限自动机和下推自动机类似,线性有界自动机根据它们是否停止在接受状态来接受或拒绝输入。LBA 在没有与其当前状态和读取字符的组合匹配的规则时停止。
对 LBA 的唯一其他限制是,磁带必须是有限的,其大小是根据输入字符串的大小线性推导出来的。例如,如果一台特定机器使用长度为 2*s+5 的磁带,其中 s 是输入的大小,那么在处理大小为 10 的字符串时,它的磁带长度将为 25。
这种设计赋予 LBA 很大的计算自由度,因此它可以识别任何字符串识别空间需求线性增长的语言。一个例子是语言。这种语言可以在 s+1 空间内识别,这意味着它只需要它所写入的磁带部分,以及末尾的一个空字符。这是因为与 PDA 或 DFA 不同,LBA 无需按顺序处理输入 - 它可以在(并且在下面的样本机器中确实会)按组标记字符,首先标记一个“a”,然后标记一个“b”,然后标记一个“c”,然后标记一个“d”,并重复。如果它没有在同一轮次中耗尽所有字符类型,它将不会接受。
这种语言无法被 DFA 和 PDA 识别,因为它们必须在经过时计算并存储“a”的数量。DFA 只能记住等于或少于其状态数的“a”的数量,因此无法检查任意长字符串中“b”的数量。PDA 可以将“a”的数量存储在堆栈中,但它在将它们与“b”进行比较时必然会破坏这些信息,因此无法将它们与“c”的数量进行比较。
请注意,尽管 LBA 可能进入无限循环,但由于只有一个有限磁带,有限数量的符号可以放在上面,以及有限数量的状态它可以处于,因此给定机器和输入的条件总数是有限的。这意味着任何无限循环都必须在有限步数内返回到它已经遇到的条件集。此外,由于 LBA 是确定性的,因此重复其条件集的机器将无限期地继续重复它刚刚遍历的循环。这两个陈述表明,LBA 仅当且仅当它重复其全部条件(包括状态、头位置和磁带内容)时,才会陷入无限循环(因此永远不会接受或拒绝)。由于是所有可能条件的总数,其中 S 是状态数,T 是磁带的大小,A 是字母表的大小,任何运行超过 S*T*A^T 步的机器都必须重复一组条件,因此处于无限循环中。虽然这个数字可能非常大,但它提供了一种明确的、有限时间的方法来确定给定的 LBA 是否会在给定输入上循环。类似但更高级的图灵机(见无限制语言)将没有这种方法。
下面的代码是 Perl 中的样本 LBA 模拟器。给定机器的描述和一个输入字符串,它模拟机器处理输入字符串,并显示机器是否接受。
语法为:progname.pl LBAFile inputFile,其中 LBAFile 是包含 LBA 指令的文本文件,inputFile 是包含输入字符串的文本文件。一些示例输入,包括一组用于机器识别 的 LBA 指令集,可在 LBA 示例输入 中找到。
#!usr/bin/perl
use Text::ParseWords;
use strict;
use warnings;
my (@tape, $tapeIndex, $tapeMax, %accepts, %alphabet, @rules);
# Grabs the filenames for the machine and the word to be run on it.
my $lbaFile = $ARGV[0];
my $input = $ARGV[1];
# We use subroutines to parse and verify the data in the input files.
# The machine data is stored in the $machine structure as the keys rules, accepts, alphabet, and startState.
my $machine = readLBA($lbaFile);
# Rules and accepts are extracted from the $machine structure for ease of access.
@rules = @{$machine->{rules}};
%accepts = %{$machine->{accepts}};
# This reads the input file and parses it into an array of strings, with each element being one input symbol.
# It checks to make sure the elements are all in the machine's alphabet.
($tapeMax, @tape) = readInput($input, $machine->{alphabet}, $machine->{tapeBound});
# $changed records whether or not a rule has been used when running through the rules list to make transitions.
my $changed = 1;
# $state is the state the Turing Machine is in, and is initialized to the start state from the machine file.
my $state = $machine->{startState};
# $tapeIndex is the position of the machine's head on the tape.
$tapeIndex = 0;
# Now that the machine is initialized, we can begin making transitions
# As long as things keep changing, keep cycling through the rules.
while($changed)
{
# Unless it changes while going through the rules, the machine will terminate.
$changed = 0;
# The current tape is printed, with the symbol under the head highlighted.
print "@tape[0..$tapeIndex-1]<".$tape[$tapeIndex].">@tape[$tapeIndex+1..@tape-1]\n";
# The current state of the machine is printed.
print "$state\n";
# A new state is calculated by checking conditions against the list of rules
for my $rNum (0..@rules-1)
{
# print "::$rules[$rNum][0]??$branches[$i][0]";
# print "::$rules[$rNum][1]??$string[$branches[$i][1]]";
# print "::$rules[$rNum][2]??".${$branches[$i][2]}[@{$branches[$i][2]}-1]."::\n";
# Checks the current state and tape symbol against the rule being examined
if (($rules[$rNum][0] eq $state) and
($rules[$rNum][1] eq $tape[$tapeIndex]))
{
# The state transition is printed.
# print "State: ".$state." -> ".$rules[$rNum][2]."\n\n";
# Set the new state,
$state = $rules[$rNum][2];
# Write the new symbol to the tape,
$tape[$tapeIndex] = $rules[$rNum][3];
# Shift the tape to the new index,
$tapeIndex += $rules[$rNum][4];
# and make sure it hasn't run past the left edge of the tape.
if ($tapeIndex < 0) { $tapeIndex = 0; }
# If the machine nears the end of the allocated tape, and more tape is allowed, expand the tape.
if (($tapeIndex >= @tape-2) and (@tape < $tapeMax))
{
push(@tape, "_");
}
# If the head runs past the right end of the tape, keep it on the right end.
if ($tapeIndex >= $tapeMax-1) { $tapeIndex = $tapeMax-1; }
$changed = 1;
# Once we've made a transition, we can stop and begin looking for the next one.
last;
}
}
}
# When there are no more possible transitions, if the machine is in an accepting state,
if (defined($accepts{$state}))
{
# Print that it accepts and quit.
print "The machine accepts the string.\n";
exit;
}
# Otherwise, print that it does not accept, and quit.
print "The machine does not accept the string.\n";
exit;
###################################################
sub readLBA
# This subroutine reads the machine data from the specified file into variables (mostly hashes).
{
my (%states, %accepts, %alphabet, @rules, @tapeBound);
open(INFILE, shift) or die "Can't open machine file: $!";
# This block reads the list of states from the machine file.
# Discards the section header,
<INFILE>;
my $line = <INFILE>;
chomp($line);
my @words = &parse_line('\s+', 0, $line);
for (@words)
{
# records the state names for checking the rules,
$states{$_} = 0;
}
# This block reads the start state from the machine file.
# Discards the header,
<INFILE>;
my $startState = <INFILE>;
# takes the whole line as the start state,
chomp($startState);
# and makes sure that the start state is defined in the list of states.
defined($states{$startState}) or die "The start state $startState isn't a state!";
# This block reads the list of accepting states from the machine file.
# Discards the header,
<INFILE>;
$line = <INFILE>;
chomp($line);
# breaks up the line into state names,
@words = &parse_line('\s+', 0, $line);
for (@words)
{
# checks to make sure that the accept states are defined states,
defined($states{$_}) or die "$_ isn't a state!";
# and defines those names in a new hash. The use of a hash makes it easier to determine later if a specific state name accepts or not.
$accepts{$_} = 1;
}
# This block reads the list of symbols in the alphabet from the machine file.
# Discards the header,
<INFILE>;
$line = <INFILE>;
chomp($line);
# breaks up the line into alphabet symbols (note that the symbols can be of arbitrary length),
@words = &parse_line('\s+', 0, $line);
# e is used as the empty symbol in the rules.
$alphabet{e} = 1;
for (@words)
{
# This records which symbols are in the alphabet for checking the rules.
$alphabet{$_} = 0;
}
# This block reads the linear bound on the tape size from the machine file.
# Discards the header,
<INFILE>;
$line = <INFILE>;
chomp($line);
# breaks up the line into two parts - the "slope" and the "intercept".
@words = &parse_line('\s+', 0, $line);
# and stores them in the @tapeBound array
$tapeBound[0] = $words[0];
$tapeBound[1] = $words[1];
# This block reads the state transition rules from the machine file.
# Discards the header,
<INFILE>;
# This variable synchronizes the position of each rule in the rules array.
my $rulesCounter=0;
while(<INFILE>)
{
# breaks each rule into start state, input symbol, stack symbol, end state, and new stack symbol.
chomp($_);
@words = &parse_line('\s+', 0, $_);
# checks that the first four pieces are defined in the state and alphabet hashes,
defined($states{$words[0]}) or die "$words[0] isn't a defined state!";
defined($alphabet{$words[1]}) or die "$words[1] isn't defined in the alphabet!";
defined($states{$words[2]}) or die "$words[2] isn't a defined state!";
defined($alphabet{$words[3]}) or die "$words[3] isn't defined in the alphabet!";
# and converts the left/right instruction into a number to be added to the position counter.
if (($words[4] eq "left") or ($words[4] eq "-1"))
{
$words[4]=-1;
}
elsif (($words[4] eq "right") or ($words[4] eq "+1"))
{
$words[4]=1;
}
else
{
(die "$words[4] isn't left, right, -1, or +1!");
}
# then creates an array of each rule.
for (0..4)
{
$rules[$rulesCounter][$_] = $words[$_];
}
# The synchronization variable has to be updated.
$rulesCounter++;
}
# Reading complete, the subroutine closes the file and returns the name of the start state.
close INFILE;
# The relevant data is stored in the $machine structure and returned to the main routine.
my $machine =
{
rules => \@rules,
accepts => \%accepts,
alphabet => \%alphabet,
startState => $startState,
tapeBound => \@tapeBound
};
return $machine;
}
sub readInput
# This subroutine reads the starting tape from the specified file into an array of symbols.
{
my (@tape, %alphabet, $alphaRef, @tapeBound, $boundRef);
open(INFILE, shift) or die "Can't open ".$input.": $!";
$alphaRef = shift;
%alphabet = %{$alphaRef};
$boundRef = shift;
@tapeBound = @{$boundRef};
# The first line of the file is read as the initial state of the tape, with symbols delimited by spaces.
my $line = <INFILE>."";
chomp($line);
@tape = &parse_line('\s+', 0, $line);
# This makes sure every symbol in the input string was defined in the machine's alphabet.
for (@tape)
{ defined($alphabet{$_}) or die "$_ in $input isn't in this machine's alphabet!"; }
close INFILE;
my $tapeMax = @tape*$tapeBound[0]+$tapeBound[1];
return ($tapeMax, @tape);
}