跳转到内容

统计分析:使用 R 入门 / R / 逻辑运算

来自维基教科书,开放的书籍,为一个开放的世界
访问向量元素时,我们看到了如何使用一个简单的涉及小于号 (<) 的逻辑表达式来生成一个逻辑向量,然后可以用来选择小于某个值的元素。这种类型的逻辑运算非常有用。除了 < 之外,还有其他一些比较运算符。以下是完整集合(有关更多详细信息,请参阅 ?Comparison
  • <(小于)和 <=(小于或等于)
  • >(大于)和 >=(大于或等于)
  • ==(等于[1])和 !=(不等于)

通过使用组合逻辑向量,可以获得更大的灵活性。例如,我们可能想找出哪些美国州的面积小于 10 000 大于 100 000 平方英里,或者找出哪些州的面积大于 100 000 平方英里并且名称较短。下面的代码展示了如何使用以下 R 符号来做到这一点

  • &(“和”)
  • |(“或”)
  • !(“非”)

当使用逻辑向量时,以下函数特别有用,如下所示

  • which() 识别逻辑向量中哪些元素为 TRUE
  • sum() 可用于给出逻辑向量中为 TRUE 的元素数量。这是因为 sum() 强制将其输入转换为数字,如果 TRUE 和 FALSE 被转换为数字,它们将分别取值为 1 和 0。
  • ifelse() 根据逻辑向量中的每个元素是 TRUE 还是 FALSE 返回不同的值。具体来说,像 ifelse(aLogicalVector, vectorT, vectorF) 这样的命令会接受 aLogicalVector,并对每个为 TRUE 的元素返回 vectorT 中的对应元素,对每个为 FALSE 的元素返回 vectorF 中的对应元素。额外说明的是,如果 vectorTvectorFaLogicalVector 短,它们将通过重复扩展到正确的长度。
输入
### In these examples, we'll reuse the American states data, especially the state names
### To remind yourself of them, you might want to look at the vector "state.names"

nchar(state.name)       # nchar() returns the number of characters in strings of text ...
nchar(state.name) <= 6  #so this indicates which states have names of 6 letters or fewer
ShortName <- nchar(state.name) <= 6         #store this logical vector for future use
sum(ShortName)          #With a logical vector, sum() tells us how many are TRUE (11 here)
which(ShortName)        #These are the positions of the 11 elements which have short names
state.name[ShortName]   #Use the index operator [] on the original vector to get the names
state.abb[ShortName]    #Or even on other vectors (e.g. the 2 letter state abbreviations)

isSmall <- state.area < 10000  #Store a logical vector indicating states <10000 sq. miles
isHuge  <- state.area > 100000 #And another for states >100000 square miles in area
sum(isSmall)                   #there are 8 "small" states
sum(isHuge)                    #coincidentally, there are also 8 "huge" states

state.name[isSmall | isHuge]   # | means OR. So these are states which are small OR huge
state.name[isHuge & ShortName] # & means AND. So these are huge AND with a short name
state.name[isHuge & !ShortName]# ! means NOT. So these are huge and with a longer name

### Examples of ifelse() ###

ifelse(ShortName, state.name, state.abb) #mix short names with abbreviations for long ones
# (think of this as "*if* ShortName is TRUE then use state.name *else* use state.abb)

### Many functions in R increase input vectors to the correct size by duplication ###
ifelse(ShortName, state.name, "tooBIG")   #A silly example: the 3rd argument is duplicated
size <- ifelse(isSmall, "small", "large") #A more useful example, for both 2nd & 3rd args
size                                      #might be useful as an indicator variable?             
ifelse(size=="large", ifelse(isHuge, "huge", "medium"), "small") #A more complex example
结果
> ### In these examples, we'll reuse the American states data, especially the state names
> ### To remind yourself of them, you might want to look at the vector "state.names"
>  
> nchar(state.name)       # nchar() returns the number of characters in strings of text ...
 [1]  7  6  7  8 10  8 11  8  7  7  6  5  8  7  4  6  8  9  5  8 13  8  9 11  8  7  8  6 13
[30] 10 10  8 14 12  4  8  6 12 12 14 12  9  5  4  7  8 10 13  9  7
> nchar(state.name) <= 6  #so this indicates which states have names of 6 letters or fewer
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
[15]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[29] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[43]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
> ShortName <- nchar(state.name) <= 6         #store this logical vector for future use
> sum(ShortName)          #With a logical vector, sum() tells us how many are TRUE (11 here)
[1] 11
> which(ShortName)        #These are the positions of the 11 elements which have short names
 [1]  2 11 12 15 16 19 28 35 37 43 44
> state.name[ShortName]   #Use the index operator [] on the original vector to get the names
 [1] "Alaska" "Hawaii" "Idaho"  "Iowa"   "Kansas" "Maine"  "Nevada" "Ohio"   "Oregon"
[10] "Texas"  "Utah"  
> state.abb[ShortName]    #Or even on other vectors (e.g. the 2 letter state abbreviations)
 [1] "AK" "HI" "ID" "IA" "KS" "ME" "NV" "OH" "OR" "TX" "UT"
>  
> isSmall <- state.area < 10000  #Store a logical vector indicating states <10000 sq. miles
> isHuge  <- state.area > 100000 #And another for states >100000 square miles in area
> sum(isSmall)                   #there are 8 "small" states
[1] 8
> sum(isHuge)                    #coincidentally, there are also 8 "huge" states
[1] 8
>  
> state.name[isSmall | isHuge]   # | means OR. So these are states which are small OR huge
 [1] "Alaska"        "Arizona"       "California"    "Colorado"      "Connecticut"  
 [6] "Delaware"      "Hawaii"        "Massachusetts" "Montana"       "Nevada"       
[11] "New Hampshire" "New Jersey"    "New Mexico"    "Rhode Island"  "Texas"        
[16] "Vermont"      
> state.name[isHuge & ShortName] # & means AND. So these are huge AND with a short name
[1] "Alaska" "Nevada" "Texas" 
> state.name[isHuge & !ShortName]# ! means NOT. So these are huge and with a longer name
[1] "Arizona"    "California" "Colorado"   "Montana"    "New Mexico"
>  
> ### Examples of ifelse() ###
>  
> ifelse(ShortName, state.name, state.abb) #mix short names with abbreviations for long ones
 [1] "AL"     "Alaska" "AZ"     "AR"     "CA"     "CO"     "CT"     "DE"     "FL"    
[10] "GA"     "Hawaii" "Idaho"  "IL"     "IN"     "Iowa"   "Kansas" "KY"     "LA"    
[19] "Maine"  "MD"     "MA"     "MI"     "MN"     "MS"     "MO"     "MT"     "NE"    
[28] "Nevada" "NH"     "NJ"     "NM"     "NY"     "NC"     "ND"     "Ohio"   "OK"    
[37] "Oregon" "PA"     "RI"     "SC"     "SD"     "TN"     "Texas"  "Utah"   "VT"    
[46] "VA"     "WA"     "WV"     "WI"     "WY"    
> # (think of this as "*if* ShortName is TRUE then use state.name *else* use state.abb)
>  
> ### Many functions in R increase input vectors to the correct size by duplication ###
> ifelse(ShortName, state.name, "tooBIG")   #A silly example: the 3rd argument is duplicated
 [1] "tooBIG" "Alaska" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG"
[10] "tooBIG" "Hawaii" "Idaho"  "tooBIG" "tooBIG" "Iowa"   "Kansas" "tooBIG" "tooBIG"
[19] "Maine"  "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG"
[28] "Nevada" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "Ohio"   "tooBIG"
[37] "Oregon" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "Texas"  "Utah"   "tooBIG"
[46] "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG"
> size <- ifelse(isSmall, "small", "large") #A more useful example, for both 2nd & 3rd args
> size                                      #might be useful as an indicator variable?             
 [1] "large" "large" "large" "large" "large" "large" "small" "small" "large" "large"
[11] "small" "large" "large" "large" "large" "large" "large" "large" "large" "large"
[21] "small" "large" "large" "large" "large" "large" "large" "large" "small" "small"
[31] "large" "large" "large" "large" "large" "large" "large" "large" "small" "large"
[41] "large" "large" "large" "large" "small" "large" "large" "large" "large" "large"
> ifelse(size=="large", ifelse(isHuge, "huge", "medium"), "small") #A more complex example
 [1] "medium" "huge"   "huge"   "medium" "huge"   "huge"   "small"  "small"  "medium"
[10] "medium" "small"  "medium" "medium" "medium" "medium" "medium" "medium" "medium"
[19] "medium" "medium" "small"  "medium" "medium" "medium" "medium" "huge"   "medium"
[28] "huge"   "small"  "small"  "huge"   "medium" "medium" "medium" "medium" "medium"
[37] "medium" "medium" "small"  "medium" "medium" "medium" "huge"   "medium" "small" 
[46] "medium" "medium" "medium" "medium" "medium"
如果你做过任何计算机编程,你可能更习惯在“if”语句的上下文中处理逻辑。虽然 R 也拥有一个 if() 语句,但在处理向量时,它并不那么有用。例如,以下 R 表达式
if(aVariable == 0) then print("zero") else print("not zero")
期望 aVariable 是一个单一数字:如果这个数字为 0,则输出“zero”,如果它不是零则输出“not zero”[2]。如果 aVariable 是一个包含 2 个或多个值的向量,则只有第一个元素有效:其他所有元素都被忽略[3]。也存在一些逻辑运算符,它们只考虑向量的第一个元素:这些是 &&(用于 AND)和 ||(用于 OR)[4]


  1. 请注意,当使用连续(小数)数字时,舍入误差可能意味着计算结果彼此并不完全相等,即使它们看起来应该相等。因此,在使用 == 处理连续数字时要小心。R 提供了 all.equal 函数来帮助解决这个问题
  2. 但与 ifelse 不同,它无法处理 NA
  3. 因此,在 if 语句中使用 == 可能不是一个好主意,有关详细信息,请参阅 ?"==" 中的注释。
  4. 这些在 R 的更高级的计算机编程中特别有用,有关详细信息,请参阅 ?"&&"
华夏公益教科书