

R语言字符串函数详解

CDA数据分析师

2016-06-22

导读：R语言字符串函数详解

微信添加CDA为好友（ID：joinlearn1），拉你入500人数据分析师交流群，点击阅读原文可查看CDA数据分析师交流群规范与福利，期待你来~

一、以下为stringr包的字符串处理函数：

1. 字符串的大小写转换

str_to_upper(string， locale = “”)

str_to_lower(string， locale = “”)

str_to_title(string， locale = “”)

2. invert_match 返回非匹配模式的起始位置

3. modifiers 指定模式的类别

fixed(pattern， ignore_case = FALSE)：Compare literal bytes in the string. This is very fast， but not usually what you want for non-ASCII character sets.

coll(pattern， ignore_case = FALSE， locale = NULL， …)：Compare strings respecting standard collation rules.

regex(pattern， ignore_case = FALSE， multiline = FALSE， comments = FALSE， dotall = FALSE， …)：默认使用正则表达式

boundary(type = c(“character”， “line_break”， “sentence”， “word”)， skip_word_none = TRUE， …)：Match boundaries between things.

pattern： Pattern to modify behaviour.

ignore_case： Should case differences be ignored in the match？

locale： Locale to use for comparisons. See stri_locale_list() for all possible options.

…： Other less frequently used arguments passed onto stri_opts_collator， stri_opts_regex， or stri_opts_brkiter

multiline： If TRUE， $ and ^ match the beginning and end of each line. If FALSE， the default， only match the start and end of the input.

comments： If TRUE， whitespace and comments beginning with # are ignored. Escape literal spaces with \ .

dotall： If TRUE， . will also match line terminators.

type： Boundary type to detect.

skip_word_none： Ignore “words” that don’t contain any characters or numbers - i.e. punctuation.

4. str_c 连接字符串

str_c(…， sep = “”， collapse = NULL)

str_join(…， sep = “”， collapse = NULL)

5. str_conv 指定字符串的编码

str_conv(string， encoding)

6. str_count 计算字符串中的匹配模式的数目

str_count(string， pattern = “”)

7. str_detect 检测字符串中是否存在某种模式

str_detect(string， pattern)

8. str_dup 重复和连接字符串向量

str_dup(string， times)

9. str_extract 从字符串中提取匹配的模式

str_extract(string， pattern) 提取匹配的第一个字符串

str_extract_all(string， pattern， simplify = FALSE) 提取匹配的所有字符串

10. str_length 字符串的长度

11. str_locate 定位在字符串中匹配模式的位置

str_locate(string， pattern)：返回匹配的第一个字符串的位置

str_locate_all(string， pattern)：返回匹配的所有位置

12. str_match 从字符串中提取匹配组

str_match(string， pattern) 提取匹配的第一个字符串

str_match_all(string， pattern) 提取匹配的所有字符串

13. str_order 对字符向量进行排序

str_order(x， decreasing = FALSE， na_last = TRUE， locale = “”， …)

str_sort(x， decreasing = FALSE， na_last = TRUE， locale = “”， …)

14. str_pad 在字符串的前后位置填充字符（如空格）

-str_pad(string， width， side = c(“left”， “right”， “both”)， pad = “ “)

width：填充字符后字符串的长度；
side：填充字符串的位置，默认为left；
pad：指定填充的字符串；

15. str_replace 替换字符串中的匹配模式

str_replace(string， pattern， replacement)

str_replace_all(string， pattern， replacement)

16. str_replace_na 将缺失值替换成‘NA’

str_replace_na(string， replacement = “NA”)

17. str_split 根据一个分隔符将字符串进行分割

str_split(string， pattern， n = Inf)#结果返回列表

str_split_fixed(string， pattern， n)#结果返回矩阵

18. str_sub 按位置从字符向量中提取或替换子字符串

str_sub(string， start = 1L， end = -1L) 提取子字符串

str_sub(string， start = 1L， end = -1L) <- value 替换子字符串

19. str_subset 提取匹配模式的字符串向量元素

str_subset(string， pattern)

20. str_trim 删除字符串中的空格

str_trim(string， side = c(“both”， “left”， “right”))

21. str_wrap

str_wrap(string， width = 80， indent = 0， exdent = 0)

width：每行的宽度
indent：设置首行缩进
exdent：设置第二行后每行缩进

22. word 从句子中提取单词

word(string， start = 1L， end = start， sep = fixed(“ “))

二、以下为基础包的字符串处理函数：

23. paste() 字符串连接：

paste(…， sep = “ “， collapse = NULL)

24. strsplit() 字符串分割：

strsplit(x， split， fixed = FALSE， perl = FALSE， useBytes = FALSE)

split：设置分割符

fixed：逻辑值，默认值为FALSE

perl：逻辑值，默认值为FALSE，取TRUE时，分割符使用正则表达式

useBytes：逻辑值，默认值为FALSE，

25. nchar() 计算字符串的字符个数：

nchar(x， type = “chars”， allowNA = FALSE)

26. substr 字符串截取及替换：

(1)substr(x， start， stop)

(2)substring(text， first， last = 1000000L)

(3)substr(x， start， stop) <- value

(4)substring(text， first， last = 1000000L) <- value

27. 字符串替换及大小写转换：

chartr(old， new， x)

tolower(x)

toupper(x)

casefold(x， upper = FALSE)

28. 字符匹配与替换

(1) grep(pattern， x， ignore.case = FALSE， perl = FALSE， value = FALSE， fixed = FALSE， useBytes = FALSE， invert = FALSE)，结果返回匹配的向量x的元素的索引

ignore.case：逻辑值，默认值FALSE，区分大小写；

perl：逻辑值，默认值FALSE，不使用正则表达式；

value：逻辑值，设置结果返回匹配元素的值还是索引，默认值为FALSE：返回索引；

fixed：逻辑值，默认值为FALSE，取值为TRUE时使用精确匹配；

useBytes：逻辑值，默认取值FALSE；

invert：逻辑值，默认取值FALSE，设置结果返回匹配还是非匹配的元素；

(2) grepl(pattern， x， ignore.case = FALSE， perl = FALSE， fixed = FALSE， useBytes = FALSE)，结果返回一个与向量x等长的逻辑向量，匹配的元素返回TRUE，不匹配的返回FALSE。

(3) sub(pattern， replacement， x， ignore.case = FALSE， perl = FALSE， fixed = FALSE， useBytes = FALSE)，替换匹配的元素的第一个字符串

(4) gsub(pattern， replacement， x， ignore.case = FALSE， perl = FALSE， fixed = FALSE， useBytes = FALSE)，替换匹配的元素的所有字符串

(5) regexpr(pattern， text， ignore.case = FALSE， perl = FALSE， fixed = FALSE， useBytes = FALSE)，结果返回每个元素匹配的第一个位置及字符数目，不匹配的元素返回的位置和长度都是-1。

(6) gregexpr(pattern， text， ignore.case = FALSE， perl = FALSE， fixed = FALSE， useBytes = FALSE)，返回每个元素匹配的所有位置及相应的字符数目

(7) regexec(pattern， text， ignore.case = FALSE， fixed = FALSE， useBytes = FALSE)

来源 | 数据人网

原文链接：http://shujuren.org/article/162.html

点击阅读原文可查看CDA数据分析师交流群规范与福利

【声明】内容源于网络

CDA数据分析师

🌸全国30万数据分析从业人员，有10万在CDA数据分析师 🌺CDA会员俱乐部有1000个数据库，成为持证人即可获得相关数据信息 🌹未来两样东西最有价值：一个是数据，一个是GPU

内容 9451

粉丝 0

CDA数据分析师 🌸全国30万数据分析从业人员，有10万在CDA数据分析师 🌺CDA会员俱乐部有1000个数据库，成为持证人即可获得相关数据信息 🌹未来两样东西最有价值：一个是数据，一个是GPU

总阅读2.8k

粉丝0

内容9.5k