regex - R - 拆分字符向量,以便将每个唯一元素添加到新的字符向量

标签 regex r vector strsplit

我有一个字符向量,其中单个元素包含多个以逗号分隔的字符串。我通过从数据框中提取它获得了这个列表,它看起来像这样:

 [1] "Acworth, Crescent Lake, East Acworth, Lynn, South Acworth"                                                                              
 [2] "Ferncroft, Passaconaway, Paugus Mill"                                                                                                   
 [3] "Alexandria, South Alexandria"                                                                                                           
 [4] "Allenstown, Blodgett, Kenison Corner, Suncook (part)"                                                                                   
 [5] "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow"                                                                 
 [6] "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands"
 [7] "Amherst, Baboosic Lake, Cricket Corner, Ponemah"                                                                                        
 [8] "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover"                                                        
 [9] "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch"                                                                    
[10] "Ashland" 

我想获得一个新的字符向量,其中每个字符串都是该字符向量中的一个元素,即:

 [1] "Acworth", "Crescent Lake", "East Acworth", "Lynn", "South Acworth"                                                                              
 [6] "Ferncroft", "Passaconaway", "Paugus Mill", "Alexandria", "South Alexandria"

我使用了 strsplit() 函数,但是这会返回一个列表。当我尝试将其转换为字符向量时,它会恢复到原来的状态。

我确信这是一个非常简单的问题 - 任何帮助将不胜感激!谢谢!

最佳答案

您可以去掉空格并使用 "\\s*,\\s*" 正则表达式拆分字符向量,然后 unlist 结果:

v <- c("Acworth, Crescent Lake, East Acworth, Lynn, South Acworth", "Ferncroft, Passaconaway, Paugus Mill", "Alexandria, South Alexandria",  "Allenstown, Blodgett, Kenison Corner, Suncook (part)", "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow", "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands", "Amherst, Baboosic Lake, Cricket Corner, Ponemah",  "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover",  "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch",  "Ashland" )
s <- unlist(strsplit(v, "\\s*,\\s*"))

参见 IDEONE demo

正则表达式匹配 , 两侧的零个或多个空白符号 (\s*),从而修剪值。即使在初始字符向量中的逗号之前存在“狂野”空格,这也会处理这种情况。

关于regex - R - 拆分字符向量,以便将每个唯一元素添加到新的字符向量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34927662/

相关文章:

regex - 用于匹配电话号码的正则表达式

c++ - 将 vector<double> 插入 vector<vector<double>>

r - R 中 browser() 函数的行为

r - 如何在 R 或 SPSS 中配对网络流量数据行?

r - 提高拟合许多模型的效率

c++ - Direct3d 中的 vector 文本渲染系统

vector - 我如何在 Rust 中获取 Vec<T> 的一部分?

java - 从字符串文件中删除文本(java)

android - 什么正则表达式用于从 wpa_supplicant.conf 获取网络对象?

php - 在给定分隔符内使用正则表达式选择 X 行