python - 如何使用(最好)正则表达式模式将一列中的值拆分为两列？

我有文本文件要加载到数据框中。加载后，这些值都在一列中，格式如下:

0 Alabama[edit] 1 Auburn (something somethign) 2 Florence (something somethign) . . . 12 California[edit] 13 Angwin (something something) 14 Arcata (something something)

我必须将这些值分为两列:State 和 RegionName。

State 应该是索引

所有州名称都有 [edit] 后缀，地区名称末尾有 (....)。在清理数据之前，我想我可以使用 [edit] 和 (..) 作为掩码。

我试图将两个“值”分开

df=pd.read_table("file.txt", names=["State","RegionName]) state=df[df["State"].str.contains(r"\[edit\]")] region=df[df["State"].str.contains(r"\s+\(.*\)")]

并尝试以某种方式合并这些，但没有运气并且如果我尝试使用状态和区域来创建新的 df，我会收到索引错误

我尝试使用.str.extract

df.row.str.extract("(?P<State>\r\[\edit\]")

但我收到一条错误消息，说 df 现在具有 .row(or.str) 属性，并且我确信该模式也是错误的。

任何帮助将不胜感激。

感谢和问候

最佳答案

类似这样的吗？

df['state'] = np.where(df.place.str.contains('edit'), df.place, np.nan)
df['region'] = np.where(df.place.str.contains('\('), df.place, np.nan)
df.drop('place', 1, inplace =True)
df['state'].ffill(inplace = True)
df.set_index('state', inplace = True)

                    region
state   
Alabama[edit]       NaN
Alabama[edit]       Auburn (something somethign)
Alabama[edit]       Florence (something somethign)
California[edit]    NaN
California[edit]    Angwin (something something)
California[edit]    Arcata (something something)

关于python - 如何使用(最好)正则表达式模式将一列中的值拆分为两列？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47101613/

上一篇：python - 更改 Python 中的实例变量也会更改原始列表

下一篇：python - 更改 pandas DataFrame 中日期时间列中的日期

相关文章：

python - 将日期时间列的年份更改为另一列的年份 + 1

Python 键盘中断按钮

python - PyTables 问题 - 迭代表的子集时出现不同的结果

python - 在 python 中处理断言

python - 如何使用变量名称列表从 .xls 文件夹自动创建 Pandas 数据框？

python - 将行添加到 Pandas DataFrame 中的组

python - 列切片 Pandas

python - 使用列表理解对列表中的奇数求和，for 和 if 在一行中

python - Pandas 在 2 个给定数字之间进行插值，给定步长并在 groupby 内

python - 在 Pandas Read_CSV 中使用 UseCols 时按指定顺序保留列