python - 根据字符串位置将 Pandas 系列分解为多个 DataFrame 列

给定一个带有字符串的 Pandas Series，我想创建一个 DataFrame，其中包含基于位置的 Series 每个部分的列.

例如，给定以下输入:

s = pd.Series(['abcdef', '123456'])
ind = [2, 3, 1]

理想情况下我会得到这个:

target_df = pd.DataFrame({
  'col1': ['ab', '12'],
  'col2': ['cde', '345'],
  'col3': ['f', '6']
})

一种方法是逐一创建它们，例如:

df['col1'] = s.str[:3]
df['col2'] = s.str[3:5]
df['col3'] = s.str[5]

但我猜这比单个分割要慢。

我尝试了正则表达式，但不确定如何解析结果:

pd.DataFrame(s.str.split("(^(\w{2})(\w{3})(\w{1}))"))
#                          0
# 0 [, abcdef, ab, cde, f, ]
# 1 [, 123456, 12, 345, 6, ]

最佳答案

您的正则表达式几乎就在那里(注意Series.str.extract(expand=True)返回一个DataFrame):

df = s.str.extract("^(\w{2})(\w{3})(\w{1})", expand = True)
df.columns = ['col1', 'col2', 'col3']
#   col1    col2    col3
# 0 ab      cde     f
# 1 12      345     6

这是一个概括这一点的函数:

def split_series_by_position(s, ind, cols):
  # Construct regex.
  regex = "^(\w{" + "})(\w{".join(map(str, ind)) + "})"
  df = s.str.extract(regex, expand=True)
  df.columns = cols
  return df

# Example which will produce the result above.
split_series_by_position(s, ind, ['col1', 'col2', 'col3'])

关于python - 根据字符串位置将 Pandas 系列分解为多个 DataFrame 列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52432051/

上一篇：python - 将具有相同 ID 的多行(具有一些非字符串值)合并到 pandas 中的一个分隔行中

下一篇：python - 如何解压方法参数来为其分配类属性？

正则表达式验证 - grails

python - 正则表达式匹配奇数空格

c# - 在 C# 中，如何使用 Regex.Replace 添加前导零(如果可能)？

java - 查找字符串中重复的字符

字符串到 str slice，str slice 的生命周期不够长

python - 在 Celery 任务中获取生成文件的 URl 的最佳方法是什么

python - 有没有办法在整个项目中将代码缩进从制表符切换为空格，并保持 'hg annotate' 功能？

python - 如何使用奇异值分解反转 numpy 矩阵？

python - 删除python中字符串中数字之间的空格