python - 如何根据条件将数据框列拆分为单独的列

标签 python python-3.x regex pandas

我正在尝试将以下数据框拆分为单独的列。我希望一列中的所有文本和数字在空白处分开。

df[0].head(10)

0                                                   []
1               [Andaman and Nicobar, 194, 52, 142, 0]
2        [Andhra Pradesh, 40,646, 19,814, 20,298, 534]
3                [Arunachal Pradesh, 609, 431, 175, 3]
4                   [Assam, 20,646, 6,490, 14,105, 51]
5                  [Bihar, 23,589, 8,767, 14,621, 201]
6                      [Chandigarh, 660, 169, 480, 11]
7              [Chhattisgarh, 4,964, 1,429, 3,512, 23]
8    [Dadra and Nagar Haveli and Daman, 585, 182, 4...
9                          [Daman and Diu, 0, 0, 0, 0]
Name: 0, dtype: object

如果我只在空白处拆分并展开,虽然数字被正确拆分,但文本被拆分成多列。由于不同观察的文本跨越不同数量的列,我无法再次连接它们。显然,解决方案是编写正确的“正则表达式”并对其进行拆分。我无法弄清楚所需的正则表达式,因此请求输入。

df1 = df[0].str.split(' ', expand= True)
df1.head(10)
    0   1   2   3   4   5   6   7   8   9
0   []  None    None    None    None    None    None    None    None    None
1   [Andaman    and     Nicobar,    194,    52,     142,    0]  None    None    None
2   [Andhra     Pradesh,    40,646,     19,814,     20,298,     534]    None    None    None    None
3   [Arunachal  Pradesh,    609,    431,    175,    3]  None    None    None    None
4   [Assam,     20,646,     6,490,  14,105,     51]     None    None    None    None    None
5   [Bihar,     23,589,     8,767,  14,621,     201]    None    None    None    None    None
6   [Chandigarh,    660,    169,    480,    11]     None    None    None    None    None
7   [Chhattisgarh,  4,964,  1,429,  3,512,  23]     None    None    None    None    None
8   [Dadra  and     Nagar   Haveli  and     Daman,  585,    182,    401,    2]
9   [Daman  and     Diu,    0,  0,  0,  0]  None    None    None

我期望的结果应该是这样的:

        0                                   1       2       3       4       5       6       7       8       9
    0   []                                  None    None    None    None    None    None    None    None    None
    1   [Andaman and Nicobar,               194,    52,     142,    0]      None    None    None    None    None
    2   [Andhra Pradesh,                    40,646, 19,814, 20,298, 534]    None    None    None    None    None
    3   [Arunachal Pradesh,                 609,    431,    175,    3]      None    None    None    None    None
    4   [Assam,                             20,646, 6,490,  14,105, 51]     None    None    None    None    None
    5   [Bihar,                             23,589, 8,767,  14,621, 201]    None    None    None    None    None
    6   [Chandigarh,                        660,    169,    480,    11]     None    None    None    None    None
    7   [Chhattisgarh,                      4,964,  1,429,  3,512,  23]     None    None    None    None    None
    8   [Dadra and Nagar Haveli and Daman,  585,    182,    401,    2]      None    None    None    None    None
    9   [Daman and Diu,                     0,      0,      0,      0]      None    None    None    None    None

最佳答案

您可以使用 str.replacestr.extract 来 reshape 数据框。

names = df[0].str.extract('(\D+)').replace('\[|,','',regex=True).rename(columns={0 : 'names'})


df_new = names.join(df[0].str.replace('\D+,','').str.strip(']').str.split(' ',expand=True))

print(df_new)

                                  names 0        1        2        3     4
0                   Andaman and Nicobar       194,      52,     142,     0
1                        Andhra Pradesh    40,646,  19,814,  20,298,   534
2                     Arunachal Pradesh       609,     431,     175,     3
3                                 Assam    20,646,   6,490,  14,105,    51
4                                 Bihar    23,589,   8,767,  14,621,   201
5                            Chandigarh       660,     169,     480,    11
6                          Chhattisgarh     4,964,   1,429,   3,512,    23
7      Dadra and Nagar Haveli and Daman       585,     182,     4...  None
8                         Daman and Diu         0,       0,       0,     0

关于python - 如何根据条件将数据框列拆分为单独的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62968023/

相关文章:

python-3.x - PyZMQ Dockerized pub sub - sub不会收到消息

javascript - 当搜索到的模式以 t 开头时,我在 img src 属性中查找模式的正则表达式失败

python - os.path.exists 不适用于特定目录

python - 如何使用 range() 函数中的变量来选择动态范围?

python - pandas 应用返回 NaN

python - 如何处理启用 telegram bot 的错误?

python - 根据不同的列将值应用于列

php - 在 mysql 中搜索特定的单词

c# - 验证有效时间的正则表达式

python - 在python OpenCV中使用特定的高度和宽度调整和填充图像会导致错误和不准确的结果