python - 使用编码为字符串的类别列表的 Pandas 假人

我有一个格式为的数据框:

id    amenities                     ...
1     "TV,Internet,Shower,..."      ...
2     "TV,Hot tub,Internet,..."     ...
3     "Internet,Heating,Shower..."  ...
...

我想分割有关逗号的字符串并为每个类别创建虚拟列，结果如下:

id    TV    Internet    Shower    Hot tub    Heating    ...
1     1     1           1         0          0          ...
2     1     1           0         1          0          ...
3     0     1           1         0          1          ...
...

我该如何去做呢？

谢谢

最佳答案

您可以使用get_dummies与 join或concat :

df = df[['id']].join(df['amentieis'].str.get_dummies(','))
print (df)
   id  Heating  Hot tub  Internet  Shower  TV
0   1        0        0         1       1   1
1   2        0        1         1       0   1
2   3        1        0         1       1   0

或者:

df = pd.concat([df['id'], df['amentieis'].str.get_dummies(',')], axis=1)
print (df)
   id  Heating  Hot tub  Internet  Shower  TV
0   1        0        0         1       1   1
1   2        0        1         1       0   1
2   3        1        0         1       1   0

关于python - 使用编码为字符串的类别列表的 Pandas 假人，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44703748/

上一篇：python - Windows 中 Tensorflow-GPU 安装错误

下一篇：python - 无法理解 pandas 中的预测算法

相关文章：

python - 在 osx 上安装 pycrypto 时出现问题

python - 在 django 中，如何从初始化脚本中调用子命令 'syncdb'？

python - 将 pandas 日期时间索引扩展到当前日期

python - Docker Compose Up 给出 "The system cannot find the file specified."错误

python - 如何使用plotly绘制枢轴点？

python - 测试 DataFrame 中的后续值

python - 向 Pandas 数据框索引添加名称

python - 如何在 for 循环之前用零初始化不同的变量(同时)？

python - 如何从 numpy.ndarray 中随机选择一些非零元素？

python - 为什么从一个 ndarray 复制到另一个 ndarray 内存消耗？