我有一个 .txt 文件,其中包含如下单词列表:
5.91686268506 exclusively, catering, provides, arms, georgia, formal, purchase, choose
5.91560417296 hugh, senlis
5.91527936181 italians
5.91470429433 soil, cultivation, fertile
5.91468087491 increases, moderation
....
5.91440227412 farmers, descendants
我想将此类数据转换为 pandas 表,我希望将其显示为 html/bootstrap 模板,如下所示 (*):
COL_A COL_B
5.91686268506 exclusively, catering, provides, arms, georgia, formal, purchase, choose
5.91560417296 hugh, senlis
5.91527936181 italians
5.91470429433 soil, cultivation, fertile
5.91468087491 increases, moderation
....
5.91440227412 farmers, descendants
所以我用 pandas 尝试了以下方法:
import pandas as pd
df = pd.read_csv('file.csv',
sep = ' ', names=['Col_A', 'Col_B'])
df.head(20)
但是,我的表没有上述所需的结构:
COL_A COL_B
6.281426 engaged, chance, makes, meeting, nations, things, believe, tries, believing, knocked, admits, awkward
6.277438 sweden NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6.271190 artificial, ammonium NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6.259790 boats, prefix NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6.230612 targets, tactical, wing, missile, squadrons NaN NaN NaN NaN NaN NaN NaN
知道如何获取 (*) 表格格式的数据吗?
最佳答案
因为你在单词之间有空格,如果你指定空格作为分隔符,它会自然地将它们分开。为了得到你需要的,你可以尝试设置 sep
作为正则表达式 (?<!,)
, ?<!
是语法背后的负面看法,这意味着只有当它前面没有逗号并且它应该适用于您的情况时才在空格上分开:
pd.read_csv("~/test.csv", sep = "(?<!,) ", names=['weight', 'topics'])
# weight topics
#0 5.916863 exclusively, catering, provides, arms, georgia...
#1 5.915604 hugh, senlis
#2 5.915279 italians
#3 5.914704 soil, cultivation, fertile
#4 5.914681 increases, moderation
#5 5.914402 farmers, descendants
关于python - 如何将多个单词列表转换为 Pandas 数据框?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39339560/