python - Pandas 创建 DataFrame,其第一个标题列位于其自己的行中

标签 python pandas csv gdelt

我正在与 GDELT dataset 合作我在使用 pd.DataFrame.from_csv(path_to_data, sep=",") 创建 pandas DataFrame 时遇到问题,它似乎可以很好地加载数据,除了以下事实:第一个标题列移至第 1 行,如下所示:

enter image description here

箭头指示应该在的位置。以下是 CSV 格式的原始数据片段:

Source,Actor1Type1Code,Actor1Type2Code,Actor1Geo_CountryCode,Target,Actor2Type1Code,Actor2Type2Code,Actor2Geo_CountryCode,EventCode,f0_
PRINCE,GOV,,CA,CITIZEN,CVL,,CA,051,61
MEDIA,MED,,CA,MINIST,GOV,,CA,090,39
SUPREME COURT,JUD,,CA,DOCTOR,HLH,,CA,060,31
POLICE,COP,,CA,TORONTO,,,CA,173,31
PUBLISHER,MED,,CA,BUSINESS,BUS,,CA,010,29
HOSPITAL,HLH,,CA,POLICE,COP,,CA,043,28
HOSPITAL,HLH,,CA,TORONTO,,,CA,043,26
POLICE,COP,,CA,HOSPITAL,HLH,,CA,042,26
PRIME MINISTER,GOV,,CA,GERMANY,,,FR,042,22

谢谢!

卡尔文

最佳答案

不要使用不再维护的from_csv,请使用read_csv :

In [244]:

t="""Source,Actor1Type1Code,Actor1Type2Code,Actor1Geo_CountryCode,Target,Actor2Type1Code,Actor2Type2Code,Actor2Geo_CountryCode,EventCode,f0_
PRINCE,GOV,,CA,CITIZEN,CVL,,CA,051,61
MEDIA,MED,,CA,MINIST,GOV,,CA,090,39
SUPREME COURT,JUD,,CA,DOCTOR,HLH,,CA,060,31
POLICE,COP,,CA,TORONTO,,,CA,173,31
PUBLISHER,MED,,CA,BUSINESS,BUS,,CA,010,29
HOSPITAL,HLH,,CA,POLICE,COP,,CA,043,28
HOSPITAL,HLH,,CA,TORONTO,,,CA,043,26
POLICE,COP,,CA,HOSPITAL,HLH,,CA,042,26
PRIME MINISTER,GOV,,CA,GERMANY,,,FR,042,22"""
df = pd.read_csv(io.StringIO(t))
df
Out[244]:
           Source Actor1Type1Code  Actor1Type2Code Actor1Geo_CountryCode  \
0          PRINCE             GOV              NaN                    CA   
1           MEDIA             MED              NaN                    CA   
2   SUPREME COURT             JUD              NaN                    CA   
3          POLICE             COP              NaN                    CA   
4       PUBLISHER             MED              NaN                    CA   
5        HOSPITAL             HLH              NaN                    CA   
6        HOSPITAL             HLH              NaN                    CA   
7          POLICE             COP              NaN                    CA   
8  PRIME MINISTER             GOV              NaN                    CA   

     Target Actor2Type1Code  Actor2Type2Code Actor2Geo_CountryCode  EventCode  \
0   CITIZEN             CVL              NaN                    CA         51   
1    MINIST             GOV              NaN                    CA         90   
2    DOCTOR             HLH              NaN                    CA         60   
3   TORONTO             NaN              NaN                    CA        173   
4  BUSINESS             BUS              NaN                    CA         10   
5    POLICE             COP              NaN                    CA         43   
6   TORONTO             NaN              NaN                    CA         43   
7  HOSPITAL             HLH              NaN                    CA         42   
8   GERMANY             NaN              NaN                    FR         42   

   f0_  
0   61  
1   39  
2   31  
3   31  
4   29  
5   28  
6   26  
7   26  
8   22  

或者传递参数index_col=None :

df = pd.DataFrame.from_csv(io.StringIO(t), index_col=None)

因此它不会将第一列解释为索引列

关于python - Pandas 创建 DataFrame,其第一个标题列位于其自己的行中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30098482/

相关文章:

python - 写入同一 Excel 文件中的多个工作表

python - django 中的 get_user_model() 、settings.AUTH_USER_MODEL 和 USER 有什么区别?

python - 使用大小变量的模拟文件读取方法

python Pandas : Drop rows from data frame if list of string value == [none]

python - Pyspark 读取带有模式、 header 检查和存储损坏记录的 csv

c++ - 从 Excel 复制后将 .txt 或 .csv 导入 C++ 时小数值丢失

python - 从 Github 下载的项目设置错误

python - 如何仅按小时聚合 Pandas 日期时间轴系列

python - 无法绘制饼图的值计数

python csv register_dialect