python - Pandas:读取多字符分隔符 csv 文件?

标签 python pandas dataframe

我有以下 csv 文件,我想使用 pandas.read_csv 读取该文件,但无法正常工作。

                                                                Mat  Pur Mat    Mat  Proc ABC   TimePrice            Crncy Supplier      
Plant Material Number   Material Description                    Grp  Grp Status Type Type Class daysper each         Key   Consignment   
-----------------------------------------------------------------------------------------------------------------------------------------
0009  076/JJJJJJJ331    DUMMY UNIT/Dummy Unit 265x225x15        ZEEJJMA9   P5   JERI   F         99          99.9900 SEK               0
0009  1/JJJJJJJJJ/1R3   EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P8   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJJJJ/4     EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P5   JERI   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ/1     BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305  MA9   P5   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJ04        EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9   P5   JERI   F         99      99,999.9900 SEK               0
0009  1/JJJJJJJJ/6      CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9   P5   JCOM   F         99           9.9900 SEK               0
0009  1/JJJJJJJJJ       PACKAGE/Pallet 800*114*600              ZEEJJMA9   P5   JVER   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ       PACKING MATERIAL/Pallet 1200*800*160    ZEEJJMA9   P5   JCOM   F        999         999.9900 SEK               0
0009  1/JJJJJJJJ/06     BAG/PåSE/MINIGRIP/300*250 MM            ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0
0009  1/JJJJJJJJ        BAG/Antistatic zip lock bag 75x100      ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0

我尝试了以下代码,但问题是

  • Material 描述中出现的空白
  • 发现阅读标题很困难
  • 第 2、3 行等的Material DescriptionMat Grp 之间没有空格
import pandas as pd

df = pd.read_csv(file_path, delim_whitespace=True, skiprows=4, header=None, error_bad_lines=False, engine="python")

最佳答案

我相信您正在寻找 Pandas read_fwf功能。不幸的是,您必须手动指定列的宽度。以下是前几列的示例:

s = '''
0009  076/JJJJJJJ331    DUMMY UNIT/Dummy Unit 265x225x15        ZEEJJMA9   P5   JERI   F         99          99.9900 SEK               0
0009  1/JJJJJJJJJ/1R3   EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P8   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJJJJ/4     EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P5   JERI   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ/1     BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305  MA9   P5   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJ04        EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9   P5   JERI   F         99      99,999.9900 SEK               0
0009  1/JJJJJJJJ/6      CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9   P5   JCOM   F         99           9.9900 SEK               0
0009  1/JJJJJJJJJ       PACKAGE/Pallet 800*114*600              ZEEJJMA9   P5   JVER   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ       PACKING MATERIAL/Pallet 1200*800*160    ZEEJJMA9   P5   JCOM   F        999         999.9900 SEK               0
0009  1/JJJJJJJJ/06     BAG/PåSE/MINIGRIP/300*250 MM            ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0
0009  1/JJJJJJJJ        BAG/Antistatic zip lock bag 75x100      ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0
'''

from io import StringIO
import pandas as pd
df = pd.read_fwf(StringIO(s), colspecs=[(0,5), (6,20), (24,64), (64,72)])

这是输出数据帧:

   Unnamed: 0      Unnamed: 1                                Unnamed: 2  \
0           9  076/JJJJJJJ331          DUMMY UNIT/Dummy Unit 265x225x15   
1           9  1/JJJJJJJJJ/1R  EQUIPPED MAGAZINE/SUP 6601; Equipped mag   
2           9   1/JJJJJJJJJ/4  EQUIPPED MAGAZINE/SUP 6601; Equipped mag   
3           9   1/JJJJJJJJJ/1  BASIC EQUIP.MAGAZINE/Remote IRU Enclosur   
4           9      1/JJJJJJ04   EQUIPPED CABINET/BYB 504 Multi-Pack Kit   
5           9    1/JJJJJJJJ/6  CABLE BUSHING/O-Ring id 21, th 2 for M25   
6           9     1/JJJJJJJJJ                PACKAGE/Pallet 800*114*600   
7           9     1/JJJJJJJJJ      PACKING MATERIAL/Pallet 1200*800*160   
8           9   1/JJJJJJJJ/06              BAG/PåSE/MINIGRIP/300*250 MM   
9           9      1/JJJJJJJJ        BAG/Antistatic zip lock bag 75x100   

  Unnamed: 3  
0   ZEEJJMA9  
1   ZEEJJMA9  
2   ZEEJJMA9  
3   305  MA9  
4   ZEEJJMA9  
5   ZEEJJMA9  
6   ZEEJJMA9  
7   ZEEJJMA9  
8   ZEEJJMA9  
9   ZEEJJMA9  

关于python - Pandas:读取多字符分隔符 csv 文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59737399/

相关文章:

r - 根据条件将值从一行复制到另一行

Python如何按字典键对嵌套字典列表进行排序?

python - 哪些方法在 Python 中实现了缓冲区接口(interface)?

python - 将列插入数据框而不修改原始框架

pandas - 处理 Pandas 中的缺失数据

python - 如何将字符串映射到数据框python3的每一列中的数字ID

使用 R 中另一个数据帧的其他匹配 ID 替换数据帧中的值

python - 为什么我的文本字符串在 Pygame 中呈现为实心/填充矩形?

python - 在 anaconda 中安装 OpenCV 未显示在 Windows 10 的 VS Code 中

python - 使用三元图的元组键将 Pandas 数据框转换为字典