我有以下 csv 文件,我想使用 pandas.read_csv
读取该文件,但无法正常工作。
Mat Pur Mat Mat Proc ABC TimePrice Crncy Supplier
Plant Material Number Material Description Grp Grp Status Type Type Class daysper each Key Consignment
-----------------------------------------------------------------------------------------------------------------------------------------
0009 076/JJJJJJJ331 DUMMY UNIT/Dummy Unit 265x225x15 ZEEJJMA9 P5 JERI F 99 99.9900 SEK 0
0009 1/JJJJJJJJJ/1R3 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P8 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJJJJ/4 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P5 JERI F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ/1 BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305 MA9 P5 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJ04 EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9 P5 JERI F 99 99,999.9900 SEK 0
0009 1/JJJJJJJJ/6 CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9 P5 JCOM F 99 9.9900 SEK 0
0009 1/JJJJJJJJJ PACKAGE/Pallet 800*114*600 ZEEJJMA9 P5 JVER F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ PACKING MATERIAL/Pallet 1200*800*160 ZEEJJMA9 P5 JCOM F 999 999.9900 SEK 0
0009 1/JJJJJJJJ/06 BAG/PåSE/MINIGRIP/300*250 MM ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
0009 1/JJJJJJJJ BAG/Antistatic zip lock bag 75x100 ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
我尝试了以下代码,但问题是
- Material 描述中出现的空白
- 发现阅读标题很困难
- 第 2、3 行等的
Material Description
和Mat Grp
之间没有空格
import pandas as pd
df = pd.read_csv(file_path, delim_whitespace=True, skiprows=4, header=None, error_bad_lines=False, engine="python")
最佳答案
我相信您正在寻找 Pandas read_fwf功能。不幸的是,您必须手动指定列的宽度。以下是前几列的示例:
s = '''
0009 076/JJJJJJJ331 DUMMY UNIT/Dummy Unit 265x225x15 ZEEJJMA9 P5 JERI F 99 99.9900 SEK 0
0009 1/JJJJJJJJJ/1R3 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P8 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJJJJ/4 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P5 JERI F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ/1 BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305 MA9 P5 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJ04 EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9 P5 JERI F 99 99,999.9900 SEK 0
0009 1/JJJJJJJJ/6 CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9 P5 JCOM F 99 9.9900 SEK 0
0009 1/JJJJJJJJJ PACKAGE/Pallet 800*114*600 ZEEJJMA9 P5 JVER F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ PACKING MATERIAL/Pallet 1200*800*160 ZEEJJMA9 P5 JCOM F 999 999.9900 SEK 0
0009 1/JJJJJJJJ/06 BAG/PåSE/MINIGRIP/300*250 MM ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
0009 1/JJJJJJJJ BAG/Antistatic zip lock bag 75x100 ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
'''
from io import StringIO
import pandas as pd
df = pd.read_fwf(StringIO(s), colspecs=[(0,5), (6,20), (24,64), (64,72)])
这是输出数据帧:
Unnamed: 0 Unnamed: 1 Unnamed: 2 \
0 9 076/JJJJJJJ331 DUMMY UNIT/Dummy Unit 265x225x15
1 9 1/JJJJJJJJJ/1R EQUIPPED MAGAZINE/SUP 6601; Equipped mag
2 9 1/JJJJJJJJJ/4 EQUIPPED MAGAZINE/SUP 6601; Equipped mag
3 9 1/JJJJJJJJJ/1 BASIC EQUIP.MAGAZINE/Remote IRU Enclosur
4 9 1/JJJJJJ04 EQUIPPED CABINET/BYB 504 Multi-Pack Kit
5 9 1/JJJJJJJJ/6 CABLE BUSHING/O-Ring id 21, th 2 for M25
6 9 1/JJJJJJJJJ PACKAGE/Pallet 800*114*600
7 9 1/JJJJJJJJJ PACKING MATERIAL/Pallet 1200*800*160
8 9 1/JJJJJJJJ/06 BAG/PåSE/MINIGRIP/300*250 MM
9 9 1/JJJJJJJJ BAG/Antistatic zip lock bag 75x100
Unnamed: 3
0 ZEEJJMA9
1 ZEEJJMA9
2 ZEEJJMA9
3 305 MA9
4 ZEEJJMA9
5 ZEEJJMA9
6 ZEEJJMA9
7 ZEEJJMA9
8 ZEEJJMA9
9 ZEEJJMA9
关于python - Pandas:读取多字符分隔符 csv 文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59737399/