我有一个包含多列的 csv 文件,我只需要两个标题为“地址”和“端口”的文件。我正在使用 Panda 尝试这个简单的函数,但出现错误。知道问题是什么吗?谢谢。
输入文件包含带有标题的列:开始时间、结束时间、地址、供应商、主机名、端口、状态、服务、脚本、输出
import pandas as pd
def trim_scan(infile, outdir):
df = pd.read_csv(infile)
keep_cols = ["address", "port"]
new_df = df[keep_cols]
new_df.to_csv(outdir + '/' + 'nmap-ip-ports.csv', index=False)
trim_scan('nmap-scan.csv', '2015-07-27')
这是错误:
Traceback (most recent call last):
File "test3.py", line 64, in <module>
trim_scan('nmap-scan.csv', '2015-07-27')
File "test3.py", line 59, in trim_scan
df = pd.read_csv(infile)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 260, in _read
return parser.read()
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 721, in read
ret = self._engine.read(nrows)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 1170, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 769, in pandas.parser.TextReader.read (pandas/parser.c:7544)
File "pandas/parser.pyx", line 791, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7784)
File "pandas/parser.pyx", line 844, in pandas.parser.TextReader._read_rows (pandas/parser.c:8401)
File "pandas/parser.pyx", line 831, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8275)
File "pandas/parser.pyx", line 1742, in pandas.parser.raise_parser_error (pandas/parser.c:20691)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 12 fields in line 6, saw 14
最佳答案
问题来自于您的 CSV 文件,该文件不规则(每行的字段数量不同)。
Expected 12 fields in line 6, saw 14
关于您的 CSV 文件格式不正确的另一个提示是您定义了 10 个不同的字段:
Start Time, End time, address, vendor, hostname, port, state, service, script, output
但是 pandas 期望 12。
您可以检查时间戳的格式或值中包含 ;
的任何其他字段。检查标题、跳过坏行等。
关于python - 使用 Python 和 Pandas 修剪 CSV 中的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31658114/