我编写了一个代码来处理许多 csv 文件。对于其中的每一个,我想提取与名为“20201-2.0”的列的非空单元格相对应的所有行。看一下附加的示例(这是 LCE 列):
我编写了以下代码来执行此操作:
import pandas as pd
import glob
import os
path = './'
#column = ['20201-2.0']
all_files = glob.glob(path + "/*.csv")
for filename in all_files:
# Option 1 below worked, although without isolating the non-nulled values
# 1. df = pd.read_csv(filename, encoding="ISO-8859-1")
df = pd.read_csv(filename, header = 0)
df = df[df['20201-2.0'].notnull()]
print('extracting info from cvs...')
print(df)
# You can now export all outcomes in new csv files
file_name = filename + 'new' + '.csv'
save_path = os.path.abspath(
os.path.join(
path, file_name
)
)
print('saving ...')
export_csv = df.to_csv(save_path, index=None)
del df
del export_csv
但是,虽然我设法生成第一个文件,但出现以下错误:
Traceback (most recent call last):
File "/home/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '20201-2.0'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/PycharmProjects/OPTIMAT/Read_MR_from_all_csv.py", line 21, in <module>
df = df[df['20201-2.0'].notnull()]
File "/home/giorgos/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '20201-2.0'
我不明白为什么会发生这种情况。任何想法将不胜感激。
最佳答案
很高兴地说我找到了一种方法来做到这一点:
import pandas as pd
import glob
import os
import numpy as np
path = './'
#column = ['20201-2.0']
# all_files = glob.glob(path + "/*.csv")
#li = []
all_files = os.listdir(path)
all_df = pd.DataFrame()
for filename in all_files:
if not filename.endswith('csv'):
continue
print('extracting info from ' + filename)
# Option 1 below worked, although without isolating the non-nulled values
# 1. df = pd.read_csv(filename, encoding="ISO-8859-1")
df = pd.read_csv(filename, header=0)
#df = df[df['20201-2.0'].notnull()]
df_subset = df.dropna(subset=['20201-2.0'])
print('processed ' + filename)
# You can now export all outcomes in new csv files
file_name = filename.split('.')[0] + '_new' + '.csv'
print('saving to' + file_name)
export_csv = df_subset.to_csv('./' + file_name, index=None)
del df
del export_csv
关于python - 使用 Python 循环处理多个 csv 文件并从特定列的非空单元格中提取行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58428610/