我想知道是否有一种方法可以使用 Pandas 迭代 CSV 文件中的每一行来识别该行中是否找到单词(类似于在 Linux 系统中使用 grep)。不管在哪一列找到这个词,只要找到这个词,就会解析整行。我发现了 iterrows() 函数,但我读到,如果文件将包含超过 1000 行,而我的程序可能读取超过 100,000 行,则使用此方法效率非常低。非常感谢任何建议!
#Code was tested using Python v3.9.5
import os
import pandas as pd
def parse_row(grep_value):
global import_file_path
global export_file_path
#Initializers loop counter for folder name
folder_counter = 0
path = os.path.join(export_file_path, "File Parser Exports")
#Creates extra directory if current directory exists
while os.path.isdir(path):
#Appends a number to the name of the folder
folder_counter += 1
path = os.path.join(export_file_path, "File Parser Exports" + " (" + str(folder_counter) + ")")
#Creates folder for exports after finding a folder name that is available
os.mkdir(path)
#Export file path for parsed file
full_export_path = path + "\Export.csv"
file_count = 0 #Initializer for file number of exported files
tmp_export_path = full_export_path #Temporary place holder for slicing export path
#Reads file with headers
file_data = pd.read_csv(import_file_path, lineterminator='\n')
#Iterate through file
for index, row in file_data.iterrows():
print(index)
print(row)
#Checks if export file exists in the newly created directory
while os.path.isfile(full_export_path):
#Appends a number to the file name
file_count += 1
tmp_export_path = tmp_export_path.rsplit('.', 1)[0]
file_name = "-" + str(file_count) + ".csv"
full_export_path = tmp_export_path + file_name
#Exports file after finding a file name that is available
file_data.to_csv(full_export_path, index=False)
print()
print("File(s) exported to \"" + path + "\"")
print("Successfully completed!")
export_file_path = "C:\\Users\\exportpath"
import_file_path = "C:\\Users\\importpath"
grep_value = "The"
parse_row(grep_value)
最佳答案
我制作了一个示例数据框:
dd = pd.DataFrame({'name':['pete','reuben','michelle'],
'number':[1,2,3],"lunch":['pizza','hamburger','reuben']})
我建议这样做以获得匹配的索引:
dd[dd.columns[dd.dtypes =='object']]\
.apply(lambda x: ' '.join(x),axis=1).str.contains('reuben')]
从左到右,代码:1) 拉出作为对象(字符串)的列,将它们连接成一个长字符串,然后检查该字符串中的关键字
获取有效索引:
matches = dd.index[dd[dd.columns[dd.dtypes =='object']]\
.apply(lambda x: ' '.join(x),axis=1).str.contains('reuben')]
关于python-3.x - Python Pandas - 根据字符串值解析 CSV 文件中的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68535630/