python-3.x - Python Pandas - 根据字符串值解析 CSV 文件中的行

我想知道是否有一种方法可以使用 Pandas 迭代 CSV 文件中的每一行来识别该行中是否找到单词(类似于在 Linux 系统中使用 grep)。不管在哪一列找到这个词，只要找到这个词，就会解析整行。我发现了 iterrows() 函数，但我读到，如果文件将包含超过 1000 行，而我的程序可能读取超过 100,000 行，则使用此方法效率非常低。非常感谢任何建议!

#Code was tested using Python v3.9.5
import os
import pandas as pd

def parse_row(grep_value):

    global import_file_path
    global export_file_path

    #Initializers loop counter for folder name
    folder_counter = 0
    path = os.path.join(export_file_path, "File Parser Exports")

    #Creates extra directory if current directory exists
    while os.path.isdir(path): 

        #Appends a number to the name of the folder
        folder_counter += 1
        path = os.path.join(export_file_path, "File Parser Exports" + " (" + str(folder_counter) + ")")

    #Creates folder for exports after finding a folder name that is available
    os.mkdir(path)

    #Export file path for parsed file
    full_export_path = path + "\Export.csv"

    file_count = 0    #Initializer for file number of exported files
    tmp_export_path = full_export_path    #Temporary place holder for slicing export path

    #Reads file with headers
    file_data = pd.read_csv(import_file_path, lineterminator='\n')

    #Iterate through file
    for index, row in file_data.iterrows():
        print(index)
        print(row)

    #Checks if export file exists in the newly created directory
    while os.path.isfile(full_export_path):
        
        #Appends a number to the file name
        file_count += 1
        tmp_export_path = tmp_export_path.rsplit('.', 1)[0]
        file_name = "-" + str(file_count) + ".csv"
        full_export_path = tmp_export_path + file_name

    #Exports file after finding a file name that is available
    file_data.to_csv(full_export_path, index=False)

    print()
    print("File(s) exported to \"" + path + "\"")
    print("Successfully completed!")

export_file_path = "C:\\Users\\exportpath"
import_file_path = "C:\\Users\\importpath"
grep_value = "The"

parse_row(grep_value)

最佳答案

我制作了一个示例数据框:

dd = pd.DataFrame({'name':['pete','reuben','michelle'],
                   'number':[1,2,3],"lunch":['pizza','hamburger','reuben']})

我建议这样做以获得匹配的索引:

dd[dd.columns[dd.dtypes =='object']]\
    .apply(lambda x: ' '.join(x),axis=1).str.contains('reuben')]

从左到右，代码:1) 拉出作为对象(字符串)的列，将它们连接成一个长字符串，然后检查该字符串中的关键字

获取有效索引:

matches = dd.index[dd[dd.columns[dd.dtypes =='object']]\
    .apply(lambda x: ' '.join(x),axis=1).str.contains('reuben')]

关于python-3.x - Python Pandas - 根据字符串值解析 CSV 文件中的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68535630/

python-3.x - Python Pandas - 根据字符串值解析 CSV 文件中的行

上一篇：kubernetes - Rancher : kubernetes cluster stuck in pending. "No route to host"

下一篇：office365 - WOPI wdl* 查询参数注入(inject)重定向 URL 中间