python - Pandas 和 Excel 中部分重复项的条件格式

标签 python excel csv pandas

我有以下名为 reviews.csv 的 csv 数据:

Movie,Reviewer,Sentence,Tag,Sentiment,Text,
Jaws,John,s1,Plot,Positive,The plot was great,
Jaws,Mary,s1,Plot,Positive,The plot was great,
Jaws,John,s2,Acting,Positive,The acting was OK,
Jaws,Mary,s2,Acting,Neutral,The acting was OK,
Jaws,John,s3,Scene,Positive,The visuals blew me away,
Jaws,Mary,s3,Effects,Positive,The visuals blew me away,
Vertigo,John,s1,Scene,Negative,The scenes were terrible,
Vertigo,Mary,s1,Acting,Negative,The scenes were terrible,
Vertigo,John,s2,Plot,Negative,The actors couldn’t make the story believable,
Vertigo,Mary,s2,Acting,Positive,The actors couldn’t make the story believable,
Vertigo,John,s3,Effects,Negative,The effects were awful,
Vertigo,Mary,s3,Effects,Negative,The effects were awful,

我的目标是将此 csv 文件转换为具有条件格式的 Excel 电子表格。具体来说,我想应用以下规则:

  1. 如果“Movie”、“Sentence”、“Tag”和“Sentiment”值相同,则整行应为绿色。

  2. 如果“Movie”、“Sentence”和“Tag”值相同,但“Sentiment”值不同,则该行应为蓝色。

  3. 如果“Movie”和“Sentence”值相同,但“Tag”值不同,则该行应为红色。

所以我想创建一个如下所示的 Excel 电子表格 (.xlsx):

Spreadsheet with color-coded partial duplicates

我一直在查看 Pandas 的样式文档以及 XlsxWriter 上的条件格式教程,但我似乎无法将它们放在一起。这是我到目前为止所拥有的。我可以将 csv 读入 Pandas 数据框,对其进行排序(尽管我不确定这是否必要),然后将其写回 Excel 电子表格。如何进行条件格式设置?它在代码中的什么位置?

def csv_to_xls(source_path, dest_path):
    """
    Convert a csv file to a formatted xlsx spreadsheet
    Input: path to hospital review csv file
    Output: formatted xlsx spreadsheet
    """
    #Read the source file and convert to Pandas dataframe
    df = pd.read_csv(source_path)

    #Sort by Filename, then by sentence number
    df.sort_values(['File', 'Sent'], ascending=[True, True], inplace = True)

    #Create the xlsx file that we'll be writing to
    orig = pd.ExcelWriter(dest_path, engine='xlsxwriter')

    #Convert the dataframe to Excel, create the sheet
    df.to_excel(orig, index=False, sheet_name='report')

    #Variables for the workbook and worksheet
    workbook = orig.book
    worksheet = orig.sheets['report']

    #Formatting for exact, partial, mismatch, gold
    exact = workbook.add_format({'bg_color':'#B7F985'}) #green
    partial = workbook.add_format({'bg_color':'#D3F6F4'}) #blue
    mismatch = workbook.add_format({'bg_color':'#F6D9D3'}) #red

    #Do the conditional formatting somehow

    orig.save()

最佳答案

免责声明:我是我要推荐的库的作者之一

这可以通过 StyleFrame 轻松实现和 DataFrame.duplicated :

from styleframe import StyleFrame, Styler

sf = StyleFrame(df)

green = Styler(bg_color='#B7F985')
blue = Styler(bg_color='#D3F6F4')
red = Styler(bg_color='#F6D9D3')

sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence'], keep=False)],
                          styler_obj=red)
sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence', 'Tag'], keep=False)],
                          styler_obj=blue)
sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence', 'Tag', 'Sentiment'],
                                           keep=False)],
                          styler_obj=green)

sf.to_excel('test.xlsx').save()

输出如下:

enter image description here

关于python - Pandas 和 Excel 中部分重复项的条件格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44814798/

相关文章:

python - 这种奇怪的冒号行为在做什么?

Go Lang 中的 JSON 结构到 csv

python正则表达式提取用户名:password or email:password in mixed delimited csv

Python 长整型输入

python - 如何使用二进制搜索来搜索大量名称

Excel TODAY() 函数在 IF 公式中无法正常工作

ruby-on-rails - rails send_data 在使用 POST 发送 Excel 文件时不发送任何内容

python - 如何仅当另一列包含特定值时才选择特定 ID 的所有行

python - 正在运行的 python 程序是否有可能覆盖自己?

python - 如何将我的 NUMPY 数组导出到 CSV 或 EXCEL 文件中