python - 如果 `pandas.testing.assert_frame_equal` 失败,如何输出所有差异?

标签 python pandas unit-testing

我正在对 Dataframe 输出进行单元测试。我有两个在多列上具有不同值的数据框

df1 = pd.DataFrame({"col1": [1, 1], "col2":[1, 1]})
df2 = pd.DataFrame({"col1": [1, 2], "col2":[1, 2]})

当我运行 pandas.testing.assert_frame_equal 时,出现以下错误,只有一列:

DataFrame.iloc[:, 0] (column name="col1") values are different (50.0 %)
[index]: [0, 1]
[left]:  [1, 1]
[right]: [1, 2]

但是,我没有关于第二列的信息。有没有办法显示所有不匹配项,而不仅仅是最左侧列中的第一个?

最佳答案

执行此操作的另一种(hacky,但性能稍好)方法:

def assert_frame_equal_extended_diff(df1, df2):
    try:
        pd.testing.assert_frame_equal(df1, df2)

    except AssertionError as e:
        # if this was a shape or index/col error, then re-raise
        try:
            pd.testing.assert_index_equal(df1.index, df2.index)
            pd.testing.assert_index_equal(df1.columns, df2.columns)
        except AssertionError:
            raise e

        # if not, we have a value error 
        diff = df1 != df2
        diffcols = diff.any(axis=0)
        diffrows = diff.any(axis=1)
        cmp = pd.concat(
            {'left': df1.loc[diffrows, diffcols], 'right': df2.loc[diffrows, diffcols]},
            names=['dataframe'],
            axis=1,
        )

        raise AssertionError(e.args[0] + f'\n\nDifferences:\n{cmp}') from None

这将使用 pandas.DataFrame 的 repr 来显示差异:

In [5]: df1 = pd.DataFrame({
   ...:     'samecol': np.arange(1500),
   ...:     'diffcol': np.arange(1500),
   ...:     'anothercol': np.ones(shape=1500),
   ...: })

In [6]: df2 = df1.copy()
   ...: df2.iloc[1000:1014, 1] = range(14)

In [7]: assert_frame_equal_extended_diff(df1, df2)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 assert_frame_equal_extended_diff(df1, df2)

Input In [6], in assert_frame_equal_extended_diff(df1, df2)
     11 diffrows = diff.any(axis=1)
     12 cmp = pd.concat(
     13     {'left': df1.loc[diffrows, diffcols], 'right': df2.loc[diffrows, diffcols]},
     14     names=['dataframe'],
     15     axis=1,
     16 )
---> 18 raise AssertionError(e.args[0] + f'\n\nDifferences:\n{cmp}') from None

AssertionError: DataFrame.iloc[:, 1] (column name="diffcol") are different

DataFrame.iloc[:, 1] (column name="diffcol") values are different (0.93333 %)
[index]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
[left]:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
[right]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]

Differences:
dataframe    left   right
          diffcol diffcol
1000         1000       0
1001         1001       1
1002         1002       2
1003         1003       3
1004         1004       4
1005         1005       5
1006         1006       6
1007         1007       7
1008         1008       8
1009         1009       9
1010         1010      10
1011         1011      11
1012         1012      12
1013         1013      13

注意 - 此答案旨在帮助调试,但不是全面/无边缘情况的方法。欢迎编辑,但使用风险自负。

关于python - 如果 `pandas.testing.assert_frame_equal` 失败,如何输出所有差异?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71412691/

相关文章:

javascript - 单元测试 Ajax 调用

javascript - Jasmine toHaveBeenCalledWith 部分匹配

python - 在 python 中,如何在 django 或 Flask 等 REST 服务中仅加载一次 ML 模型?

python - 使用 Mechanize 提交表单 (Python)

python - 将列中的所有先前字符串作为列表存储在 Pandas 数据框中新列的单元格中

python - Pandas 枢轴 : how to keep rows with all NaNs without introducing extra rows

javascript - 将 Intern 与 Selenium 和动态 proxyUrl 结合使用

python - 在 keras fit_generator 训练的第二个时期结束时无法将模型历史写入 json 文件

python - 在 Python 中将嵌套的 JSON 转换为 CSV 文件

python - 将 DataFrame 列标题设置为 MultiIndex