Scala 单元测试 : how to validate the returned RDD

我已经编写了一个方法来从 RDD 中过滤掉重复项，并决定为该方法编写一个单元测试。这是我的方法:

  def filterDupes(salesWithDupes: RDD[((String, String), SalesData)]): RDD[((String, String), SalesData)] = {
    salesWithDupes.map(salesWithDupes => ((salesWithDupes._2.saleType, salesWithDupes._2.saleDate), salesWithDupes))
      .reduceByKey((a, _) => a)
      .map(_._2)
  }

由于这是我第一次使用 Scala 编写测试，所以我遇到了一些复杂的问题。我是否正确地将列表中的元素传递给过滤方法？

现在我被困在如何验证从该方法返回的结果上。我现在想到的唯一方法是将 RDD 的数据收集到列表中，然后检查其大小。方法对吗？

这是我对测试逻辑的看法:

"Sales" should "be filtered" in {

    Given("Sales RDD")

    val rddWithDupes = sc.parallelize(Seq(
  (("metric1", "metric2"), createSale("1", saleType = "Type1", saleDate = "2014-10-12")),
  (("metric1", "metric2"), createSale("2", saleType = "Type1", saleDate = "2014-10-12")),
  (("metric1", "metric2"), createSale("3", saleType = "Type3", saleDate = "2010-11-01"))
))

    When("Sales RDD is filtered")

    val filteredResult = SalesProcessor.filterDupes(rddWithDupes).collect.toList

    Then("Sales are filtered")
    filteredResult.size should be(2)
    ????
  }

最佳答案

The only approach I came up with for now is collecting the RDD 's data to a list and then checking its size. Is it the right way?

是的，是的。分布式对象没有平等的有意义的概念，并且缺乏像这样的技巧:

检查大小是否相同。
检查 b 减去 a 是否为空
检查 a 减 b 是否为空

你无法真正比较两个 RDD。

还有另一个问题 - 这是洗牌操作的不确定性(如 reduceByKey)。您必须假设，每次运行的结果都可能不同，并相应地设计测试。

这使得测试非常具有挑战性。在实践中，我宁愿建议测试转换中使用的每个函数(避免不可测试的匿名困惑)并只测试有保证的不变量(大小、键集等)。

关于Scala 单元测试 : how to validate the returned RDD，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51209326/

Scala 单元测试 : how to validate the returned RDD

上一篇：testing - 如何在测试上下文中使用 LocalizationUtility

下一篇：php - 使本地 Symfony 2.8 应用程序像在生产环境中一样运行