python - 对两列求和,在MapReduce中计算最大值,最小值和平均值

标签 python hadoop mapreduce sum

我有一个映射器的示例代码,如下所示,键是UCO,值是TaxiTotal,应为TaxiIn和TaxiOut两列的总和,如何对这两列求和?

我当前的解决方案TaxiIn + TaxiOut会导致粘贴编号,例如333 + 444 = 333444,我需要将其设为777,如何编写代码?

#! /usr/bin/env python

import sys

# -- Airline Data
# Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum,
# TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, Origin, Dest,         Distance, TaxiIn,
# TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay

for line in sys.stdin:
    line = line.strip()
    unpacked = line.split(",")
    Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, Origin, Dest, Distance, TaxiIn,TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay = line.split(",")
    UCO = "-".join([UniqueCarrier, Origin])
    results = [UCO, TaxiIn+TaxiOut]
    print("\t".join(results))

最佳答案

TaxiIn + TaxiOut转换为:

int(TaxiIn) + int(TaxiOut)

请参见以下示例:
In [1612]: TaxiIn = '333'                                                                                                                                                                                   

In [1613]: TaxiOut = '444'                                                                                                                                                                                  

In [1614]: TaxiIn + TaxiOut                                                                                                                                                                                 
Out[1614]: '333444'

In [1615]: int(TaxiIn) + int(TaxiOut)                                                                                                                                                                       
Out[1615]: 777

您不能有字符串的数字和,因为它会将str转换为intfloat

您的代码应为:
results = [UCO, str(int(TaxiIn) + int(TaxiOut))]
print("\t".join(results))

关于python - 对两列求和,在MapReduce中计算最大值,最小值和平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61602272/

相关文章:

Python 生成的 RSS : outputting raw HTML?

Python,Tkinter - ttk.Progressbar 在单独的线程中

从维基百科文章中提取Java关键字

hadoop - 为什么 mapreduce 尝试由于 "Container preempted by scheduler"而被终止?

mongodb - mongodb中的成对交叉点

hadoop - 在 Hadoop Map/Reduce 中为多个映射器配置 Map Side join

python - 无法从网站相应地获取两个字段

python - 如何在 python pandas 中找到具有多索引的两个数据框列的最小值?

hadoop - Cassandra 从 Hadoop 写入/读取

java - 获取权限被拒绝(公钥)。在AWS上启动hadoop集群时