python-3.x - Pandas数据框，找到最大值并返回相邻列值，而不是整行

Pandas 新手，所以如果有明显的解决方案，我很抱歉...... 我导入了一个只有 2 列的 CSV，并创建了第三列。以下是前 10 行和标题的屏幕截图: Screen shot of DataFrame

我已经弄清楚如何在 ['Amount Changed'] 列中查找最小值和最大值，但还需要提取与最小值和最大值相关的日期 - 但不是索引和 ['Profit/Loss'] ]。我尝试过 iloc、loc、read about groupby - 我无法让它们中的任何一个返回我可以再次使用的单个值(在本例中为日期)。

我的目标是创建一个新变量“Gi_Date”，该变量与 ['Amount Changed'] 中的最大值位于同一行，但与 ['Date'] 列中的日期相关联。

我试图将变量分开，以便我可以在打印语句中使用它们，将它们写入txt文件等。

import os
import csv
import pandas as pd
import numpy as np

#path for CSV file
csvpath = ("budget_data.csv")
#Read CSV into Panadas and give it a variable name Bank_pd
Bank_pd = pd.read_csv(csvpath, parse_dates=True)

#Number of month records in the CSV
Months = Bank_pd["Date"].count()

#Total amount of money captured in the data converted to currency
Total_Funds = '${:.0f}'.format(Bank_pd["Profit/Losses"].sum())

#Determine the amount of increase or decrease from the previous month
AmtChange = Bank_pd["Profit/Losses"].diff()
Bank_pd["Amount Changed"] = AmtChange

#Identify the greatest positive change
GreatestIncrease = '${:.0f}'.format(Bank_pd["Amount Changed"].max())
Gi_Date = Bank_pd[Bank_pd["Date"] == GreatestIncrease]

#Identify the greatest negative change
GreatestDecrease =  '${:.0f}'.format(Bank_pd["Amount Changed"].min())
Gd_Date = Bank_pd[Bank_pd['Date'] == GreatestDecrease]

print(f"Total Months: {Months}")
print(f"Total: {Total_Funds}")
print(f"Greatest Increase in Profits: {Gi_Date}  ({GreatestIncrease})")
print(f"Greatest Decrease in Profits: {Gd_Date} ({GreatestDecrease})")

当我在 git bash 中运行脚本时，我不再收到错误，所以我认为我已经接近了，而不是显示它所说的日期:

$ python PyBank.py
Total Months: 86
Total: $38382578
Greatest Increase in Profits: Empty DataFrame
Columns: [Date, Profit/Losses, Amount Changed]
Index: []  ($1926159)
Greatest Decrease in Profits: Empty DataFrame
Columns: [Date, Profit/Losses, Amount Changed]
Index: [] ($-2196167)

我希望它像这样打印出来:

$ python PyBank.py
Total Months: 86
Total: $38382578
Greatest Increase in Profits: Feb-2012  ($1926159)
Greatest Decrease in Profits: Sept-2013 ($-2196167)

这是原始 DataFrame 一年的值(value):

bank_pd = pd.DataFrame({'Date':['Jan-10', 'Feb-10', 'Mar-10', 'Apl-10', 'May-10', 'Jun-10', 'Jul-10', 'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10'],
                        'Profit/Losses':[867884, 984655, 322013, -69417, 310503, 522857, 1033096, 604885, -216386, 477532, 893810, -80353]})

样本 df 的预期输出为: 总月数:12 资金总额:$5651079 利润增幅最大:2010 年 10 月(693918 美元) 利润最大降幅:2010 年 12 月($-974163)

我在上面的示例数据框中也有一个错误，当我快速输入它时，我错过了一个月 - 现在已修复。

谢谢!

最佳答案

我发现所使用的变量存在一些问题。

Bank_pd["Amount Changed"] = AmtChange

上面的语句实际上是用“Amount Changed”列替换数据框。在此语句之后，您可以使用此列进行任何操作。

下面是更新后的代码并突出显示了新添加的行。您可以添加进一步的格式:

import pandas as pd


csvpath = ("budget_data.csv")

Bank_pd = pd.read_csv(csvpath, parse_dates=True)
inp_bank_pd = pd.DataFrame(Bank_pd)

Months = Bank_pd["Date"].count()
Total_Funds = '${:.0f}'.format(Bank_pd["Profit/Losses"].sum())

AmtChange = Bank_pd["Profit/Losses"].diff()
GreatestIncrease = Bank_pd["Amount Changed"].max()

Gi_Date = inp_bank_pd.loc[Bank_pd["Amount Changed"] == GreatestIncrease]

print(Months)
print(Total_Funds)
print(Gi_Date['Date'].values[0])
print(GreatestIncrease)

关于python-3.x - Pandas数据框，找到最大值并返回相邻列值，而不是整行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52196651/

python-3.x - Pandas数据框，找到最大值并返回相邻列值，而不是整行

上一篇：xcode - 如何在 Interface Builder 中复制 ViewController？

下一篇：reactjs - 当没有类或构造函数时，如何在 React 中设置状态？