python - 如何对数据帧的行进行求和,以便它仅根据月、日或年对值进行求和。然后将所有结果形成报告

标签 python pandas dataframe report filtering

我正在分析来自雅虎财经的股票数据,我目前的 DataFrame = Df 过滤后仅显示 1991 年以来 3 月份的天数。我希望能够找出每年 3 月份的月返回率。或者任何其他月份组合,即自 1991 年以来 1 月到 3 月的返回是多少。我还希望按年份分割。

我还希望能够完成这一天的操作,即自 1991 年以来所有星期五苹果股票的变化是多少。这将是另一个示例问题

我正在尝试将其打印到可以打印实际纸质副本的位置,并按年份将其全部分割;就像一份报告。

我尝试阅读 pandas.pydata.org/上的多重索引教程和分组,但它非常令人困惑,我不确定这是否 是我需要的。

这是我当前的代码

from pandas_datareader import data as dreader
import pandas as pd
from datetime import datetime
import dateutil.parser
from tkinter import *


# Sets the max rows that can be displayed
# when the program is executed
pd.options.display.max_rows = 120



# df is the name of the dataframe, it is 
# reading the csv file containing data loaded
# from yahoo finance(Date,Open,High,Low,Close
# volume,adj close,)the name of the ticker
# is placed before _data.csv i.e. the ticker aapl
# would have a csv file named aapl_data.csv.
df = pd.read_csv("cde_data.csv")


# resets the index back to the pandas default
# i.e. index starts at 0 for the first row and
# 1 for the second and continues by one till the
# end of the data in the above csv file. 
df.reset_index()



# the following code will allow for filtering of the datafram
# based on the year, day of week (dow), and month. It then gets
# applied to the dataframe and then can be used to sort data i.e
# print(df[(df.year == 2015) & (df.month == 5) & (df.dow == 4)])
# which will give you all the days in the month of May(df.month == 5), 
# that fall on a Thursday(df.dow == 4), in the year 2015 
# (df.year == 2015)
#
#      Month          Dow                      Year
# January    = 1  Monday    = 1  The year will be dispaly in a four
# February   = 2  Tuesday   = 2  digit format i.e. 2015
# March      = 3  Wednesday = 3
# April      = 4  Thursday  = 4
# May        = 5  Friday    = 5
# June       = 6
# July       = 7
# August     = 8
# September  = 9
# October    = 10
# November   = 11
# December   = 12
def year(x):
    return(x.year)
def dow(x):
    return(x.isoweekday())
def month(x):
    return(x.month)
df.Date            = df.Date.apply(dateutil.parser.parse)
df['year']         = df.Date.apply(year)
df['dow']          = df.Date.apply(dow)
df['month']        = df.Date.apply(month)


# The code below has a total of five sections all labeled by number.
# They are #1, #2, #3, #4, #5. Number one adds new columns to the df
# and populates them with data, number two filters out all the days
# that the market went down or flat for the day, number three filters
# out all of the days that the market went up or flat, number four 
# filters all of the days that  the market went up or down, and
# number five drops the excess columns and concats steps #2, #3, & #4. 


# 1
# there are five columns that are being added, up_down, up, down, 
# flat, and %chg. up, down, and flat are temporary and will be 
# deleted later on the other two up_down, and %chg will be permeant.
# The up_down column is derived from taking the 'close' column minus the
# 'open'column, this tells you how much the stock has moved for the day.
# The 'up' column is temporary and has a value of 'up' for all the rows
# of the DataFrame df. The 'down' column is temporary and has a value of  
# 'down' for all the rows of the DataFrame df. The 'down' column is   
# temporary and has a value of 'flat' for all the rows of the DataFrame 
# df. The '%chg' column is calculated by taking the results of the 
# 'up_down' divided by the 'close' column, and then times 100, which
# turns it into a percentage show what percent the stock moved up or 
# down for the day. All of the columns added below are added to the 
# DataFrame called df, which contains a a csv file(see code lines 14-20
# for information on the csv file contained in the DataFrame df). 

df['up']           = 'up'   
df['down']         = 'down'
df['flat']         = 'flat'
df['up_down']      = df['Close'] - df['Open'] 
df['%chg']         = ((df['up_down']/df['Close'])*100)      


# 2
# df column[up_down] is first filtered on the greater than zero
# criteria from the year 1984 on up and then is turned into df2.
# If the up_down column is greater than zero than this means that 
# the stock went up. Next df3 is set = to df2['up'], df3 now holds 
# just the days where the asset went up  
df2= (df[(df.year > 1984) & (df.up_down > 0)])
df3 = df2['up']



# 3
# df column[up_down] is first filtered on the less than zero
# criteria from the year 1984 on up and then is turned into df4.
# If the up_down column is less than zero than this means that 
# the stock went Down. Next df5 is set = to df4['down'], df5 now holds 
# just the days where the asset went down 
df4= (df[(df.year > 1984) & (df.up_down < 0)])
df5 = df4['down']



# 4
# df column[up_down] is first filtered on the equal to zero
# criteria from the year 1984 on up and then is turned into df6.
# If the up_down column is equal to zero than this means that 
# the stock did not move. Next df7 is set = to df6['flat'],df5 
# now holds just the days where the asset did not move at all 
df6= (df[(df.year > 1984) & (df.up_down == 0)])
df7 = df6['flat']



# 5
# The code below starts by droping the columns 'up', 'down', and 'flat'.
# These were temporary and were used to help filter data in the above
# code in sections two, three, and four. Finally we concat the 
# DataFrames df, df3, df5, and df7. We now have new 'up', 'down' and
# 'flat' columns that only display up, down, or flat when the criteria 
# is true.
df = df.drop(['up'], axis = 1)
df = df.drop(['down'], axis = 1)
df = df.drop(['flat'], axis = 1)
df = pd.concat([df,df3,df5,df7],axis =1, join_axes=[df.index])


# The difference between the close of current day and the previous day
# non percentage
df['Up_Down']          = df.Close.diff()


# The percentage of change on the Up_Down column
df['%Chg']             = ((df['up_down']/df['Close'])*100)

# How much the current opening price has moved up from the previous
# opening price in terms of percentage,
df['Open%pd']          = df.Open.pct_change()*100


# How much the current high price has moved up from the previous high
# price in terms of percentage.
df['High%pd']          = df.High.pct_change()*100


# How much the current low price has moved up from the previous low
# price in terms of percentage
df['Low%pd']           = df.Low.pct_change()*100


# How much the current close price has moved up from the previous close
# price in terms of percentage
df['Close%pd']         = df.Close.pct_change()*100


# How much the current volume price has moved up from the previous days
# volume in terms of percetage
df['Volume%pd']        = df.Volume.pct_change()*100


# Both columns take the percentage of change from open to high and open
# to low
df['High%fo']          = ((df.High - df.Open)/(df.Open))*100
df['Low%fo']           = ((df.Open - df.Low) / (df.Open))*100


# Takes the difference from the high price and the low price non 
# percentage
df['HighLowRange']     = df.High - df.Low


# Measures how much the range the high minus low has changed verses the
# previous day
df['HighLowRange%pd']  = df.HighLowRange.pct_change()*100


# df now is equal to only the months of March and only has the date and
# Close%pd column
df=df[['Date','Close%pd']][(df.month == 3)]


print(df)

这就是 df 打印的内容(自 1991 年以来三月的所有日子)例如

      Date   Close%pd
223  1991-03-01   2.097902
224  1991-03-04   1.369863
225  1991-03-05  -2.702703
226  1991-03-06   1.388889
227  1991-03-07   0.000000
228  1991-03-08   6.164384
229  1991-03-11  -4.516129
230  1991-03-12   0.675676
231  1991-03-13   2.684564
232  1991-03-14  -2.614379
233  1991-03-15  -1.342282
234  1991-03-18  -7.482993
235  1991-03-19   0.000000
236  1991-03-20   0.735294
237  1991-03-21   0.000000
238  1991-03-22   0.000000
239  1991-03-25  -0.729927
240  1991-03-26   0.000000
241  1991-03-27   0.735294
242  1991-03-28   0.000000
476  1992-03-02   0.000000
477  1992-03-03   0.000000
478  1992-03-04   0.000000
479  1992-03-05   0.000000
480  1992-03-06   0.000000
481  1992-03-09  -3.174603
482  1992-03-10   2.459016
483  1992-03-11   0.000000
484  1992-03-12  -1.600000
485  1992-03-13   0.813008
486  1992-03-16  -1.612903
487  1992-03-17  -1.639344
488  1992-03-18  -2.500000
489  1992-03-19   1.709402
490  1992-03-20  -1.680672
491  1992-03-23  -1.709402
492  1992-03-24  -1.739130
493  1992-03-25   2.654867
494  1992-03-26  -0.862069
495  1992-03-27   4.347826
496  1992-03-30  -1.666667
497  1992-03-31  -1.694915
728  1993-03-01   2.000000
729  1993-03-02   0.980392
730  1993-03-03   0.000000
731  1993-03-04   1.941748
732  1993-03-05   1.904762
733  1993-03-08   1.869159
734  1993-03-09   0.000000
735  1993-03-10   0.000000
736  1993-03-11  -1.834862
737  1993-03-12   3.738318
738  1993-03-15   4.504505
739  1993-03-16  -1.724138
740  1993-03-17   0.000000
741  1993-03-18   2.631579
742  1993-03-19   0.000000
743  1993-03-22   5.128205
744  1993-03-23   0.000000
745  1993-03-24   2.439024
...         ...        ...
6023 2014-03-10  -3.372835
6024 2014-03-11  -0.943396
6025 2014-03-12   2.761905
6026 2014-03-13  -1.019462
6027 2014-03-14   1.029963
6028 2014-03-17  -1.853568
6029 2014-03-18   4.815864
6030 2014-03-19  -2.792793
6031 2014-03-20   1.297498
6032 2014-03-21  -0.548948
6033 2014-03-24  -7.083717
6034 2014-03-25   0.198020
6035 2014-03-26  -8.003953
6036 2014-03-27   0.214823
6037 2014-03-28   3.215434
6038 2014-03-31  -3.530633
6269 2015-03-02   2.226027
6270 2015-03-03   0.670017
6271 2015-03-04  -4.991681
6272 2015-03-05  -2.101576
6273 2015-03-06 -10.017889
6274 2015-03-09  -5.367793
6275 2015-03-10  -6.722689
6276 2015-03-11   4.504505
6277 2015-03-12   1.293103
6278 2015-03-13   1.063830
6279 2015-03-16   6.315789
6280 2015-03-17  -5.544554
6281 2015-03-18   8.805031
6282 2015-03-19  -3.853565
6283 2015-03-20   7.014028
6284 2015-03-23   0.561798
6285 2015-03-24   1.862197
6286 2015-03-25  -3.473492
6287 2015-03-26  -1.325758
6288 2015-03-27  -3.262956
6289 2015-03-30  -3.968254
6290 2015-03-31  -2.685950
6521 2016-03-01  -1.295337
6522 2016-03-02   4.986877
6523 2016-03-03  12.250000
6524 2016-03-04   0.668151
6525 2016-03-07  11.061947
6526 2016-03-08  -7.370518
6527 2016-03-09   1.720430
6528 2016-03-10   3.382664
6529 2016-03-11   2.862986
6530 2016-03-14  -2.783300
6531 2016-03-15  -1.226994
6532 2016-03-16   9.730849
6533 2016-03-17   3.207547
6534 2016-03-18   2.193784
6535 2016-03-21   3.041145
6536 2016-03-22   0.694444
6537 2016-03-23  -9.482759
6538 2016-03-24   0.571429
6539 2016-03-28   3.030303
6540 2016-03-29   4.963235
6541 2016-03-30  -1.050788
6542 2016-03-31  -0.530973

[568 rows x 2 columns]
Press any key to continue . . .

最佳答案

如评论中所述,您可以使用 groupby 查找每月总计:

#change the previous last line of code to this
df=df[['Date','year','month','Close%pd']][(df.month == 3)]

#make a new dataframe
new_df = df.groupby(['year','month']).sum()

另一种方法是使用resample命令( docs )。这可能是获取每周总计的最佳方法,特别是因为您没有指示“一年中的哪一周”的变量,而这正是您要传递给 groupby 的变量。

df = df.resample('W', how='sum') #weekly totals
df = df.resample('M', how='sum') #monthly totals

关于python - 如何对数据帧的行进行求和,以便它仅根据月、日或年对值进行求和。然后将所有结果形成报告,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39501354/

相关文章:

python - 从空格分隔的字符串创建 Pandas DataFrame

python - 用于多项式回归的 GridsearchCV

Python:如何找到用户三胞胎访问过的项目的三胞胎

python - 相当于 numpy 数组的 pandas read_sql_query?

python - 从 CSV 文件读取时,pandas 如何处理时区?

python - Pandas 版本 0.22.0 - drop_duplicates() 得到意外的关键字参数 'keep'

r - 如何删除某些句子后面的文字?

python - NumPy 的 : assembling multiple slices into new array

Python Astroid (Pylint) 当类通过工厂时如何模拟继承?

python - 使用PIL的渐变只是一种纯色