我想生成每天平均 CPU 使用率的线性最佳拟合趋势线。
我的数据如下所示:
host_df_means['cpu_usage_percent']
history_datetime
2020-03-03 9.727273
2020-03-04 9.800000
2020-03-05 9.727273
2020-03-06 10.818182
2020-03-07 9.500000
2020-03-08 10.909091
2020-03-09 15.000000
2020-03-10 14.333333
2020-03-11 15.333333
2020-03-12 16.000000
2020-03-13 21.000000
2020-03-14 28.833333
Name: cpu_usage_percent, dtype: float64
然后我用以下方法绘制:
plot = host_df_means['cpu_usage_percent'].plot()
plot.set_xlim([datetime.date(2020, 3, 3), datetime.date(2020, 3, 31)])
plot;
这创造了这样的情节
所以现在我想为 future 添加一条趋势线,如下所示:
最佳答案
将您的数据保留为 pd.DataFrame
,诀窍是将日期转换为可用于执行线性回归的数字类型。
import datetime
import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats as stats
from io import StringIO
# Set up data as in question
host_df_means = pd.read_csv(StringIO("""
2020-03-03 9.727273
2020-03-04 9.800000
2020-03-05 9.727273
2020-03-06 10.818182
2020-03-07 9.500000
2020-03-08 10.909091
2020-03-09 15.000000
2020-03-10 14.333333
2020-03-11 15.333333
2020-03-12 16.000000
2020-03-13 21.000000
"""),
sep='\s+', header=None, parse_dates=[0], index_col=0)
host_df_means.columns = ['cpu_usage_percent']
host_df_means.index.name = 'history_datetime'
fig, ax = plt.subplots(1, 1)
ax.plot(host_df_means.index, host_df_means)
ax.set_xlim([datetime.date(2020, 3, 3), datetime.date(2020, 3, 31)])
# To perform the linear regression we need the dates to be numeric
host_df_means.index = host_df_means.index.map(datetime.date.toordinal)
# Perform linear regression
slope, y0, r, p, stderr = stats.linregress(host_df_means.index,
host_df_means['cpu_usage_percent'])
# x co-ordinates for the start and end of the line
x_endpoints = pd.DataFrame([host_df_means.index[0], host_df_means.index[-1]])
# Compute predicted values from linear regression
y_endpoints = y0 + slope * x_endpoints
# Overlay the line
ax.plot(x_endpoints, y_endpoints, c='r')
ax.set_xlabel('history_datetime')
关于python - 如何使用 matplotlib 和 pandas 绘制日期时间与值的线性趋势线?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60704798/