Python:如何跨多个间隔创建固定范围的散点图?

标签 python pandas matplotlib dataframe scatter-plot

我有以下 Pandas 数据框:

import pandas as pd
df = pd.read_table(...)
df

>>> df
>>>    interval  location type  y_axis
0        01      1230    X      50
1        01      1609    X      55
2        01      1903    Y      54
3        01      2574    A      58
4        01      3151    A      57
5        01      3198    B      46
6        01      3312    X      50
...                 .....
         02      42      X      31
         02      214     A      23
         02      598     X      28
....

有几个间隔,例如0102等。在每个区间内,数据点在1到10,000的范围内。在 df 中,第一个数据点在 40,下一个在 136,依此类推。

Interval 02 的范围也是从 1 到 15,000。

我想创建一个散点图,以便为每个间隔按比例绘制 1 到 15000 的范围。然后第一个点将绘制在 1230,下一个绘制在 1609,等等。我还想要一条垂直线来显示间隔的位置。散点图的 x 轴的间距应为 1 到 10,000。每个区间都是一个“区域”,包含从 1 到 10,000 的 x 轴。所以x轴上的坐标是interval1: 1 to 15000, interval2: 1 to 15000, interval 3: 1 to 15000, etc.(这几乎就像是几个单独的散点图连接在一起。)

如何做到这一点?没有这种复杂的间隔,如果想从这个 DataFrame 创建一个散点图,可以使用:

df.plot(kind='scatter', x = "location", y = "y_axis")

这是前 50 行:

d = {"interval" : ["01",                                                                                                                                                                                                              
 "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",                                                                                                                                                                                                          
 "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",                                                                                                                                                                                                          
 "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",                                                                                                                                                                                                          
 "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",                                                                                                                                                                                                          
 "01", "01", "01", "01", "01"], "location" : [1230, 1609,                                                                                                                                                                                                      
 1903, 2574, 3151, 3198, 3312, 3659, 3709,                                                                                                                                                                                                      
 3725, 4172, 4542, 4860, 4900, 5068, 5220,                                                                                                                                                                                                      
 5260, 5339, 5442, 5529, 5773, 6128, 6165,                                                                                                                                                                                                      
 6177, 6269, 6275, 6460, 7167, 7361, 7361,                                                                                                                                                                                                      
 8051, 8222, 8305, 8992, 9104, 9439, 9844,                                                                                                                                                                                                      
 10045, 10764, 10787, 11104, 11478, 11508,                                                                                                                                                                                                          
 11684, 12490, 12590, 12794, 12803, 13823,                                                                                                                                                                                                          
 13982], "type" : ["X", "X", "Y", "A", "A",                                                                                                                                                                                                              
     "B", "X", "X", "X", "B", "B", "A", "A", "A", "B", "B", "X",                                                                                                                                                                                                            
     "B", "Y", "X", "X", "Y", "Y", "C", "A", "X", "X", "Z", "Z",                                                                                                                                                                                                            
     "B", "X", "X", "A", "A", "Y", "X", "A", "X", "X", "Z", "Z",                                                                                                                                                                                                            
     "C", "X", "Y", "Y", "Z", "Z", "Z", "Z", "Z"],  "y_axis" : [50, 55, 
    54, 58, 57, 46, 50, 55, 46, 42, 56, 55, 55, 45, 52, 51, 45, 48, 50,
     49, 53, 55, 45, 40, 49, 37, 52, 58, 52, 4, 58, 52, 49, 58, 50, 55, 
    56, 53, 58, 43, 55, 55, 44, 52, 59, 49, 53, 39, 60, 52]}

最佳答案

这里的主要挑战似乎是您希望 x 轴既是分类的(区间 0102 等)又是度量的(值 1-15000).正如您甚至在帖子中指出的那样,您实际上是在谈论使用共享 y 轴绘制多个散点图。我建议您使用 subplotsgroupby 来做到这一点。您可以使用 subplots_adjust() 调整绘图之间的空间,就像我在这个答案中所做的那样。

首先,使用 OP 中的 d 生成一些示例数据。我们还将随机选择一半的观察结果并将它们更改为 interval=02,以演示所需的镶板:

import pandas as pd
import numpy as np

df = pd.DataFrame(d)

# shuffle rows 
# (taken from this answer: http://stackoverflow.com/a/15772330/2799941)
df = df.reindex(np.random.permutation(df.index))

# randomly select half of the rows for changing to interval 02
interval02 = df.sample(int(df.shape[0]/2.)).index
df.loc[interval02, 'interval'] = "02"

现在使用 pyplot 指定并排的子图,并删除图之间的任何填充。

from matplotlib import pyplot as plt

# n_plots = number of different interval values
n_plots = len(df.interval.unique())

fig, axes = plt.subplots(1, n_plots, figsize=(10,5), sharey=True)

# remove space between plots
fig.subplots_adjust(hspace=0, wspace=0)

最后,groupby interval 和 plot:

for i, (name, group) in enumerate(df.groupby('interval')):
    group.plot(kind="scatter", x='location', y='y_axis', 
               ax=axes[i], title="Interval {}".format(name))

side-by-side plot

关于Python:如何跨多个间隔创建固定范围的散点图?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43728708/

相关文章:

python - 是否可以存储selenium对象? (Python)

python - fmin_l_bfgs_b 返回 NaN 作为函数值,但我不明白

python - 来自 .itertext() 的 lxml 错误 "ValueError: Input object has no element: HtmlComment"

python - Pandas - 确定 Churn 是否发生缺失年份

python - 添加数据标签 - matplotlib 条形图

python - 使用 Matplotlib 绘制二维数组

python - 更新多列: the Username disappears

python - 你能在不知道格式的情况下使用 datetime.strptime 吗?

pandas - 从 Pandas 数据帧覆盖 seaborn 中的两个热图(一个是围绕另一个单元格的框架)

python - 在堆积条形图中注释值 Matplotlib