python - 将附近的散点合并为一个并增加其大小

标签 python pandas pandas-groupby scatter-plot

我有 3 个变量 RZPRPTC。我用不同的颜色绘制了它们之间的散点图。 RZ 在 0-800 之间变化,PRP 在 0-4000 之间变化,TC 在 0-100 之间变化。代码及图如下:

fig = plt.figure(figsize=(12, 10))
points = plt.scatter(RZS_P.PRP, RZS_P.RZ, c=RZS_P.TC, cmap="Spectral", 
lw=1, s = 60 , vmax = 100, vmin =0, alpha = 0.7, edgecolors= 'b')
plt.colorbar(points)

enter image description here

我想要做的是将附近的一些相同的点与 PRP (± 250)、RZ (± 50) 和 TC (± 5) [或类似的东西] 组合为单个点,并增加其尺寸。这将提供比下面的散点图更好的可视化效果。 基本上,我想要实现的目标是通过取平均值然后绘制它,将散点与接近相似的值(或范围或箱内的值)结合起来。

下面提到了我提出的一些代码(尽管这个代码仅在重叠时增加散点的大小并且不考虑邻居):

# First defining a list with specifications as
data_dict = {250: np.array(RZS_P['RZ'][RZS_P.PPR < 250]),
             500:np.array(RZS_P['RZ'][(RZS_P.PRP > 250) & (RZS_P.PRP < 500)]),
             ....................
             4000:np.array(RZS_P['RZ'][(RZS_P.PRP > 3750) & (RZS_P.PRP< 4000)])}
size_constant = 20

for xe, ye in data_dict.items():
    xAxis = [xe] * len(ye)

    #square it to amplify the effect, if you do ye.count(num)*size_constant the effect is barely noticeable
    sizes = [ye.tolist().count(num)**3 * size_constant for num in ye]
    plt.scatter(xAxis, ye, s=sizes)
plt.show()

我的理想身材应该是这样的: enter image description here 有人可以帮我解决这个问题吗?

其他信息: 有关动态编码的更多信息

### Divide the dataset into categories first and then plot
P_range = np.arange(0,4000,500); RZ_range = np.arange(0,1000,100); TC_range = np.arange(0,100,10)

i = 0; j = 0; k = 0; 
RZS_P[(RZS_P.P_2001 >= P_range[i]-250) & (RZS_P.P_2001 < P_range[i]+250) & (RZS_P.Rootzone >= RZ_range[j]-50) & 
      (RZS_P.Rootzone < RZ_range[j]+50) & (RZS_P.Treecover >= TC_range[k]-5) & (RZS_P.Treecover < TC_range[k]+5)].describe()
[Output]:
        RZ          PRP         TC  
count   1.000000    1.000000    1.000000    
mean    43.614338   220.068451  2.179487    
std      NaN        NaN         NaN         
### For above, I want my scatter point to remain same

i = 0; j = 1; k = 0; 
[Output]:
        RZ          PRP         TC  
count   28.000000   28.000000   28.000000   
mean    104.511887  124.827377  1.982593    
std      29.474167  62.730640   0.977752    
## For this subset I want my scatter point to have a size of 29 and 62 (as std) on x and 
## y-axis, respectively (so basically an oval) with centre at 104 and 124 (as mean) on x and y respectively. 
## Since the count is 28, I want my scatter point to be relatively bigger than 
## previous (based on this count throughout the analysis). The values of mean TC 
## would be used as the colour axis (same as Fig. 1).

我最接近的目标:

P_range = np.arange(0,4000,200); RZ_range = np.arange(0,1000,50); TC_range = np.arange(0,110,10)

x = []; y = []; z = []; height = []; width = []; size = [] 
for i in range(P_range.shape[0]):
    for j in range(RZ_range.shape[0]):
        for k in range(TC_range.shape[0]):
            stats = RZS_P[(RZS_P.PRP>= P_range[i]-100) & (RZS_P.PRP< P_range[i]+100) & (RZS_P.RZ>= RZ_range[j]-25) & 
                          (RZS_P.RZ< RZ_range[j]+25) & (RZS_P.TC>= TC_range[k]-5) & (RZS_P.TC< TC_range[k]+5)].describe()
            x.append(stats.to_numpy()[1,1]) 
            y.append(stats.to_numpy()[1,0])
            z.append(stats.to_numpy()[1,2])
            width.append(stats.to_numpy()[2,1])
            height.append(stats.to_numpy()[2,0])
            size.append(stats.to_numpy()[0,0])

final_scatters = pd.DataFrame({'PRP': x, 'RZ': y, 'TC': z, 'height': height, 'width': width, 'size': size})
#final_scatters looks like this
    PRP         RZ          TC           height      width      size
22  84.423500   91.315781   2.492503    17.500629   18.499458   2.0
33  61.671188   137.650848  1.305071    18.169079   20.138525   6.0
143 53.673630   634.536926  3.443243    1.000000    1.000000    1.0
231 202.459641  62.480145   2.156926    8.962382    46.061661   21.0
242 217.588333  98.111694   2.011893    15.964933   59.468643   20.0
....................................................................

fig = plt.figure(figsize=(12, 10))

points = plt.scatter(final_scatters.PRP, final_scatters.RZ, c=final_scatters.TC, cmap="Spectral",
                     s = final_scatters['size']*40, vmax = 100, vmin =0, alpha = 0.9, edgecolors= 'black')
plt.colorbar(points)

enter image description here

现在我正在对椭圆进行以下操作,但得到一个空框:

ells = [Ellipse(xy = np.array([np.array(final_scatters)[i,0], np.array(final_scatters)[i,1]]), width=np.array(final_scatters)[i,4], 
                height=np.array(final_scatters)[i,3]) for i in range(len(final_scatters))]
fig = plt.figure(0)
ax = fig.add_subplot(111)
for e in ells:
    ax.add_artist(e)
    e.set_clip_box(ax.bbox)
    e.set_alpha(rnd.rand())
    e.set_facecolor(rnd.rand(3))

最佳答案

如果您的 final_scatters 数据框结构良好,由预期的椭圆组成一行:

final_scatters = pd.DataFrame({'PRP': x, 'RZ': y, 'TC': z, 'height': height, 'width': width, 'size': size})
#final_scatters looks like this
    PRP         RZ          TC           height      width      size
22  84.423500   91.315781   2.492503    17.500629   18.499458   2.0
33  61.671188   137.650848  1.305071    18.169079   20.138525   6.0
143 53.673630   634.536926  3.443243    1.000000    1.000000    1.0
231 202.459641  62.480145   2.156926    8.962382    46.061661   21.0
242 217.588333  98.111694   2.011893    15.964933   59.468643   20.0

您可以逐行迭代它并绘制省略号:

fig, ax = plt.subplot()

for i, row in final_scatters.iterrows():
    ax.add_artist(Ellipse(
        xy = (row['PRP'], row['RZ']),
        width = row['width'], 
        height = row['height'],
        alpha = 0.5  # in case you want some transparency 
    ))

关于python - 将附近的散点合并为一个并增加其大小,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59490101/

相关文章:

python - 如何加速数百万对象的python实例初始化?

python - 将 df 转换为字典时,我们可以不取字典中的列名吗?

python - 解析srt字幕

Python解释器等待子进程死亡

python - 从 matplotlib.pyplot 导入 plt ImportError : No module named 'matplotlib'

python - 无法将字符串转换为 float : 'Product_0332'

python - 根据分组后每组中最后一个值的内容对列进行求和和计数

python - pandas 在 groupby 级别 2 总和或平均条件上删除行

python - Pandas 计算 groupby 中的平均单词数

python-3.x - groupby 并根据条件修剪一些行