我正在尝试测量 xy 点的整体结构以表示重复出现的粒子形成。我希望采用成对的方法通过相对于相邻点的定位来确定结构,而不是取原始笛卡尔坐标的平均值。
为此,我想在每个时间戳计算每个点与相邻点之间的向量。每对点之间的这些向量的平均值应提供整体结构。
注意:如果在特定点之间对向量进行硬编码,则无法正确识别结构。如果点交换位置或不同的点被替换但保留相同的结构,最终结果将不准确。我希望该函数能够仅根据相邻点来确定整体结构。
因此最终结构应该采用成对的方法,其中最终的空间分布,1) 将结构的质心设置为结构最密集部分中点的位置,由到第三个的平均距离确定-最近的邻居。 2) 确定它们的最近邻点的相对位置,该点的最近邻点的相对位置等等,直到所有点的位置都确定了。
我将在下面生成两个示例 df。使用 df1,第 1 帧显示第一个时间戳处的点之间的向量。第 2 帧对某些点进行新定位并为其他点交换位置(点 A 和 B 在帧之间交换定位)。最后一帧显示所有帧的每个矢量,而点显示平均结构。
import pandas as pd
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform
import matplotlib.pyplot as plt
import numpy as np
# Example 1:
df = pd.DataFrame({
'Time' : [1,1,1,1,1,2,2,2,2,2],
'id' : ['A','B','C','D','E','B','A','C','D','E'],
'X' : [1.0,2.8,4.0,2.0,2.0,1.5,3.0,5.0,3.0,2.5],
'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
})
def calculate_distances(group):
group_distances = pd.DataFrame(
squareform(pdist(group[["X", "Y"]].to_numpy())), # Default is Euclidean distance
columns=group["id"],
index=group["id"],
)
return group_distances
# Calculate the distances between the points, per timeframe
df_distances = df.groupby("Time").apply(calculate_distances)
# Create a placeholder to store the relative positions at every timestamp
relative_positions = {timestamp: [] for timestamp in df["Time"].values}
# Go over the timeframes
for timestamp, group in df.groupby("Time"):
# ---
# "... first, we set the centroid of the structure to be the position of the point in the densest part of the structure ..."
# Determine the density of the group, within this timeframe
kde = KernelDensity(kernel="gaussian", bandwidth=0.5).fit(group[["X", "Y"]])
log_density = kde.score_samples(group[["X", "Y"]])
# Centroid is the most dense point in the structure
centroid = group.iloc[np.argmax(log_density)]
# Make a list of the other points to keep track of which points we've handled
other_points = group["id"].to_list()
# Start by making the centroid the active point
active_point_id = centroid["id"]
# ---
# "... the relative position of that point’s nearest neighbor (ignoring any point already considered
# in the process) and so on, until the positions of all points in the team have been determined."
# Keep handling the next point until there are no points left
while len(other_points) > 1:
# Remove the active point from the list
other_points = [point for point in other_points if point != active_point_id]
# For the active point, get the nearest neighbor
nearest_neighbor = df_distances.loc[[timestamp]][active_point_id].droplevel(0).loc[other_points].sort_values().reset_index().iloc[0]["id"]
# ---
# "... We then identify the relative position of his nearest neighbor ..."
# Determine the relative position of the nearest neigbor (compared to the active point)
active_point_coordinates = group.loc[group["id"] == active_point_id, ["X", "Y"]].iloc[0].values
nearest_neighbor_coordinates = group.loc[group["id"] == nearest_neighbor, ["X", "Y"]].iloc[0].values
relative_position = active_point_coordinates - nearest_neighbor_coordinates
# Add the relative position to the list, for this timestamp
relative_positions[timestamp].append(relative_position)
# The neighbor becomes the active point
active_point_id = nearest_neighbor
# ---
# "... averaging the vectors between each pair of points over a specified time interval to gain a
# clear measure of their designated relative positions ..."
# Take the average vector, across timeframes
averages = np.mean([t for t in relative_positions.values()], axis=0)
# Plot the relative positions, NOTE: The centroid is always at (0, 0), and is not plotted
plt.scatter(averages[:,0], averages[:,1])
如果我在 0,0 处手动绘制质心,则输出为:
点结构框架1:
点结构框架2:
突出显示了两个帧的总向量。所以这些的平均点结构应该是:
如果我生成相同的点结构但将点向右移动以用于后续帧,则底层点结构应该相同。
df2 = pd.DataFrame({
'Time' : [1,1,1,1,1,2,2,2,2,2],
'id' : ['A','B','C','D','E','B','A','C','D','E'],
'X' : [1.0,3.0,4.0,2.0,2.0,3.0,5.0,6.0,4.0,4.0],
'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
})
预期结构:
最佳答案
我已经尝试按照您引用的论文对 T 进行跟踪,但他们的算法描述非常模糊。这是我的解决方案:
import numpy
import pandas
import random
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform
# From the paper:
# ---------------
# Formations are measured by calculating the vectors between each player and the rest of his
# teammates at successive instants during a match, averaging the vectors between each pair of
# players over a specified time interval to gain a clear measure of their designated relative positions.
# The final spatial distribution of the outfield players is determined by the following algorithm:
# first, we set the centroid of the formation to be the position of the player in the densest part of the
# team, as determined by the average distance to the third-nearest neighbor. We then identify the
# relative position of his nearest neighbor, the relative position of that player’s nearest neighbor
# (ignoring any player already considered in the process) and so on, until the positions of all players
# in the team have been determined.
# Your data, I've added some randomness to get a more realistic setting
df = pandas.DataFrame(
{
"Time": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
"id": ["A", "B", "C", "D", "E", "A", "B", "C", "D", "E"],
"Y": [element + random.random() * 0.25 for element in [1.0, 1.0, 0.0, 1.25, 2.0, 1.0, 1.0, 0.0, 1.25, 2.0]],
"X": [element + random.random() * 0.25 for element in [1.0, 3.0, 2.0, 2.25, 2.0, 3.0, 5.0, 4.0, 4.25, 4.0]],
}
)
# Plot the different timeframes (for reference)
for timestamp in df["Time"].unique():
df.loc[df["Time"] == timestamp].plot(kind="scatter", x="X", y="Y")
def calculate_distances(group: pandas.DataFrame) -> pandas.DataFrame:
""" Calculate the distances between the players, within a specific timeframe.
Args:
group (pandas.DataFrame): The data from a specif timeframe
Returns:
pandas.DataFrame: The distances
"""
group_distances = pandas.DataFrame(
squareform(pdist(group[["X", "Y"]].to_numpy())), # Default is Euclidean distance
columns=group["id"],
index=group["id"],
)
return group_distances
# Calculate the distances between the points, per timeframe
df_distances = df.groupby("Time").apply(calculate_distances)
# Create a placeholder to store the relative positions at every timestamp
relative_positions = {timestamp: [] for timestamp in df["Time"].values}
# Go over the timeframes
for timestamp, group in df.groupby("Time"):
# ---
# "... first, we set the centroid of the formation to be the position of the player in the densest part of the team ..."
# Determine the density of the group, within this timeframe
kde = KernelDensity(kernel="gaussian", bandwidth=0.5).fit(group[["X", "Y"]])
log_density = kde.score_samples(group[["X", "Y"]])
# Centroid is the most dense point in the formation
centroid = group.iloc[numpy.argmax(log_density)]
# Make a list of the other players to keep track of which players we've handled
other_players = group["id"].to_list()
# Start by making the centroid the active player
active_player_id = centroid["id"]
# ---
# "... the relative position of that player’s nearest neighbor (ignoring any player already considered
# in the process) and so on, until the positions of all players in the team have been determined."
# Keep handling the next player until there are no players left
while len(other_players) > 1:
# Remove the active player from the list
other_players = [player for player in other_players if player != active_player_id]
# For the active player, get the nearest neighbor
nearest_neighbor = df_distances.loc[[timestamp]][active_player_id].droplevel(0).loc[other_players].sort_values().reset_index().iloc[0]["id"]
# ---
# "... We then identify the relative position of his nearest neighbor ..."
# Determine the relative position of the nearest neigbor (compared to the active player)
active_player_coordinates = group.loc[group["id"] == active_player_id, ["X", "Y"]].iloc[0].values
nearest_neighbor_coordinates = group.loc[group["id"] == nearest_neighbor, ["X", "Y"]].iloc[0].values
relative_position = active_player_coordinates - nearest_neighbor_coordinates
# Add the relative position to the list, for this timestamp
relative_positions[timestamp].append(relative_position)
# The neighbor becomes the active player
active_player_id = nearest_neighbor
# ---
# "... averaging the vectors between each pair of players over a specified time interval to gain a
# clear measure of their designated relative positions ..."
# Take the average vector, across timeframes
averages = numpy.mean([t for t in relative_positions.values()], axis=0)
# Plot the relative positions, NOTE: The centroid is always at (0, 0), and is not plotted
pandas.DataFrame(averages, columns=["X", "Y"]).plot(kind="scatter", x="X", y="Y")
上一个答案:
第一部分(修复您的代码示例)并不太难。 scipy
有一个名为 pdist
的函数,它计算多个维度(在本例中为 2)的一组点之间的距离。如果您希望按时间范围进行此操作,则只需使用 groupby
。
第二部分更难,因为它并不完全清楚您希望实现的目标。无需先前的距离计算即可找到编队中最“密集”的点。 sklearn
有一个用于此的 KernelDensity
类。除此之外,我无法真正按照您的意愿行事,因为在您的编队中没有最近的邻居(从质心到所有其他点的距离相等,因此所有邻居都同样接近)。但是,我认为您可以为此目的使用平均距离矩阵 (df_distances_mean
),因为它确实包含所有距离。您只需选择距质心距离最近的下一个。
下面是使用 KernelDensity
类计算距离并找到质心的代码:
import numpy
import pandas
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform
# Your data
df = pandas.DataFrame(
{
"Time": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
"id": ["A", "B", "C", "D", "E", "A", "B", "C", "D", "E"],
"X": [1.0, 3.0, 2.0, 2.0, 2.0, 3.0, 5.0, 4.0, 4.0, 4.0],
"Y": [1.0, 1.0, 0.0, 1.0, 2.0, 1.0, 1.0, 0.0, 1.0, 2.0],
}
)
def calculate_distances(group: pandas.DataFrame) -> pandas.DataFrame:
group_distances = pandas.DataFrame(
squareform(pdist(group[["X", "Y"]].to_numpy())), # Default is Euclidean distance
columns=group["id"],
index=group["id"],
)
return group_distances
# Calculate the distances between the points, per timeframe
df_distances = df.groupby("Time").apply(calculate_distances)
# Take the mean distance across timeframes (since your points are just shifted right over time, this should be constant)
df_distances_mean = pandas.DataFrame(
numpy.mean([group.to_numpy() for _, group in df_distances.groupby("Time")], axis=0),
columns=df_distances.columns,
index=df_distances.columns,
)
# df_distances_mean will now contain the mean distance between points
# =====================
# Not a 100% sure what you want to achieve from this point forward
# Determining the density of the formation (at each time step)
for _, group in df.groupby("Time"):
# Determine the density
kde = KernelDensity(kernel="gaussian", bandwidth=0.5).fit(group[["X", "Y"]])
log_density = kde.score_samples(group[["X", "Y"]])
# Centroid is the most dense point in the formation (?)
centroid = group.iloc[numpy.argmax(log_density)]
print("Centroid based on density:", centroid)
输出:
Centroid based on density: Time 1
id D
X 2
Y 1
Name: 3, dtype: object
Centroid based on density: Time 2
id D
X 4
Y 1
Name: 8, dtype: object
print(df_distances_mean)
id A B C D E
id
A 0.000000 2.000000 1.414214 1.0 1.414214
B 2.000000 0.000000 1.414214 1.0 1.414214
C 1.414214 1.414214 0.000000 1.0 2.000000
D 1.000000 1.000000 1.000000 0.0 1.000000
E 1.414214 1.414214 2.000000 1.0 0.000000
关于python - xy点的测量结构-python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65001543/