对标题不满意,但我不知道如何让它变得更好。
我有一个包含 3 个元素的元组列表,其中前 2 个元素具有重复值。基本上,它代表一种矩阵(假设为 2 * 3):
[(a0, b0, val), (a0, b1, val), (a0, b2, val), (a1, b0, val), (a1, b1, val), (a1, b2, val)]
它代表一个矩阵,如下所示:
b0 b1 b2
a0 val val val
a1 val val val
对于每一行,我想获取与最低 val 对应的 b 值。我很累了,而且我的想法充其量也并不令人满意。
在一个具体的例子中,这是我所做的:
res = [(915, 1584, 2618.40972202602), (915, 3835, 323.293876052119), (915, 7483, 1521.50879590718), (916, 1584, 2609.47030580952), (916, 3835, 314.354459835623), (916, 7483, 1512.56937969069), (1346, 1584, 3012.63009273444), (1346, 3835, 717.514246760547), (1346, 7483, 1582.83428580677), (4281, 1584, 2603.7125461067), (4281, 3835, 308.596700132804), (4281, 7483, 1464.5140765524), (4282, 1584, 2608.78719959729), (4282, 3835, 313.671353623393), (4282, 7483, 1459.43942306181), (4283, 1584, 2614.00974611433), (4283, 3835, 318.89390014043), (4283, 7483, 1454.21687654477), (4284, 1584, 2619.17131078887), (4284, 3835, 324.05546481497), (4284, 7483, 1449.05531187023), (4287, 1584, 2634.63255731146), (4287, 3835, 339.516711337566), (4287, 7483, 1433.59406534764), (4288, 1584, 2639.73617965108), (4288, 3835, 344.620333677179), (4288, 7483, 1428.49044300803), (4290, 1584, 2650.08066128732), (4290, 3835, 354.96481531342), (4290, 7483, 1418.14596137178), (4297, 1584, 2592.7709526482), (4297, 3835, 297.655106674305), (4297, 7483, 1475.4556700109), (4298, 1584, 2597.94359872049), (4298, 3835, 302.827752746592), (4298, 7483, 1470.28302393861), (4299, 1584, 2603.13534825911), (4299, 3835, 308.019502285211), (4299, 7483, 1465.09127439999), (4305, 1584, 2580.83715850017), (4305, 3835, 285.721312526271), (4305, 7483, 1487.38946415893), (4306, 1584, 2575.62363753943), (4306, 3835, 280.507791565529), (4306, 7483, 1492.60298511968), (4310, 1584, 2555.06067283699), (4310, 3835, 259.94482686309), (4310, 7483, 1513.16594982211), (8350, 1584, 2618.12918933735), (8350, 3835, 323.013343363448), (8350, 7483, 1478.93071978304), (8351, 1584, 2632.5746391363), (8351, 3835, 337.458793162408), (8351, 7483, 1493.376169582)]
r = np.array(res)
c = np.unique(r[:,0])
for val in c:
d = (val, r[r[:,2]==np.amin(r[r[:,0]==val][:,2])][0,1], r[r[:,2]==np.amin(r[r[:,0]==val][:,2])][0,2])
print(d)
>>> (915.0, 3835.0, 323.293876052119)
>>> (916.0, 3835.0, 314.354459835623)
>>> (1346.0, 3835.0, 717.514246760547)
>>> (4281.0, 3835.0, 308.596700132804)
>>> (4282.0, 3835.0, 313.671353623393)
>>> (4283.0, 3835.0, 318.89390014043)
>>> (4284.0, 3835.0, 324.05546481497)
>>> (4287.0, 3835.0, 339.516711337566)
>>> (4288.0, 3835.0, 344.620333677179)
>>> (4290.0, 3835.0, 354.96481531342)
>>> (4297.0, 3835.0, 297.655106674305)
>>> (4298.0, 3835.0, 302.827752746592)
>>> (4299.0, 3835.0, 308.019502285211)
>>> (4305.0, 3835.0, 285.721312526271)
>>> (4306.0, 3835.0, 280.507791565529)
>>> (4310.0, 3835.0, 259.94482686309)
>>> (8350.0, 3835.0, 323.013343363448)
>>> (8351.0, 3835.0, 337.458793162408)
然后可以将其放回到另一个“仅最少行”numpy 数组中。请注意,在这种特定情况下,每行都会指向相同的 b 值,这是“偶然”的,不应假设。
虽然这在技术上可行,但我必须说我很少看到更丑陋的代码。我确信必须有一种更聪明、更清晰的方法来实现我想要实现的目标。有什么建议吗?
另请注意,如果绝对需要的话,我可以事先知道有多少个“重复”(即 b 集的大小)。
最佳答案
df = pd.DataFrame(data=res, columns=['0', '1', '2'])
print(df.loc[df.groupby('0')['2'].idxmin()])
结果:
0 1 2
1 915 3835 323.293876
4 916 3835 314.354460
7 1346 3835 717.514247
10 4281 3835 308.596700
13 4282 3835 313.671354
16 4283 3835 318.893900
19 4284 3835 324.055465
22 4287 3835 339.516711
25 4288 3835 344.620334
28 4290 3835 354.964815
31 4297 3835 297.655107
34 4298 3835 302.827753
37 4299 3835 308.019502
40 4305 3835 285.721313
43 4306 3835 280.507792
46 4310 3835 259.944827
49 8350 3835 323.013343
52 8351 3835 337.458793
关于python - 查找具有重复值的元组列表中的最小行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60348446/