python - 有没有更好的方法来确定 numpy 数组的交叉映射索引

我需要 numpy 并集和交集运算的交叉映射索引。我下面的代码工作正常，但我想在将其应用于大型数据集之前对其进行矢量化。或者，如果有更好的内置方式，那么它是什么？

# ------- define the arrays and set operations ---------
A = np.array(['a','b','c','e','f','g','h','j'])
B = np.array(['h','i','j','k','m'])
C = np.union1d(A, B)
D = np.intersect1d(A,B)

# ------- get the mapped indicies for the union ----
zc = np.empty((len(C),3,))
zc[:]=np.nan
zc[:,0] = range(0,len(C))
for iy in range(0,len(C)):
    for ix in range(0, len(A)):
        if A[ix] == C[iy]:
            zc[iy,1] = ix
    for ix in range(0, len(B)):
        if B[ix] == C[iy]:
            zc[iy,2] = ix

# ------- get the mapped indicies for the intersection ----
zd = np.empty((len(D),3,))
zd[:]=np.nan
zd[:,0] = range(0,len(D))
for iy in range(0,len(D)):
    for ix in range(0, len(A)):
        if A[ix] == D[iy]:
            zd[iy,1] = ix
    for ix in range(0, len(B)):
        if B[ix] == D[iy]:
            zd[iy,2] = ix

最佳答案

对于此类情况，您可能希望将字符串转换为数字，因为使用它们的效率要高得多。此外，考虑到输出是数字数组，因此将它们预先作为数字 ID 更有意义。现在，对于数字 ID 的转换，我看到人们使用 lambda 等方法，但我会使用 np.unique ，这对于此类情况非常有效。下面是从数字 ID 转换开始的实现 -

# ------------------------ Setup work -------------------------------
_,idx1 = np.unique(np.append(A,B),return_inverse=True)
A_ID = idx1[:A.size]
B_ID = idx1[A.size:]

# ------------------------ Union work -------------------------------
# Get length of zc, which would be the max of ID+1.
lenC = idx1.max()+1

# Initialize output array zc and fill with NaNs.
zc1 = np.empty((lenC,3,))
zc1[:]=np.nan

# Fill first column with consecutive numbers starting with 0
zc1[:,0] = range(0,lenC)

# Most important part of the code :
# Set the cols-1,2 at places specified by IDs from A and B respectively
# with values from 0 to the extent of the respective IDs
zc1[A_ID,1] = np.arange(A_ID.size)
zc1[B_ID,2] = np.arange(B_ID.size)

# ------------------------ Intersection work -------------------------------
# Get intersecting indices between A and B
intersect_ID = np.argwhere(A_ID[:,None] == B_ID)

# Initialize output zd based on the number of interesects
lenD = intersect_ID.shape[0]
zd1 = np.empty((lenD,3,))
zd1[:] = np.nan

# Fill first column with consecutive numbers starting with 0
zd1[:,0] = range(0,lenD)
zd1[:,1:] = intersect_ID

关于python - 有没有更好的方法来确定 numpy 数组的交叉映射索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33698592/

python - 有没有更好的方法来确定 numpy 数组的交叉映射索引

上一篇：python - Groupby 用逗号分隔总和

下一篇：Python for 循环帮助，附加到列表