python - 有没有更好的方法来确定 numpy 数组的交叉映射索引

标签 python arrays numpy vectorization

我需要 numpy 并集和交集运算的交叉映射索引。我下面的代码工作正常,但我想在将其应用于大型数据集之前对其进行矢量化。或者,如果有更好的内置方式,那么它是什么?

# ------- define the arrays and set operations ---------
A = np.array(['a','b','c','e','f','g','h','j'])
B = np.array(['h','i','j','k','m'])
C = np.union1d(A, B)
D = np.intersect1d(A,B)

# ------- get the mapped indicies for the union ----
zc = np.empty((len(C),3,))
zc[:]=np.nan
zc[:,0] = range(0,len(C))
for iy in range(0,len(C)):
    for ix in range(0, len(A)):
        if A[ix] == C[iy]:
            zc[iy,1] = ix
    for ix in range(0, len(B)):
        if B[ix] == C[iy]:
            zc[iy,2] = ix

# ------- get the mapped indicies for the intersection ----
zd = np.empty((len(D),3,))
zd[:]=np.nan
zd[:,0] = range(0,len(D))
for iy in range(0,len(D)):
    for ix in range(0, len(A)):
        if A[ix] == D[iy]:
            zd[iy,1] = ix
    for ix in range(0, len(B)):
        if B[ix] == D[iy]:
            zd[iy,2] = ix

最佳答案

对于此类情况,您可能希望将字符串转换为数字,因为使用它们的效率要高得多。此外,考虑到输出是数字数组,因此将它们预先作为数字 ID 更有意义。现在,对于数字 ID 的转换,我看到人们使用 lambda 等方法,但我会使用 np.unique ,这对于此类情况非常有效。下面是从数字 ID 转换开始的实现 -

# ------------------------ Setup work -------------------------------
_,idx1 = np.unique(np.append(A,B),return_inverse=True)
A_ID = idx1[:A.size]
B_ID = idx1[A.size:]

# ------------------------ Union work -------------------------------
# Get length of zc, which would be the max of ID+1.
lenC = idx1.max()+1

# Initialize output array zc and fill with NaNs.
zc1 = np.empty((lenC,3,))
zc1[:]=np.nan

# Fill first column with consecutive numbers starting with 0
zc1[:,0] = range(0,lenC)

# Most important part of the code :
# Set the cols-1,2 at places specified by IDs from A and B respectively
# with values from 0 to the extent of the respective IDs
zc1[A_ID,1] = np.arange(A_ID.size)
zc1[B_ID,2] = np.arange(B_ID.size)

# ------------------------ Intersection work -------------------------------
# Get intersecting indices between A and B
intersect_ID = np.argwhere(A_ID[:,None] == B_ID)

# Initialize output zd based on the number of interesects
lenD = intersect_ID.shape[0]
zd1 = np.empty((lenD,3,))
zd1[:] = np.nan

# Fill first column with consecutive numbers starting with 0
zd1[:,0] = range(0,lenD)
zd1[:,1:] = intersect_ID

关于python - 有没有更好的方法来确定 numpy 数组的交叉映射索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33698592/

相关文章:

python - Matplotlib - 从轴上删除东西

python - HTML 清理代码不太有效

Bash : curly braces 的 Python 子进程

arrays - Flutter 从动态 json 响应创建动态小部件

python - 矩阵相乘得到 einsum 的数据类型无效

python - Numpy 3D 数组最大值

python - SQLite 性能基准测试——为什么 :memory: so slow. ..只有磁盘的 1.5 倍?

python - 应用 SVD 会立即引发内存错误?

arrays - Delphi 中的 Length() 函数如何工作?

Python比较两个不均匀数组