python - 在一组固定的元素上生成特定秩的 "random"矩阵

我想生成大小为 m 的矩阵x n并排名 r ，元素来自指定的有限集，例如{0,1}或 {1,2,3,4,5} .我希望它们在这个词的某种非常宽松的意义上是“随机的”，即我想从算法中获得各种可能的输出，其分布与具有指定秩的该组元素上的所有矩阵的分布模糊相似。

事实上，我并不关心它的等级是 r , 只是它接近一个秩为 r 的矩阵(由 Frobenius 范数测量)。

当手头的集合是实数时，我一直在执行以下操作，这完全可以满足我的需要:生成矩阵 U尺寸m x r和 V的 n x r ，元素独立采样自例如正常 (0, 2)。然后 U V'是一个 m x n秩矩阵 r (嗯，<= r，但我认为它很有可能是 r)。

不过，如果我只是这样做然后四舍五入到二进制/1-5，排名会增加。

也可以通过执行 SVD 并取第一个 r 来获得矩阵的低阶近似值奇异值。但是，这些值不会位于所需的集合中，将它们四舍五入将再次提高排名。

This question是相关的，但接受的答案不是“随机的”，另一个答案建议 SVD，如前所述，它在这里不起作用。

我想到的一种可能性是制作r从集合中线性独立的行或列向量，然后通过这些的线性组合得到矩阵的其余部分。不过，我不是很清楚如何获得“随机”线性无关向量，或者之后如何以准随机方式组合它们。

(并不是说它 super 相关，但我在 numpy 中这样做。)

更新:我已经尝试了 EMS 在评论中建议的方法，使用这个简单的实现:

real = np.dot(np.random.normal(0, 1, (10, 3)), np.random.normal(0, 1, (3, 10)))
bin = (real > .5).astype(int)
rank = np.linalg.matrix_rank(bin)
niter = 0

while rank > des_rank:
    cand_changes = np.zeros((21, 5))
    for n in range(20):
        i, j = random.randrange(5), random.randrange(5)
        v = 1 - bin[i,j]
        x = bin.copy()
        x[i, j] = v
        x_rank = np.linalg.matrix_rank(x)
        cand_changes[n,:] = (i, j, v, x_rank, max((rank + 1e-4) - x_rank, 0))
    cand_changes[-1,:] = (0, 0, bin[0,0], rank, 1e-4)

    cdf = np.cumsum(cand_changes[:,-1])
    cdf /= cdf[-1]
    i, j, v, rank, score = cand_changes[np.searchsorted(cdf, random.random()), :]
    bin[i, j] = v
    niter += 1
    if niter % 1000 == 0:
        print(niter, rank)

它对小矩阵工作很快，但对例如10x10 -- 它似乎卡在第 6 或 7 级，至少在数十万次迭代中是这样。

看起来这可能与更好(即不太平坦)的目标函数一起工作得更好，但我不知道那会是什么。

我还尝试了一种简单的拒绝方法来构建矩阵:

def fill_matrix(m, n, r, vals):
    assert m >= r and n >= r
    trans = False
    if m > n: # more columns than rows I think is better
        m, n = n, m
        trans = True

    get_vec = lambda: np.array([random.choice(vals) for i in range(n)])

    vecs = []
    n_rejects = 0

    # fill in r linearly independent rows
    while len(vecs) < r:
        v = get_vec()
        if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
            vecs.append(v)
        else:
            n_rejects += 1
    print("have {} independent ({} rejects)".format(r, n_rejects))

    # fill in the rest of the dependent rows
    while len(vecs) < m:
        v = get_vec()
        if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
            n_rejects += 1
            if n_rejects % 1000 == 0:
                print(n_rejects)
        else:
            vecs.append(v)
    print("done ({} total rejects)".format(n_rejects))

    m = np.vstack(vecs)
    return m.T if trans else m

这适用于例如具有任何等级的 10x10 二进制矩阵，但不适用于 0-4 矩阵或具有较低等级的更大的二进制矩阵。 (例如，获得一个 20x20 的 15 阶二进制矩阵需要 42,000 次拒绝；对于 20x20 的 10 阶二进制矩阵，需要 120 万次。)

这显然是因为第一个 r 所跨越的空间rows 是我从中采样的空间的一部分太小了，例如{0,1}^10 ，在这些情况下。

我们想要第一个 r 的跨度的交集具有有效值集的行。所以我们可以尝试从跨度中采样并寻找有效值，但由于跨度涉及实值系数，因此永远不会找到我们有效的向量(即使我们进行归一化以便例如第一个分量在有效集合中)。

也许这可以表述为一个整数规划问题，或者什么？

最佳答案

我的 friend Daniel Johnson 在上面发表了评论，他提出了一个想法，但我看到他从未发布过。它不是很充实，但您可以对其进行调整。

If A is m-by-r and B is r-by-n and both have rank r then AB has rank r. Now, we just have to pick A and B such that AB has values only in the given set. The simplest case is S = {0,1,2,...,j}.

One choice would be to make A binary with appropriate row/col sums that guaranteed the correct rank and B with column sums adding to no more than j (so that each term in the product is in S) and row sums picked to cause rank r (or at least encourage it as rejection can be used).

I just think that we can come up with two independent sampling schemes on A and B that are less complicated and quicker than trying to attack the whole matrix at once. Unfortunately, all my matrix sampling code is on the other computer. I know it generalized easily to allowing entries in a bigger set than {0,1} (i.e. S), but I can't remember how the computation scaled with m*n.

关于python - 在一组固定的元素上生成特定秩的 "random"矩阵，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10132585/

python - 在一组固定的元素上生成特定秩的 "random"矩阵

上一篇：python - py.test 运行 Python2 和 Python3

下一篇：python - 什么 Python 正则表达式匹配所有字母字符但不匹配数字？ [unicode 感知]