python - 这个向量是如何工作的？

我试图理解这里的代码。

f = open('/Users/nk/Vocab.txt','r')
vocab_temp = f.read().split()
f.close()
col = len(vocab_temp)
print("Training column size:")
print(col)

row = run('cat '+'/Users/nk/X_train.txt'+" | wc -l").split()[0]
print("Training row size:")
print(row)
matrix_tmp = np.zeros((int(row),col), dtype=np.int64)
print("Train Matrix size:")
print(matrix_tmp.size)

label_tmp = np.zeros((int(row)), dtype=np.int64)
f = open('/Users/nk/X_train.txt','r')
count = 0
for line in f:
    line_tmp = line.split()
    #print(line_tmp)
    for word in line_tmp[0:]:
        if word not in vocab_temp:
            continue
        matrix_tmp[count][vocab_temp.index(word)] = 1
    count = count + 1
f.close()

据我了解，col 基本上是词汇表中的单词，row 是训练集中的文本数据。我还了解到，在循环中，代码实际上是检查词汇表中存在的单词是否存在于训练集中。有人可以解释一下 continue 之后的行会做什么吗？

最佳答案

matrix_tmp[count][vocab_temp.index(word)] = 1 如果您查看代码，就会发现每行 count 都会增加 1。因此，matrix_tmp[count] 是每行的单词向量。

现在，考虑 vocab_temp.index(word)，您可以在第二行中看到 vocab_temp 保留由 f.read 生成的向量().split().

事实上，它从 vocab_temp 中获取索引，它实际上获取了矩阵的位置(矩阵中单词 word 所在的索引)并设置它到 1(单词 word 出现在 index 位置)。

关于python - 这个向量是如何工作的？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34903222/

上一篇：python - 如何在机器人框架中保存对象状态

下一篇：python - 如何将 pandas 数据框中的 int 索引转换为日期索引？

相关文章：

python - Pandas:使用 timedelta 合并两个数据帧

python - 为什么从文件中读取的 numpy 数组会消耗这么多内存？

python - 如何用Python求解符号方程组？

opencv - 在 Emgucv 或 Opencv 中将多个一维矩阵复制到一个大的一维矩阵中

c++ - D3DX 在错误的坐标中相交光线？

c++ - 如何在 C++ 中将一个矩阵存储在另一个矩阵的一行中？

arrays - Matlab:带矩阵的 Arrayfun

python - 名称错误 : name 'PROTOCOL_TLS' is not defined

python - CPython 内部结构

python - 将两个字符串与 'is' 进行比较——未按预期执行