我是图表世界的新手,希望得到一些帮助:-)
我有一个包含 10 个句子的数据框,我计算了每个句子之间的余弦相似度。
原始数据框:
text
0 i like working with text
1 my favourite colour is blue and i like beans
2 i have a cat and a dog that are both chubby Pets
3 reading is also working with text just in anot...
4 cooking is great and i love making beans with ...
5 my cat likes cheese and my dog likes beans
6 in some way text is a bit boring
7 cooking is stressful when it is too complicated
8 pets can be so cute but they are often a lot o...
9 working with pets would be a dream job
计算余弦相似度:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
k = test_df['text'].tolist()
# Vectorise the data
vec = TfidfVectorizer()
X = vec.fit_transform(k)
# Calculate the pairwise cosine similarities
S = cosine_similarity(X)
# add output to new dataframe
print(len(S))
T = S.tolist()
df = pd.DataFrame.from_records(T)
余弦相似度的输出:
0 1 2 3 4 5 6 7 8 9
0 1.000000 0.204491 0.000000 0.378416 0.110185 0.000000 0.158842 0.000000 0.000000 0.282177
1 0.204491 1.000000 0.072468 0.055438 0.333815 0.327299 0.064935 0.112483 0.000000 0.000000
2 0.000000 0.072468 1.000000 0.000000 0.064540 0.231068 0.000000 0.000000 0.084140 0.000000
3 0.378416 0.055438 0.000000 1.000000 0.110590 0.000000 0.375107 0.097456 0.000000 0.156774
4 0.110185 0.333815 0.064540 0.110590 1.000000 0.205005 0.057830 0.202825 0.000000 0.071145
5 0.000000 0.327299 0.231068 0.000000 0.205005 1.000000 0.000000 0.000000 0.000000 0.000000
6 0.158842 0.064935 0.000000 0.375107 0.057830 0.000000 1.000000 0.114151 0.000000 0.000000
7 0.000000 0.112483 0.000000 0.097456 0.202825 0.000000 0.114151 1.000000 0.000000 0.000000
8 0.000000 0.000000 0.084140 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.185502
9 0.282177 0.000000 0.000000 0.156774 0.071145 0.000000 0.000000 0.000000 0.185502 1.000000
我现在想从两个数据帧创建一个图表,其中我的节点是通过余弦 smiliarty(边)连接的句子。我已经添加了节点,如下所示,但我不确定如何添加边缘?
### Build graph
G = nx.Graph()
# Add node
G.add_nodes_from(test_df['text'].tolist())
# Add edges
G.add_edges_from()
最佳答案
您可以将 df
中的索引和列名称设置为输入数据帧(网络中的节点)中的 text
列,并从中构建一个图表作为使用 nx.from_pandas_adjacency
的邻接矩阵:
df_adj = pd.DataFrame(df.to_numpy(), index=test_df['text'], columns=test_df['text'])
G = nx.from_pandas_adjacency(df_adj)
G.edges(data=True)
EdgeDataView([('i like working with text ', 'i like working with text ', {'weight': 1.0}),
('i like working with text ', 'my favourite colour is blue and i like beans', {'weight': 0.19953178577876396}),
('i like working with text ', 'reading is also working with text just in anot...', {'weight': 0.39853956570404026})
...
关于python - 从相似度矩阵创建 NetworkX 图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64027427/