tensorflow - 无法理解 keras.datasets.imdb

标签 tensorflow keras dataset tensorflow-datasets imdb

我有两个问题:

首先，tf.keras.datasets.imdb.get_word_index 的文档说

Retrieves the dictionary mapping word indices back to words.

事实上恰恰相反，

打印(tf.keras.datasets.imdb.get_word_index())

{'fawn': 34701, 'tsukino': 52006, 'nunnery': 52007

我尝试在 TensorFlow 2.0 中运行它

(train_data_raw, train_labels), (test_data_raw, test_labels) = keras.datasets.imdb.load_data()
words2idx = tf.keras.datasets.imdb.get_word_index()
idx2words = {idx:word for word, idx in words2idx.items()}
i = 0
train_ex = [idx2words[x] for x in train_data_raw[0]]
train_ex = ' '.join(train_ex)
print(train_ex)

这会产生一个无意义的字符串

the as you with out themselves powerful lets loves their [...]

我不应该获得有效的电影评论吗？

最佳答案

我做了一些挖掘，发现处理过程中存在一些“偏移”，需要撤消这些偏移才能返回合理的评论语言。我修改了你的行，从原始序列中出现的索引中减去 3(因为默认是以索引 = 3 开始真实单词)，并且第一个字符是虚拟标记(设置为 1)，所以真实文本从位置 2(或 python 中的索引 1)开始。

train_ex = [idx2words[x-3] for x in train_data_raw[0][1:]]

使用上述修改后，我将获得您最初选择的评论的以下内容:

this film was just brilliant casting location scenery story direction everyone's really suited the part they played ...

似乎删除了一些标点符号和大写字母等，但这似乎会返回合理的评论。

我希望这会有所帮助。

关于tensorflow - 无法理解 keras.datasets.imdb，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58635125/

上一篇：pandas - 为每个 CohortGroup 分配适当的 CohortPeriod 计数

下一篇：laravel - 如何将 Laravel 从 5.7 升级到 5.8

c# - 如何在 Java 中使用 C# 数据集？

javascript - 在 javascript 中重组复杂对象数组的最有效方法？

python - 如何在 TF Lite 中添加预处理步骤

tensorflow - 如何将保存的模型从sklearn转换为tensorflow/lite

machine-learning - 如何在 Keras 中训练暹罗网络？

python - Tensorflow Keras 模型 : how to get the best score from a history object

python - 如何使输出图像的大小与原始图像的大小相同以计算CNN中的损失？

python - 在Python中对多维数组应用Mann Whitney U测试并替换xarray数据数组变量的单个值？

python - 如何保存一个热编码器？