python - 保存在 Tensorflow 模型中的自定义文本预处理

标签 python tensorflow keras nlp

如何编写可以保存作为模型一部分的自定义文本预处理？

假设我想要两个功能:

使用某些功能自动更正字符串输入。此操作后文字可能会发生变化
对字符串输入进行查询扩展，以便结果文本/标记可能包含很少的附加单词(为其训练权重)。

类似这样的事情:

乘飞机i飞往伦敦 -> 乘飞机y飞往伦敦
飞往伦敦 -> 飞往伦敦loc_city

-> 这个标记需要提前在词汇表中，这是可以做到的

第 1 步和/或第 2 步之后，将结果提供给 TextVectorization/Embedding 层？

有标准化回调，但我没有看到使用现有的 tf.string 操作执行此操作的明显方法。

理想情况下，有一个回调函数/层接受字符串(或标记)并映射到另一个字符串(或字符串标记)。

最佳答案

您可以像这样获取字符串的第一个字符:

import tensorflow as tf

class StringLayer(tf.keras.layers.Layer):
  def __init__(self):
    super(StringLayer, self).__init__()

  def call(self, inputs):
    return tf.squeeze(tf.strings.bytes_split(inputs), axis=1).to_tensor()[:, 0]

s = tf.constant([['next_string'], ['some_string']])
layer = StringLayer()
print(layer(s))
# tf.Tensor([b'n' b's'], shape=(2,), dtype=string)

关于python - 保存在 Tensorflow 模型中的自定义文本预处理，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/72992870/

上一篇：Python:连续绘制不同季节的平均工作日

下一篇：reactjs - 无法获取React-select中选定的值

相关文章：

python - 如何在 Keras 中试验自定义二维卷积核？

python - 目录中的 Keras 流标签张量

python - 使用 Python 查找文件列表中的重复次数

python - 如何导出点云以在Matlab中查看

python - 2个列表，并从2个列表中找到最大产品

python - 在 Google App Engine 中，我不应该在模型中使用实例方法吗？

tensorflow - 当我尝试在 jetson tx1 中加载卷积预训练模型时，tensorflow 中出现错误

python - Tensorflow 2.1/Keras - 尝试卡住图形时出现 "output_node is not in graph"错误

numpy - 使用 numpy 和 tensorflow 将原始 CIFAR-10 转换为 CNN 输入

python - "Mini Keras"有没有办法在没有整个 keras 包的情况下从训练有素的 keras 模型中获得预测？