python - BiLSTM_Classifier 中的输入/输出/循环 dropout 层以及它们如何影响模型和预测

标签 python tensorflow nlp lstm dropout

我想对 BiLSTM_Classifier 中的输入/输出/循环丢失层以及它们如何影响模型和预测有一些了解/信息。

# Output drop out
model_out_dp = Sequential()
model_out_dp.add(Embedding(vocab_size, embedding_dim, input_length=maxlen,weights=[embedding_matrix],trainable=False))
model_out_dp.add(Bidirectional(LSTM(64)))
model_out_dp.add(Dropout(0.5))
model_out_dp.add(Dense(8, activation='softmax'))

# input drop out
model_input_dp = Sequential()
model_input_dp.add(Embedding(vocab_size, embedding_dim, input_length=maxlen,weights=[embedding_matrix],trainable=False))
model_input_dp.add(Bidirectional(LSTM(64,dropout=0.5)))
model_input_dp.add(Dense(8, activation='softmax'))

# recurrent drop out
model_rec_dp = Sequential()
model_rec_dp.add(Embedding(vocab_size, embedding_dim, input_length=maxlen,weights=[embedding_matrix],trainable=False))
model_rec_dp.add(Bidirectional(LSTM(64,recurrent_dropout=0.5)))
model_rec_dp.add(Dense(8, activation='softmax'))

最佳答案

首先,我们根据规则将“S”和“A”分成组——我们为每个 S 分配一个唯一的“组”,后跟任意数量(包括没有)的 As。我们还按顺序对每个组中的元素进行编号

df['group'] = (df['First']=='S').cumsum()
df['el'] = df.groupby('group').cumcount()

看起来像这样:

    First    Second                                               group    el
--  -------  -------------------------------------------------  -------  ----
 0  S        Keeping the Secret of Genetic Testing                    1     0
 1  S        What is genetic risk ?                                   2     0
 2  S        Genetic risk refers more to your chance of inh...        3     0
 3  A        3 4|||Rloc-||||||REQUIRED|||-NONE-|||0                   3     1
 4  S        People get certain disease because of genetic ...        4     0
 5  A        1 2|||Wci|||develop|||REQUIRED|||-NONE-|||0              4     1
 6  A        3 4|||Nn|||diseases|||REQUIRED|||-NONE-|||0              4     2
 7  S        How much a genetic change tells us about your ...        5     0
 8  S        If your genetic results indicate that you have...        6     0
 9  A        8 8|||ArtOrDet|||the|||REQUIRED|||-NONE-|||0             6     1

现在我们将多重索引设置为“group”和“el”,然后将“el”unstack放入标题中

df.set_index(['group','el'])['Second'].unstack(level=1)

所以看起来像

  group  0                                                  1                                             2
-------  -------------------------------------------------  --------------------------------------------  -------------------------------------------
      1  Keeping the Secret of Genetic Testing              nan                                           nan
      2  What is genetic risk ?                             nan                                           nan
      3  Genetic risk refers more to your chance of inh...  3 4|||Rloc-||||||REQUIRED|||-NONE-|||0        nan
      4  People get certain disease because of genetic ...  1 2|||Wci|||develop|||REQUIRED|||-NONE-|||0   3 4|||Nn|||diseases|||REQUIRED|||-NONE-|||0
      5  How much a genetic change tells us about your ...  nan                                           nan
      6  If your genetic results indicate that you have...  8 8|||ArtOrDet|||the|||REQUIRED|||-NONE-|||0  nan

这看起来几乎是您想要的,除了您可以根据需要使用 .rename(columns = {...}) 更改列名称,以及 .fillna(0) 如果你想用 0 替换 NaN

关于python - BiLSTM_Classifier 中的输入/输出/循环 dropout 层以及它们如何影响模型和预测,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66204478/

相关文章:

python - 如何生成一个数字的所有可能的除数积?

machine-learning - 使用神经网络学习方波函数

nlp - 如何将 CLAWS7 标签转换为 Penn 标签?

Python 在使用字符串对数据框列进行 .replace 时忽略前导空格

python - 通过悬停在 Bokeh 中突出显示多个 hex_tiles

python - 我如何在 TensorFlow 中使用我自己的图像?

c++ - 可更新的 DAWG 库或未分类数据的 DAWG 构造

python - Spacy - 保存自定义管道

python - 如何通过正则表达式清除某些字符?

python - LSTM 与 keras