machine-learning - 强化学习序列决策中的平稳性概念

标签 machine-learning artificial-intelligence reinforcement-learning

以下是 Stuart Russel 和 Peter Norvig 所著的《人工智能》一书《现代方法》中的顺序决策问题的文本片段。第十七章第17.1节

Stationarity for preferences means the following:

if two state sequences [s0, s1, s2, . . .] and [s0',s1', s2', . . .] begin with the same state (i.e., s0 =s01), then the two sequences should be preference-ordered the same way as the sequences [s1, s2, . . .] and [s1', s2', . . .].

In English, this means that if you prefer one future to another starting tomorrow, then you should still prefer that future if it were to start today instead.

我很难理解最后一句话。

用英语来说，这意味着如果您更喜欢从明天开始的一个 future 而不是另一个 future ，那么如果它从今天开始，您仍然应该更喜欢那个 future 。

请 eloboarte 解释一下。

最佳答案

维基百科关于平稳性的另一个定义可能有助于理解这个想法:

In mathematics and statistics, a stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time.

关键概念是随着时间的推移不会改变。因此，应用于偏好的情况下，偏好应该是相同的，与做出的时间无关。也就是说，如果您在第 2 天(明天)或第 1 天(今天)，对第 3 天的偏好应该是相同的。

关于machine-learning - 强化学习序列决策中的平稳性概念，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51626985/

上一篇：python-3.x - ARIMA 预测 - 多个模型

下一篇：python - 如何在 Tensorflow 中将 2 个图像发送到 1 个网络并计算对比损失？

java - 在找到解决方案之前 BFS 队列为空

algorithm - 开发西洋跳棋(跳棋)引擎，如何开始？

machine-learning - 什么是 epsilon/k 它是如何出现在 epsilon 贪婪算法中的

matlab - 强化学习

python - 机器学习/NLP 与关键字搜索以将非结构化数据转换为结构化数据

python - 使用预训练模型在 tensorflow 中训练新模型

C++ 强化学习库

python - Blender 和 Tensorflow 之间的通信

python - 使用 YellowBrick 的分类报告