machine-learning - Kohonen 自组织映射 : Determining the number of neurons and grid size

标签 machine-learning neural-network self-organizing-maps

我有一个大型数据集，我正在尝试使用 SOM 进行聚类分析。数据集巨大(约数十亿条记录)，我不确定神经元的数量和 SOM 网格的大小应该从多少开始。任何有关估计神经元数量和网格大小的 Material 的指针将不胜感激。

谢谢!

最佳答案

引自som_make function documentation som工具箱的

It uses a heuristic formula of 'munits = 5*dlen^0.54321'. The 'mapsize' argument influences the final number of map units: a 'big' map has x4 the default number of map units and a 'small' map has x0.25 the default number of map units.

dlen 是数据集中的记录数

您还可以阅读经典的 WEBSOM，它解决了大型数据集的问题 http://www.cs.indiana.edu/~bmarkine/oral/self-organization-of-a.pdf http://websom.hut.fi/websom/doc/ps/Lagus04Infosci.pdf

请记住， map 大小也是一个参数，也是特定于应用程序的。也就是说，这取决于您想要对生成的集群执行什么操作。大型 map 会产生大量小但“紧凑”的簇(分配给每个簇的记录非常相似)。小 map 产生较少但更通用的集群。 “正确数量的簇”并不存在，尤其是在现实世界的数据集中。这完全取决于您想要检查数据集的详细信息。

关于machine-learning - Kohonen 自组织映射 : Determining the number of neurons and grid size，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19163214/

上一篇：machine-learning - 为什么不通过回归来进行分类呢？

下一篇：machine-learning - K均值算法

相关文章：

python - GridSearchCV 神经元数量

python - 具有神经网络思维模式的简单逻辑回归

matlab - 自组织映射: How to identify clusters from plots?

Matlab SOM 工具箱 U 矩阵可视化

neural-network - 使用 Gekko 的大脑模块，我如何确定使用多少层和什么类型的层来解决深度学习问题？

neural-network - 自组织图和神经气体有什么区别

python - 从哪里开始对两列(标签、文本)数据源进行文本分类？

python - 值错误: Please initialize `TimeDistributed` layer with a `Layer` instance

image - 在图像文本文档中随机生成合成噪声

python - 以有效的方式过滤2 numpy.ndarray中的相似图像