docker - Nifi 1.6.0内存泄漏

标签 docker memory-leaks apache-nifi

我们在生产中运行 NiFi 1.6.0 的 Docker 容器,并且必须遇到内存泄漏。

一旦启动,应用程序运行得很好,但是,4-5天后,主机上的内存消耗不断增加。在 NiFi 集群 UI 中检查时,JVM 堆大小几乎没有使用 30% 左右,但操作系统级别的内存却达到 80-90%。

运行 dockerstarts 命令时,我们发现 NiFi docker 容器正在消耗内存。

收集 JMX 指标后,我们发现 RSS 内存不断增长。造成这种情况的潜在原因是什么?在集群对话框的 JVM 选项卡中,年轻 GC 似乎也及时发生,而旧 GC 计数显示为 0。

我们如何识别导致 RSS 内存增长的原因?

最佳答案

您需要在非 Docker 环境中复制它,因为使用 docker,内存为 known to raise .
正如我在“Difference between Resident Set Size (RSS) and Java total committed memory (NMT) for a JVM running in Docker container ”中所解释的,docker 有一些错误(如 issue 10824issue 15020 ),这些错误阻止了 Docker 容器内 Java 进程消耗的内存的准确报告。

这就是为什么像 signalfx/docker-collectd-plugin 这样的插件(两周前)在其 PR -- Pull Request -- 35 中提到“从内存使用百分比指标中扣除缓存数字”:

Currently the calculation for memory usage of a container/cgroup being returned to SignalFX includes the Linux page cache.
This is generally considered to be incorrect, and may lead people to chase phantom memory leaks in their application.

For a demonstration on why the current calculation is incorrect, you can run the following to see how I/O usage influences the overall memory usage in a cgroup:

docker run --rm -ti alpine
cat /sys/fs/cgroup/memory/memory.stat
cat /sys/fs/cgroup/memory/memory.usage_in_bytes
dd if=/dev/zero of=/tmp/myfile bs=1M count=100
cat /sys/fs/cgroup/memory/memory.stat
cat /sys/fs/cgroup/memory/memory.usage_in_bytes

You should see that the usage_in_bytes value rises by 100MB just from creating a 100MB file. That file hasn't been loaded into anonymous memory by an application, but because it's now in the page cache, the container memory usage is appearing to be higher.
Deducting the cache figure in memory.stat from the usage_in_bytes shows that the genuine use of anonymous memory hasn't risen.

The signalFX metric now differs from what is seen when you run docker stats which uses the calculation I have here.
It seems like knowing the page cache use for a container could be useful (though I am struggling to think of when), but knowing it as part of an overall percentage usage of the cgroup isn't useful, since it then disguises your actual RSS memory use.
In a garbage collected application with a max heap size as large, or larger than the cgroup memory limit (e.g the -Xmx parameter for java, or .NET core in server mode), the tendency will be for the percentage to get close to 100% and then just hover there, assuming the runtime can see the cgroup memory limit properly.
If you are using the Smart Agent, I would recommend using the docker-container-stats monitor (to which I will make the same modification to exclude cache memory).

关于docker - Nifi 1.6.0内存泄漏,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53057052/

相关文章:

docker - 为什么我无法通过 Docker 访问 NiFi 流的开放 HTTP 端口?

java - 将大数据流式传输到 Apache Nifi 中的流文件,无需 OOM

apache-nifi - 如何修复 Apache NiFi 中过早过期的流文件?

java - java.lang.NoSuchFieldError:ECS中启动时的SIGNING_REGION不会在本地发生

c++ - 检测 MFC 应用程序中的内存泄漏

c++ - Valgrind 输出带有地址和问号?

java - 在 Spring MVC 应用程序中查找内存泄漏

docker - 特权容器是否遵守CPU限制

docker - gitlab容器的Dockerfile

networking - 使用 Docker Swarm 和覆盖网络进行组播