我有一台在 Ubuntu 16.04 上运行 Percona 5.6.34-79.1-1.xenial 的繁忙服务器。它工作得很好,但是,每隔几周,mysqld 就会被内存不足 killer 攻击,而我无法找出原因。
root@master02:~# grep Out /var/log/syslog
Apr 6 13:37:03 master02 kernel: [17420955.874564] Out of memory: Kill process 36138 (mysqld) score 659 or sacrifice child
所以它在 13:37:03 被杀死。
然而,就在 2 秒前,它使用了大约 110 GB RAM(系统有 ~160 GB RAM),还有大约 55 GB 可用空间:
root@master02:~# cat /root/logs/free/free-2017-04-06-13\:36\:01.log
total used free shared buff/cache available
Mem: 165050752 109560372 593508 189240 54896872 54434632
Swap: 0 0 0
root@master02:~# cat /root/logs/free/free-2017-04-06-13\:37\:01.log
total used free shared buff/cache available
Mem: 165050752 109582416 602704 189624 54865632 54412072
Swap: 0 0 0
root@master02:~# cat /root/logs/free/free-2017-04-06-13\:38\:01.log
total used free shared buff/cache available
Mem: 165050752 17982728 92226488 189200 54841536 146007904
Swap: 0 0 0
my.cnf 设置了“innodb-buffer-pool-size = 130G”。我认为 mysqld 在 2 秒内分配了一些额外的 50 GB 空间并被杀死的可能性不大(当然我可能是错的)。
这是一个显示完整 OOM 的 dmesg - 是否存在一些可调整的内存分配问题?如果有任何提示,我将不胜感激。
[17420955.874279] mysqld invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[17420955.874282] mysqld cpuset=/ mems_allowed=0-1
[17420955.874287] CPU: 4 PID: 36138 Comm: mysqld Not tainted 4.4.0-59-generic #80-Ubuntu
[17420955.874288] Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
[17420955.874290] 0000000000000286 00000000dba8c85b ffff88142f4bbaf0 ffffffff813f7583
[17420955.874292] ffff88142f4bbcc8 ffff8827ac8db800 ffff88142f4bbb60 ffffffff8120ad5e
[17420955.874295] ffffffff81cd2dc7 0000000000000000 ffffffff81e67760 0000000000000206
[17420955.874297] Call Trace:
[17420955.874304] [<ffffffff813f7583>] dump_stack+0x63/0x90
[17420955.874307] [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
[17420955.874311] [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
[17420955.874312] [<ffffffff81192b49>] out_of_memory+0x219/0x460
[17420955.874315] [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
[17420955.874317] [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
[17420955.874319] [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
[17420955.874323] [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
[17420955.874326] [<ffffffff811c164d>] ? handle_mm_fault+0xcbd/0x1820
[17420955.874328] [<ffffffff810805a0>] _do_fork+0x80/0x360
[17420955.874329] [<ffffffff81080929>] SyS_clone+0x19/0x20
[17420955.874333] [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[17420955.874343] Mem-Info:
[17420955.874354] active_anon:27160197 inactive_anon:28926 isolated_anon:0
active_file:5497699 inactive_file:7563747 isolated_file:0
unevictable:914 dirty:2486 writeback:0 unstable:0
slab_reclaimable:556865 slab_unreclaimable:45056
mapped:20876 shmem:47414 pagetables:71927 bounce:0
free:154548 free_pcp:64 free_cma:0
[17420955.874357] Node 0 DMA free:15904kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[17420955.874361] lowmem_reserve[]: 0 3706 80497 80497 80497
[17420955.874370] Node 0 DMA32 free:311248kB min:2072kB low:2588kB high:3108kB active_anon:3411180kB inactive_anon:1348kB active_file:4kB inactive_file:8kB unevictable:280kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3835152kB mlocked:280kB dirty:0kB writeback:0kB mapped:524kB shmem:2308kB slab_reclaimable:65844kB slab_unreclaimable:14292kB kernel_stack:1760kB pagetables:13784kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[17420955.874373] lowmem_reserve[]: 0 0 76791 76791 76791
[17420955.874376] Node 0 Normal free:143060kB min:42940kB low:53672kB high:64408kB active_anon:62267420kB inactive_anon:31360kB active_file:7207656kB inactive_file:7550284kB unevictable:3296kB isolated(anon):0kB isolated(file):0kB present:79953920kB managed:78634400kB mlocked:3296kB dirty:3116kB writeback:0kB mapped:18940kB shmem:49312kB slab_reclaimable:838024kB slab_unreclaimable:78600kB kernel_stack:51840kB pagetables:159192kB unstable:0kB bounce:0kB free_pcp:136kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? no
[17420955.874380] lowmem_reserve[]: 0 0 0 0 0
[17420955.874383] Node 1 Normal free:147980kB min:45088kB low:56360kB high:67632kB active_anon:42962188kB inactive_anon:82996kB active_file:14783136kB inactive_file:22704696kB unevictable:80kB isolated(anon):0kB isolated(file):0kB present:83886080kB managed:82565296kB mlocked:80kB dirty:6828kB writeback:0kB mapped:64040kB shmem:138036kB slab_reclaimable:1323592kB slab_unreclaimable:87332kB kernel_stack:48624kB pagetables:114732kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[17420955.874387] lowmem_reserve[]: 0 0 0 0 0
[17420955.874390] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
[17420955.874398] Node 0 DMA32: 13480*4kB (UME) 6994*8kB (UME) 2710*16kB (UME) 714*32kB (UMEH) 354*64kB (UMEH) 225*128kB (UMEH) 107*256kB (UMEH) 52*512kB (UME) 29*1024kB (UMEH) 0*2048kB 0*4096kB = 311248kB
[17420955.874408] Node 0 Normal: 29953*4kB (UMEH) 2985*8kB (UMEH) 1*16kB (H) 0*32kB 2*64kB (H) 2*128kB (H) 2*256kB (H) 0*512kB 1*1024kB (H) 0*2048kB 0*4096kB = 145628kB
[17420955.874416] Node 1 Normal: 36438*4kB (UME) 412*8kB (UMEH) 2*16kB (H) 2*32kB (H) 0*64kB 1*128kB (H) 1*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 150552kB
[17420955.874425] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[17420955.874427] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[17420955.874428] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[17420955.874429] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[17420955.874430] 13109543 total pagecache pages
[17420955.874431] 0 pages in swap cache
[17420955.874440] Swap cache stats: add 0, delete 0, find 0/0
[17420955.874441] Free swap = 0kB
[17420955.874442] Total swap = 0kB
[17420955.874443] 41942941 pages RAM
[17420955.874444] 0 pages HighMem/MovableOnly
[17420955.874444] 680253 pages reserved
[17420955.874445] 0 pages cma reserved
[17420955.874446] 0 pages hwpoisoned
[17420955.874447] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[17420955.874457] [ 677] 0 677 38915 23334 80 3 0 0 systemd-journal
[17420955.874461] [ 698] 0 698 25742 47 17 3 0 0 lvmetad
[17420955.874469] [ 729] 0 729 11545 1219 23 3 0 -1000 systemd-udevd
[17420955.874472] [ 1112] 109 1112 25081 492 19 3 0 0 systemd-timesyn
[17420955.874474] [ 1333] 0 1333 4030 637 12 3 0 0 dhclient
[17420955.874475] [ 1541] 0 1541 6932 547 19 3 0 0 cron
[17420955.874477] [ 1552] 0 1552 365572 5403 79 7 0 0 snapd
[17420955.874479] [ 1559] 102 1559 10725 667 26 3 0 -900 dbus-daemon
[17420955.874480] [ 1570] 0 1570 77473 728 21 3 0 0 lxcfs
[17420955.874482] [ 1576] 101 1576 64099 1008 27 3 0 0 rsyslogd
[17420955.874484] [ 1578] 106 1578 1884 403 9 3 0 0 vnstatd
[17420955.874486] [ 1580] 0 1580 1100 322 8 3 0 0 acpid
[17420955.874488] [ 1582] 0 1582 68674 1100 37 3 0 0 accounts-daemon
[17420955.874489] [ 1584] 0 1584 6511 378 18 3 0 0 atd
[17420955.874491] [ 1593] 0 1593 16380 700 36 3 0 -1000 sshd
[17420955.874493] [ 1595] 0 1595 7157 635 18 3 0 0 systemd-logind
[17420955.874495] [ 1599] 0 1599 7470 139 19 3 0 0 cgmanager
[17420955.874497] [ 1618] 0 1618 3344 116 11 3 0 0 mdadm
[17420955.874498] [ 1622] 0 1622 1306 441 8 3 0 0 iscsid
[17420955.874500] [ 1623] 0 1623 1431 916 8 3 0 -17 iscsid
[17420955.874511] [ 1717] 0 1717 3619 388 12 3 0 0 agetty
[17420955.874513] [ 1718] 0 1718 3665 343 12 3 0 0 agetty
[17420955.874515] [ 1731] 0 1731 5025 653 15 3 0 0 irqbalance
[17420955.874517] [ 1744] 0 1744 69272 719 39 3 0 0 polkitd
[17420955.874519] [ 2484] 1001 2484 11312 218 27 3 0 0 systemd
[17420955.874520] [ 2488] 1001 2488 15805 475 34 3 0 0 (sd-pam)
[17420955.874523] [ 6289] 0 6289 7718 1364 19 3 0 0 tmux
[17420955.874531] [ 6290] 0 6290 5381 924 15 3 0 0 bash
[17420955.874533] [ 6306] 0 6306 2158 395 10 3 0 0 mysqld_safe
[17420955.874537] [36138] 107 36138 43942177 27121187 70971 157 0 0 mysqld
[17420955.874539] [78610] 108 78610 5992 577 16 3 0 0 nrpe
[17420955.874542] [19441] 0 19441 24876 1740 54 3 0 0 sshd
[17420955.874543] [19447] 1008 19447 11312 1147 27 3 0 0 systemd
[17420955.874555] [19449] 1008 19449 15817 487 34 3 0 0 (sd-pam)
[17420955.874557] [19575] 1008 19575 24876 834 51 3 0 0 sshd
[17420955.874558] [19576] 0 19576 14970 933 34 3 0 0 sudo
[17420955.874560] [19577] 0 19577 5388 1343 16 3 0 0 bash
[17420955.874561] [26883] 0 26883 8430 5113 22 5 0 0 mysqld_exporter
[17420955.874564] Out of memory: Kill process 36138 (mysqld) score 659 or sacrifice child
[17420955.890336] Killed process 36138 (mysqld) total-vm:175768708kB, anon-rss:108470860kB, file-rss:13888kB
这是我的.cnf 文件:
[mysql]
# CLIENT #
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld]
# GENERAL #
user = mysql
default-storage-engine = InnoDB
socket = /var/run/mysqld/mysqld.sock
pid-file = /var/run/mysqld/mysqld.pid
# MyISAM #
key-buffer-size = 32M
myisam-recover = FORCE,BACKUP
# SAFETY #
max-allowed-packet = 16M
max-connect-errors = 1000000
# DATA STORAGE #
datadir = /var/lib/mysql/
# BINARY LOGGING #
log-bin = /var/lib/mysql-binlogs/mysql-bin
expire-logs-days = 7
sync-binlog = 1
binlog-format = MIXED
# REPLICATION #
server-id = 1
auto_increment_offset = 1
# total number of master servers
auto_increment_increment = 2
log-slave-updates = 1
relay-log = /var/lib/mysql-binlogs/relay-bin
slave-net-timeout = 60
# CACHES AND LIMITS #
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 4096
# INNODB #
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 512M
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table = 1
innodb-buffer-pool-size = 125G
innodb_large_prefix = 1
innodb_file_format = Barracuda
# LOGGING #
log-error = /var/log/mysql/mysql-error.log
log-queries-not-using-indexes = 0
slow-query-log = 0
slow-query-log-file = /var/log/mysql/mysql-slow.log
# OTHER SETUP #
character_set_server = utf8mb4
collation-server = utf8mb4_unicode_ci
init-connect = 'SET NAMES utf8mb4'
skip-name-resolve = 1
max_connections = 12288
wait_timeout = 120
connect_timeout = 30
interactive_timeout = 120
最佳答案
我假设您将其与 PHP 和 Apache 一起使用。您很可能会在这些应用程序的日志文件之一中找到导致此问题的原因。
记下 MySQL 日志中的故障时间,然后查看该时间段内的其他日志文件,您应该会找到最终引导您找到答案的线索。
我还应该注意到,当系统内存不足时,OOM 会选择关闭哪个服务,即使关闭它也可能不是造成问题的服务。
关于mysqld 内存不足但内存充足(?),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43259136/