先说一下环境:
一、硬件
8台服务器做的超融合架构,软件存储池,
每台服务器是96G内存,两颗Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz,32线程。
每台服务器是1T SSD加三块4T SATA,无raid卡,做的两副本。
服务器集成2个1G的网口,做端口聚合,对外提供服务。
另加一块四端口千兆网卡,做端口聚合,做内部数据传输。
二、软件
基础系统是Ubuntu 16.04 LTS,基于KVM,libvirt的虚拟化管理平台。
虚拟机:windows7中文专业版,32位,
每台虚拟机分配3071M内存,两颗CPU(2 sockets) 。
libvirt:1.3.1
三、现象
服务器上的windows,不定时的会卡死,多数每天发生一次到两次,少数三四天卡死一次,极少数会间隔时间更久,
多数发生在电脑空闲时间,
能上互联网的虚拟机多发,
此外无其他明显规律。
四、问题排查
1、先是排查了服务器配置,终端的配置,网络
因为这套架构已经在多地部署,并稳定运行多年,排查完客户的现场网络后,此原因基本排除。
2、后来怀疑存储的问题
使用iostat持续检测两台服务器的三个小时。
iostat -x -m 1
从记录下的日志看服务器存储的平均响应时间一直在正常范围,读写负载也不高。
3、检查服务器的日志
默认libvirt的日志是关闭的,编辑/etc/libvirt/libvirtd.conf,打开warnning级别的日志记录。
log_level = 3
log_outputs="3:file:/var/log/libvirt/libvirtd.log"
从/var/log/libvirt/libvirtd.log看libvirt日志:
2017-03-15 06:03:08.894+0000: 13919: info : libvirt version: 1.3.1, package: 1ubuntu10.5 (Stefan Bader <stefan.bader@canonical.com> Thu, 06 Oct 2016 13:07:20 +0200)
2017-03-15 06:03:08.894+0000: 13919: info : hostname: sc7
2017-03-15 06:03:08.894+0000: 13919: warning : virStorageBackendVolOpen:1535 : ignoring permission error for '/media/vmpool/win7_test.img'
2017-03-15 06:03:48.180+0000: 13914: error : qemuMonitorIO:689 : internal error: End of file from monitor
2017-03-15 06:03:48.180+0000: 13914: error : virNetClientProgramDispatchError:177 : Cannot open log file: '/var/log/libvirt/qemu/vm105.log': Device or resource busy
2017-03-15 06:04:22.540+0000: 13918: error : qemuMigrationCheckJobStatus:2641 : operation failed: migration job: unexpectedly failed
2017-03-15 06:04:57.546+0000: 13918: warning : virStorageBackendVolOpen:1535 : ignoring permission error for '/media/vmpool/win7_test.img'
2017-03-15 06:06:12.507+0000: 13916: error : qemuMigrationCheckJobStatus:2641 : operation failed: migration job: unexpectedly failed
2017-03-15 06:11:21.182+0000: 13918: warning : virStorageBackendVolOpen:1535 : ignoring permission error for '/media/vmpool/win7_test.img'
2017-03-15 06:13:08.112+0000: 13920: error : qemuMigrationCheckJobStatus:2641 : operation failed: migration job: unexpectedly failed
2017-03-15 06:15:09.467+0000: 13919: warning : virStorageBackendVolOpen:1535 : ignoring permission error for '/media/vmpool/win7_test.img'
2017-03-15 06:35:17.680+0000: 13914: error : qemuMonitorIO:689 : internal error: End of file from monitor
2017-03-15 06:35:17.680+0000: 13914: error : virNetClientProgramDispatchError:177 : Cannot open log file: '/var/log/libvirt/qemu/vm102.log': Device or resource busy
2017-03-16 00:33:08.324+0000: 13914: error : virNetClientProgramDispatchError:177 : Cannot open log file: '/var/log/libvirt/qemu/vm105.log': Device or resource busy
2017-03-16 02:47:26.712+0000: 13919: warning : virStorageBackendVolOpen:1535 : ignoring permission error for '/media/vmpool/win7_test.img'
2017-03-16 02:52:03.184+0000: 13914: error : qemuMonitorIO:689 : internal error: End of file from monitor
2017-03-16 02:52:03.184+0000: 13914: error : virNetClientProgramDispatchError:177 : Cannot open log file: '/var/log/libvirt/qemu/vm113.log': Device or resource busy
2017-03-16 02:52:26.683+0000: 13919: warning : virStorageBackendVolOpen:1535 : ignoring permission error for '/media/vmpool/win7_test.img'
看到日志中有一些virStorageBackendVolOpen,virNetClientProgramDispatchError,qemuMonitorIO的报错(重复部分略去),但都不是关键问题,不会导致虚拟机死机。
从/var/log/libvirt/qemu/里看虚拟机的日志:
2017-03-16 02:55:46.227+0000: starting up libvirt version: 1.3.1, package: 1ubuntu10.5 (Stefan Bader <stefan.bader@canonical.com> Thu, 06 Oct 2016 13:07:20 +0200), qemu version: 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.8), hostname: sc7
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=spice /usr/bin/kvm-spice -name vm113 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -cpu SandyBridge,+pdpe1gb,+osxsave,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 3072 -realtime mlock=off -smp 2,sockets=1,cores=2,threads=1 -uuid 3fec386e-23ee-4853-b46a-37feb9ee3cf9 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-vm113/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=cd,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/media/vmpool/vm113.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/media/vmdpool/vmd113.qcow2,format=qcow2,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 -drive if=none,id=drive-ide0-1-1,readonly=on -device ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=42 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a6:3a:68,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -spice port=5908,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x6 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21]
warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22]
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29]
warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31]
warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2]
warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3]
warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4]
warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6]
warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7]
warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8]
warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14]
warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15]
warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18]
warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27]
warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21]
warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22]
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29]
warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31]
warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2]
warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3]
warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4]
warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6]
warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7]
warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8]
warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14]
warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15]
warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18]
warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27]
main_channel_link: add main channel client
main_channel_handle_parsed: net test: latency 15.188000 ms, bitrate 120548590 bps (114.964094 Mbps)
inputs_connect: inputs channel client create
red_dispatcher_set_cursor_peer:
red_channel_client_disconnect: rcc=0x55f0084c69c0 (channel=0x55f007faa6f0 type=3 id=0)
red_channel_client_disconnect: rcc=0x7f9120236180 (channel=0x7f912020b990 type=2 id=0)
red_channel_client_disconnect: rcc=0x7f912022ff60 (channel=0x7f912020bf60 type=4 id=0)
red_peer_receive: Connection reset by peer
red_channel_client_disconnect: rcc=0x55f009402710 (channel=0x55f007f9f160 type=1 id=0)
main_channel_client_on_disconnect: rcc=0x55f009402710
snd_receive: Connection reset by peer
red_channel_client_disconnect_dummy: rcc=0x55f00919e910 (channel=0x55f00925ab90 type=6 id=0)
snd_channel_put: SndChannel=0x55f0087d8c50 freed
snd_receive: Connection reset by peer
red_channel_client_disconnect_dummy: rcc=0x55f0092349b0 (channel=0x55f00925a8a0 type=5 id=0)
snd_channel_put: SndChannel=0x55f008af3550 freed
red_client_destroy: destroy client 0x55f009496b80 with #channels=6
red_dispatcher_disconnect_cursor_peer:
red_dispatcher_disconnect_display_peer:
main_channel_link: add main channel client
main_channel_handle_parsed: net test: latency 16.994000 ms, bitrate 115386782 bps (110.041410 Mbps)
inputs_connect: inputs channel client create
中间的warnning信息是libvirt迁移虚拟机时添加的cpu feature不支持。也没有会导致死机的关键的错误。
4、分析虚拟机本身的问题
该虚拟机经过用户的修改之后,使用了多款优化工具进行优化,出于管理的目的,手动调整了一些系统的设置,做了一些权限上的修改。
从windows里拷出崩溃时的内存转储文件。memory.dmp。
运行windbg,按ctrl+s配置symboal地址:
SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols
打开dmp文件:
Microsoft (R) Windows Debugger Version 6.11.0001.404 X86
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\dmp\(98)MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available
Symbol search path is: SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols;SRV*c:\mysymbol* http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows 7 Kernel Version 7601 (Service Pack 1) MP (2 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 7601.18205.x86fre.win7sp1_gdr.130708-1532
Machine Name:
Kernel base = 0x83c13000 PsLoadedModuleList = 0x83d5c4d0
Debug session time: Wed Mar 15 13:11:54.777 2017 (GMT+8)
System Uptime: 0 days 5:00:52.740
Loading Kernel Symbols
...............................................................
................................................................
.........................
Loading User Symbols
Loading unloaded module list
.......
可以看出系统是运行了5个小时之后死机的。
在windbg里执行:
0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 00000061, Clock interrupt time out interval in nominal clock ticks.
Arg2: 00000000, 0.
Arg3: 807a3120, The PRCB address of the hung processor.
Arg4: 00000001, 0.
Debugging Details:
------------------
*** ERROR: Module load completed but symbols could not be loaded for Ntfs.sys
*** ERROR: Module load completed but symbols could not be loaded for qutmdrv.sys
BUGCHECK_STR: CLOCK_WATCHDOG_TIMEOUT_2_PROC
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: 1c
STACK_TEXT:
807fc5e4 83c8e117 00000101 00000061 00000000 nt!KeBugCheckEx+0x1e
807fc620 83c8d763 00026160 00000000 0011a800 nt!KeAccumulateTicks+0x242
807fc660 83c8d610 83c4a2be 2e50d92a 00000000 nt!KeUpdateRunTime+0x145
807fc6b8 83c8ce13 8736e71b 8736e71b 000000d1 nt!KeUpdateSystemTime+0x613
807fc6b8 83c4a2be 8736e71b 8736e71b 000000d1 nt!KeUpdateSystemTimeAssist+0x13
807fc75c 83c93268 94484000 00000000 807fc7b4 nt!KeFlushMultipleRangeTb+0x2d5
807fc770 83c70ca0 00001000 876424e8 00000000 nt!MiFlushPteList+0x22
807fc838 83c9acdb 94485000 00001000 94272250 nt!MmSetAddressRangeModified+0x205
807fc8cc 8c0ebd3f 8763f998 00000000 00000000 nt!CcFlushCache+0x294
WARNING: Stack unwind information not available. Following frames may be wrong.
807fc91c 8c0f2dfc 87604da8 942670f8 00000000 Ntfs+0x7ed3f
807fc990 8c0ecd92 87604da8 87baa428 0c757330 Ntfs+0x85dfc
807fc9f8 83c49c1e 87361020 87baa428 87baa428 Ntfs+0x7fd92
807fca10 843ce20c 8735a300 87baa428 00000000 nt!IofCallDriver+0x63
807fca34 843ce3cb 807fca54 8735a300 00000000 FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x2aa
807fca6c 83c49c1e 8735a300 87baa428 874e9db0 FLTMGR!FltpDispatch+0xc5
807fca84 937684db 807fcaa4 83c49c1e 874e9db0 nt!IofCallDriver+0x63
807fca8c 83c49c1e 874e9db0 87baa428 87baa428 qutmdrv+0x4db
807fcaa4 83e3dbe9 87baa428 87646c98 00000000 nt!IofCallDriver+0x63
807fcac4 83e1c4bb 874e9db0 87646c98 00000000 nt!IopSynchronousServiceTail+0x1f8
807fcb30 83c508c6 87f0a590 807fcbc8 807fcbd0 nt!NtFlushBuffersFile+0x1d7
807fcb30 83c4e7ad 87f0a590 807fcbc8 807fcbd0 nt!KiSystemServicePostCall
807fcbb0 83e22b5d 80000564 807fcbc8 863e7798 nt!ZwFlushBuffersFile+0x11
807fcbd0 83eab328 8ce1a740 00000000 00000000 nt!CmpFileFlush+0x5a
807fcc00 83c9014b 00000000 00000000 863e7798 nt!CmpLazyFlushWorker+0x35
807fcc50 83e1c12b 00000001 b2d8e202 00000000 nt!ExpWorkerThread+0x10d
807fcc90 83cc3559 83c9003e 00000001 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
STACK_COMMAND: kb
SYMBOL_NAME: ANALYSIS_INCONCLUSIVE
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: Unknown_Module
IMAGE_NAME: Unknown_Image
DEBUG_FLR_IMAGE_TIMESTAMP: 0
FAILURE_BUCKET_ID: CLOCK_WATCHDOG_TIMEOUT_2_PROC_ANALYSIS_INCONCLUSIVE
BUCKET_ID: CLOCK_WATCHDOG_TIMEOUT_2_PROC_ANALYSIS_INCONCLUSIVE
Followup: MachineOwner
看到关键的错误信息了:
CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 00000061, Clock interrupt time out interval in nominal clock ticks.
Arg2: 00000000, 0.
Arg3: 807a3120, The PRCB address of the hung processor.
Arg4: 00000001, 0.
CLOCK_WATCHDOG_TIMEOUT
就是这个了。
看msdn上解释:
https://msdn.microsoft.com/en-us/library/windows/hardware/ff557211(v=vs.85).aspx
就是在多处理器的情况中,第二颗处理器没有及时响应系统的中断请求。
此外就没有更多的信息了。
五、解决问题
问题找到了。
尝试的解决办法:
1、将虚拟机改成单CPU
观察了一段时间,故障消失,但这是治标不治本的办法。
2、将虚拟机改成单 socket双核心的模式
(windows7默认最多支持双路cpu,kvm默认分配两颗vcpu时是双路)
#libvirt xml <CPU>
<topology sockets=‘1‘ cores=‘2‘ threads=‘1‘/>
观察无效。
3、bugzilla.redhat.com里发现:
Bug 784836 – new cpu’s flags, to control hyper-v related features
- “hv_relaxed” to depress some ugly Windows watchdog timer, and enable “relaxed” timing
直觉感觉和这个BSOD相关。
libvirt xml里增加:
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
......
.....
<qemu:commandline>
<qemu:arg value='-cpu'/>
<qemu:arg value='host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,kvm=off,hv_vendor_id=xeon'/>
</qemu:commandline>
</domain>
观察一晚上,问题解决。
这时从命令行看qemu的执行参数如下:
libvirt+ 4573 3.1 3.3 6576684 3363956 ? Sl 06:53 26:10 qemu-system-x86_64 -enable-kvm -name vm84 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -cpu SandyBridge,+pdpe1gb,+osxsave,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 3072 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 1bae86e4-4e9e-44cb-a8ae-2a8a87142f75 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-vmtest/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=cd,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/media/vmpool/vmtest.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/media/vmdpool/vmdtest.qcow2,format=qcow2,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 -drive if=none,id=drive-ide0-1-1,readonly=on -device ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=40 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:6c:0f:62,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -spice port=5906,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x6 -device hda-duplex,id=sound0-codec0,bus=sfaxianound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,kvm=off,hv_vendor_id=xeon -msg timestamp=on
可以看到启用了:hv_relaxed 。
六、补充
后来发现libvirt-1.1.1之后开始支持hyper-v的特性,HV开头。
在libvirt xml <feature>里增加:
<hyperv>
<relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='4096'/>
</hyperv>
也可以起到同样的效果。
七、参考资料
https://bugzilla.redhat.com/show_bug.cgi?id=784836
http://www.windowswally.com/how-to-fix-the-clock_watchdog_timeout-error/
https://msdn.microsoft.com/en-us/library/windows/hardware/ff557211(v=vs.85).aspx
http://lists.nongnu.org/archive/html/qemu-devel/2014-02/msg00335.html