안녕하세요
눈팅만하다가 정말 모르는게 생겨서 조언좀 얻고싶습니다ㅠㅠ
- 스레드리퍼 PRO 5955WX
- ASUS PRO WS WRX80E-SAGE SE WIFI
- SAMSUNG DDR4 64GB * 8
proxmox를 구성했고 ubuntu lxc를 띄워서 열심히 개발을 하고있었습니다.
ubuntu 22.04 - docker.io, podman으로 portainer로 워드프레스를 약 30개정도 돌립니다. 그런 LXC가 2개정도 테스트했습니다.
proxmox만 부팅시킬때는 문제가없는데 ubuntu LXC를 구동시키고 docker container를 실행시키면 약 20~30분정도 또는 몇시간뒤에 아래 오류를 내면서 proxmox가 재부팅되어버립니다.
Feb 25 02:17:01 v4 CRON[35659]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Feb 25 02:17:01 v4 CRON[35660]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Feb 25 02:17:01 v4 CRON[35659]: pam_unix(cron:session): session closed for user root Feb 25 02:21:37 v4 kernel: mce: [Hardware Error]: Machine check events logged Feb 25 02:21:37 v4 kernel: [Hardware Error]: Corrected error, no action required. Feb 25 02:21:37 v4 kernel: [Hardware Error]: CPU:1 (19:8:2) MC1_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd8200000060a0859 Feb 25 02:21:37 v4 kernel: [Hardware Error]: PPIN: 0x02b68f671f2d007b Feb 25 02:21:37 v4 kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a000586 Feb 25 02:21:37 v4 kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error. Feb 25 02:21:37 v4 kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout) Feb 25 02:24:34 v4 pmxcfs[1191]: [dcdb] notice: data verification successful Feb 25 02:31:05 v4 pvedaemon[1316]: successful auth for user 'root@pam' Feb 25 02:46:30 v4 pvedaemon[1317]: successful auth for user 'root@pam' Feb 25 02:52:45 v4 kernel: mce: [Hardware Error]: Machine check events logged Feb 25 02:52:45 v4 kernel: [Hardware Error]: Corrected error, no action required. Feb 25 02:52:45 v4 kernel: [Hardware Error]: CPU:1 (19:8:2) MC1_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd8200000060a0859 Feb 25 02:52:45 v4 kernel: [Hardware Error]: PPIN: 0x02b68f671f2d007b Feb 25 02:52:45 v4 kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a000581 Feb 25 02:52:45 v4 kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error. Feb 25 02:52:45 v4 kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout) Feb 25 03:03:38 v4 pvedaemon[1315]: successful auth for user 'root@pam' Feb 25 03:10:01 v4 CRON[49309]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Feb 25 03:10:01 v4 CRON[49310]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r) Feb 25 03:10:01 v4 CRON[49309]: pam_unix(cron:session): session closed for user root Feb 25 03:17:01 v4 CRON[51083]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Feb 25 03:17:01 v4 CRON[51084]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Feb 25 03:17:01 v4 CRON[51083]: pam_unix(cron:session): session closed for user root Feb 25 03:24:34 v4 pmxcfs[1191]: [dcdb] notice: data verification successful Feb 25 03:29:04 v4 kernel: mce: [Hardware Error]: Machine check events logged Feb 25 03:29:04 v4 kernel: [Hardware Error]: Corrected error, no action required. Feb 25 03:29:04 v4 kernel: [Hardware Error]: CPU:1 (19:8:2) MC1_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd8200000060a0859 Feb 25 03:29:04 v4 kernel: [Hardware Error]: PPIN: 0x02b68f671f2d007b Feb 25 03:29:04 v4 kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a000a98 Feb 25 03:29:04 v4 kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error. Feb 25 03:29:04 v4 kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout) Feb 25 03:29:08 v4 pvedaemon[1317]: successful auth for user 'root@pam' -- Reboot --
위와같이 Reboot하고 proxmox가 재대로 켜지지 않습니다. 그전에 Reboot되는 이유도 모르겠더라구요
Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error.
이 문구가 의심스러워서 열심히 검색을 해보니까 마더보드의 BIOS를 업데이트해서 문제가해결되었다해서 BIOS도 업데이트했습니다.
하지만 동일현상으로 계속 죽어버리더라구요
왜 그럴까요?
---
현재 해결은 안되었지만 OS를 proxmox가아닌 window로 설치하고 살펴보았습니다. 이벤트뷰어로 확인했는데 kernel power 41이라는 오류로 판명했습니다.
이제부터 해결을 해봐야할 것같습니다.
댓글 많이 남겨주셔서 감사합니다.
ѹ ĿξƮ õغ߰ڳ
kernel.org ֽ stable
/boot config ؼ .config
make oldconfigؼ ߰
make menuconfig
make deb-pkg Ű ؼ ƺ (̷ ѹ鵵 )
⺻ LTS , ֽ μ ֽ ʿؿ ( )
apt update && apt install pve-kernel-5.15
Feb 25 02:21:37 v4 kernel: [Hardware Error]: CPU:1 (19:8:2) MC1_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd8200000060a0859
Feb 25 02:21:37 v4 kernel: [Hardware Error]: PPIN: 0x02b68f671f2d007b
Feb 25 02:21:37 v4 kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a000586
Feb 25 02:21:37 v4 kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error.
Feb 25 02:21:37 v4 kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)
*/***CPU ʵǰ... ġ ..
׳ Ʈ ϵ ʴ ϴ.
ٸ ִ ..
-- ̿ Ʈ
-- Ʈ ٸ ġ
̷ ȭ 帮۶...
÷ ŷ ô° ??
谡 ÷ ϰ ִµ..
Ȱϴ.
̹ ִ°ſ ε
ȭ ֳ
ΰ ʾƼ AMD ߽ϴ
ǽɰ κ̶ fastboot Ƹ ϵ ߴŰϴ. ٸ ͵ ǵ ʾҴµ.
̤̾Ʊ
=> MCE => Ʈѷ ƴѰ ˴ϴ.. ==> α ø ˷ּ..
Feb 25 02:21:37 v4 kernel: [Hardware Error]: Corrected error, no action required.
Feb 25 02:21:37 v4 kernel: [Hardware Error]: CPU:1 (19:8:2) MC1_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd8200000060a0859
Feb 25 02:21:37 v4 kernel: [Hardware Error]: PPIN: 0x02b68f671f2d007b
Feb 25 02:21:37 v4 kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a000586
Feb 25 02:21:37 v4 kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error.
Feb 25 02:21:37 v4 kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)
==> cache level: L1, mem/io: IO, mem-tx ==> L1 ij / I/O / ۽ ==> CPU ij Ʈѷ
ǽɵ˴ϴ..
̷ α״ 糪 OS ʿ α м ؾ մϴ..
```
Feb 25 14:46:30 v4 systemd[86368]: Stopped target Main User Target.
Feb 25 14:46:30 v4 systemd[86368]: Stopped target Basic System.
Feb 25 14:46:30 v4 systemd[86368]: Stopped target Paths.
Feb 25 14:46:30 v4 systemd[86368]: Stopped target Sockets.
Feb 25 14:46:30 v4 systemd[86368]: Stopped target Timers.
Feb 25 14:46:30 v4 systemd[86368]: dirmngr.socket: Succeeded.
Feb 25 14:46:30 v4 systemd[86368]: Closed GnuPG network certificate management daemon.
Feb 25 14:46:30 v4 systemd[86368]: gpg-agent-browser.socket: Succeeded.
Feb 25 14:46:30 v4 systemd[86368]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Feb 25 14:46:30 v4 systemd[86368]: gpg-agent-extra.socket: Succeeded.
Feb 25 14:46:30 v4 systemd[86368]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Feb 25 14:46:30 v4 systemd[86368]: gpg-agent-ssh.socket: Succeeded.
Feb 25 14:46:30 v4 systemd[86368]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Feb 25 14:46:30 v4 systemd[86368]: gpg-agent.socket: Succeeded.
Feb 25 14:46:30 v4 systemd[86368]: Closed GnuPG cryptographic agent and passphrase cache.
Feb 25 14:46:30 v4 systemd[86368]: Removed slice User Application Slice.
Feb 25 14:46:30 v4 systemd[86368]: Reached target Shutdown.
Feb 25 14:46:30 v4 systemd[86368]: systemd-exit.service: Succeeded.
Feb 25 14:46:30 v4 systemd[86368]: Finished Exit the Session.
Feb 25 14:46:30 v4 systemd[86368]: Reached target Exit the Session.
Feb 25 14:46:30 v4 systemd[1]: user@0.service: Succeeded.
Feb 25 14:46:30 v4 systemd[1]: Stopped User Manager for UID 0.
Feb 25 14:46:30 v4 systemd[1]: Stopping User Runtime Directory /run/user/0...
Feb 25 14:46:30 v4 systemd[1]: run-user-0.mount: Succeeded.
Feb 25 14:46:30 v4 systemd[1]: user-runtime-dir@0.service: Succeeded.
Feb 25 14:46:30 v4 systemd[1]: Stopped User Runtime Directory /run/user/0.
Feb 25 14:46:30 v4 systemd[1]: Removed slice User Slice of UID 0.
Feb 25 14:46:33 v4 pmxcfs[1191]: [dcdb] notice: data verification successful
-- Reboot --
```
ּż մϴ.
ý Ʈ α մϴ..
USB ̹ ġؼ USB ؼ ˴ϴ..
Ӻ Ͻø ǰ ÿ Ʋ ð ƸԽϴ..
غø 뼳 Ƹ ã ̴ϴ..
CPU Ʈ ϴ α ս ̶ ֽϴ..
https://netlib.org/benchmark/hpl/
ġϰ HPL α ġ ؼ ˴ϴ..
̰ ư CPU 100% ϸ ְ 100% ϸ ־ Ʈ մϴ..
α ž ϴ ưϴ..
α ҷ ִ ˾ƺ մϴٸ ý Ҿϰų ϸ α ߰ų ý ų ֽϴ..
ϸ ִ ֱ ε ý ֽϴ..
ý α ִµ
https://www.passmark.com/products/burnintest/index.php
https://www.ocbase.com/download
System Burn in Test OCCT ΰ αԴϴ..
ư ؼ ̽ ϸ 100% ְ ϴ..
̷ α ϸ ִ ֱ ý ϸ ý 峯 ֽϴ..
û Խϴ..
̷ α ð ̻ OS ġ غø մϴ..
Ʈ õغ ҽϴ. ǰ߰մϴ!
帮 ý۰ proxmox ȣȯ ´ ̶ ߰ڳ.. (ġ ٸ)
proxmox ʿ ȣȯ Ʈ ڷᰡ ѵ Ȩ ãƺô proxmox 翡 غ Դϴ..
ũ̼ ýۿ OS ġ Ǵµ
ص ư 찡 Ȥ ʴ 찡 ְ ̰ ڰ ȥ Ǯ ʽϴ..
ʿ Դϴ..
翡 Ƹ ȿ ..
ش ̹ ʸ ̴ϴ..
Ȥó ־ մϴ..
proxmox threadripper ˻ص threadripper ý۸ Ȥ ̰ threadripper pro ڴ Ⱥ̳..
ý̸. ʹ. ֽϴ
CPUƮ ȵǸ ־ϴ. ڱ pci ٰų, ٰų ϴµ ã ֽϴ..
cpu ٽ ð? ϼż غô ׳ Ǵ. ˾ƺ ū Ĩ з ̽ , Ȥ ִ ϴ.
1. Ʈ
2. κ A/S (̻)
3. Ŀö ü
4. CPU A/S ( ã)
kernel 41 Ȯ
ᱹ CPU ý ϴ.
ֽ е鲲 մϴ.