帮 proxmox ׽ϴ

   
   ȸ 3182   õ 0    

안녕하세요

눈팅만하다가 정말 모르는게 생겨서 조언좀 얻고싶습니다ㅠㅠ

  • 스레드리퍼 PRO 5955WX
  • ASUS PRO WS WRX80E-SAGE SE WIFI
  • SAMSUNG DDR4 64GB * 8

proxmox를 구성했고 ubuntu lxc를 띄워서 열심히 개발을 하고있었습니다.

ubuntu 22.04 - docker.io, podman으로 portainer로 워드프레스를 약 30개정도 돌립니다. 그런 LXC가 2개정도 테스트했습니다.

proxmox만 부팅시킬때는 문제가없는데 ubuntu LXC를 구동시키고 docker container를 실행시키면 약 20~30분정도 또는 몇시간뒤에 아래 오류를 내면서 proxmox가 재부팅되어버립니다.

Feb 25 02:17:01 v4 CRON[35659]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)  
Feb 25 02:17:01 v4 CRON[35660]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)  
Feb 25 02:17:01 v4 CRON[35659]: pam_unix(cron:session): session closed for user root  
Feb 25 02:21:37 v4 kernel: mce: [Hardware Error]: Machine check events logged  
Feb 25 02:21:37 v4 kernel: [Hardware Error]: Corrected error, no action required.  
Feb 25 02:21:37 v4 kernel: [Hardware Error]: CPU:1 (19:8:2) MC1_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd8200000060a0859  
Feb 25 02:21:37 v4 kernel: [Hardware Error]: PPIN: 0x02b68f671f2d007b  
Feb 25 02:21:37 v4 kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a000586  
Feb 25 02:21:37 v4 kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error.  
Feb 25 02:21:37 v4 kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)  
Feb 25 02:24:34 v4 pmxcfs[1191]: [dcdb] notice: data verification successful  
Feb 25 02:31:05 v4 pvedaemon[1316]:  successful auth for user 'root@pam'  
Feb 25 02:46:30 v4 pvedaemon[1317]:  successful auth for user 'root@pam'  
Feb 25 02:52:45 v4 kernel: mce: [Hardware Error]: Machine check events logged  
Feb 25 02:52:45 v4 kernel: [Hardware Error]: Corrected error, no action required.  
Feb 25 02:52:45 v4 kernel: [Hardware Error]: CPU:1 (19:8:2) MC1_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd8200000060a0859  
Feb 25 02:52:45 v4 kernel: [Hardware Error]: PPIN: 0x02b68f671f2d007b  
Feb 25 02:52:45 v4 kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a000581  
Feb 25 02:52:45 v4 kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error.  
Feb 25 02:52:45 v4 kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)
Feb 25 03:03:38 v4 pvedaemon[1315]:  successful auth for user 'root@pam'  
Feb 25 03:10:01 v4 CRON[49309]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)  
Feb 25 03:10:01 v4 CRON[49310]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)  
Feb 25 03:10:01 v4 CRON[49309]: pam_unix(cron:session): session closed for user root  
Feb 25 03:17:01 v4 CRON[51083]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)  
Feb 25 03:17:01 v4 CRON[51084]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)  
Feb 25 03:17:01 v4 CRON[51083]: pam_unix(cron:session): session closed for user root  
Feb 25 03:24:34 v4 pmxcfs[1191]: [dcdb] notice: data verification successful  
Feb 25 03:29:04 v4 kernel: mce: [Hardware Error]: Machine check events logged  
Feb 25 03:29:04 v4 kernel: [Hardware Error]: Corrected error, no action required.  
Feb 25 03:29:04 v4 kernel: [Hardware Error]: CPU:1 (19:8:2) MC1_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd8200000060a0859  
Feb 25 03:29:04 v4 kernel: [Hardware Error]: PPIN: 0x02b68f671f2d007b  
Feb 25 03:29:04 v4 kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a000a98  
Feb 25 03:29:04 v4 kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error.  
Feb 25 03:29:04 v4 kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)  
Feb 25 03:29:08 v4 pvedaemon[1317]:  successful auth for user 'root@pam'  
-- Reboot --


위와같이 Reboot하고 proxmox가 재대로 켜지지 않습니다. 그전에 Reboot되는 이유도 모르겠더라구요


Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error.

이 문구가 의심스러워서 열심히 검색을 해보니까 마더보드의 BIOS를 업데이트해서 문제가해결되었다해서 BIOS도 업데이트했습니다.

하지만 동일현상으로 계속 죽어버리더라구요


왜 그럴까요?



---


현재 해결은 안되었지만 OS를 proxmox가아닌 window로 설치하고 살펴보았습니다. 이벤트뷰어로 확인했는데 kernel power 41이라는 오류로 판명했습니다.

이제부터 해결을 해봐야할 것같습니다.

댓글 많이 남겨주셔서 감사합니다.

ª ϼ ϰ.
2023-02
proxmox Űǰ? lxc Ⱦ ?
     
2023-02
promxox 7.3-3 ϰֽϴ lxc ϴ
     
2023-02
nuc11 7.2-3 Ҷ ƹ µ ٿ׷̵ؼ ׽Ʈغ߰ڳ׿ մϴ 🙇‍️
2023-02
Ŀ ΰ? Ŀ Ʈ ̳?
     
2023-02
Ŀι Linux 5.15.74-1-pve #1 SMP PVE 5.15.74-1 (Mon, 14 Nov 2022 20:17:15 +0100) ̷ε ĿξƮ غýϴ
ѹ ĿξƮ õغ߰ڳ׿
          
dateno1 2023-02
Ŀ ״

kernel.org ֽ stable
/boot config ؼ .config
make oldconfigؼ ߰
make menuconfig
make deb-pkg Ű ؼ ƺ (̷ ߻ ѹ鵵 )

⺻ LTS , ֽ μ ֽ ʿؿ ( )
     
2023-02
Ŀ ̶ֽ Ʈ ׿

apt update && apt install pve-kernel-5.15
2023-02
Feb 25 02:21:37 v4 kernel: [Hardware Error]: Corrected error, no action required. 
Feb 25 02:21:37 v4 kernel: [Hardware Error]: CPU:1 (19:8:2) MC1_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd8200000060a0859 
Feb 25 02:21:37 v4 kernel: [Hardware Error]: PPIN: 0x02b68f671f2d007b 
Feb 25 02:21:37 v4 kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a000586 
Feb 25 02:21:37 v4 kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error. 
Feb 25 02:21:37 v4 kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout) 

*/***CPU ʵǰ... ġ ׵ ..
׳ Ʈ ϵ ʴ ϴ.

ٸ ִ ..
 -- ̿ Ʈ
 -- Ʈ ٸ ġ

̷ ȭ 帮۶... 

÷ ŷ ô° ??
谡 ÷ ϰ ִµ..
Ȱϴ.
     
2023-02
̷ ۵ Ȱ׿

̹ ִ°ſ ε
ȭ ֳ
     
binaryeast 2023-02
ȭ Ʋ X Ẹ ƴµ... ƴ.  帮 µ ȵdz׿.
     
2023-02
۰մϴ! Ȩ غ; AMD ˾ƺ غ ؼ ׽Ʈغֽϴ٤

ΰ ʾƼ AMD ߽ϴ
binaryeast 2023-02
α׸ L1 ij κ 귣ġ Ÿ (BTB) бⰡ ÿ ĪǴ , CPU ҷ ϴ. ϴ 1. CPU/ û , BIOS Ʈ cstate, amd-v, fastboot , proxmox 缳ġ privileged lxc ̳ Ŀ ġغð ׷ ߻Ѵٸ CPU ȯž ƿ. p.s. proxmox ٷ Ŀ ġϽ ƴϽ? lxc Ŀ ̳ʰ ȣƮ 򸮸 Ǵ ˰ ֽϴ. lxc ȿ Ŀ , vm и ϴ ˰ ִµ Ȯغ
     
2023-02
LXC̳ʾȿ VM Ƚϴ! ̰ ϵ Ǵ󱸿. ˷ֽŴ ѹ CPU/ û õڽϴ.
ǽɰ κ̶ fastboot Ƹ ϵ ߴŰϴ. ٸ ͵ ǵ ʾҴµ.
̤̾Ʊ
ڹ 2023-02
Feb 25 02:21:37 v4 kernel: mce: [Hardware Error]: Machine check events logged 

=> MCE => ޸Ʈѷ ƴѰ ˴ϴ.. ==> α ø ˷ּ..

Feb 25 02:21:37 v4 kernel: [Hardware Error]: Corrected error, no action required. 
Feb 25 02:21:37 v4 kernel: [Hardware Error]: CPU:1 (19:8:2) MC1_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd8200000060a0859 
Feb 25 02:21:37 v4 kernel: [Hardware Error]: PPIN: 0x02b68f671f2d007b 
Feb 25 02:21:37 v4 kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a000586 
Feb 25 02:21:37 v4 kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 10, L1 BTB Multi-Match Error. 
Feb 25 02:21:37 v4 kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)

==>  cache level: L1, mem/io: IO, mem-tx ==> L1 ij /޸ I/O / ޸ ۽  ==> CPU ij ޸ ޸ Ʈѷ ׸ ޸
ǽɵ˴ϴ..


̷ α״ 糪 OS ʿ α м ޶ ؾ մϴ..
     
2023-02
ٸ ǰ մϴ. ޸ ִ. CPU, ޸ ֽ RebootǴ ̷ α׸ 󱸿

```
Feb 25 14:46:30 v4 systemd[86368]: Stopped target Main User Target. 
Feb 25 14:46:30 v4 systemd[86368]: Stopped target Basic System. 
Feb 25 14:46:30 v4 systemd[86368]: Stopped target Paths. 
Feb 25 14:46:30 v4 systemd[86368]: Stopped target Sockets. 
Feb 25 14:46:30 v4 systemd[86368]: Stopped target Timers. 
Feb 25 14:46:30 v4 systemd[86368]: dirmngr.socket: Succeeded. 
Feb 25 14:46:30 v4 systemd[86368]: Closed GnuPG network certificate management daemon. 
Feb 25 14:46:30 v4 systemd[86368]: gpg-agent-browser.socket: Succeeded. 
Feb 25 14:46:30 v4 systemd[86368]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers). 
Feb 25 14:46:30 v4 systemd[86368]: gpg-agent-extra.socket: Succeeded. 
Feb 25 14:46:30 v4 systemd[86368]: Closed GnuPG cryptographic agent and passphrase cache (restricted). 
Feb 25 14:46:30 v4 systemd[86368]: gpg-agent-ssh.socket: Succeeded. 
Feb 25 14:46:30 v4 systemd[86368]: Closed GnuPG cryptographic agent (ssh-agent emulation). 
Feb 25 14:46:30 v4 systemd[86368]: gpg-agent.socket: Succeeded. 
Feb 25 14:46:30 v4 systemd[86368]: Closed GnuPG cryptographic agent and passphrase cache. 
Feb 25 14:46:30 v4 systemd[86368]: Removed slice User Application Slice. 
Feb 25 14:46:30 v4 systemd[86368]: Reached target Shutdown. 
Feb 25 14:46:30 v4 systemd[86368]: systemd-exit.service: Succeeded. 
Feb 25 14:46:30 v4 systemd[86368]: Finished Exit the Session. 
Feb 25 14:46:30 v4 systemd[86368]: Reached target Exit the Session. 
Feb 25 14:46:30 v4 systemd[1]: user@0.service: Succeeded. 
Feb 25 14:46:30 v4 systemd[1]: Stopped User Manager for UID 0. 
Feb 25 14:46:30 v4 systemd[1]: Stopping User Runtime Directory /run/user/0... 
Feb 25 14:46:30 v4 systemd[1]: run-user-0.mount: Succeeded. 
Feb 25 14:46:30 v4 systemd[1]: user-runtime-dir@0.service: Succeeded. 
Feb 25 14:46:30 v4 systemd[1]: Stopped User Runtime Directory /run/user/0. 
Feb 25 14:46:30 v4 systemd[1]: Removed slice User Slice of UID 0. 
Feb 25 14:46:33 v4 pmxcfs[1191]: [dcdb] notice: data verification successful 
-- Reboot --
```

ּż մϴ.
          
ڹ 2023-02
https://www.memtest86.com/

ý ޸ ׽Ʈ α׷ մϴ..

USB ̹ ġؼ USB ؼ ˴ϴ..

Ӻ Ͻø ǰ ÿ Ʋ ð ƸԽϴ..

غø 뼳 Ƹ ã ̴ϴ..

޸𸮿 CPU ׽Ʈ ϴ α׷ ս ̶ ֽϴ..

https://netlib.org/benchmark/hpl/

ġϰ HPL α׷ ġ ؼ ˴ϴ..

̰ ư CPU 100% ϸ ְ ޸𸮵 100% ϸ ־ ׽Ʈ մϴ..

޸𸮴 α׷ ž ׽ϴ ưϴ..


α׷ ޸ ҷ ִ ˾ƺ մϴٸ ý Ҿϰų ϸ α׷ ߰ų ý ų ֽϴ..

ϸ ִ ֱ ε ý ֽϴ..


ý α׷ ִµ

https://www.passmark.com/products/burnintest/index.php

https://www.ocbase.com/download

System Burn in Test OCCT ΰ α׷Դϴ..

ư ؼ ̽ ϸ 100% ְ ϴ..

̷ α׷ ϸ ִ ֱ⿡ ý ϸ ý 峯 ֽϴ..

û ⵵ Խϴ..

̷  α׷ ð ̻ OS ġ غø մϴ..
               
2023-02
stress-ng ̿ؼ CPU 32ھ 90%, ޸ 512GB 90% Ʈ ׽Ʈ Ͽ ϴ. 뷫 ð 60 ׽Ʈغýϴ
׷ ׽Ʈ õغ ҽϴ. ǰ߰մϴ!
                    
ڹ 2023-02
Ʈ ׽Ʈ н ߴµ ϸ

帮 ý۰ proxmox ȣȯ ´ ̶ ߰ڳ׿.. (ġ ٸ)

proxmox ʿ ȣȯ ׽Ʈ ڷᰡ ѵ Ȩ ãƺô proxmox 翡 غ Դϴ..

ũ̼ ýۿ OS ġ Ǵµ

׷ ص ư 찡 Ȥ ׷ ʴ 찡 ְ ̰ ڰ ȥ  Ǯ ʽϴ..

ʿ Դϴ..

׸ 翡 Ƹ ȿ ׿..

ش ̹ ʸ ̴ϴ..

׷ Ȥó 𸣴 ־ մϴ..

proxmox threadripper ˻ص   threadripper ý۸ Ȥ ̰  threadripper pro ڴ Ⱥ̳׿..
gmltj 2023-02
ۿ ãƺ...иƼ Ǵ ε.... ҷ ƴ......
     
2023-02
̱ ϴ. ѹ Ȯغڽϴ. ǰ߰մϴ!
2023-02
Ȥ ޸ ִ° ϼ ֽϴ ޸ Ŭ Ϻη 缭 غ

𷯰 ý̸. ʹ. ֽϴ
kyile 2023-02
mв L1̾߱⸦ ּż ش ̽ ƴϰ.. Ȥó ؼ ϴ.
CPUƮ ȵǸ ׷ ־ϴ. ڱ pci ٰų, ޸ ٰų ϴµ ã ֽϴ..
cpu ٽ ð? ϼż غô ׳ Ǵ󱸿. ˾ƺ ū Ĩ з ޶ ̽ , Ȥ ִ ϴ.
2023-04
õغð ᱹ

1. ׽Ʈ
2. κ A/S (̻)
3. Ŀö ü
4. CPU A/S ( ã)

kernel 41 Ȯ
ᱹ CPU ý ϴ.

޾ֽ е鲲 մϴ.


QnA
Page 4397/5606
2015-12   1099332   ް
2014-05   4549499   1
2017-07   3098   ۻ
2020-02   3098   ¦¦
2019-09   3098  
2020-02   3098   𽺿
2017-10   3098   gentoo
2014-04   3098   Ȳ
2019-05   3098   ȿ
2017-05   3098   տ
2016-02   3098   sffbig
2018-01   3098   õ
2015-12   3098   ̻
2020-09   3098   NVLINK
2015-02   3098   ̴
2017-03   3098  
2017-05   3098   Rainwalk
2014-05   3098   °
2020-08   3098   adioshun
2021-04   3098   poiuyt
2015-08   3098   Ǫ
2017-03   3097   cpu