AMD cpu 下 Pytorch 多卡并行卡死问题解决
生活随笔
收集整理的這篇文章主要介紹了
AMD cpu 下 Pytorch 多卡并行卡死问题解决
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
dataparallel not working on nvidia gpus and amd cpus
https://github.com/pytorch/pytorch/issues/13045 ? 問題: 多卡運行時, 網絡會卡在那里不能運行. 系統是 AMD Ryzen5 1600x 和 兩張taitanXP 之前兩張卡是2070+taitanXP是可以多卡運行的, 只不過是顯存不一樣大... 看了下日志, 都是下面的錯誤 these error messages were found in the dmesg log:[1118468.873266] nvidia 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x00000000ea13a000 flags=0x0020] [1118468.942145] nvidia 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x00000000ea139068 flags=0x0020] [1118468.942189] nvidia 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x00000000d0000040 flags=0x0020] [1118468.942227] nvidia 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x00000000d00007c0 flags=0x0020] [1118468.942265] nvidia 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x00000000d0001040 flags=0x0020] [1118468.942303] nvidia 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x00000000d0000f40 flags=0x0020] [1118468.942340] nvidia 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x00000000d00016c0 flags=0x0020] [1118468.942377] nvidia 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x00000000d0002040 flags=0x0020]?
搜了一下, 似乎是一個bug . . . 臨時解決辦法: 修改 /etc/default/grub GRUB_DEFAULT=0 GRUB_TIMEOUT_STYLE=hidden GRUB_TIMEOUT=10 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" GRUB_CMDLINE_LINUX="iommu=soft" # 注意修改這一行 ...?
然后 sudo update grub 最后重啟 這樣就可以正常運行了轉載于:https://www.cnblogs.com/JiangOil/p/10513906.html
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的AMD cpu 下 Pytorch 多卡并行卡死问题解决的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: angular input和output
- 下一篇: cURL在Web渗透测试中的应用