日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

hadoop集群崩溃恢复记录

發布時間:2024/4/15 编程问答 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 hadoop集群崩溃恢复记录 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一.崩潰原因

搭建的是一個hadoop測試集群,所以將數據備份參數設置為dfs.replication=1,這樣如果有一臺datanode損壞的話,數據就會失去。但不幸的是,剛好就有一臺機器由于負載過高,導致數據操壞。進而后面需要重啟整個hadoop集群,重啟后啟動namenode啟動不了。報如下錯誤:

?

Java代碼 ?
  • FSNamesystem?initialization?failed?saveLeases?found?path????/tmp/xxx/aaa.txt?but?no?matching?entry?in?namespace.??
  • FSNamesystem initialization failed saveLeases found path /tmp/xxx/aaa.txt but no matching entry in namespace.

    ?

    二.修復namenode?

    ?

    hadoop 集群崩潰了. 導致namenode啟動不了.

    ?

    1. 刪除 namenode主節點的metadata配置目錄

    rm -fr /data/hadoop-tmp/hadoop-hadoop/dfs/name

    ?

    2. 啟動secondnamenode

    使用start-all.sh命令啟動secondnamenode,namenode的啟動不了不管

    ?

    3. 從secondnamenode恢復

    使用命令: hadoop namenode -importCheckpoint

    ?

    ?

    恢復過程中,發現數據文件有些已經損壞(因為dfs.replication=1),所以一直無法退出安全模式(safemode),一直報如下提示:

    ?

    Java代碼 ?
  • The?ratio?of?reported?blocks?0.8866?has?not?reached?the?threshold?0.9990.?Safe?mode?will?be?turned?off?automatically.??
  • The ratio of reported blocks 0.8866 has not reached the threshold 0.9990. Safe mode will be turned off automatically.

    ?

    ?

    4.強制退出safemode

    ?

    ?

    Java代碼 ?
  • hadoop?dfsadmin?-safemode?leave??
  • hadoop dfsadmin -safemode leave

    ?

    最后啟動成功,查看hdfs網頁報警告信息:

    ?

    ?

    Java代碼 ?
  • WARNING?:?There?are?about?257?missing?blocks.?Please?check?the?log?or?run?fsck.??
  • WARNING : There are about 257 missing blocks. Please check the log or run fsck.

    ?

    ?

    5.檢查損壞的hdfs文件列表

    ?

    使用命令可以打印出損壞的文件列表:?

    ?

    Java代碼 ?
  • ./hadoop?fsck?/??
  • ./hadoop fsck /

    ?打印結果:

    ?

    ?

    ?

    Java代碼 ?
  • /user/hive/warehouse/pay_consume_orgi/dt=2011-06-28/consume_2011-06-28.sql:?MISSING?1?blocks?of?total?size?1250990?B.. ??
  • /user/hive/warehouse/pay_consume_orgi/dt=2011-06-29/consume_2011-06-29.sql:?CORRUPT?block?blk_977550919055291594 ??
  • ??
  • /user/hive/warehouse/pay_consume_orgi/dt=2011-06-29/consume_2011-06-29.sql:?MISSING?1?blocks?of?total?size?1307147?B..................Status:?CORRUPT ??
  • ?Total?size:????235982871209?B ??
  • ?Total?dirs:????1213??
  • ?Total?files:???1422??
  • ?Total?blocks?(validated):??????4550?(avg.?block?size?51864367?B) ??
  • ??******************************** ??
  • ??CORRUPT?FILES:????????277??
  • ??MISSING?BLOCKS:???????509??
  • ??MISSING?SIZE:?????????21857003415?B ??
  • ??CORRUPT?BLOCKS:???????509??
  • ??********************************??
  • /user/hive/warehouse/pay_consume_orgi/dt=2011-06-28/consume_2011-06-28.sql: MISSING 1 blocks of total size 1250990 B.. /user/hive/warehouse/pay_consume_orgi/dt=2011-06-29/consume_2011-06-29.sql: CORRUPT block blk_977550919055291594/user/hive/warehouse/pay_consume_orgi/dt=2011-06-29/consume_2011-06-29.sql: MISSING 1 blocks of total size 1307147 B..................Status: CORRUPTTotal size: 235982871209 BTotal dirs: 1213Total files: 1422Total blocks (validated): 4550 (avg. block size 51864367 B)********************************CORRUPT FILES: 277MISSING BLOCKS: 509MISSING SIZE: 21857003415 BCORRUPT BLOCKS: 509********************************

    沒有冗余備份,只能刪除損壞的文件,使用命令:

    Java代碼 ?
  • ./hadoop?fsck?--delete??
  • ./hadoop fsck --delete

    ?

    ?

    三.總結

    ?

    一定需要將你的secondnamenode及namenode分開在不同兩臺機器運行,增加namenode的容錯性。以便在集群崩潰時可以從secondnamenode恢復數據.

    轉載于:https://www.cnblogs.com/JohnLiang/archive/2011/11/10/2244572.html

    總結

    以上是生活随笔為你收集整理的hadoop集群崩溃恢复记录的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。