合并查看源代码
查看源代码 时间:2021-03-21 阅读:(
)
Hadoop元数据合并异常及解决方法这几天观察了一下StandbyNN上面的日志,发现每次Fsimage合并完之后,StandbyNN通知ActiveNN来下载合并好的Fsimage的过程中会出现以下的异常信息:2014-04-2314:42:54,964ERRORorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer:ExceptionindoCheckpointjava.
net.
SocketTimeoutException:Readtimedoutatjava.
net.
SocketInputStream.
socketRead0(NativeMethod)atjava.
net.
SocketInputStream.
read(SocketInputStream.
java:152)atjava.
net.
SocketInputStream.
read(SocketInputStream.
java:122)atjava.
io.
BufferedInputStream.
fill(BufferedInputStream.
java:235)atjava.
io.
BufferedInputStream.
read1(BufferedInputStream.
java:275)atjava.
io.
BufferedInputStream.
read(BufferedInputStream.
java:334)atsun.
net.
www.
http.
HttpClient.
parseHTTPHeader(HttpClient.
java:687)atsun.
net.
www.
http.
HttpClient.
parseHTTP(HttpClient.
java:633)atsun.
net.
www.
protocol.
http.
HttpURLConnection.
getInputStream(HttpURLConnection.
java:1323)atjava.
net.
HttpURLConnection.
getResponseCode(HttpURLConnection.
java:468)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
TransferFsImage.
doGetUrl(TransferFsImage.
java:268)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
TransferFsImage.
getFileClient(TransferFsImage.
java:247)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
TransferFsImage.
uploadImageFromStorage(TransferFsImage.
java:162)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer.
doCheckpoint(StandbyCheckpointer.
java:174)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer.
access$1100(StandbyCheckpointer.
java:53)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer$CheckpointerThread.
doWork(StandbyCheckpointer.
java:297)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer$CheckpointerThread.
access$300(StandbyCheckpointer.
java:210)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer$CheckpointerThread$1.
run(StandbyCheckpointer.
java:230)atorg.
apache.
hadoop.
security.
SecurityUtil.
doAsLoginUserOrFatal(SecurityUtil.
java:456)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer$CheckpointerThread.
run(StandbyCheckpointer.
java:226)1/5上面的代码贴出来有点乱啊,可以看下下面的图片截图:StandbyCheckpointer于是习惯性的去Google了一下,找了好久也没找到类似的信息.
只能自己解决.
我们通过分析日志发现更奇怪的问题,上次Checkpoint的时间一直都不变(一直都是StandbyNN启动的时候第一次Checkpoint的时间),如下:2014-04-2314:50:54,429INFOorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer:Triggeringcheckpointbecauseithasbeen70164secondssincethelastcheckpoint,whichexceedstheconfiguredinterval600难道这是Hadoop的bug于是我就根据上面的错误信息去查看源码,经过仔细的分析,发现上述的问题都是由StandbyCheckpointer类输出的:privatevoiddoWork(){//Resetcheckpointtimesothatwedon'talwayscheckpoint//onstartup.
lastCheckpointTime=now();while(shouldRun){try{Thread.
sleep(1000*checkpointConf.
getCheckPeriod());}catch(InterruptedExceptionie){}if(!
shouldRun){break;}try{//Wemayhavelostourticketsincelastcheckpoint,loginagain,//justincaseif(UserGroupInformation.
isSecurityEnabled()){UserGroupInformation.
getCurrentUser().
checkTGTAndReloginFromKeytab();}longnow=now();longuncheckpointed=countUncheckpointedTxns();longsecsSinceLast=(now-lastCheckpointTime)/1000;2/5booleanneedCheckpoint=false;if(uncheckpointed>=checkpointConf.
getTxnCount()){LOG.
info("Triggeringcheckpointbecausetherehavebeen"+uncheckpointed+"txnssincethelastcheckpoint,which"+"exceedstheconfiguredthreshold"+checkpointConf.
getTxnCount());needCheckpoint=true;}elseif(secsSinceLast>=checkpointConf.
getPeriod()){LOG.
info("Triggeringcheckpointbecauseithasbeen"+secsSinceLast+"secondssincethelastcheckpoint,which"+"exceedstheconfiguredinterval"+checkpointConf.
getPeriod());needCheckpoint=true;}synchronized(cancelLock){if(now0){connection.
setConnectTimeout(timeout);connection.
setReadTimeout(timeout);}if(connection.
getResponseCode()!
=HttpURLConnection.
HTTP_OK){thrownewHttpGetFailedException("Imagetransferservletat"+url+"failedwithstatuscode"+connection.
getResponseCode()+"\nResponsemessage:\n"+connection.
getResponseMessage(),connection);}DFS_IMAGE_TRANSFER_TIMEOUT_KEY这个时间是由dfs.
image.
transfer.
timeout参数所设置的,默认值为10*60*1000,单位为毫秒.
然后我看了一下这个属性的解释:Timeoutforimagetransferinmilliseconds.
Thistimeoutandtherelateddfs.
image.
transfer.
bandwidthPerSecparametershouldbeconfiguredsuchthatnormalimagetransfercancompletewithinthetimeout.
Thistimeoutpreventsclienthangswhenthesender4/5failsduringimagetransfer,whichisparticularlyimportantduringcheckpointing.
Notethatthistimeoutappliestotheentiretyofimagetransfer,andisnotasockettimeout.
这才发现问题,这个参数的设置和dfs.
image.
transfer.
bandwidthPerSec息息相关,要保证ActiveNN在dfs.
image.
transfer.
timeout时间内把合并好的Fsimage从StandbyNN上下载完,要不然会出现异常.
然后我看了一下我的配置dfs.
image.
transfer.
timeout60000dfs.
image.
transfer.
bandwidthPerSec104857660秒超时,一秒钟拷贝1MB,而我的集群上的元数据有800多MB,显然是不能在60秒钟拷贝完,后来我把dfs.
image.
transfer.
timeout设置大了,观察了一下,集群再也没出现过上述异常信息,而且以前的一些异常信息也由于这个而解决了.
.
本博客文章除特别声明,全部都是原创!
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载.
本文链接:【】()PoweredbyTCPDF(www.
tcpdf.
org)5/5
传统农历新年将至,国人主机商DogYun(狗云)发来了虎年春节优惠活动,1月31日-2月6日活动期间使用优惠码新开动态云7折,经典云8折,新开独立服务器可立减100元/月;使用优惠码新开香港独立服务器优惠100元,并次月免费;活动期间单笔充值每满100元赠送10元,还可以参与幸运大转盘每日抽取5折码,流量,余额等奖品;商家限量推出一款年付特价套餐,共100台,每个用户限1台,香港VPS年付199元...
官方网站:点击访问月神科技官网优惠码:美国优惠方案:CPU:E5-2696V2,机房:国人热衷的优质 CeraNetworks机房,优惠码:3wuZD43F 【过期时间:5.31,季付年付均可用】活动方案:1、美国机房:洛杉矶CN2-GIA,100%高性能核心:2核CPU内存:2GB硬盘:50GB流量:Unmilited端口:10Mbps架构:KVM折后价:15元/月、150元/年传送:购买链接洛...
咖啡主机怎么样?咖啡主机是一家国人主机销售商,成立于2016年8月,之前云服务器网已经多次分享过他家的云服务器产品了,商家主要销售香港、洛杉矶等地的VPS产品,Cera机房 三网直连去程 回程CUVIP优化 本产品并非原生地区本土IP,线路方面都有CN2直连国内,机器比较稳定。咖啡主机目前推出美国洛杉矶弹性轻量云主机仅13元/月起,高防云20G防御仅18元/月;香港弹性云服务器,香港HKBN CN...
查看源代码为你推荐
沙滩捡12块石头价值近百万捡块石头价值一亿 奇石到底应该怎么定价vc组合维生素C和维生素E混合胶囊有用吗,还是分开的好?急救知识纳入考试在中国急救员可以纳入医护人员吗?小度商城小度分期靠谱吗?刘祚天Mc浩然的资料以及百科谁知道?长尾关键词挖掘工具外贸长尾关键词挖掘工具哪个好用51sese.com谁有免费电影网站haole10.comwww.qq10eu.in是QQ网站吗ip查询器查看自己IP的指令m88.comm88.com现在的官方网址是哪个啊 ?m88.com分析软件?
北京域名注册 腾讯云盘 华为云主机 徐正曦 万网空间购买 多线空间 美国凤凰城 深圳域名 国外免费云空间 七牛云存储 mteam SmartAXMT800 phpwind论坛 apachetomcat 低价 studentmain iptables nano 主机系统 网易轻博客 更多