合并查看源代码

查看源代码  时间:2021-03-21  阅读:()
Hadoop元数据合并异常及解决方法这几天观察了一下StandbyNN上面的日志,发现每次Fsimage合并完之后,StandbyNN通知ActiveNN来下载合并好的Fsimage的过程中会出现以下的异常信息:2014-04-2314:42:54,964ERRORorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer:ExceptionindoCheckpointjava.
net.
SocketTimeoutException:Readtimedoutatjava.
net.
SocketInputStream.
socketRead0(NativeMethod)atjava.
net.
SocketInputStream.
read(SocketInputStream.
java:152)atjava.
net.
SocketInputStream.
read(SocketInputStream.
java:122)atjava.
io.
BufferedInputStream.
fill(BufferedInputStream.
java:235)atjava.
io.
BufferedInputStream.
read1(BufferedInputStream.
java:275)atjava.
io.
BufferedInputStream.
read(BufferedInputStream.
java:334)atsun.
net.
www.
http.
HttpClient.
parseHTTPHeader(HttpClient.
java:687)atsun.
net.
www.
http.
HttpClient.
parseHTTP(HttpClient.
java:633)atsun.
net.
www.
protocol.
http.
HttpURLConnection.
getInputStream(HttpURLConnection.
java:1323)atjava.
net.
HttpURLConnection.
getResponseCode(HttpURLConnection.
java:468)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
TransferFsImage.
doGetUrl(TransferFsImage.
java:268)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
TransferFsImage.
getFileClient(TransferFsImage.
java:247)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
TransferFsImage.
uploadImageFromStorage(TransferFsImage.
java:162)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer.
doCheckpoint(StandbyCheckpointer.
java:174)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer.
access$1100(StandbyCheckpointer.
java:53)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer$CheckpointerThread.
doWork(StandbyCheckpointer.
java:297)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer$CheckpointerThread.
access$300(StandbyCheckpointer.
java:210)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer$CheckpointerThread$1.
run(StandbyCheckpointer.
java:230)atorg.
apache.
hadoop.
security.
SecurityUtil.
doAsLoginUserOrFatal(SecurityUtil.
java:456)atorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer$CheckpointerThread.
run(StandbyCheckpointer.
java:226)1/5上面的代码贴出来有点乱啊,可以看下下面的图片截图:StandbyCheckpointer于是习惯性的去Google了一下,找了好久也没找到类似的信息.
只能自己解决.
我们通过分析日志发现更奇怪的问题,上次Checkpoint的时间一直都不变(一直都是StandbyNN启动的时候第一次Checkpoint的时间),如下:2014-04-2314:50:54,429INFOorg.
apache.
hadoop.
hdfs.
server.
namenode.
ha.
StandbyCheckpointer:Triggeringcheckpointbecauseithasbeen70164secondssincethelastcheckpoint,whichexceedstheconfiguredinterval600难道这是Hadoop的bug于是我就根据上面的错误信息去查看源码,经过仔细的分析,发现上述的问题都是由StandbyCheckpointer类输出的:privatevoiddoWork(){//Resetcheckpointtimesothatwedon'talwayscheckpoint//onstartup.
lastCheckpointTime=now();while(shouldRun){try{Thread.
sleep(1000*checkpointConf.
getCheckPeriod());}catch(InterruptedExceptionie){}if(!
shouldRun){break;}try{//Wemayhavelostourticketsincelastcheckpoint,loginagain,//justincaseif(UserGroupInformation.
isSecurityEnabled()){UserGroupInformation.
getCurrentUser().
checkTGTAndReloginFromKeytab();}longnow=now();longuncheckpointed=countUncheckpointedTxns();longsecsSinceLast=(now-lastCheckpointTime)/1000;2/5booleanneedCheckpoint=false;if(uncheckpointed>=checkpointConf.
getTxnCount()){LOG.
info("Triggeringcheckpointbecausetherehavebeen"+uncheckpointed+"txnssincethelastcheckpoint,which"+"exceedstheconfiguredthreshold"+checkpointConf.
getTxnCount());needCheckpoint=true;}elseif(secsSinceLast>=checkpointConf.
getPeriod()){LOG.
info("Triggeringcheckpointbecauseithasbeen"+secsSinceLast+"secondssincethelastcheckpoint,which"+"exceedstheconfiguredinterval"+checkpointConf.
getPeriod());needCheckpoint=true;}synchronized(cancelLock){if(now0){connection.
setConnectTimeout(timeout);connection.
setReadTimeout(timeout);}if(connection.
getResponseCode()!
=HttpURLConnection.
HTTP_OK){thrownewHttpGetFailedException("Imagetransferservletat"+url+"failedwithstatuscode"+connection.
getResponseCode()+"\nResponsemessage:\n"+connection.
getResponseMessage(),connection);}DFS_IMAGE_TRANSFER_TIMEOUT_KEY这个时间是由dfs.
image.
transfer.
timeout参数所设置的,默认值为10*60*1000,单位为毫秒.
然后我看了一下这个属性的解释:Timeoutforimagetransferinmilliseconds.
Thistimeoutandtherelateddfs.
image.
transfer.
bandwidthPerSecparametershouldbeconfiguredsuchthatnormalimagetransfercancompletewithinthetimeout.
Thistimeoutpreventsclienthangswhenthesender4/5failsduringimagetransfer,whichisparticularlyimportantduringcheckpointing.
Notethatthistimeoutappliestotheentiretyofimagetransfer,andisnotasockettimeout.
这才发现问题,这个参数的设置和dfs.
image.
transfer.
bandwidthPerSec息息相关,要保证ActiveNN在dfs.
image.
transfer.
timeout时间内把合并好的Fsimage从StandbyNN上下载完,要不然会出现异常.
然后我看了一下我的配置dfs.
image.
transfer.
timeout60000dfs.
image.
transfer.
bandwidthPerSec104857660秒超时,一秒钟拷贝1MB,而我的集群上的元数据有800多MB,显然是不能在60秒钟拷贝完,后来我把dfs.
image.
transfer.
timeout设置大了,观察了一下,集群再也没出现过上述异常信息,而且以前的一些异常信息也由于这个而解决了.
.
本博客文章除特别声明,全部都是原创!
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载.
本文链接:【】()PoweredbyTCPDF(www.
tcpdf.
org)5/5

hostio荷兰10Gbps带宽,10Gbps带宽,€5/月,最低配2G内存+2核+5T流量

成立于2006年的荷兰Access2.IT Group B.V.(可查:VAT: NL853006404B01,CoC: 58365400) 一直运作着主机周边的业务,当前正在对荷兰的高性能AMD平台的VPS进行5折优惠,所有VPS直接砍一半。自有AS208258,vps母鸡配置为Supermicro 1024US-TRT 1U,2*AMD Epyc 7452(64核128线程),16条32G D...

香港2GB内存DIYVM2核(¥50月)香港沙田CN2云服务器

DiyVM 香港沙田机房,也是采用的CN2优化线路,目前也有入手且在使用中,我个人感觉如果中文业务需要用到的话虽然日本机房也是CN2,但是线路的稳定性不如香港机房,所以我们在这篇文章中亲测看看香港机房,然后对比之前看到的日本机房。香港机房的配置信息。CPU内存 硬盘带宽IP价格购买地址2核2G50G2M1¥50/月选择方案4核4G60G3M1¥100/月选择方案4核8G70G3M4¥200/月选择...

DediPath($1.40),OpenVZ架构 1GB内存

DediPath 商家成立时间也不过三五年,商家提供的云服务器产品有包括KVM和OPENVZ架构的VPS主机。翻看前面的文章有几次提到这个商家其中机房还是比较多的。其实对于OPENVZ架构的VPS主机以前我们是遇到比较多,只不过这几年很多商家都陆续的全部用KVM和XEN架构替代。这次DediPath商家有基于OPENVZ架构提供低价的VPS主机。这次四折的促销活动不包括512MB内存方案。第一、D...

查看源代码为你推荐
microcenterGPU和CPU的区别h连锁酒店什么连锁酒店好www.983mm.comwww.47683.combbs.99nets.com做一款即时通讯软件难吗 像hi qq这类的刘祚天你们知道21世纪的DJ分为几种类型吗?(答对者重赏)杰景新特谁给我一个李尔王中的葛罗斯特这个人物的分析?急 ....先谢谢了冯媛甑冯媛甄多大啊?haole16.com国色天香16 17全集高清在线观看 国色天香qvod快播迅雷下载地址pp43.com登录www.bdnpxzl.com怎么进入网站后台啊www.mfav.org邪恶动态图587期 www.zqzj.org
买域名 免费二级域名申请 lamp安装 awardspace l5639 博客主机 cloudstack godaddy支付宝 网通代理服务器 腾讯云分析 爱奇艺vip免费试用7天 卡巴斯基免费试用 移动服务器托管 视频服务器是什么 上海电信测速 电信网络测速器 阿里云手机官网 实惠 网站防护 hdchina 更多