launchlocalhost

localhost  时间:2021-05-20  阅读:()
CS246:MiningMassiveDatasetsWinter2014ProblemSet0Due9:30amJanuary14,2014GeneralInstructionsThishomeworkistobecompletedindividually(nocollaborationisallowed).
Also,youarenotallowedtouseanylatedaysforthehomework.
Thishomeworkisworth1%ofthetotalcoursegrade.
ThepurposeofthishomeworkistogetyoustartedwithHadoop.
Hereyouwilllearnhowtowrite,compile,debugandexecuteasimpleHadoopprogram.
FirstpartofthehomeworkservesasatutorialandthesecondpartasksyoutowriteyourownHadoopprogram.
Section1describesthevirtualmachineenvironment.
Insteadofthevirtualmachine,youarewelcometosetupyourownpseudo-distributedorfullydistributedclusterifyoupre-fer.
AnyversionofHadoopthatisatleast1.
0willsuce.
(Foraneasywaytosetupacluster,tryClouderaManager:http://archive.
cloudera.
com/cm4/installer/latest/cloudera-manager-installer.
bin.
)Ifyouchoosetosetupyourowncluster,youarere-sponsibleformakingsuretheclusterisworkingproperly.
TheTAswillbeunabletohelpyoudebugcongurationissuesinyourowncluster.
Section2explainshowtousetheEclipseenvironmentinthevirtualmachine,includinghowtocreateaproject,howtorunjobs,andhowtodebugjobs.
Section2.
5givesanend-to-endexampleofcreatingaproject,addingcode,building,running,anddebuggingit.
Section3istheactualhomeworkassignment.
Therearenodeliverablesforsections1and2.
Insection3,youareaskedtowriteandsubmityourownMapReducejob.
Thishomeworkrequiresyoutouploadthecodeandhand-inaprint-outoftheoutputforSection3.
Regular(non-SCPD)studentsshouldsubmithardcopiesoftheanswers(Section3)eitherinclassorinthesubmissionbox(seecoursewebsiteforlocation).
Forpapersubmis-sion,pleasellthecoversheetandsubmititasafrontpagewithyouranswers.
Youshoulduploadyoursourcecodeandanyotherlesyouused.
SCPDstudentsshouldsubmittheiranswersthroughSCPDandalsouploadthecode.
ThesubmissionmustincludetheanswerstoSection3,thecoversheetandtheusualSCPDrout-ingform(http://scpd.
stanford.
edu/generalInformation/pdf/SCPD_HomeworkRouteForm.
pdf).
CoverSheet:http://cs246.
stanford.
edu/cover.
pdfUploadLink:http://snap.
stanford.
edu/submit/CS246:MiningMassiveDatasets-ProblemSet02Questions1SettingupavirtualmachineDownloadandinstallVirtualBoxonyourmachine:http://virtualbox.
org/wiki/DownloadsDownloadtheClouderaQuickstartVMathttp://www.
cloudera.
com/content/dev-center/en/home/developer-admin-resources/quickstart-vm.
htmlUncompresstheVMarchive.
Itiscompressedwith7-Zip.
Ifneeded,youcandownloadatooltouncompressthearchiveathttp://www.
7-zip.
org/.
StartVirtualBoxandclickImportAppliance.
Clickthefoldericonbesidethelocationeld.
Browsetotheuncompressedarchivefolder,selectthe.
ovfle,andclicktheOpenbutton.
ClicktheContinuebutton.
ClicktheImportbutton.
Yourvirtualmachineshouldnowappearintheleftcolumn.
SelectitandclickonStarttolaunchit.
Usernameandpasswordare"cloudera"and"cloudera".
Optional:Openthenetworkpropertiesforthevirtualmachine.
ClickontheAdapter2tab.
EnabletheadapterandselectHost-onlyAdapter.
Ifyoudothisstep,youwillbeabletoconnecttotherunningvirtualmachinefromthehostOSat192.
168.
56.
101.
VirtualmachineincludesthefollowingsoftwareCentOS6.
2JDK6(1.
6.
032)Hadoop2.
0.
0Eclipse4.
2.
6(Juno)Theloginuseriscloudera,andthepasswordforthataccountiscloudera.
2RunningHadoopjobsGenerallyHadoopcanberuninthreemodes.
1.
Standalone(orlocal)mode:Therearenodaemonsusedinthismode.
HadoopusesthelocallesystemasansubstituteforHDFSlesystem.
Thejobswillrunasifthereis1mapperand1reducer.
CS246:MiningMassiveDatasets-ProblemSet032.
Pseudo-distributedmode:Allthedaemonsrunonasinglemachineandthissettingmimicsthebehaviorofacluster.
AllthedaemonsrunonyourmachinelocallyusingtheHDFSprotocol.
Therecanbemultiplemappersandreducers.
3.
Fully-distributedmode:ThisishowHadooprunsonarealcluster.
InthishomeworkwewillshowyouhowtorunHadoopjobsinStandalonemode(veryusefulfordevelopinganddebugging)andalsoinPseudo-distributedmode(tomimicthebehaviorofaclusterenvironment).
2.
1CreatingaHadoopprojectinEclipse(ThereisapluginforEclipsethatmakesitsimpletocreateanewHadoopprojectandexecuteHadoopjobs,butthepluginisonlywellmaintainedforHadoop1.
0.
4,whichisaratheroldversionofHadoop.
Thereisaprojectathttps://github.
com/winghc/hadoop2x-eclipse-pluginthatisworkingtoupdatethepluginforHadoop2.
0.
Youcantryitoutifyoulike,butyourmilagemayvary.
)Tocreateaproject:1.
Openorcreatethe~/.
m2/settings.
xmlleandmakesureithasthefollowingcon-tents:standardextrarepostruecentralhttp://repo.
maven.
apache.
org/maven2/truetrueclouderaCS246:MiningMassiveDatasets-ProblemSet04https://repository.
cloudera.
com/artifactory/clouderarepostruetrue2.
OpenEclipseandselectFile→New→Project.
.
.
.
3.
ExpandtheMavennode,selectMavenProject,andclicktheNext>button.
4.
Onthenextscreen,clicktheNext>button.
5.
Onthenextscreen,whenthearchetypeshaveloaded,selectmaven-archetype-quickstartandclicktheNext>button.
6.
Onthenextscreen,enteragroupnameintheGroupIdeld,andenteraprojectnameintheArtifactId.
ClicktheFinishbutton.
7.
Inthepackageexplorer,expandtheprojectnodeanddouble-clickthepom.
xmlletoopenit.
8.
Replacethecurrent"dependencies"sectionwiththefollowingcontent:jdk.
toolsjdk.
tools1.
6org.
apache.
hadoophadoophdfs2.
0.
0cdh4.
0.
0org.
apache.
hadoophadoopauth2.
0.
0cdh4.
0.
0CS246:MiningMassiveDatasets-ProblemSet05org.
apache.
hadoophadoopcommon2.
0.
0cdh4.
0.
0org.
apache.
hadoophadoopcore2.
0.
0mr1cdh4.
0.
1junitjunitdep4.
8.
2org.
apache.
hadoophadoophdfsorg.
apache.
hadoophadoopauthorg.
apache.
hadoophadoopcommonorg.
apache.
hadoophadoopcorejunitjunit4.
10testCS246:MiningMassiveDatasets-ProblemSet06org.
apache.
maven.
pluginsmavencompilerplugin2.
11.
61.
69.
Savethele.
10.
Right-clickontheprojectnodeandselectMaven→UpdateProject.
Youcannowcreateclassesinthesrcdirectory.
Afterwritingyourcode,buildtheJARlebyright-clickingontheprojectnodeandselectingRunAs→Maveninstall.
2.
2RunningHadoopjobsinstandalonemodeAftercreatingaproject,addingsourcecode,andbuildingtheJARleasoutlinedabove,theJARlewillbelocatedat/workspace//targetdirectory.
Openaterminalandrunthefollowingcommand:hadoopjar~/workspace//target/-0.
0.
1-SNAPSHOT.
jar\-Dmapped.
task.
tracker=local-Dfs.
defaultFS=localYouwillseealloftheoutputfromthemapandreducetasksintheterminal.
2.
3RunningHadoopjobsinpseudo-distributedmodeOpenaterminalandrunthefollowingcommand:hadoopjar~/workspace//target/-0.
0.
1-SNAPSHOT.
jarToseeallrunningjobs,runthefollowingcommand:hadoopjob-listTokillarunningjob,ndthejob'sIDandthenrunthefollowingcommand:hadoopjob-killCS246:MiningMassiveDatasets-ProblemSet072.
4DebuggingHadoopjobsTodebuganissuewithajob,theeasiestapproachistoaddprintstatementsintothesourceleandrunthejobinstandalonemode.
Theprintstatementswillappearintheterminaloutput.
Whenrunningyourjobinpseudo-distributedmode,theoutputfromthejobisloggedinthetasktracker'slogles,whichcanbeaccessedmosteasilybypointingawebbrowsertoport50030oftheserver.
Fromthejobtrackerwebpage,youcandrilldownintothefailingjob,thefailingtask,thefailedattempt,andnallythelogles.
Notethatthelogsforstdoutandstderrareseparated,whichcanbeusefulwhentryingtoisolatespecicdebuggingprintstatements.
IfyouenabledthesecondnetworkadapterintheVMsetup,youcanpointyourlocalbrowsertohttp://192.
168.
56.
101:50030/toaccessthejobtrackerpage.
Note,though,thatwhenyoufollowlinksthatleadtothetasktrackerwebpage,thelinkspointtolocalhost.
locadomain,whichmeansyourbrowserwillreturnapagenotfounderror.
Sim-plyreplacelocalhost.
locadomainwith192.
168.
56.
101intheURLbarandpressentertoloadthecorrectpage.
2.
5ExampleprojectInthissectionyouwillcreateanewEclipseHadoopproject,compile,andexecuteit.
Theprogramwillcountthefrequencyofallthewordsinagivenlargetextle.
Inyourvirtualmachine,Hadoop,JavaenvironmentandEclipsehavealreadybeenpre-installed.
Editthe~/.
m2/settings.
xmlleasoutlinedabove.
SeeFigure1Figure1:CreateaHadoopProject.
OpenEclipseandcreateanewprojectasoutlinedabove.
SeeFigures2-9.
CS246:MiningMassiveDatasets-ProblemSet08Figure2:CreateaHadoopProject.
Figure3:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet09Figure4:CreateaHadoopProject.
Figure5:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet010Figure6:CreateaHadoopProject.
Figure7:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet011Figure8:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet012Figure9:CreateaHadoopProject.
Theprojectwillcontainastubsourceleinthesrc/main/javadirectorythatwewillnotuse.
Instead,createanewclasscalledWordCount.
FromtheFilemenu,selectNew→Class.
SeeFigure10Figure10:Createjavale.
Onthenextscreen,enterthepackagename(e.
g,thegroupIDplustheprojectname)inthePackageeld.
EnterWordCountastheName.
SeeFigure11.
CS246:MiningMassiveDatasets-ProblemSet013Figure11:Createjavale.
IntheSuperclasseld,enterConfiguredandclicktheBrowsebutton.
Fromthepop-upwindowselectCongured—org.
apache.
hadoop.
confandclicktheOKbutton.
SeeFigure12.
CS246:MiningMassiveDatasets-ProblemSet014Figure12:Createjavale.
IntheInterfacessection,clicktheAddbutton.
Fromthepop-upwindowselectTool—org.
apache.
hadoop.
utilandclicktheOKbutton.
SeeFigure13.
CS246:MiningMassiveDatasets-ProblemSet015Figure13:Createjavale.
Checktheboxesforpublicstaticvoidmain(Stringargs[])andInheritedabstractmeth-odsandclicktheFinishbutton.
SeFigure14CS246:MiningMassiveDatasets-ProblemSet016Figure14:CreateWordCount.
java.
YouwillnowhavearoughskeletonofaJavaleasinFigure15.
YoucannowaddcodetothisclasstoimplementyourHadoopjob.
CS246:MiningMassiveDatasets-ProblemSet017Figure15:CreateWordCount.
java.
Ratherthanimplementajobfromscratch,copythecontentsfromhttp://snap.
stanford.
edu/class/cs246-data-2014/WordCount.
javaandpasteitintotheWordCount.
javale.
Becarefultoleavethepackagestatementatthetopintact.
SeeFigure16.
ThecodeinWordCount.
javacalculatesthefrequencyofeachwordinagivendataset.
CS246:MiningMassiveDatasets-ProblemSet018Figure16:CreateWordCount.
java.
Buildtheprojectbyright-clickingtheprojectnodeandselectingRunAs→Maveninstall.
SeeFigure17.
CS246:MiningMassiveDatasets-ProblemSet019Figure17:CreateWordCount.
java.
DownloadtheCompleteWorksofWilliamShakespearefromProjectGutenbergathttp://www.
gutenberg.
org/cache/epub/100/pg100.
txt.
Openaterminalandchangetothedirectorywherethedatasetwasstored.
Runthecommand:hadoopjar~/workspace/wordcount/target/wordcount-0.
0.
1-SNAPSHOT.
jar\edu.
stanford.
cs246.
wordcount.
WordCount-Dmapred.
job.
tracker=local\-Dfs.
defaultFS=localdatasetoutputCS246:MiningMassiveDatasets-ProblemSet020SeeFigure18Figure18:RunWordCountjob.
Ifthejobsucceeds,youwillseeanoutputdirectoryinthecurrentdirectorythatcontainsalecalledpart-00000.
Thepart-00000lecontainstheoutputfromthejob.
SeeFigure19Figure19:RunWordCountjob.
Runthecommand:hadoopfs-lsThecommandwilllistthecontentsofyourhomedirectoryinHDFS,whichshouldbeempty,resultinginnooutput.
Runthecommand:hadoopfs-copyFromLocalpg100.
txttocopythedatasetfolderintoHDFS.
Runthecommand:hadoopfs-lsCS246:MiningMassiveDatasets-ProblemSet021again.
Youshouldseethedatasetdirectorylisted,asinFigure20indicatingthatthedatasetisinHDFS.
Figure20:RunWordCountjob.
Runthecommand:hadoopjar~/workspace/WordCount/target/WordCount-0.
0.
1-SNAPSHOT.
jar\edu.
stanford.
cs246.
wordcount.
WordCountpg100.
txtoutputSeeFigure21.
Ifthejobfails,youwillseeamessageindicatingthatthejobfailed.
Otherwise,youcanassumethejobsucceeded.
Figure21:RunWordCountjob.
Runthecommand:hadoopfs-lsoutputYoushouldseeanoutputleforeachreducer.
Sincetherewasonlyonereducerforthisjob,youshouldonlyseeonepart-*le.
Notethatsometimestheleswillbecalledpart-NNNNN,andsometimesthey'llbecalledpart-r-NNNNN.
SeeFigure22Figure22:RunWordCountjob.
Runthecommand:hadoopfs-catoutput/part\*|headYoushouldseethesameoutputaswhenyouranthejoblocally,asshowninFigure23CS246:MiningMassiveDatasets-ProblemSet022Figure23:RunWordCountjob.
Toviewthejob'slogs,openthebrowserintheVMandpointittohttp://localhost:50030asinFigure24.
Figure24:ViewWordCountjoblogs.
Clickonthelinkforthecompletedjob.
SeeFigure25.
CS246:MiningMassiveDatasets-ProblemSet023Figure25:ViewWordCountjoblogs.
Clickthelinkforthemaptasks.
SeeFigure26.
CS246:MiningMassiveDatasets-ProblemSet024Figure26:ViewWordCountjoblogs.
Clickthelinkfortherstattempt.
SeeFigure27.
CS246:MiningMassiveDatasets-ProblemSet025Figure27:ViewWordCountjoblogs.
Clickthelinkforthefulllogs.
SeeFigure28.
CS246:MiningMassiveDatasets-ProblemSet026Figure28:ViewWordCountjoblogs.
2.
6UsingyourlocalmachinefordevelopmentIfyouenabledthesecondnetworkadapter,youcanuseyourownlocalmachineforde-velopment,includingyourlocalIDE.
Ifordertodothat,you'llneedtoinstallacopyofHadooplocally.
Theeasiestwaytodothatistosimplydownloadthearchivefromhttp://archive.
cloudera.
com/cdh4/cdh/4/hadoop-2.
0.
0-cdh4.
4.
0.
tar.
gzandunpackit.
Intheunpackedarchive,you'llndaetc/hadoop-mapreduce1directory.
Inthatdirectory,openthecore-site.
xmlleandmodifyitasfollows:fs.
default.
namehdfs://192.
168.
56.
101:8020CS246:MiningMassiveDatasets-ProblemSet027Next,openthemapred-site.
xmlleinthesamedirectoryandmodifyitasfollows:mapred.
job.
tracker192.
168.
56.
101:8021Aftermakingthosemodications,updateyourcommandpathtoincludethebin-mapreduce1directoryandsettheHADOOPCONFDIRenvironmentvariabletobethepathtotheetc/hadoop-mapreduce1directory.
YoushouldnowbeabletoexecuteHadoopcommandsfromyourlocalterminaljustasyouwouldfromtheterminalinthevirtualmachine.
YoumayalsowanttosettheHADOOPUSERNAMEenvironmentvariabletoclouderatoletyoumasqueradeastheclouderauser.
WhenyouusetheVMdirectly,you'rerunningastheclouderauser.
FurtherHadooptutorialsYahoo!
HadoopTutorial:http://developer.
yahoo.
com/hadoop/tutorial/ClouderaHadoopTutorial:http://www.
cloudera.
com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial.
htmlHowtoDebugMapReducePrograms:http://wiki.
apache.
org/hadoop/HowToDebugMapReduceProgramsFurtherEclipsetutorialsGeneraEclipsetutorial:http://www.
vogella.
com/articles/Eclipse/article.
html.
TutorialonhowtousetheEclipsedebugger:http://www.
vogella.
com/articles/EclipseDebugging/article.
html.
3Task:WriteyourownHadoopJobNowyouwillwriteyourrstMapReducejobtoaccomplishthefollowingtask:CS246:MiningMassiveDatasets-ProblemSet028WriteaHadoopMapReduceprogramwhichoutputsthenumberofwordsthatstartwitheachletter.
Thismeansthatforeveryletterwewanttocountthetotalnumberofwordsthatstartwiththatletter.
Inyourimplementationignorethelettercase,i.
e.
,considerallwordsaslowercase.
Youcanignoreallnon-alphabeticcharacters.
Runyourprogramoverthesameinputdataasabove.
Whattohand-in:Hand-intheprintoutoftheoutputleanduploadthesourcecode.

瓜云互联-美国洛杉矶高防CN2高防云服务器,新老用户均可9折促销!低至32.4元/月!

瓜云互联一直主打超高性价比的海外vps产品,主要以美国cn2、香港cn2线路为主,100M以内高宽带,非常适合个人使用、企业等等!安全防护体系 弹性灵活,能为提供简单、 高效、智能、快速、低成本的云防护,帮助个人、企业从实现网络攻击防御,同时也承诺产品24H支持退换,不喜欢可以找客服退现,诚信自由交易!官方网站:点击访问瓜云互联官网活动方案:打折优惠策略:新老用户购买服务器统统9折优惠预存返款活动...

RAKsmart美国VPS上市,活动期间5折抢购仅$30,$1.99/月

RAKsmart机房将于7月1日~7月31日推出“年中大促”活动,多重惊喜供您选择;爆款I3-2120仅30美金秒杀、V4新品上市,活动期间5折抢购、爆款产品持续热卖、洛杉矶+硅谷+香港+日本站群恢复销售、G口不限流量产品超低价热卖。美国VPS、日本VPS及香港VPS享全场7折优惠;爆款VPS $ 1.99/月限量秒杀,10台/天,售完即止, VPS 7折优惠码:VPS-TP-disRAKsmar...

企鹅小屋:垃圾服务商有跑路风险,站长注意转移备份数据!

企鹅小屋:垃圾服务商有跑路风险!企鹅不允许你二次工单的,二次提交工单直接关服务器,再严重就封号,意思是你提交工单要小心,别因为提交工单被干了账号!前段时间,就有站长说企鹅小屋要跑路了,站长不太相信,本站平台已经为企鹅小屋推荐了几千元的业绩,CPS返利达182.67CNY。然后,站长通过企鹅小屋后台申请提现,提现申请至今已经有20几天,企鹅小屋也没有转账。然后,搞笑的一幕出现了:平台账号登录不上提示...

localhost为你推荐
psbAchromefunctionscss"2014年全国民营企业招聘会现场A区域企业信息",,,,Anthemmy支持ipad支持ipadwin7关闭445端口如何快速关闭445端口iphone连不上wifi苹果手机为什么突然连不上家里的wifi?iexplore.exe应用程序错误iexplore.exe---应用程序错误.是什么意思?win10445端口win的22端口和23端口作用分别是什么 ?
fc2最新域名 万网域名空间 vps服务器 北京vps主机 重庆vps租用 edis 10t等于多少g wdcp 南昌服务器托管 html空间 英文站群 可外链相册 免费申请网站 免费phpmysql空间 gtt 鲁诺 个人免费主页 永久免费空间 中国联通宽带测速 上海联通 更多