launchlocalhost

localhost  时间:2021-05-20  阅读:()
CS246:MiningMassiveDatasetsWinter2014ProblemSet0Due9:30amJanuary14,2014GeneralInstructionsThishomeworkistobecompletedindividually(nocollaborationisallowed).
Also,youarenotallowedtouseanylatedaysforthehomework.
Thishomeworkisworth1%ofthetotalcoursegrade.
ThepurposeofthishomeworkistogetyoustartedwithHadoop.
Hereyouwilllearnhowtowrite,compile,debugandexecuteasimpleHadoopprogram.
FirstpartofthehomeworkservesasatutorialandthesecondpartasksyoutowriteyourownHadoopprogram.
Section1describesthevirtualmachineenvironment.
Insteadofthevirtualmachine,youarewelcometosetupyourownpseudo-distributedorfullydistributedclusterifyoupre-fer.
AnyversionofHadoopthatisatleast1.
0willsuce.
(Foraneasywaytosetupacluster,tryClouderaManager:http://archive.
cloudera.
com/cm4/installer/latest/cloudera-manager-installer.
bin.
)Ifyouchoosetosetupyourowncluster,youarere-sponsibleformakingsuretheclusterisworkingproperly.
TheTAswillbeunabletohelpyoudebugcongurationissuesinyourowncluster.
Section2explainshowtousetheEclipseenvironmentinthevirtualmachine,includinghowtocreateaproject,howtorunjobs,andhowtodebugjobs.
Section2.
5givesanend-to-endexampleofcreatingaproject,addingcode,building,running,anddebuggingit.
Section3istheactualhomeworkassignment.
Therearenodeliverablesforsections1and2.
Insection3,youareaskedtowriteandsubmityourownMapReducejob.
Thishomeworkrequiresyoutouploadthecodeandhand-inaprint-outoftheoutputforSection3.
Regular(non-SCPD)studentsshouldsubmithardcopiesoftheanswers(Section3)eitherinclassorinthesubmissionbox(seecoursewebsiteforlocation).
Forpapersubmis-sion,pleasellthecoversheetandsubmititasafrontpagewithyouranswers.
Youshoulduploadyoursourcecodeandanyotherlesyouused.
SCPDstudentsshouldsubmittheiranswersthroughSCPDandalsouploadthecode.
ThesubmissionmustincludetheanswerstoSection3,thecoversheetandtheusualSCPDrout-ingform(http://scpd.
stanford.
edu/generalInformation/pdf/SCPD_HomeworkRouteForm.
pdf).
CoverSheet:http://cs246.
stanford.
edu/cover.
pdfUploadLink:http://snap.
stanford.
edu/submit/CS246:MiningMassiveDatasets-ProblemSet02Questions1SettingupavirtualmachineDownloadandinstallVirtualBoxonyourmachine:http://virtualbox.
org/wiki/DownloadsDownloadtheClouderaQuickstartVMathttp://www.
cloudera.
com/content/dev-center/en/home/developer-admin-resources/quickstart-vm.
htmlUncompresstheVMarchive.
Itiscompressedwith7-Zip.
Ifneeded,youcandownloadatooltouncompressthearchiveathttp://www.
7-zip.
org/.
StartVirtualBoxandclickImportAppliance.
Clickthefoldericonbesidethelocationeld.
Browsetotheuncompressedarchivefolder,selectthe.
ovfle,andclicktheOpenbutton.
ClicktheContinuebutton.
ClicktheImportbutton.
Yourvirtualmachineshouldnowappearintheleftcolumn.
SelectitandclickonStarttolaunchit.
Usernameandpasswordare"cloudera"and"cloudera".
Optional:Openthenetworkpropertiesforthevirtualmachine.
ClickontheAdapter2tab.
EnabletheadapterandselectHost-onlyAdapter.
Ifyoudothisstep,youwillbeabletoconnecttotherunningvirtualmachinefromthehostOSat192.
168.
56.
101.
VirtualmachineincludesthefollowingsoftwareCentOS6.
2JDK6(1.
6.
032)Hadoop2.
0.
0Eclipse4.
2.
6(Juno)Theloginuseriscloudera,andthepasswordforthataccountiscloudera.
2RunningHadoopjobsGenerallyHadoopcanberuninthreemodes.
1.
Standalone(orlocal)mode:Therearenodaemonsusedinthismode.
HadoopusesthelocallesystemasansubstituteforHDFSlesystem.
Thejobswillrunasifthereis1mapperand1reducer.
CS246:MiningMassiveDatasets-ProblemSet032.
Pseudo-distributedmode:Allthedaemonsrunonasinglemachineandthissettingmimicsthebehaviorofacluster.
AllthedaemonsrunonyourmachinelocallyusingtheHDFSprotocol.
Therecanbemultiplemappersandreducers.
3.
Fully-distributedmode:ThisishowHadooprunsonarealcluster.
InthishomeworkwewillshowyouhowtorunHadoopjobsinStandalonemode(veryusefulfordevelopinganddebugging)andalsoinPseudo-distributedmode(tomimicthebehaviorofaclusterenvironment).
2.
1CreatingaHadoopprojectinEclipse(ThereisapluginforEclipsethatmakesitsimpletocreateanewHadoopprojectandexecuteHadoopjobs,butthepluginisonlywellmaintainedforHadoop1.
0.
4,whichisaratheroldversionofHadoop.
Thereisaprojectathttps://github.
com/winghc/hadoop2x-eclipse-pluginthatisworkingtoupdatethepluginforHadoop2.
0.
Youcantryitoutifyoulike,butyourmilagemayvary.
)Tocreateaproject:1.
Openorcreatethe~/.
m2/settings.
xmlleandmakesureithasthefollowingcon-tents:standardextrarepostruecentralhttp://repo.
maven.
apache.
org/maven2/truetrueclouderaCS246:MiningMassiveDatasets-ProblemSet04https://repository.
cloudera.
com/artifactory/clouderarepostruetrue2.
OpenEclipseandselectFile→New→Project.
.
.
.
3.
ExpandtheMavennode,selectMavenProject,andclicktheNext>button.
4.
Onthenextscreen,clicktheNext>button.
5.
Onthenextscreen,whenthearchetypeshaveloaded,selectmaven-archetype-quickstartandclicktheNext>button.
6.
Onthenextscreen,enteragroupnameintheGroupIdeld,andenteraprojectnameintheArtifactId.
ClicktheFinishbutton.
7.
Inthepackageexplorer,expandtheprojectnodeanddouble-clickthepom.
xmlletoopenit.
8.
Replacethecurrent"dependencies"sectionwiththefollowingcontent:jdk.
toolsjdk.
tools1.
6org.
apache.
hadoophadoophdfs2.
0.
0cdh4.
0.
0org.
apache.
hadoophadoopauth2.
0.
0cdh4.
0.
0CS246:MiningMassiveDatasets-ProblemSet05org.
apache.
hadoophadoopcommon2.
0.
0cdh4.
0.
0org.
apache.
hadoophadoopcore2.
0.
0mr1cdh4.
0.
1junitjunitdep4.
8.
2org.
apache.
hadoophadoophdfsorg.
apache.
hadoophadoopauthorg.
apache.
hadoophadoopcommonorg.
apache.
hadoophadoopcorejunitjunit4.
10testCS246:MiningMassiveDatasets-ProblemSet06org.
apache.
maven.
pluginsmavencompilerplugin2.
11.
61.
69.
Savethele.
10.
Right-clickontheprojectnodeandselectMaven→UpdateProject.
Youcannowcreateclassesinthesrcdirectory.
Afterwritingyourcode,buildtheJARlebyright-clickingontheprojectnodeandselectingRunAs→Maveninstall.
2.
2RunningHadoopjobsinstandalonemodeAftercreatingaproject,addingsourcecode,andbuildingtheJARleasoutlinedabove,theJARlewillbelocatedat/workspace//targetdirectory.
Openaterminalandrunthefollowingcommand:hadoopjar~/workspace//target/-0.
0.
1-SNAPSHOT.
jar\-Dmapped.
task.
tracker=local-Dfs.
defaultFS=localYouwillseealloftheoutputfromthemapandreducetasksintheterminal.
2.
3RunningHadoopjobsinpseudo-distributedmodeOpenaterminalandrunthefollowingcommand:hadoopjar~/workspace//target/-0.
0.
1-SNAPSHOT.
jarToseeallrunningjobs,runthefollowingcommand:hadoopjob-listTokillarunningjob,ndthejob'sIDandthenrunthefollowingcommand:hadoopjob-killCS246:MiningMassiveDatasets-ProblemSet072.
4DebuggingHadoopjobsTodebuganissuewithajob,theeasiestapproachistoaddprintstatementsintothesourceleandrunthejobinstandalonemode.
Theprintstatementswillappearintheterminaloutput.
Whenrunningyourjobinpseudo-distributedmode,theoutputfromthejobisloggedinthetasktracker'slogles,whichcanbeaccessedmosteasilybypointingawebbrowsertoport50030oftheserver.
Fromthejobtrackerwebpage,youcandrilldownintothefailingjob,thefailingtask,thefailedattempt,andnallythelogles.
Notethatthelogsforstdoutandstderrareseparated,whichcanbeusefulwhentryingtoisolatespecicdebuggingprintstatements.
IfyouenabledthesecondnetworkadapterintheVMsetup,youcanpointyourlocalbrowsertohttp://192.
168.
56.
101:50030/toaccessthejobtrackerpage.
Note,though,thatwhenyoufollowlinksthatleadtothetasktrackerwebpage,thelinkspointtolocalhost.
locadomain,whichmeansyourbrowserwillreturnapagenotfounderror.
Sim-plyreplacelocalhost.
locadomainwith192.
168.
56.
101intheURLbarandpressentertoloadthecorrectpage.
2.
5ExampleprojectInthissectionyouwillcreateanewEclipseHadoopproject,compile,andexecuteit.
Theprogramwillcountthefrequencyofallthewordsinagivenlargetextle.
Inyourvirtualmachine,Hadoop,JavaenvironmentandEclipsehavealreadybeenpre-installed.
Editthe~/.
m2/settings.
xmlleasoutlinedabove.
SeeFigure1Figure1:CreateaHadoopProject.
OpenEclipseandcreateanewprojectasoutlinedabove.
SeeFigures2-9.
CS246:MiningMassiveDatasets-ProblemSet08Figure2:CreateaHadoopProject.
Figure3:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet09Figure4:CreateaHadoopProject.
Figure5:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet010Figure6:CreateaHadoopProject.
Figure7:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet011Figure8:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet012Figure9:CreateaHadoopProject.
Theprojectwillcontainastubsourceleinthesrc/main/javadirectorythatwewillnotuse.
Instead,createanewclasscalledWordCount.
FromtheFilemenu,selectNew→Class.
SeeFigure10Figure10:Createjavale.
Onthenextscreen,enterthepackagename(e.
g,thegroupIDplustheprojectname)inthePackageeld.
EnterWordCountastheName.
SeeFigure11.
CS246:MiningMassiveDatasets-ProblemSet013Figure11:Createjavale.
IntheSuperclasseld,enterConfiguredandclicktheBrowsebutton.
Fromthepop-upwindowselectCongured—org.
apache.
hadoop.
confandclicktheOKbutton.
SeeFigure12.
CS246:MiningMassiveDatasets-ProblemSet014Figure12:Createjavale.
IntheInterfacessection,clicktheAddbutton.
Fromthepop-upwindowselectTool—org.
apache.
hadoop.
utilandclicktheOKbutton.
SeeFigure13.
CS246:MiningMassiveDatasets-ProblemSet015Figure13:Createjavale.
Checktheboxesforpublicstaticvoidmain(Stringargs[])andInheritedabstractmeth-odsandclicktheFinishbutton.
SeFigure14CS246:MiningMassiveDatasets-ProblemSet016Figure14:CreateWordCount.
java.
YouwillnowhavearoughskeletonofaJavaleasinFigure15.
YoucannowaddcodetothisclasstoimplementyourHadoopjob.
CS246:MiningMassiveDatasets-ProblemSet017Figure15:CreateWordCount.
java.
Ratherthanimplementajobfromscratch,copythecontentsfromhttp://snap.
stanford.
edu/class/cs246-data-2014/WordCount.
javaandpasteitintotheWordCount.
javale.
Becarefultoleavethepackagestatementatthetopintact.
SeeFigure16.
ThecodeinWordCount.
javacalculatesthefrequencyofeachwordinagivendataset.
CS246:MiningMassiveDatasets-ProblemSet018Figure16:CreateWordCount.
java.
Buildtheprojectbyright-clickingtheprojectnodeandselectingRunAs→Maveninstall.
SeeFigure17.
CS246:MiningMassiveDatasets-ProblemSet019Figure17:CreateWordCount.
java.
DownloadtheCompleteWorksofWilliamShakespearefromProjectGutenbergathttp://www.
gutenberg.
org/cache/epub/100/pg100.
txt.
Openaterminalandchangetothedirectorywherethedatasetwasstored.
Runthecommand:hadoopjar~/workspace/wordcount/target/wordcount-0.
0.
1-SNAPSHOT.
jar\edu.
stanford.
cs246.
wordcount.
WordCount-Dmapred.
job.
tracker=local\-Dfs.
defaultFS=localdatasetoutputCS246:MiningMassiveDatasets-ProblemSet020SeeFigure18Figure18:RunWordCountjob.
Ifthejobsucceeds,youwillseeanoutputdirectoryinthecurrentdirectorythatcontainsalecalledpart-00000.
Thepart-00000lecontainstheoutputfromthejob.
SeeFigure19Figure19:RunWordCountjob.
Runthecommand:hadoopfs-lsThecommandwilllistthecontentsofyourhomedirectoryinHDFS,whichshouldbeempty,resultinginnooutput.
Runthecommand:hadoopfs-copyFromLocalpg100.
txttocopythedatasetfolderintoHDFS.
Runthecommand:hadoopfs-lsCS246:MiningMassiveDatasets-ProblemSet021again.
Youshouldseethedatasetdirectorylisted,asinFigure20indicatingthatthedatasetisinHDFS.
Figure20:RunWordCountjob.
Runthecommand:hadoopjar~/workspace/WordCount/target/WordCount-0.
0.
1-SNAPSHOT.
jar\edu.
stanford.
cs246.
wordcount.
WordCountpg100.
txtoutputSeeFigure21.
Ifthejobfails,youwillseeamessageindicatingthatthejobfailed.
Otherwise,youcanassumethejobsucceeded.
Figure21:RunWordCountjob.
Runthecommand:hadoopfs-lsoutputYoushouldseeanoutputleforeachreducer.
Sincetherewasonlyonereducerforthisjob,youshouldonlyseeonepart-*le.
Notethatsometimestheleswillbecalledpart-NNNNN,andsometimesthey'llbecalledpart-r-NNNNN.
SeeFigure22Figure22:RunWordCountjob.
Runthecommand:hadoopfs-catoutput/part\*|headYoushouldseethesameoutputaswhenyouranthejoblocally,asshowninFigure23CS246:MiningMassiveDatasets-ProblemSet022Figure23:RunWordCountjob.
Toviewthejob'slogs,openthebrowserintheVMandpointittohttp://localhost:50030asinFigure24.
Figure24:ViewWordCountjoblogs.
Clickonthelinkforthecompletedjob.
SeeFigure25.
CS246:MiningMassiveDatasets-ProblemSet023Figure25:ViewWordCountjoblogs.
Clickthelinkforthemaptasks.
SeeFigure26.
CS246:MiningMassiveDatasets-ProblemSet024Figure26:ViewWordCountjoblogs.
Clickthelinkfortherstattempt.
SeeFigure27.
CS246:MiningMassiveDatasets-ProblemSet025Figure27:ViewWordCountjoblogs.
Clickthelinkforthefulllogs.
SeeFigure28.
CS246:MiningMassiveDatasets-ProblemSet026Figure28:ViewWordCountjoblogs.
2.
6UsingyourlocalmachinefordevelopmentIfyouenabledthesecondnetworkadapter,youcanuseyourownlocalmachineforde-velopment,includingyourlocalIDE.
Ifordertodothat,you'llneedtoinstallacopyofHadooplocally.
Theeasiestwaytodothatistosimplydownloadthearchivefromhttp://archive.
cloudera.
com/cdh4/cdh/4/hadoop-2.
0.
0-cdh4.
4.
0.
tar.
gzandunpackit.
Intheunpackedarchive,you'llndaetc/hadoop-mapreduce1directory.
Inthatdirectory,openthecore-site.
xmlleandmodifyitasfollows:fs.
default.
namehdfs://192.
168.
56.
101:8020CS246:MiningMassiveDatasets-ProblemSet027Next,openthemapred-site.
xmlleinthesamedirectoryandmodifyitasfollows:mapred.
job.
tracker192.
168.
56.
101:8021Aftermakingthosemodications,updateyourcommandpathtoincludethebin-mapreduce1directoryandsettheHADOOPCONFDIRenvironmentvariabletobethepathtotheetc/hadoop-mapreduce1directory.
YoushouldnowbeabletoexecuteHadoopcommandsfromyourlocalterminaljustasyouwouldfromtheterminalinthevirtualmachine.
YoumayalsowanttosettheHADOOPUSERNAMEenvironmentvariabletoclouderatoletyoumasqueradeastheclouderauser.
WhenyouusetheVMdirectly,you'rerunningastheclouderauser.
FurtherHadooptutorialsYahoo!
HadoopTutorial:http://developer.
yahoo.
com/hadoop/tutorial/ClouderaHadoopTutorial:http://www.
cloudera.
com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial.
htmlHowtoDebugMapReducePrograms:http://wiki.
apache.
org/hadoop/HowToDebugMapReduceProgramsFurtherEclipsetutorialsGeneraEclipsetutorial:http://www.
vogella.
com/articles/Eclipse/article.
html.
TutorialonhowtousetheEclipsedebugger:http://www.
vogella.
com/articles/EclipseDebugging/article.
html.
3Task:WriteyourownHadoopJobNowyouwillwriteyourrstMapReducejobtoaccomplishthefollowingtask:CS246:MiningMassiveDatasets-ProblemSet028WriteaHadoopMapReduceprogramwhichoutputsthenumberofwordsthatstartwitheachletter.
Thismeansthatforeveryletterwewanttocountthetotalnumberofwordsthatstartwiththatletter.
Inyourimplementationignorethelettercase,i.
e.
,considerallwordsaslowercase.
Youcanignoreallnon-alphabeticcharacters.
Runyourprogramoverthesameinputdataasabove.
Whattohand-in:Hand-intheprintoutoftheoutputleanduploadthesourcecode.

BuyVM新设立的迈阿密机房速度怎么样?简单的测评速度性能

BuyVM商家算是一家比较老牌的海外主机商,公司设立在加拿大,曾经是低价便宜VPS主机的代表,目前为止有提供纽约、拉斯维加斯、卢森堡机房,以及新增加的美国迈阿密机房。如果我们有需要选择BuyVM商家的机器需要注意的是注册信息的时候一定要规范,否则很容易出现欺诈订单,甚至你开通后都有可能被禁止账户,也是这个原因,曾经被很多人吐槽的。这里我们简单的对于BuyVM商家新增加的迈阿密机房进行简单的测评。如...

酷锐云香港(19元/月) ,美国1核2G 19元/月,日本独立物理机,

酷锐云是一家2019年开业的国人主机商家,商家为企业运营,主要销售主VPS服务器,提供挂机宝和云服务器,机房有美国CERA、中国香港安畅和电信,CERA为CN2 GIA线路,提供单机10G+天机盾防御,提供美国原生IP,支持媒体流解锁,商家的套餐价格非常美丽,CERA机房月付20元起,香港安畅机房10M带宽月付25元,有需要的朋友可以入手试试。酷锐云自开业以来一直有着良好的产品稳定性及服务态度,支...

酷番云-618云上秒杀,香港1核2M 29/月,高防服务器20M 147/月 50M 450/月,续费同价!

官方网站:点击访问酷番云官网活动方案:优惠方案一(限时秒杀专场)有需要海外的可以看看,比较划算29月,建议年付划算,月付续费不同价,这个专区。国内节点可以看看,性能高IO为主, 比较少见。平常一般就100IO 左右。优惠方案二(高防专场)高防专区主要以高防为主,节点有宿迁,绍兴,成都,宁波等,节点挺多,都支持防火墙自助控制。续费同价以下专场。 优惠方案三(精选物理机)西南地区节点比较划算,赠送5...

localhost为你推荐
1前言2区块链应用产业发展现状"2018年中文图书第5期新书通报",,,,,桂林飞宇科技股份有限公司V1.2动设备管理解决平板ipad我研制千万亿次超级电脑支持ipad支持ipad勒索病毒win7补丁求问win7 64位旗舰版怎么预防勒索病毒勒索病毒win7补丁为了防勒索病毒,装了kb4012212补丁,但出现关机蓝屏的问题了,开机正常
国外空间租用 duniu hostmonster 美元争夺战 win8.1企业版升级win10 线路工具 主机合租 韩国网名大全 phpmyadmin配置 t云 台湾谷歌 美国迈阿密 电信宽带测速软件 七十九刀 tracker服务器 删除域名 服务器是什么 美国vpn服务器 pptpvpn bwg 更多