launchlocalhost

localhost  时间:2021-05-20  阅读:()
CS246:MiningMassiveDatasetsWinter2014ProblemSet0Due9:30amJanuary14,2014GeneralInstructionsThishomeworkistobecompletedindividually(nocollaborationisallowed).
Also,youarenotallowedtouseanylatedaysforthehomework.
Thishomeworkisworth1%ofthetotalcoursegrade.
ThepurposeofthishomeworkistogetyoustartedwithHadoop.
Hereyouwilllearnhowtowrite,compile,debugandexecuteasimpleHadoopprogram.
FirstpartofthehomeworkservesasatutorialandthesecondpartasksyoutowriteyourownHadoopprogram.
Section1describesthevirtualmachineenvironment.
Insteadofthevirtualmachine,youarewelcometosetupyourownpseudo-distributedorfullydistributedclusterifyoupre-fer.
AnyversionofHadoopthatisatleast1.
0willsuce.
(Foraneasywaytosetupacluster,tryClouderaManager:http://archive.
cloudera.
com/cm4/installer/latest/cloudera-manager-installer.
bin.
)Ifyouchoosetosetupyourowncluster,youarere-sponsibleformakingsuretheclusterisworkingproperly.
TheTAswillbeunabletohelpyoudebugcongurationissuesinyourowncluster.
Section2explainshowtousetheEclipseenvironmentinthevirtualmachine,includinghowtocreateaproject,howtorunjobs,andhowtodebugjobs.
Section2.
5givesanend-to-endexampleofcreatingaproject,addingcode,building,running,anddebuggingit.
Section3istheactualhomeworkassignment.
Therearenodeliverablesforsections1and2.
Insection3,youareaskedtowriteandsubmityourownMapReducejob.
Thishomeworkrequiresyoutouploadthecodeandhand-inaprint-outoftheoutputforSection3.
Regular(non-SCPD)studentsshouldsubmithardcopiesoftheanswers(Section3)eitherinclassorinthesubmissionbox(seecoursewebsiteforlocation).
Forpapersubmis-sion,pleasellthecoversheetandsubmititasafrontpagewithyouranswers.
Youshoulduploadyoursourcecodeandanyotherlesyouused.
SCPDstudentsshouldsubmittheiranswersthroughSCPDandalsouploadthecode.
ThesubmissionmustincludetheanswerstoSection3,thecoversheetandtheusualSCPDrout-ingform(http://scpd.
stanford.
edu/generalInformation/pdf/SCPD_HomeworkRouteForm.
pdf).
CoverSheet:http://cs246.
stanford.
edu/cover.
pdfUploadLink:http://snap.
stanford.
edu/submit/CS246:MiningMassiveDatasets-ProblemSet02Questions1SettingupavirtualmachineDownloadandinstallVirtualBoxonyourmachine:http://virtualbox.
org/wiki/DownloadsDownloadtheClouderaQuickstartVMathttp://www.
cloudera.
com/content/dev-center/en/home/developer-admin-resources/quickstart-vm.
htmlUncompresstheVMarchive.
Itiscompressedwith7-Zip.
Ifneeded,youcandownloadatooltouncompressthearchiveathttp://www.
7-zip.
org/.
StartVirtualBoxandclickImportAppliance.
Clickthefoldericonbesidethelocationeld.
Browsetotheuncompressedarchivefolder,selectthe.
ovfle,andclicktheOpenbutton.
ClicktheContinuebutton.
ClicktheImportbutton.
Yourvirtualmachineshouldnowappearintheleftcolumn.
SelectitandclickonStarttolaunchit.
Usernameandpasswordare"cloudera"and"cloudera".
Optional:Openthenetworkpropertiesforthevirtualmachine.
ClickontheAdapter2tab.
EnabletheadapterandselectHost-onlyAdapter.
Ifyoudothisstep,youwillbeabletoconnecttotherunningvirtualmachinefromthehostOSat192.
168.
56.
101.
VirtualmachineincludesthefollowingsoftwareCentOS6.
2JDK6(1.
6.
032)Hadoop2.
0.
0Eclipse4.
2.
6(Juno)Theloginuseriscloudera,andthepasswordforthataccountiscloudera.
2RunningHadoopjobsGenerallyHadoopcanberuninthreemodes.
1.
Standalone(orlocal)mode:Therearenodaemonsusedinthismode.
HadoopusesthelocallesystemasansubstituteforHDFSlesystem.
Thejobswillrunasifthereis1mapperand1reducer.
CS246:MiningMassiveDatasets-ProblemSet032.
Pseudo-distributedmode:Allthedaemonsrunonasinglemachineandthissettingmimicsthebehaviorofacluster.
AllthedaemonsrunonyourmachinelocallyusingtheHDFSprotocol.
Therecanbemultiplemappersandreducers.
3.
Fully-distributedmode:ThisishowHadooprunsonarealcluster.
InthishomeworkwewillshowyouhowtorunHadoopjobsinStandalonemode(veryusefulfordevelopinganddebugging)andalsoinPseudo-distributedmode(tomimicthebehaviorofaclusterenvironment).
2.
1CreatingaHadoopprojectinEclipse(ThereisapluginforEclipsethatmakesitsimpletocreateanewHadoopprojectandexecuteHadoopjobs,butthepluginisonlywellmaintainedforHadoop1.
0.
4,whichisaratheroldversionofHadoop.
Thereisaprojectathttps://github.
com/winghc/hadoop2x-eclipse-pluginthatisworkingtoupdatethepluginforHadoop2.
0.
Youcantryitoutifyoulike,butyourmilagemayvary.
)Tocreateaproject:1.
Openorcreatethe~/.
m2/settings.
xmlleandmakesureithasthefollowingcon-tents:standardextrarepostruecentralhttp://repo.
maven.
apache.
org/maven2/truetrueclouderaCS246:MiningMassiveDatasets-ProblemSet04https://repository.
cloudera.
com/artifactory/clouderarepostruetrue2.
OpenEclipseandselectFile→New→Project.
.
.
.
3.
ExpandtheMavennode,selectMavenProject,andclicktheNext>button.
4.
Onthenextscreen,clicktheNext>button.
5.
Onthenextscreen,whenthearchetypeshaveloaded,selectmaven-archetype-quickstartandclicktheNext>button.
6.
Onthenextscreen,enteragroupnameintheGroupIdeld,andenteraprojectnameintheArtifactId.
ClicktheFinishbutton.
7.
Inthepackageexplorer,expandtheprojectnodeanddouble-clickthepom.
xmlletoopenit.
8.
Replacethecurrent"dependencies"sectionwiththefollowingcontent:jdk.
toolsjdk.
tools1.
6org.
apache.
hadoophadoophdfs2.
0.
0cdh4.
0.
0org.
apache.
hadoophadoopauth2.
0.
0cdh4.
0.
0CS246:MiningMassiveDatasets-ProblemSet05org.
apache.
hadoophadoopcommon2.
0.
0cdh4.
0.
0org.
apache.
hadoophadoopcore2.
0.
0mr1cdh4.
0.
1junitjunitdep4.
8.
2org.
apache.
hadoophadoophdfsorg.
apache.
hadoophadoopauthorg.
apache.
hadoophadoopcommonorg.
apache.
hadoophadoopcorejunitjunit4.
10testCS246:MiningMassiveDatasets-ProblemSet06org.
apache.
maven.
pluginsmavencompilerplugin2.
11.
61.
69.
Savethele.
10.
Right-clickontheprojectnodeandselectMaven→UpdateProject.
Youcannowcreateclassesinthesrcdirectory.
Afterwritingyourcode,buildtheJARlebyright-clickingontheprojectnodeandselectingRunAs→Maveninstall.
2.
2RunningHadoopjobsinstandalonemodeAftercreatingaproject,addingsourcecode,andbuildingtheJARleasoutlinedabove,theJARlewillbelocatedat/workspace//targetdirectory.
Openaterminalandrunthefollowingcommand:hadoopjar~/workspace//target/-0.
0.
1-SNAPSHOT.
jar\-Dmapped.
task.
tracker=local-Dfs.
defaultFS=localYouwillseealloftheoutputfromthemapandreducetasksintheterminal.
2.
3RunningHadoopjobsinpseudo-distributedmodeOpenaterminalandrunthefollowingcommand:hadoopjar~/workspace//target/-0.
0.
1-SNAPSHOT.
jarToseeallrunningjobs,runthefollowingcommand:hadoopjob-listTokillarunningjob,ndthejob'sIDandthenrunthefollowingcommand:hadoopjob-killCS246:MiningMassiveDatasets-ProblemSet072.
4DebuggingHadoopjobsTodebuganissuewithajob,theeasiestapproachistoaddprintstatementsintothesourceleandrunthejobinstandalonemode.
Theprintstatementswillappearintheterminaloutput.
Whenrunningyourjobinpseudo-distributedmode,theoutputfromthejobisloggedinthetasktracker'slogles,whichcanbeaccessedmosteasilybypointingawebbrowsertoport50030oftheserver.
Fromthejobtrackerwebpage,youcandrilldownintothefailingjob,thefailingtask,thefailedattempt,andnallythelogles.
Notethatthelogsforstdoutandstderrareseparated,whichcanbeusefulwhentryingtoisolatespecicdebuggingprintstatements.
IfyouenabledthesecondnetworkadapterintheVMsetup,youcanpointyourlocalbrowsertohttp://192.
168.
56.
101:50030/toaccessthejobtrackerpage.
Note,though,thatwhenyoufollowlinksthatleadtothetasktrackerwebpage,thelinkspointtolocalhost.
locadomain,whichmeansyourbrowserwillreturnapagenotfounderror.
Sim-plyreplacelocalhost.
locadomainwith192.
168.
56.
101intheURLbarandpressentertoloadthecorrectpage.
2.
5ExampleprojectInthissectionyouwillcreateanewEclipseHadoopproject,compile,andexecuteit.
Theprogramwillcountthefrequencyofallthewordsinagivenlargetextle.
Inyourvirtualmachine,Hadoop,JavaenvironmentandEclipsehavealreadybeenpre-installed.
Editthe~/.
m2/settings.
xmlleasoutlinedabove.
SeeFigure1Figure1:CreateaHadoopProject.
OpenEclipseandcreateanewprojectasoutlinedabove.
SeeFigures2-9.
CS246:MiningMassiveDatasets-ProblemSet08Figure2:CreateaHadoopProject.
Figure3:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet09Figure4:CreateaHadoopProject.
Figure5:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet010Figure6:CreateaHadoopProject.
Figure7:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet011Figure8:CreateaHadoopProject.
CS246:MiningMassiveDatasets-ProblemSet012Figure9:CreateaHadoopProject.
Theprojectwillcontainastubsourceleinthesrc/main/javadirectorythatwewillnotuse.
Instead,createanewclasscalledWordCount.
FromtheFilemenu,selectNew→Class.
SeeFigure10Figure10:Createjavale.
Onthenextscreen,enterthepackagename(e.
g,thegroupIDplustheprojectname)inthePackageeld.
EnterWordCountastheName.
SeeFigure11.
CS246:MiningMassiveDatasets-ProblemSet013Figure11:Createjavale.
IntheSuperclasseld,enterConfiguredandclicktheBrowsebutton.
Fromthepop-upwindowselectCongured—org.
apache.
hadoop.
confandclicktheOKbutton.
SeeFigure12.
CS246:MiningMassiveDatasets-ProblemSet014Figure12:Createjavale.
IntheInterfacessection,clicktheAddbutton.
Fromthepop-upwindowselectTool—org.
apache.
hadoop.
utilandclicktheOKbutton.
SeeFigure13.
CS246:MiningMassiveDatasets-ProblemSet015Figure13:Createjavale.
Checktheboxesforpublicstaticvoidmain(Stringargs[])andInheritedabstractmeth-odsandclicktheFinishbutton.
SeFigure14CS246:MiningMassiveDatasets-ProblemSet016Figure14:CreateWordCount.
java.
YouwillnowhavearoughskeletonofaJavaleasinFigure15.
YoucannowaddcodetothisclasstoimplementyourHadoopjob.
CS246:MiningMassiveDatasets-ProblemSet017Figure15:CreateWordCount.
java.
Ratherthanimplementajobfromscratch,copythecontentsfromhttp://snap.
stanford.
edu/class/cs246-data-2014/WordCount.
javaandpasteitintotheWordCount.
javale.
Becarefultoleavethepackagestatementatthetopintact.
SeeFigure16.
ThecodeinWordCount.
javacalculatesthefrequencyofeachwordinagivendataset.
CS246:MiningMassiveDatasets-ProblemSet018Figure16:CreateWordCount.
java.
Buildtheprojectbyright-clickingtheprojectnodeandselectingRunAs→Maveninstall.
SeeFigure17.
CS246:MiningMassiveDatasets-ProblemSet019Figure17:CreateWordCount.
java.
DownloadtheCompleteWorksofWilliamShakespearefromProjectGutenbergathttp://www.
gutenberg.
org/cache/epub/100/pg100.
txt.
Openaterminalandchangetothedirectorywherethedatasetwasstored.
Runthecommand:hadoopjar~/workspace/wordcount/target/wordcount-0.
0.
1-SNAPSHOT.
jar\edu.
stanford.
cs246.
wordcount.
WordCount-Dmapred.
job.
tracker=local\-Dfs.
defaultFS=localdatasetoutputCS246:MiningMassiveDatasets-ProblemSet020SeeFigure18Figure18:RunWordCountjob.
Ifthejobsucceeds,youwillseeanoutputdirectoryinthecurrentdirectorythatcontainsalecalledpart-00000.
Thepart-00000lecontainstheoutputfromthejob.
SeeFigure19Figure19:RunWordCountjob.
Runthecommand:hadoopfs-lsThecommandwilllistthecontentsofyourhomedirectoryinHDFS,whichshouldbeempty,resultinginnooutput.
Runthecommand:hadoopfs-copyFromLocalpg100.
txttocopythedatasetfolderintoHDFS.
Runthecommand:hadoopfs-lsCS246:MiningMassiveDatasets-ProblemSet021again.
Youshouldseethedatasetdirectorylisted,asinFigure20indicatingthatthedatasetisinHDFS.
Figure20:RunWordCountjob.
Runthecommand:hadoopjar~/workspace/WordCount/target/WordCount-0.
0.
1-SNAPSHOT.
jar\edu.
stanford.
cs246.
wordcount.
WordCountpg100.
txtoutputSeeFigure21.
Ifthejobfails,youwillseeamessageindicatingthatthejobfailed.
Otherwise,youcanassumethejobsucceeded.
Figure21:RunWordCountjob.
Runthecommand:hadoopfs-lsoutputYoushouldseeanoutputleforeachreducer.
Sincetherewasonlyonereducerforthisjob,youshouldonlyseeonepart-*le.
Notethatsometimestheleswillbecalledpart-NNNNN,andsometimesthey'llbecalledpart-r-NNNNN.
SeeFigure22Figure22:RunWordCountjob.
Runthecommand:hadoopfs-catoutput/part\*|headYoushouldseethesameoutputaswhenyouranthejoblocally,asshowninFigure23CS246:MiningMassiveDatasets-ProblemSet022Figure23:RunWordCountjob.
Toviewthejob'slogs,openthebrowserintheVMandpointittohttp://localhost:50030asinFigure24.
Figure24:ViewWordCountjoblogs.
Clickonthelinkforthecompletedjob.
SeeFigure25.
CS246:MiningMassiveDatasets-ProblemSet023Figure25:ViewWordCountjoblogs.
Clickthelinkforthemaptasks.
SeeFigure26.
CS246:MiningMassiveDatasets-ProblemSet024Figure26:ViewWordCountjoblogs.
Clickthelinkfortherstattempt.
SeeFigure27.
CS246:MiningMassiveDatasets-ProblemSet025Figure27:ViewWordCountjoblogs.
Clickthelinkforthefulllogs.
SeeFigure28.
CS246:MiningMassiveDatasets-ProblemSet026Figure28:ViewWordCountjoblogs.
2.
6UsingyourlocalmachinefordevelopmentIfyouenabledthesecondnetworkadapter,youcanuseyourownlocalmachineforde-velopment,includingyourlocalIDE.
Ifordertodothat,you'llneedtoinstallacopyofHadooplocally.
Theeasiestwaytodothatistosimplydownloadthearchivefromhttp://archive.
cloudera.
com/cdh4/cdh/4/hadoop-2.
0.
0-cdh4.
4.
0.
tar.
gzandunpackit.
Intheunpackedarchive,you'llndaetc/hadoop-mapreduce1directory.
Inthatdirectory,openthecore-site.
xmlleandmodifyitasfollows:fs.
default.
namehdfs://192.
168.
56.
101:8020CS246:MiningMassiveDatasets-ProblemSet027Next,openthemapred-site.
xmlleinthesamedirectoryandmodifyitasfollows:mapred.
job.
tracker192.
168.
56.
101:8021Aftermakingthosemodications,updateyourcommandpathtoincludethebin-mapreduce1directoryandsettheHADOOPCONFDIRenvironmentvariabletobethepathtotheetc/hadoop-mapreduce1directory.
YoushouldnowbeabletoexecuteHadoopcommandsfromyourlocalterminaljustasyouwouldfromtheterminalinthevirtualmachine.
YoumayalsowanttosettheHADOOPUSERNAMEenvironmentvariabletoclouderatoletyoumasqueradeastheclouderauser.
WhenyouusetheVMdirectly,you'rerunningastheclouderauser.
FurtherHadooptutorialsYahoo!
HadoopTutorial:http://developer.
yahoo.
com/hadoop/tutorial/ClouderaHadoopTutorial:http://www.
cloudera.
com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial.
htmlHowtoDebugMapReducePrograms:http://wiki.
apache.
org/hadoop/HowToDebugMapReduceProgramsFurtherEclipsetutorialsGeneraEclipsetutorial:http://www.
vogella.
com/articles/Eclipse/article.
html.
TutorialonhowtousetheEclipsedebugger:http://www.
vogella.
com/articles/EclipseDebugging/article.
html.
3Task:WriteyourownHadoopJobNowyouwillwriteyourrstMapReducejobtoaccomplishthefollowingtask:CS246:MiningMassiveDatasets-ProblemSet028WriteaHadoopMapReduceprogramwhichoutputsthenumberofwordsthatstartwitheachletter.
Thismeansthatforeveryletterwewanttocountthetotalnumberofwordsthatstartwiththatletter.
Inyourimplementationignorethelettercase,i.
e.
,considerallwordsaslowercase.
Youcanignoreallnon-alphabeticcharacters.
Runyourprogramoverthesameinputdataasabove.
Whattohand-in:Hand-intheprintoutoftheoutputleanduploadthesourcecode.

inux国外美老牌PhotonVPS月$2.5 ,Linux系统首月半价

PhotonVPS 服务商我们是不是已经很久没有见过?曾经也是相当的火爆的,我们中文习惯称作为饭桶VPS主机商。翻看之前的文章,在2015年之前也有较多商家的活动分享的,这几年由于服务商太多,乃至于有一些老牌的服务商都逐渐淡忘。这不有看到PhotonVPS商家发布促销活动。PhotonVPS 商家七月份推出首月半价Linux系统VPS主机,首月低至2.5美元,有洛杉矶、达拉斯、阿什本机房,除提供普...

星梦云:四川100G高防4H4G10M月付仅60元

星梦云怎么样?星梦云资质齐全,IDC/ISP均有,从星梦云这边租的服务器均可以备案,属于一手资源,高防机柜、大带宽、高防IP业务,一手整C IP段,四川电信,星梦云专注四川高防服务器,成都服务器,雅安服务器。星梦云目前夏日云服务器促销,四川100G高防4H4G10M月付仅60元;西南高防月付特价活动,续费同价,买到就是赚到!点击进入:星梦云官方网站地址1、成都电信年中活动机(成都电信优化线路,封锁...

pacificrack7月美国便宜支持win VPS,$19.99/年,2G内存/1核/50gSSD/1T流量

pacificrack发布了7月最新vps优惠,新款促销便宜vps采用的是魔方管理,也就是PR-M系列。提一下有意思的是这次支持Windows server 2003、2008R2、2012R2、2016、2019、Windows 7、Windows 10,当然啦,常规Linux系统是必不可少的!1Gbps带宽、KVM虚拟、纯SSD raid10、自家QN机房洛杉矶数据中心...支持PayPal、...

localhost为你推荐
followgoogle支持ipad支持ipad支持ipad支持ipad重庆网通重庆联通现在有哪些资费???itunes备份itunes就是备份不了怎么办啊ms17-010win10蒙林北冬虫夏草酒·10年原浆1*6 500ml 176,176是一瓶的价格还是一箱的价格联通版iphone4s苹果4s是联通版,或移动版,或全网通如何知道?win7如何关闭445端口如何判断445端口是否关闭
最好的虚拟主机 动态域名解析 中文域名申请 naning9韩国官网 狗爹 已备案删除域名 域名评估 美国在线代理服务器 中国电信测速网 备案空间 linode支付宝 秒杀品 东莞主机托管 云服务器比较 后门 卡巴斯基官网下载 apnic 电信主机托管 ncp是什么 美国十大啦 更多