prototypeavmask.net

avmask.net  时间:2021-03-25  阅读:()
CRNet:Cross-ReferenceNetworksforFew-ShotSegmentationWeideLiu1,ChiZhang1,GuoshengLin1,FayaoLiu21NanyangTechnologicalUniversity,Singapore2A*Star,SingaporeE-mail:weide001@e.
ntu.
edu.
sg,chi007@e.
ntu.
edu.
sg,gslin@ntu.
edu.
sgAbstractOverthepastfewyears,state-of-the-artimagesegmen-tationalgorithmsarebasedondeepconvolutionalneuralnetworks.
Torenderadeepnetworkwiththeabilitytoun-derstandaconcept,humansneedtocollectalargeamountofpixel-levelannotateddatatotrainthemodels,whichistime-consumingandtedious.
Recently,few-shotsegmenta-tionisproposedtosolvethisproblem.
Few-shotsegmenta-tionaimstolearnasegmentationmodelthatcanbegener-alizedtonovelclasseswithonlyafewtrainingimages.
Inthispaper,weproposeacross-referencenetwork(CRNet)forfew-shotsegmentation.
Unlikepreviousworkswhichonlypredictthemaskinthequeryimage,ourproposedmodelconcurrentlymakepredictionsforboththesupportimageandthequeryimage.
Withacross-referencemecha-nism,ournetworkcanbetterndtheco-occurrentobjectsinthetwoimages,thushelpingthefew-shotsegmentationtask.
Wealsodevelopamaskrenementmoduletorecurrentlyre-nethepredictionoftheforegroundregions.
Forthek-shotlearning,weproposetonetunepartsofnetworkstotakeadvantageofmultiplelabeledsupportimages.
ExperimentsonthePASCALVOC2012datasetshowthatournetworkachievesstate-of-the-artperformance.
1.
IntroductionDeepneuralnetworkshavebeenwidelyappliedtovi-sualunderstandingtasks,e.
g.
,objectiondetection,seman-ticsegmentationandimagecaptioning,sincethehugesuc-cessinImageNetclassicationchallenge[4].
Duetoitsdata-drivingproperty,large-scalelabeleddatasetsareof-tenrequiredtoenablethetrainingofdeepmodels.
How-ever,collectinglabeleddatacanbenotoriouslyexpensiveintaskslikesemanticsegmentation,instancesegmentation,andvideosegmentation.
Moreover,datacollectingisusu-allyforasetofspeciccategories.
KnowledgelearnedinpreviousclassescanhardlybetransferredtounseenclassesCorrespondingauthor:G.
Lin(e-mail:gslin@ntu.
edu.
sg)Figure1.
ComparisonofourproposedCRNetagainstpreviouswork.
Previouswork(upperpart)unilaterallyguidethesegmen-tationofqueryimageswithsupportimages,whileinourCRNet(lowerpart)supportandqueryimagescanguidethesegmentationofeachother.
directly.
Directlynetuningthetrainedmodelsstillneedsalargeamountofnewlabeleddata.
Few-shotlearning,ontheotherhand,isproposedtosolvethisproblem.
Inthefew-shotlearningtasks,modelstrainedonprevioustasksareexpectedtogeneralizetounseentaskswithonlyafewlabeledtrainingimages.
Inthispaper,wetargetatfew-shotimagesegmentation.
Givenanovelobjectcategory,few-shotsegmentationaimstondtheforegroundregionsofthiscategoryonlyseeingafewlabeledexamples.
Manypreviousworksformulatethefew-shotsegmentationtaskasaguidedsegmentationtask.
Theguidanceinformationisextractedfromthelabeledsup-portsetfortheforegroundpredictioninthequeryimage,whichisusuallyachievedbyanunsymmetricaltwo-branchnetworkstructure.
Themodelisoptimizedwiththegroundtruthquerymaskasthesupervision.
Inourwork,wearguethattherolesofqueryandsup-portsetscanbeswitchedinafew-shotsegmentationmodel.
Specically,thesupportimagescanguidethepredictionof4165thequeryset,andconversely,thequeryimagecanalsohelpmakepredictionsofthesupportset.
Inspiredbytheimageco-segmentationliterature[7,12,1],weproposeasymmet-ricCross-ReferenceNetworkthattwoheadsconcurrentlymakepredictionsforboththequeryimageandthesupportimage.
ThedifferenceofthenetworkdesignwithpreviousworksisshowninFig.
1.
Thekeycomponentinournet-workdesignisthecross-referencemodulewhichgeneratesthereinforcedfeaturerepresentationsbycomparingtheco-occurrentfeaturesintwoimages.
Thereinforcedrepresen-tationsareusedforthedownstreamforegroundpredictionsintwoimages.
Inthemeantime,thecross-referencemod-ulealsomakespredictionsofco-occurrentobjectsinthetwoimages.
Thissub-taskprovidesanauxiliarylossinthetrainingphasetofacilitatethetrainingofthecross-referencemodule.
Asthereexistshugevarianceintheobjectappearance,miningforegroundregionsinimagescanbeamulti-stepprocess.
WedevelopaneffectiveMaskRenementModuletoiterativelyreneourpredictions.
Intheinitialprediction,thenetworkisexpectedtolocatehigh-condenceseedre-gions.
Then,thecondencemap,intheformofprobabilitymap,issavedasthecacheinthemoduleandisusedforlaterpredictions.
Weupdatethecacheeverytimewemakeanewprediction.
Afterrunningthemaskrenementmoduleforafewsteps,ourmodelcanbetterpredicttheforegroundre-gions.
Weempiricallydemonstratethatsuchalight-weightmodulecansignicantlyimprovetheperformance.
Whenitcomestothek-shotimagesegmentationwheremorethanonesupportimagesareprovided,previousmeth-odsoftenuse1-shotmodeltomakepredictionswitheachsupportimageindividuallyandfusetheirfeaturesorpre-dictedmasks.
Inourpaper,weproposetonetunepartsofournetworkwiththelabeledsupportexamples.
Asournetworkcanmakepredictionsforbothtwoimageinputsatatime,wecanuseatmostk2imagepairstonetuneournetwork.
Anadvantageofournetuningbasedmethodisthatitcanbenetfromtheincreasingnumberofsupportim-ages,andthusconsistentlyincreasestheaccuracy.
Incom-parison,thefusion-basedmethodscaneasilysaturatewhenmoresupportimagesareprovided.
Inourexperiment,wevalidateourmodelinthe1-shot,5-shot,and10-shotset-tings.
Themaincontributionsofthispaperarelistedasfollows:Weproposeanovelcross-referencenetworkthatcon-currentlymakespredictionsforboththequerysetandthesupportsetinthefew-shotimagesegmentationtask.
Byminingtheco-occurrentfeaturesintwoim-ages,ourproposednetworkcaneffectivelyimprovetheresults.
Wedevelopamaskrenementmodulewithcondencecachethatisabletorecurrentlyrenethepredictedre-sults.
Weproposeanetuningschemefork-shotlearning,whichturnsouttobeaneffectivesolutiontohandlemultiplesupportimages.
ExperimentsonthePASCALVOC2012demonstratethatourmethodsignicantlyoutperformsbaselinere-sultsandachievesnewstate-of-the-artperformanceonthe5-shotsegmentationtask.
2.
RelatedWork2.
1.
FewshotlearningFew-shotlearningaimstolearnamodelwhichcanbeeasilytransferredtonewtaskswithlimitedtrainingdataavailable.
Few-shotlearningiswidelyexploredinimageclassicationtasks.
Previousmethodscanberoughlydi-videdintotwocategoriesbasedonwhetherthemodelneedsnetuningatthetestingtime.
Innon-netunedmethods,parameterslearnedatthetrainingtimearekeptxedatthetestingstage.
Forexample,[19,22,21,24]aremetricbasedapproacheswhereanembeddingencoderandadistancemetricarelearnedtodeterminetheimagepairsimilarity.
Thesemethodshavetheadvantageoffastinferencewith-outfurtherparameteradaptions.
However,whenmultiplesupportimagesareavailable,theperformancecanbecomesaturateeasily.
Innetuningbasedmethods,themodelpa-rametersneedtobeadaptedtothenewtasksforpredictions.
Forexample,in[3],theydemonstratethatbyonlynetun-ingthefullyconnectedlayer,modelslearnedontrainingclassescanyieldstate-of-the-artfew-shotperformanceonnewclasses.
Inourwork,weuseanon-netunedfeed-forwardmodeltohandle1-shotlearningandadoptmodelnetuninginthek-shotsettingtobenetfrommultiplela-beledsupportimages.
Thetaskoffew-shotlearningisalsorelatedtoopensetproblem[20],wherethegoalisonlytodetectdatafromnovelclasses.
2.
2.
SegmentationSemanticsegmentationisafundamentalcomputervisiontaskwhichaimstoclassifyeachpixelintheimage.
State-of-the-artmethodsformulateimagesegmentationasadensepredictiontaskandadoptfullyconvolutionalnetworkstomakepredictions[2,11].
Usually,apre-trainedclassica-tionnetworkisusedasthenetworkbackbonebyremovingthefullyconnectedlayersattheend.
Tomakepixel-leveldensepredictions,encoder-decoderstructures[9,11]areoftenusedtoreconstructhigh-resolutionpredictionmaps.
Typicallyanencodergraduallydownsamplesthefeaturemaps,whichaimstoacquirelargeeld-of-viewandcaptureabstractfeaturerepresentations.
Then,thedecodergrad-uallyrecoversthene-grainedinformation.
Skipconnec-tionsareoftenusedtofusehigh-levelandlow-levelfea-4166turesforbetterpredictions.
Inournetwork,wealsofollowtheencoder-decoderdesignandopttotransfertheguidanceinformationinthelow-resolutionmapsandusedecoderstorecoverdetails.
2.
3.
Few-shotsegmentationFew-shotsegmentationisanaturalextensionoffew-shotclassicationtopixellevels.
SinceShabanetal.
[17]pro-posethistaskforthersttime,manydeeplearning-basedmethodsareproposed.
Mostpreviousworksformulatethefew-shotsegmentationasaguidedsegmentationtask.
Forexample,in[17],thesidebranchtakesthelabeledsupportimageastheinputandregressthenetworkparametersinthemainbranchtomakeforegroundpredictionsforthequeryimage.
In[26],theysharethesamespiritsandproposetofusetheembeddingsofthesupportbranchesintothequerybranchwithadensecomparisonmodule.
Dongetal.
[5]drawinspirationfromthesuccessofPrototypicalNetwork[19]infew-shotclassications,andproposeadenseprototypelearningwithEuclideandistanceasthemetricforsegmentationtasks.
Similarly,Zhangetal.
[27]proposeacosinesimilarityguidancenetworktoweightfea-turesfortheforegroundpredictionsinthequerybranch.
Therearesomepreviousworksusingrecurrentstructurestorenethesegmentationpredictions[6,26].
Allpreviousmethodsonlyusetheforegroundmaskinthequeryimageasthetrainingsupervision,whileinournetwork,thequerysetandthesupportsetguideeachotherandbothbranchesmakeforegroundpredictionsfortrainingsupervision.
2.
4.
Imageco-segmentationImageco-segmentationisawell-studiedtaskwhichaimstojointlysegmentthecommonobjectsinpairedimages.
Manyapproacheshavebeenproposedtosolvetheobjectco-segmentationproblem.
Rotheretetal.
[15]proposetominimizeanenergyfunctionofahistogrammatchingtermwithanMRFtoenforcesimilarforegroundstatistics.
Ru-binsteinetetal.
[16]capturethesparsityandvisualvari-abilityofthecommonobjectfrompairsofimageswithdensecorrespondences.
Joulinetal.
[7]solvethecommonobjectproblemwithanefcientconvexquadraticapprox-imationofenergywithdiscriminateclustering.
Sincetheprevalenceofdeepneuralnetworks,manydeeplearning-basedmethodshavebeenproposed.
In[12],themodelre-trievescommonobjectproposalswithaSiamesenetwork.
Chenetal.
[1]adoptchannelattentionstoweightfea-turesfortheco-segmentationtask.
Deeplearning-basedapproacheshavesignicantlyoutperformednon-learningbasedmethods.
3.
TaskDenitionFew-shotsegmentationaimstondtheforegroundpix-elsinthetestimagesgivenonlyafewpixel-levelannotatedimages.
Thetrainingandtestingofthemodelareconductedontwodatasetswithnooverlappedcategories.
Atboththetrainingandtestingstages,thelabeledexampleimagesarecalledthesupportset,whichservesasameta-trainingsetandtheunlabeledmeta-testingimageiscalledthequeryset.
Toguaranteeagoodgeneralizationperformanceattesttime,thetrainingandevaluationofthemodelareaccom-plishedbyepisodicallysamplingthesupportsetandthequeryset.
GivenanetworkRθparameterizedbyθ,ineachepisode,werstsampleatargetcategorycfromthedatasetC.
Basedonthesampledclass,wethensamplek+1labeledimages{(x1s,y1s),(x2s,y2s),.
.
.
(xks,yks),(xq,yq)}thatallcontainthesampledcategoryc.
Amongthem,therstklabeledimagesconstitutethesupportsetSandthelastoneisthequerysetQ.
Afterthat,wemakepredictionsonthequeryimagesbyinputtingthesupportsetandthequeryimageintothemodelyq=Rθ(S,xq).
Attrainingtime,welearnthemodelpa-rametersθbyoptimizingthecross-entropylossL(yq,yq),andrepeatsuchproceduresuntilconvergence.
4.
MethodInthissection,weintroducetheproposedcross-referencenetworkforsolvingfew-shotimagesegmenta-tion.
Inthebeginning,wedescribeournetworkinthe1-shotcase.
Afterthat,wedescribeournetuningschemeinthecaseofk-shotlearning.
Ournetworkincludesfourkeymodules:theSiameseencoder,thecross-referencemod-ule,theconditionmodule,andthemaskrenementmodule.
TheoverallarchitectureisshowninFig.
2.
4.
1.
MethodoverviewDifferentfrompreviousexistingfew-shotsegmentationmethods[26,17,5]unilaterallyguidethesegmentationofqueryimageswithsupportimages,ourproposedCRNeten-ablessupportandqueryimagesguidethesegmentationofeachother.
Wearguethattherelationshipbetweensupport-queryimagepairsisvitaltofew-shotsegmentationlearn-ing.
ExperimentsinTable2validatetheeffectivenessofournewarchitecturedesign.
AsshowninFigure2,ourmodellearnstoperformfew-shotsegmentationasfollows:foreveryquery-supportpair,weencodertheimagepairintodeepfeatureswiththeSiameseEncoder,thenapplythecross-referencemoduletomineoutco-occurrentobjectfeatures.
Tofullyutilizetheannotatedmask,theconditionalmodulewillincorporatethecategoryinformationofsupportsetannotationsforforegroundmaskpredictions,ourmaskrenemodulecachesthecondenceregionmapsrecur-rentlyfornalforegroundprediction.
Inthecaseofk-shotlearning,previousworks[27,26,17]onsimplyaveragetheresultsofdifferent1-shotpredictions,whileweadoptanoptimization-basedmethodthatnetunesthemodelto4167Figure2.
ThepipelineofourNetworkarchitecture.
OurNetworkmainlyconsistsofaSiameseencoder,across-referencemodule,acon-ditionmodule,andamaskrenementmodule.
Ournetworkadoptsasymmetricdesign.
TheSiameseencodermapsthequeryandsupportimagesintofeaturerepresentations.
Thecross-referencemoduleminestheco-occurrentfeaturesintwoimagestogeneratereinforcedrepresentations.
Theconditionmodulefusesthecategory-relevantfeaturevectorsintofeaturemapstoemphasizethetargetcategory.
Themaskrenementmodulesavesthecondencemapsofthelastpredictionintothecacheandrecurrentlyrenesthepredictedmasks.
makeuseofmoresupportdata.
Table4demonstratestheadvantagesofourmethodoverpreviousworks.
4.
2.
SiameseencoderTheSiameseencoderisapairofparameter-sharedcon-volutionalneuralnetworksthatencodethequeryimageandthesupportimagetofeaturemaps.
Unlikethemod-elsin[17,14],weuseasharedfeatureencodertoencodethesupportandthequeryimages.
Byembeddingtheim-agesintothesamespace,ourcross-referencemodulecanbettermineco-occurrentfeaturestolocatetheforegroundregions.
Toacquirerepresentativefeatureembeddings,weuseskipconnectionstoutilizemultiple-layerfeatures.
AsisobservedinCNNfeaturevisualizationliterature[26,23],featuresinlowerlayersoftenrelatetolowlevelcueandhigherlayersoftenrelatetosegmentcue,wecombinethelowerlevelfeaturesandhigherlevelfeaturesandpassingtofollowedmodules.
4.
3.
Cross-ReferenceModuleThecross-referencemoduleisdesignedtomineco-occurrentfeaturesintwoimagesandgenerateupdatedrep-resentations.
ThedesignofthemoduleisshowninFig.
3.
GiventwoinputfeaturemapsgeneratedbytheSiameseen-coder,werstuseglobalaveragepoolingtoacquiretheglobalstatisticsinthetwoimages.
Then,thetwofeaturevectorsaresenttoapairoftwo-layerfullyconnected(FC)layers,respectively.
TheSigmoidactivationfunctionat-tachedaftertheFClayertransformsthevectorvaluesintotheimportanceofthechannel,whichisintherangeof[0,1].
Afterthat,thevectorsinthetwobranchesarefusedbyelement-wisemultiplication.
Intuitively,onlythecommonfeaturesinthetwobrancheswillhaveahighactivationinthefusedimportancevector.
Finally,weusethefusedvec-tortoweighttheinputfeaturemapstogeneratereinforcedfeaturerepresentations.
Incomparisontotherawfeatures,thereinforcedfeaturesfocusmoreontheco-occurrentrep-resentations.
Basedonthereinforcedfeaturerepresentations,weadda4168Figure3.
Thecross-referencemodule.
Giventheinputfea-turemapsfromthesupportandthequerysets(Fs,Fq),thecross-referencemodulegeneratesupdatedfeaturerepresentations(Gs,Gq)byinspectingtheco-occurrentfeatures.
headtodirectlypredicttheco-occurrentobjectsinthetwoimagesduringtrainingtime.
Thissub-taskaimstofacil-itatethelearningoftheco-segmentationmoduletominebetterfeaturerepresentationsforthedownstreamtasks.
Togeneratethepredictionsoftheco-occurrentobjectsintwoimages,thereinforcedfeaturemapsinthetwobranchesaresenttoadecodertogeneratethepredictedmaps.
ThedecoderiscomposedofconvolutionallayerfollowedbyaASPP[2]layers,nally,aconvolutionallayergeneratesatwo-channelpredictioncorrespondingtotheforegroundandbackgroundscores.
4.
4.
ConditionModuleTofullyutilizethesupportsetannotations,wedesignaconditionmoduletoefcientlyincorporatethecategoryinformationforforegroundmaskpredictions.
Thecon-ditionmoduletakesthereinforcedfeaturerepresentationsgeneratedbythecross-referencemoduleandacategory-relevantvectorasinputs.
Thecategory-relevantvectoristhefusedfeatureembeddingsofthetargetcategory,whichisachievedbyapplyingforegroundaveragepooling[26]overthecategoryregion.
Asthegoalofthefew-shotsegmen-tationistoonlyndtheforegroundmaskoftheassignedobjectcategory,thetask-relevantvectorservesasacondi-tiontosegmentthetargetcategory.
Toachieveacategory-relevantembedding,previousworksopttolteroutthebackgroundregionsintheinputimages[14,17]orinthefeaturerepresentations[26,27].
Wechoosetodosobothinthefeaturelevelandintheinputimage.
Thecategory-relevantvectorisfusedwiththereinforcedfeaturemapsintheconditionmodulebybilinearlyupsamplingthevectortothesamespatialsizeofthefeaturemapsandconcatenatingthem.
Finally,weaddaresidualconvolutiontoprocesstheconcatenatedfeatures.
Thestructureoftheconditionmod-ulecanbefoundinFig.
4.
Theconditionmodulesinthesupportbranchandthequerybranchhavethesamestruc-tureandsharealltheparameters.
4.
5.
MaskRenementModuleAsisoftenobservedintheweaklysupervisedseman-ticsegmentationliterature[26,8],directlypredictingtheFigure4.
Theconditionmodule.
Ourconditionmodulefusesthecategory-relevantfeaturesintorepresentationsforbetterpredic-tionsofthetargetcategory.
objectmaskscanbedifcult.
Itisacommonprincipletorstlylocateseedregionsandthenrenetheresults.
Basedonsuchprinciple,wedesignamaskrenementmoduletorenethepredictedmaskstep-by-step.
Ourmotivationisthattheprobabilitymapsinasinglefeed-forwardpredic-tioncanreectwhereisthecondentregioninthemodelprediction.
Basedonthecondentregionsandtheimagefeatures,wecangraduallyoptimizethemaskandndthewholeobjectregions.
AsshowninFig.
5,ourmaskrene-mentmodulehastwoinputs.
Oneisthesavedcondencemapinthecacheandthesecondinputistheconcatenationoftheoutputsfromtheconditionmoduleandthecross-referencemodule.
Fortheinitialprediction,thecacheisinitializedwithazeromask,andthemodulemakespredic-tionssolelybasedontheinputfeaturemaps.
Themodulecacheisupdatedwiththegeneratedprobabilitymapeverytimethemodulemakesanewprediction.
Werunthismod-ulemultipletimestogenerateanalrenedmask.
Themaskrenementmoduleincludesthreemainblocks:thedownsampleblock,theglobalconvolutionblock,andthecombineblock.
TheDownsampleBlockdownsamplesthefeaturemapsbyafactorof2.
Thedownsampledfea-turesarethenupsampledtotheoriginalsizeandfusedwithfeaturesintheoppositebranch.
Theglobalconvolutionblock[13]aimstocapturefeaturesinalargeeld-of-viewwhilecontainingfewparameters.
Itincludestwogroupsof1*7and7*1convolutionalkernels.
Thecombineblockeffectivelyfusesthefeaturebranchandthecachedbranchtogeneraterenedfeaturerepresentations.
4.
6.
FinetuningforK-ShotLearningInthecaseofk-shotlearning,weproposetonetuneournetworktotakeadvantageofmultiplelabeledsupportim-ages.
Asournetworkcanmakepredictionsfortwoimagesatatime,wecanuseatmostk2imagepairstonetuneournetwork.
Attheevaluationstage,werandomlysam-pleanimagepairfromthelabeledsupportsettonetuneourmodel.
WekeeptheparametersintheSiameseencoderxedandonlynetunetherestmodules.
Inourexperiment,wedemonstratethatournetuningbasedmethodscancon-sistentlyimprovetheresultwhenmorelabeledsupportim-agesareavailable,whilethefusion-basedmethodsinprevi-ousworksoftengetsaturatedperformancewhenthenum-berofsupportimagesincreases.
4169Figure5.
Themaskrenementmodule.
Themodulesavesthegeneratedprobabilitymapfromthelaststepintothecacheandrecurrentlyoptimizesthepredictions.
5.
Experiment5.
1.
ImplementationDetailsIntheSiameseencoder,weexploitmulti-levelfeaturesfromtheImageNetpre-trainedResnet-50astheimagerep-resentations.
Weusedilatedconvolutionsandkeepthefea-turemapsafterlayer3andlayer4haveaxedsizeof1/8oftheinputimageandconcatenatethemfornalpredic-tion.
Alltheconvolutionallayersinourproposedmoduleshavethekernelsizeof3*3andgeneratefeaturesof256channels,followedbytheReLUactivationfunction.
Attesttime,werecurrentlyrunthemaskrenementmodulefor5timestorenethepredictedmasks.
Inthecaseofk-shotlearning,wextheSiameseencoderandnetunetherestparameters.
5.
2.
DatasetandEvaluationMetricWeimplementcross-validationexperimentsonthePAS-CALVOC2012datasettovalidateournetworkdesign.
Tocompareourmodelwithpreviousworks,weadoptthesamecategorydivisionsandtestsettingswhicharerstproposedin[17].
Inthecross-validationexperiments,20objectcat-egoriesareevenlydividedinto4folds,withthreefoldsasthetrainingclassesandonefoldasthetestingclasses.
ThecategorydivisionisshowninTable1.
Wereporttheav-erageperformanceover4testingfolds.
Fortheevaluationmetrics,weusethestandardmeanIntersection-over-Union(mIoU)oftheclassesinthetestingfold.
Formorede-tailsaboutthedatasetinformationandtheevaluationmet-ric,pleasereferto[17].
6.
AblationstudyThegoaloftheablationstudyistoinspecteachcompo-nentinournetworkdesign.
OurablationexperimentsareconductedonthePASCALVOCdataset.
Weimplementfoldcategories0aeroplane,bicycle,bird,boat,bottle1bus,car,cat,chair,cow2diningtable,dog,horse,motobike,person3pottedplant,sheep,sofa,train,tv/monitorTable1.
TheclassdivisionofthePASCALVOC2012datasetpro-posedin[17].
ConditionCross-ReferenceModule1-shot36.
343.
349.
1Table2.
Ablationstudyontheconditionmoduleandthecross-referencemodule.
Thecross-referencemodulebringsalargeper-formanceimprovementoverthebaselinemodel(Conditiononly).
Multi-LevelMaskReneMulti-Scale1-shot49.
150.
353.
455.
2Table3.
Ablationexperimentsonthemultiple-levelfeature,multiple-scaleinput,andtheMaskRenemodule.
Everymod-ulebringsperformanceimprovementoverthebaselinemodel.
cross-validation1-shotexperimentsandreporttheaverageperformanceoverthefoursplits.
InTable2,werstinvestigatethecontributionsofourtwoimportantnetworkcomponents:theconditionmod-uleandthecross-referencemodule.
Asshown,therearesignicantperformancedropsifweremoveeithercompo-nentfromthenetwork.
Particularly,ourproposedcross-referencemodulehasahugeimpactonthepredictions.
Our4170Figure6.
OurQualitativeexamplesonthePASCALVOCdataset.
Therstrowisthesupportsetandthesecondrowisthequeryset.
Thethirdrowisourpredictedresultsandthethefourthrowisthegroundtruth.
Evenwhenthequeryimagescontainobjectsfrommultipleclasses,ournetworkcanstillsuccessfullysegmentthetargetcategoryindicatedbythesupportmask.
Method1-shot5-shot10-shotFusion49.
150.
249.
9FinetuneN/A57.
559.
1Finetune+FusionN/A57.
658.
8Table4.
k-shotexperiments.
Wecompareournetuningbasedmethodwiththefusionmethod.
Ourmethodyieldsconsistentper-formanceimprovementwhenthenumberofsupportimagesin-creases.
Forthecaseof1-shot,netuneresultsarenotavailableasCRNetneedsatleasttwoimagestoapplyournetunescheme.
networkcanimprovethecounterpartmodelwithoutcross-referencemodulebymorethan10%.
Toinvestigatehowmuchthescalevarianceoftheob-jectsinuencethenetworkperformance,weadoptamulti-scaletestexperimentinournetwork.
Specically,atthetesttime,weresizethesupportimageandthequeryimageto[0.
75,1.
25]oftheoriginalimagesizeandconducttheinfer-MethodBackbonemIoUIoUOSLM[17]VGG1640.
861.
3co-fcn[14]VGG1641.
160.
9sg-one[27]VGG1646.
363.
1R-DRCN[18]VGG1640.
160.
9PL[5]VGG16-61.
2A-MCG[6]ResNet-50-61.
2CANet[26]ResNet-5055.
466.
2PGNet[25]ResNet-5056.
069.
9CRNetVGG1655.
266.
4CRNetResNet-5055.
766.
8Table5.
Comparisonwiththestate-of-the-artmethodsunderthe1-shotsetting.
Ourproposednetworkachievesstate-of-the-artper-formanceunderbothevaluationmetrics.
ence.
Theoutputpredictedmaskoftheresizedqueryimageisbilinearlyresizedtotheoriginalimagesize.
Wefusethepredictionsunderdifferentimagescales.
AsshowninTa-4171MethodBackbonemIoUIoUOSLM[17]VGG1643.
961.
5co-fcn[14]VGG1641.
460.
2sg-one[27]VGG1647.
165.
9R-DFCN[18]VGG1645.
366.
0PL[5]VGG16-62.
3A-MCG[6]ResNet-50-62.
2CANet[26]ResNet-5057.
169.
6PGNet[25]ResNet5058.
570.
5CRNetVGG1658.
571.
0CRNetResNet5058.
871.
5Table6.
Comparisonwiththestate-of-the-artmethodsunderthe5-shotsetting.
Ourproposednetworkoutperformsallpreviousmethodsandachievesnewstate-of-the-artperformanceunderbothevaluationmetrics.
ble3,multi-scaleinputtestbrings1.
2mIoUscoreimprove-mentinthe1-shotsetting.
WealsoinvestigatethechoicesoffeaturesinthenetworkbackboneinTable3.
Wecom-parethemulti-levelfeatureembeddingswiththefeaturessolelyfromthelastlayer.
Ourmodelwithmulti-levelfea-turesprovidesanimprovementof1.
8mIoUscore.
Thisindicatesthattobetterlocatethecommonobjectsintwoimages,middle-levelfeaturesarealsoimportantandhelp-ful.
Tofurtherinspecttheeffectivenessofthemaskrene-mentmodule,wedesignabaselinemodelthatremovesthecachedbranch.
Inthiscase,themaskrenementblockmakespredictionssolelybasedontheinputfeaturesandweonlyrunthemaskrenementmoduleonce.
AsshowninTable3,ourmaskrenementmodulebrings3.
1mIoUscoreperformanceincreaseoverourbaselinemethod.
Inthek-shotsetting,wecompareournetuningbasedmethodwiththefusion-basedmethodswidelyusedinpre-viousworks.
Forthefusion-basedmethod,wemakeanin-ferencewitheachofthesupportimagesandaveragetheirprobabilitymapsasthenalprediction.
ThecomparisonisshowninTable4.
Inthe5-shotsetting,thenetuningbasedmethodoutperforms1-shotbaselineby8.
4mIoUscore,whichissignicantlysuperiortothefusion-basedmethod.
When10supportimagesareavailable,ournetuningbasedmethodshowsmoreadvantages.
Theperformancecon-tinuesincreasingwhilethefusion-basedmethod'sperfor-mancebeginstodrop.
6.
1.
MSCOCOCOCO2014[10]isachallenginglarge-scaledataset,whichcontains80objectcategories.
Following[26],wechoose40classesfortraining,20classesforvalidationand20classesfortest.
AsshowninTable.
7,theresultsagainvalidatethedesignsinournetwork.
ConditionCross-ReferenceModuleMask-Rene1-shot5-shot43.
344.
038.
542.
744.
945.
645.
847.
2Table7.
Ablationstudyontheconditionmodulecross-referencemoduleandMask-renemoduleondatasetMSCOCO.
6.
2.
ComparisonwiththeState-of-the-ArtResultsWecompareournetworkwithstate-of-the-artmethodsonthePASCALVOC2012dataset.
Table5showstheper-formanceofdifferentmethodsinthe1-shotsetting.
WeuseIoUtodenotetheevaluationmetricproposedin[14].
ThedifferencebetweenthetwometricsisthattheIoUmetricalsoincorporatesthebackgroundintotheIntersection-over-Unioncomputationandignorestheimagecategory.
5-ShotExperiments.
Thecomparisonof5-shotseg-mentationresultsundertwoevaluationmetricsisshowninTable6.
Ourmethodachievesnewstate-of-the-artperfor-manceunderbothevaluationmetrics.
7.
ConclusionInthispaper,wehavepresentedanovelcross-referencenetworkforfew-shotsegmentation.
Unlikepreviousworkunilaterallyguidingthesegmentationofqueryimageswithsupportimages,ourtwo-headdesignconcurrentlymakespredictionsinboththequeryimageandthesupportimagetohelpthenetworkbetterlocatethetargetcategory.
Wedevelopamaskrenementmodulewithacachemecha-nismwhichcaneffectivelyimprovethepredictionperfor-mance.
Inthek-shotsetting,ournetuningbasedmethodcantakeadvantageofmoreannotateddataandsignicantlyimprovestheperformance.
ExtensiveablationexperimentsonPASCALVOC2012datasetvalidatetheeffectivenessofourdesign.
Ourmodelachievesstate-of-the-artperfor-manceonthePASCALVOC2012dataset.
AcknowledgementsThisresearchissupportedbytheNationalResearchFoundationSingaporeunderitsAISingaporeProgramme(AwardNumber:AISG-RP-2018-003)andtheMOETier-1researchgrants:RG126/17(S)andRG22/19(S).
Thisre-searchisalsopartlysupportedbytheDelta-NTUCorporateLabwithfundingsupportfromDeltaElectronicsInc.
andtheNationalResearchFoundation(NRF)Singapore.
References[1]HongChen,YifeiHuang,andHidekiNakayama.
Semanticawareattentionbaseddeepobjectco-segmentation.
arXivpreprintarXiv:1810.
06859,2018.
2,3[2]Liang-ChiehChen,GeorgePapandreou,IasonasKokkinos,KevinMurphy,andAlanLYuille.
Deeplab:Semanticimage4172segmentationwithdeepconvolutionalnets,atrousconvolu-tion,andfullyconnectedcrfs.
IEEEtransactionsonpatternanalysisandmachineintelligence,40(4):834–848,2018.
2,5[3]Wei-YuChen,Yen-ChengLiu,ZsoltKira,Yu-ChiangWang,andJia-BinHuang.
Acloserlookatfew-shotclassication.
InInternationalConferenceonLearningRepresentations,2019.
2[4]JiaDeng,WeiDong,RichardSocher,Li-JiaLi,KaiLi,andLiFei-Fei.
Imagenet:Alarge-scalehierarchicalimagedatabase.
InCVPR,pages248–255,2009.
1[5]NanqingDongandEricXing.
Few-shotsemanticsegmenta-tionwithprototypelearning.
InBMVC,2018.
3,7,8[6]TaoHu,PengwanYang,ChiliangZhang,GangYu,YadongMu,andCeesGMSnoek.
Attention-basedmulti-contextguidingforfew-shotsemanticsegmentation.
2019.
3,7,8[7]ArmandJoulin,FrancisBach,andJeanPonce.
Multi-classcosegmentation.
In2012IEEEConferenceonComputerVi-sionandPatternRecognition,pages542–549.
IEEE,2012.
2,3[8]AlexanderKolesnikovandChristophHLampert.
Seed,ex-pandandconstrain:Threeprinciplesforweakly-supervisedimagesegmentation.
InEuropeanConferenceonComputerVision,pages695–711.
Springer,2016.
5[9]GuoshengLin,AntonMilan,ChunhuaShen,andIanDReid.
Renenet:Multi-pathrenementnetworksforhigh-resolutionsemanticsegmentation.
InCVPR,volume1,page5,2017.
2[10]Tsung-YiLin,MichaelMaire,SergeBelongie,JamesHays,PietroPerona,DevaRamanan,PiotrDollar,andCLawrenceZitnick.
Microsoftcoco:Commonobjectsincontext.
InECCV,pages740–755,2014.
8[11]JonathanLong,EvanShelhamer,andTrevorDarrell.
Fullyconvolutionalnetworksforsemanticsegmentation.
InPro-ceedingsoftheIEEEconferenceoncomputervisionandpat-ternrecognition,pages3431–3440,2015.
2[12]PreranaMukherjee,BrejeshLall,andSnehithLattupally.
Objectcosegmentationusingdeepsiamesenetwork.
arXivpreprintarXiv:1803.
02555,2018.
2,3[13]ChaoPeng,XiangyuZhang,GangYu,GuimingLuo,andJianSun.
Largekernelmatters–improvesemanticsegmen-tationbyglobalconvolutionalnetwork.
InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecog-nition,pages4353–4361,2017.
5[14]KateRakelly,EvanShelhamer,TrevorDarrell,AlyoshaEfros,andSergeyLevine.
Conditionalnetworksforfew-shotsemanticsegmentation.
InICLRWorkshop,2018.
4,5,7,8[15]CarstenRother,TomMinka,AndrewBlake,andVladimirKolmogorov.
Cosegmentationofimagepairsbyhistogrammatching-incorporatingaglobalconstraintintomrfs.
In2006IEEEComputerSocietyConferenceonComputerVi-sionandPatternRecognition(CVPR'06),volume1,pages993–1000.
IEEE,2006.
3[16]MichaelRubinstein,ArmandJoulin,JohannesKopf,andCeLiu.
Unsupervisedjointobjectdiscoveryandsegmentationininternetimages.
InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages1939–1946,2013.
3[17]AmirrezaShaban,ShrayBansal,ZhenLiu,IrfanEssa,andByronBoots.
One-shotlearningforsemanticsegmentation.
arXivpreprintarXiv:1709.
03410,2017.
3,4,5,6,7,8[18]MennatullahSiamandBorisOreshkin.
Adaptivemaskedweightimprintingforfew-shotsegmentation.
arXivpreprintarXiv:1902.
11123,2019.
7,8[19]JakeSnell,KevinSwersky,andRichardZemel.
Prototypicalnetworksforfew-shotlearning.
InNIPS,2017.
2,3[20]XinSun,ZhenningYang,ChiZhang,GuohaoPeng,andKeck-VoonLing.
Conditionalgaussiandistributionlearningforopensetrecognition,2020.
2[21]OriolVinyals,CharlesBlundell,TimothyLillicrap,DaanWierstra,etal.
Matchingnetworksforoneshotlearning.
InAdvancesinneuralinformationprocessingsystems,pages3630–3638,2016.
2[22]FloodSungYongxinYang,LiZhang,TaoXiang,PhilipHSTorr,andTimothyMHospedales.
Learningtocompare:Re-lationnetworkforfew-shotlearning.
InCVPR,2018.
2[23]JasonYosinski,JeffClune,AnhNguyen,ThomasFuchs,andHodLipson.
Understandingneuralnetworksthroughdeepvisualization.
arXivpreprintarXiv:1506.
06579,2015.
4[24]ChiZhang,YujunCai,GuoshengLin,andChunhuaShen.
Deepemd:Few-shotimageclassicationwithdifferentiableearthmover'sdistanceandstructuredclassiers,2020.
2[25]ChiZhang,GuoshengLin,FayaoLiu,JiushuangGuo,QingyaoWu,andRuiYao.
Pyramidgraphnetworkswithconnectionattentionsforregion-basedone-shotsemanticsegmentation.
InProceedingsoftheIEEEInternationalConferenceonComputerVision,pages9587–9595,2019.
7,8[26]ChiZhang,GuoshengLin,FayaoLiu,RuiYao,andChunhuaShen.
Canet:Class-agnosticsegmentationnetworkswithit-erativerenementandattentivefew-shotlearning.
InPro-ceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition,pages5217–5226,2019.
3,4,5,7,8[27]XiaolinZhang,YunchaoWei,YiYang,andThomasHuang.
Sg-one:Similarityguidancenetworkforone-shotsemanticsegmentation.
arXivpreprintarXiv:1810.
09091,2018.
3,5,7,84173

LightNode(7.71美元),免认证高质量香港CN2 GIA

LightNode是一家位于香港的VPS服务商.提供基于KVM虚拟化技术的VPS.在提供全球常见节点的同时,还具备东南亚地区、中国香港等边缘节点.满足开发者建站,游戏应用,外贸电商等应用场景的需求。新用户注册充值就送,最高可获得20美元的奖励金!成为LightNode的注册用户后,还可以获得属于自己的邀请链接。通过你的邀请链接带来的注册用户,你将直接获得该用户的消费的10%返佣,永久有效!平台目前...

易探云美国云服务器评测,主机低至33元/月,336元/年

美国服务器哪家平台好?美国服务器无需备案,即开即用,上线快。美国服务器多数带防御,且有时候项目运营的时候,防御能力是用户考虑的重点,特别是网站容易受到攻击的行业。现在有那么多美国一年服务器,哪家的美国云服务器好呢?美国服务器用哪家好?这里推荐易探云,有美国BGP、美国CN2、美国高防、美国GIA等云服务器,线路优化的不错。易探云刚好就是做香港及美国云服务器的主要商家之一,我们来看一下易探云美国云服...

CloudCone:KVM月付1.99美元起,洛杉矶机房,支持PayPal/支付宝

CloudCone的[2021 Flash Sale]活动仍在继续,针对独立服务器、VPS或者Hosted email,其中VPS主机基于KVM架构,最低每月1.99美元,支持7天退款到账户,可使用PayPal或者支付宝付款,先充值后下单的方式。这是一家成立于2017年的国外VPS主机商,提供独立服务器租用和VPS主机,其中VPS基于KVM架构,多个不同系列,也经常提供一些促销套餐,数据中心在洛杉...

avmask.net为你推荐
留学生认证留学生回国学历认证 需要带什么材料冯媛甑尸城女主角叫什么名字曹谷兰曹谷兰事件 有吧友知道吗百花百游百花蛇草的作用partnersonline我家Internet Explorer为什么开不起来www.147qqqcom求女人能满足我的…网页源代码网页源代码是什么,具体讲一下?蜘蛛机器人汤姆克鲁斯主演,有巴掌大小的蜘蛛机器人,很厉害的,科幻片吧,是什么电影干支论坛2018天干地支数值是多少?邯郸纠风网河北邯郸有几个县个名单非法集资
域名批量查询 国内vps 最便宜的vps 免费域名跳转 域名抢注工具 淘宝抢红包攻略 softbank官网 广州服务器 傲盾官网 免费私人服务器 独享主机 防cc攻击 注册阿里云邮箱 supercache 免备案cdn加速 腾讯云平台 服务器机柜 nic 在线tracert 瓦工工资 更多