contributionsgraph

graphsearch  时间:2021-02-11  阅读:()
JournalofMachineLearningResearch21(2020)1-5Submitted3/19;Revised11/19;Published03/20CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythonDiviyanKalainathandiviyan@fentech.
aiFenTech,TAU,LRI,INRIA,UniversiteParis-Sud20RueRaymondAron,75013Paris,FranceOlivierGoudetolivier.
goudet@univ-angers.
frLERIA,Universited'Angers,2boulevardLavoisier,49045Angers,FranceRitikDuttadutta.
ritik@iitgn.
ac.
inIITGandhinagar,Gandhinagar,Gujarat382355,IndiaEditor:AndreasMuellerAbstractThispaperpresentsanewopensourcePythonframeworkforcausaldiscoveryfromob-servationaldataanddomainbackgroundknowledge,aimedatcausalgraphandcausalmechanismmodeling.
TheCdtpackageimplementsanend-to-endapproach,recover-ingthedirectdependencies(theskeletonofthecausalgraph)andthecausalrelation-shipsbetweenvariables.
Itincludesalgorithmsfromthe'Bnlearn'(Scutari,2018)and'Pcalg'(Kalischetal.
,2018)packages,togetherwithalgorithmsforpairwisecausaldis-coverysuchasANM(Hoyeretal.
,2009).
CdtisavailableundertheMITLicenseathttps://github.
com/FenTechSolutions/CausalDiscoveryToolbox.
Keywords:CausalDiscovery,Graphrecovery,opensource,constraint-basedmethods,score-basedmethods,pairwisecausality,Markovblanket1.
IntroductionCausalmodelingiskeytounderstandphysicalorarticialphenomenaandtoguideinter-ventions.
MostsoftwaresforcausaldiscoveryhavebeendevelopedintheRprogramminglanguage(Kalischetal.
,2018;Scutari,2018),andafewcausaldiscoveryalgorithmsareavailableinPythone.
g.
RCC(Lopez-Pazetal.
,2015),CGNN(Goudetetal.
,2018)andSAM(Kalainathanetal.
,2019),whilePythonsupportsmanycurrentmachinelearningframeworkssuchasPyTorch(Paszkeetal.
,2017).
TheCausalDiscoveryToolbox(Cdt)isanopen-sourcePythonpackageconcernedwithobservationalcausaldiscovery,aimedatlearningboththecausalgraphandtheas-sociatedcausalmechanismsfromsamplesofthejointprobabilitydistributionofthedata.
Cdtincludesmanystate-of-the-artcausalmodelingalgorithms(someofwhichareimportedfromR),thatsupportsGPUhardwareaccelerationandautomatichardwaredetection.
AmaingoalofCdtistoprovidetheuserswithguidancetowardsend-to-endexperiments,.
ThisworkwasdoneduringDiviyanKalainathan'sPhDThesisatUniv.
Paris-Saclayc2020DiviyanKalainathan,OlivierGoudet,RitikDutta.
License:CC-BY4.
0,seehttps://creativecommons.
org/licenses/by/4.
0/.
Attributionrequirementsareprovidedathttp://jmlr.
org/papers/v21/19-187.
html.
Kalainathan,Goudet,Duttabyincludingscoringmetrics,andstandardbenchmarkdatasetssuchasthe"Sachs"dataset(Sachsetal.
,2005).
Comparedtoothercausaldiscoverypackages,Cdtuniespairwiseandscore-basedmulti-variateapproacheswithinasinglepackage,implementinganstep-by-steppipelineapproach(Fig.
1).
Figure1:TheCdtcausalmodelingpackage:GeneralpipelineCdtalsoprovidesanintuitiveapproachforincludingR-basedalgorithms,facilitatingthetaskofextendingthetoolkitwithadditionalRpackages.
Thepackagerevolvesaroundtheusageofnetworkx.
Graphclasses,mainlyforrecovering(un)directedgraphsfromob-servationaldata.
Cdtcurrentlyincludes17algorithmsforgraphskeletonidentication:7methodsbasedonindependencetests,and10methodsaimedatdirectlyrecoveringtheskeletongraph.
Itfurtherincludes20algorithmsaimedatcausaldirectedgraphprediction,including11graphicaland9pairwiseapproaches.
2.
OriginalcontributionsofthepackageThecausalpairwisesettingconsidersapairofvariablesandaimstodeterminethecausalrelationshipbetweenbothvariables.
Thissettingimplicitlyassumesthatbothvari-ablesarealreadyconditionedonothercovariates,orreadjustedwithapropensityscore(RosenbaumandRubin,1983),andthattheremaininglatentcovariateshavelittleornoinuenceandcanbeconsideredas"noise".
Thepairwisesettingisalsorelevanttocompleteapartiallydirectedgraphresultingfromothercausaldiscoverymethods.
Inthe2010s,thepairwisesettingwasinvestigatedbyHoyeretal.
(2009)amongothers,whoproposedtheAdditiveNoiseModel(ANM).
Lateron,Guyon(2013)onCause-Eectpair(CEP)prob-lems;CEPformulatesbivariatecausalidenticationasasupervisedmachinelearningtask,whereaclassieristrainedfromexamples(Ai,Bi,i),wherethevariablepair(Ai,Bi)isrepresentedbysamplesoftheirjointdistributionandlabeliindicatesthetypeofcausalrelationshipbetweenbothvariables(independent,Ai→Bi,Bi→Ai).
Cdtisonethefewpackagestoincludecausalpairwisediscoveryalgorithms.
Thesealgorithms,mostlyimple-mentedusingPythonorMatlabareoftenleftunmaintained.
Therefore,manyalgorithmsthatareknowntobequiteecient(suchasJarfo(Fonollosa,2019),rstandrstinthecause-eectpairschallenges,codedinPython2.
7)areoutdatedandrequireasubstantialamountofworktoxandupdate.
Cdtimplements9pairwisealgorithms,allcodedinPython,5ofthembeingnewimplementations(NCC,GNN,CDS,RECIandabaselinemethodbasedonregressionerror).
Thegraphsetting,extensivelystudiedintheliterature,issupportedbymanypack-ages.
Bayesianapproachesrelyeitheronconditionalindependencetestsnamedconstraint-basedmethods,suchasPCorFCI(Spirtesetal.
,2000;Strobletal.
,2017),oronscore-basedmethods,involvingndingthegraphthatmaximizesalikelihoodscorethrough2CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythongraphsearchheuristics,likeGES(Chickering,2002)orCAM(B¨uhlmannetal.
,2014).
OtherapproachesleveragetheGenerativeNetworksetting,suchasCGNNorSAM(Goudetetal.
,2018;Kalainathanetal.
,2019).
Graphsettingmethodsoutputeitheradirectedacyclicgraphorapartiallydirectedacyclicgraph.
MostapproachesinthegraphsettingareimportedfromRpackages,withtheexceptionofCGNNandSAM.
3.
ComparisonwithotherpackagesToourbestknowledge,CausalityandPy-CausalaretheonlyalternativestoCdtforcausaldiscoveryinPython.
However,theonlyoverlapwithCdtconcernsthePC-algorithm,commontoPy-CausalandCdt.
AkintoCdt,Py-CausalisawrapperpackagebutaroundtheTetradJavapackage.
Fig.
2comparestheruntimesofthetwoPCimplementationsonsyntheticgraphswithofvaryingsize,connectivity,andnumberofdatapoints,showingaconstantgapinwithrespecttothenumberofdatapointsandconnectivityofthegraph.
Thisgapisduetothecreationofthesubprocessandthedatatransfer,thatarenottakenintoaccountinthePyCausalexecutionruntime.
Thegapwithrespecttothenumberofnodesisduetodierentimplementationsandcomputationalcomplexity.
FurthereortwillbedevotedtoimposingtheeciencyofourPython-NumbaimplementationofPC.
Figure2:RuntimesofimplementationsofPConvariousgraphs4.
ImplementationandutilitiesRintegration.
Assaid,theCdtpackageintegrate10algorithmscodedinRand17codedinPython.
TheCdtpackageintegratesallofthem,usingWrapperfunctionsinPythontoenabletheusertolaunchanyRscriptandtocontrolitsarguments;theRscriptsareexecutedinatemporaryfolderwithasubprocesstoavoidthelimitationsofthePythonGIL.
TheresultsareretrievedthroughoutputlesbackintothemainPythonprocess.
ThewholeprocedureismodularandallowscontributorstoeasilyaddnewRfunctionstothepackage.
Sustainabilityanddeployment.
Inorderforthepackagetobeeasilyextended,foster-ingtheintegrationoffurthercommunitycontributions,specialcareisgiventothequalityoftests.
Specically,aContinuousIntegrationtooladdedtothegitrepository,allowstosequentiallyexecutetestsonnewcommitsandpullrequest:i)Testallfunctionalitiesofthenewversiononthepackageontoydatasets;ii)Builddockerimagesandpushthemtohub.
docker.
com;iii)Pushthenewversiononpypi;iv)Updatethedocumentation3Kalainathan,Goudet,Duttawebsite.
Thisprocedurealsoallowstotesttheproperfunctioningofthepackagewithitsdependencies.
5.
ConclusionandfuturedevelopmentsTheCausalDiscoveryToolbox(Cdt)packageallowsPythonuserstoapplymanycausaldiscoveryorgraphmodelingalgorithmsonobservationaldata.
Itisalreadyusedinresearchprojects,suchas(Yaleetal.
,2018;Kalainathanetal.
,2019).
Astheoutputgraphsarenetworkx.
Graphclasses,theseareeasilyexportableintovariousformatsforvisualizationsoftwares,usinge.
g.
GraphvizorGephi.
Atthepackageimport,testsarerealizedtopinpointthecongurationoftheuser:availabilityofGPUsandRpackagesandnumberofCPUsonthehostmachine.
Thepackagepromotesanend-to-end,step-by-stepapproach:theundirectedgraph(bi-variatedependencies)isrstidentied,beforeapplyingcausaldiscoveryalgorithms;thelatterareconstrainedfromtheundirectedgraph,withsignicantcomputationalgains.
Futureextensionsofthepackageinclude:i)reimplementingtheRalgorithmsinPython-NumbaandreimplementthePytorchalgorithmsinChainertodropallheavydependenciesandtointegrateCdtinthePythoncommunitywithaNumpy-API;ii)developingGPU-compliantimplementationofnewalgorithms;iii)handlinginterventionaldataandtime-seriesdata(e.
g.
forneuroimagingandweatherforecast).
Inthelongerterm,ourpriorityistoprovidetheuserwithteststowhetherthestandardassumptions(e.
g.
causalsuciencyassumption)holdandassesstheriskofapplyingmethodsoutoftheirintendedscope.
ReferencesPeterB¨uhlmann,JonasPeters,JanErnest,etal.
CAM:Causaladditivemodels,high-dimensionalordersearchandpenalizedregression.
TheAnnalsofStatistics,2014.
DavidMaxwellChickering.
Optimalstructureidenticationwithgreedysearch.
Journalofmachinelearningresearch,3(Nov):507–554,2002.
JoseA.
R.
Fonollosa.
Conditionaldistributionvariabilitymeasuresforcausalitydetection.
CauseEectPairsinMachineLearning,2019.
OlivierGoudet,DiviyanKalainathan,PhilippeCaillou,IsabelleGuyon,DavidLopez-Paz,andMicheleSebag.
Learningfunctionalcausalmodelswithgenerativeneuralnetworks.
ExplainableandInterpretableModelsinComputerVisionandMachineLearning,2018.
IsabelleGuyon.
Chalearncauseeectpairschallenge,2013.
URLhttp://www.
causality.
inf.
ethz.
ch/cause-effect.
php.
PatrikO.
Hoyer,DominikJanzing,JorisM.
Mooij,JonasPeters,andBernhardSch¨olkopf.
Nonlinearcausaldiscoverywithadditivenoisemodels.
InNeuralInformationProcessingSystems(NIPS),pages689–696,2009.
DiviyanKalainathan,OlivierGoudet,IsabelleGuyon,DavidLopez-Paz,andMich`eleSebag.
Structuralagnosticmodeling:Adversariallearningofcausalgraphs.
ArXiv,2019.
4CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythonMarkusKalisch,AlainHauser,etal.
Package'pcalg'.
2018.
URLhttps://cran.
r-project.
org/web/packages/pcalg/index.
html.
DavidLopez-Paz,KrikamolMuandet,BernhardSch¨olkopf,andIlyaOTolstikhin.
Towardsalearningtheoryofcause-eectinference.
InICML,pages1452–1461,2015.
AdamPaszke,SamGross,SoumithChintala,etal.
AutomaticdierentiationinPyTorch.
2017.
URLhttps://pytorch.
org/.
PaulRRosenbaumandDonaldBRubin.
Thecentralroleofthepropensityscoreinobservationalstudiesforcausaleects.
Biometrika,70(1):41–55,1983.
KarenSachs,OmarPerez,DanaPe'er,DouglasALauenburger,andGarryPNolan.
Causalprotein-signalingnetworksderivedfrommultiparametersingle-celldata.
Science,308(5721):523–529,2005.
MarcoScutari.
Package'bnlearn',2018.
URLhttp://www.
bnlearn.
com/.
PeterSpirtes,ClarkNGlymour,andRichardScheines.
Causation,prediction,andsearch.
MITpress,2000.
EricVStrobl,KunZhang,andShyamVisweswaran.
Approximatekernel-basedconditionalindependencetestsforfastnon-parametriccausaldiscovery.
2017.
AndrewYale,SaloniDash,RitikDutta,IsabelleGuyon,AdrienPavao,andKristinBennett.
Privacypreservingsynthetichealthdata.
ESANN,2018.
5

个人网站备案流程及注意事项(内容方向和适用主机商)

如今我们还有在做个人网站吗?随着自媒体和短视频的发展和兴起,包括我们很多WEB2.0产品的延续,当然也包括个人建站市场的低迷和用户关注的不同,有些个人已经不在做网站。但是,由于我们有些朋友出于网站的爱好或者说是有些项目还是基于PC端网站的,还是有网友抱有信心的,比如我们看到有一些老牌个人网站依旧在运行,且还有新网站的出现。今天在这篇文章中谈谈有网友问关于个人网站备案的问题。这个也是前几天有他在选择...

2021年全新Vultr VPS主机开通云服务器和选择机房教程(附IP不通问题)

昨天有分享到"2021年Vultr新用户福利注册账户赠送50美元"文章,居然还有网友曾经没有注册过他家的账户,薅过他们家的羊毛。通过一阵折腾居然能注册到账户,但是对于如何开通云服务器稍微有点不对劲,对于新人来说确实有点疑惑。因为Vultr采用的是预付费充值方式,会在每月的一号扣费,当然我们账户需要存留余额或者我们采用自动扣费支付模式。把笔记中以前的文章推送给网友查看,他居然告诉我界面不同,看的不对...

AlphaVPS(€3.99/月)VPS年付15欧,AMD EYPC+NVMe系列起

AlphaVPS是一家保加利亚本土主机商(DA International Group Ltd),提供VPS主机及独立服务器租用等,数据中心包括美国(洛杉矶/纽约)、德国、英国和保加利亚等,公司办公地点跟他们提供的保加利亚数据中心在一栋楼内,自有硬件,提供IPv4+IPv6,支持PayPal或者信用卡等方式付款。商家提供的大硬盘VPS主机,提供128GB-2TB磁盘,最低年付15欧元起,也可以选择...

graphsearch为你推荐
中证财通中国可持续发展100(ECPI桌面chromeformgraph支持ipad支持ipad国家标准苹果5netbios端口netbios ssn是什么意思?ipad上网新买的ipad怎么用。什么装程序 怎么上网x-routerx-0.4x等于多少?google中国地图求教谷歌中国地图~手机如何使用?
万网域名证书查询 idc评测 burstnet webhostingpad 美国仿牌空间 回程路由 彩虹ip java空间 亚洲小于500m 789电视 linux服务器维护 支持外链的相册 移动服务器托管 网通服务器 dnspod 学生服务器 测速电信 宿迁服务器 阿里云个人邮箱 tracker服务器 更多