JournalofMachineLearningResearch21(2020)1-5Submitted3/19;Revised11/19;Published03/20CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythonDiviyanKalainathandiviyan@fentech.
aiFenTech,TAU,LRI,INRIA,UniversiteParis-Sud20RueRaymondAron,75013Paris,FranceOlivierGoudetolivier.
goudet@univ-angers.
frLERIA,Universited'Angers,2boulevardLavoisier,49045Angers,FranceRitikDuttadutta.
ritik@iitgn.
ac.
inIITGandhinagar,Gandhinagar,Gujarat382355,IndiaEditor:AndreasMuellerAbstractThispaperpresentsanewopensourcePythonframeworkforcausaldiscoveryfromob-servationaldataanddomainbackgroundknowledge,aimedatcausalgraphandcausalmechanismmodeling.
TheCdtpackageimplementsanend-to-endapproach,recover-ingthedirectdependencies(theskeletonofthecausalgraph)andthecausalrelation-shipsbetweenvariables.
Itincludesalgorithmsfromthe'Bnlearn'(Scutari,2018)and'Pcalg'(Kalischetal.
,2018)packages,togetherwithalgorithmsforpairwisecausaldis-coverysuchasANM(Hoyeretal.
,2009).
CdtisavailableundertheMITLicenseathttps://github.
com/FenTechSolutions/CausalDiscoveryToolbox.
Keywords:CausalDiscovery,Graphrecovery,opensource,constraint-basedmethods,score-basedmethods,pairwisecausality,Markovblanket1.
IntroductionCausalmodelingiskeytounderstandphysicalorarticialphenomenaandtoguideinter-ventions.
MostsoftwaresforcausaldiscoveryhavebeendevelopedintheRprogramminglanguage(Kalischetal.
,2018;Scutari,2018),andafewcausaldiscoveryalgorithmsareavailableinPythone.
g.
RCC(Lopez-Pazetal.
,2015),CGNN(Goudetetal.
,2018)andSAM(Kalainathanetal.
,2019),whilePythonsupportsmanycurrentmachinelearningframeworkssuchasPyTorch(Paszkeetal.
,2017).
TheCausalDiscoveryToolbox(Cdt)isanopen-sourcePythonpackageconcernedwithobservationalcausaldiscovery,aimedatlearningboththecausalgraphandtheas-sociatedcausalmechanismsfromsamplesofthejointprobabilitydistributionofthedata.
Cdtincludesmanystate-of-the-artcausalmodelingalgorithms(someofwhichareimportedfromR),thatsupportsGPUhardwareaccelerationandautomatichardwaredetection.
AmaingoalofCdtistoprovidetheuserswithguidancetowardsend-to-endexperiments,.
ThisworkwasdoneduringDiviyanKalainathan'sPhDThesisatUniv.
Paris-Saclayc2020DiviyanKalainathan,OlivierGoudet,RitikDutta.
License:CC-BY4.
0,seehttps://creativecommons.
org/licenses/by/4.
0/.
Attributionrequirementsareprovidedathttp://jmlr.
org/papers/v21/19-187.
html.
Kalainathan,Goudet,Duttabyincludingscoringmetrics,andstandardbenchmarkdatasetssuchasthe"Sachs"dataset(Sachsetal.
,2005).
Comparedtoothercausaldiscoverypackages,Cdtuniespairwiseandscore-basedmulti-variateapproacheswithinasinglepackage,implementinganstep-by-steppipelineapproach(Fig.
1).
Figure1:TheCdtcausalmodelingpackage:GeneralpipelineCdtalsoprovidesanintuitiveapproachforincludingR-basedalgorithms,facilitatingthetaskofextendingthetoolkitwithadditionalRpackages.
Thepackagerevolvesaroundtheusageofnetworkx.
Graphclasses,mainlyforrecovering(un)directedgraphsfromob-servationaldata.
Cdtcurrentlyincludes17algorithmsforgraphskeletonidentication:7methodsbasedonindependencetests,and10methodsaimedatdirectlyrecoveringtheskeletongraph.
Itfurtherincludes20algorithmsaimedatcausaldirectedgraphprediction,including11graphicaland9pairwiseapproaches.
2.
OriginalcontributionsofthepackageThecausalpairwisesettingconsidersapairofvariablesandaimstodeterminethecausalrelationshipbetweenbothvariables.
Thissettingimplicitlyassumesthatbothvari-ablesarealreadyconditionedonothercovariates,orreadjustedwithapropensityscore(RosenbaumandRubin,1983),andthattheremaininglatentcovariateshavelittleornoinuenceandcanbeconsideredas"noise".
Thepairwisesettingisalsorelevanttocompleteapartiallydirectedgraphresultingfromothercausaldiscoverymethods.
Inthe2010s,thepairwisesettingwasinvestigatedbyHoyeretal.
(2009)amongothers,whoproposedtheAdditiveNoiseModel(ANM).
Lateron,Guyon(2013)onCause-Eectpair(CEP)prob-lems;CEPformulatesbivariatecausalidenticationasasupervisedmachinelearningtask,whereaclassieristrainedfromexamples(Ai,Bi,i),wherethevariablepair(Ai,Bi)isrepresentedbysamplesoftheirjointdistributionandlabeliindicatesthetypeofcausalrelationshipbetweenbothvariables(independent,Ai→Bi,Bi→Ai).
Cdtisonethefewpackagestoincludecausalpairwisediscoveryalgorithms.
Thesealgorithms,mostlyimple-mentedusingPythonorMatlabareoftenleftunmaintained.
Therefore,manyalgorithmsthatareknowntobequiteecient(suchasJarfo(Fonollosa,2019),rstandrstinthecause-eectpairschallenges,codedinPython2.
7)areoutdatedandrequireasubstantialamountofworktoxandupdate.
Cdtimplements9pairwisealgorithms,allcodedinPython,5ofthembeingnewimplementations(NCC,GNN,CDS,RECIandabaselinemethodbasedonregressionerror).
Thegraphsetting,extensivelystudiedintheliterature,issupportedbymanypack-ages.
Bayesianapproachesrelyeitheronconditionalindependencetestsnamedconstraint-basedmethods,suchasPCorFCI(Spirtesetal.
,2000;Strobletal.
,2017),oronscore-basedmethods,involvingndingthegraphthatmaximizesalikelihoodscorethrough2CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythongraphsearchheuristics,likeGES(Chickering,2002)orCAM(B¨uhlmannetal.
,2014).
OtherapproachesleveragetheGenerativeNetworksetting,suchasCGNNorSAM(Goudetetal.
,2018;Kalainathanetal.
,2019).
Graphsettingmethodsoutputeitheradirectedacyclicgraphorapartiallydirectedacyclicgraph.
MostapproachesinthegraphsettingareimportedfromRpackages,withtheexceptionofCGNNandSAM.
3.
ComparisonwithotherpackagesToourbestknowledge,CausalityandPy-CausalaretheonlyalternativestoCdtforcausaldiscoveryinPython.
However,theonlyoverlapwithCdtconcernsthePC-algorithm,commontoPy-CausalandCdt.
AkintoCdt,Py-CausalisawrapperpackagebutaroundtheTetradJavapackage.
Fig.
2comparestheruntimesofthetwoPCimplementationsonsyntheticgraphswithofvaryingsize,connectivity,andnumberofdatapoints,showingaconstantgapinwithrespecttothenumberofdatapointsandconnectivityofthegraph.
Thisgapisduetothecreationofthesubprocessandthedatatransfer,thatarenottakenintoaccountinthePyCausalexecutionruntime.
Thegapwithrespecttothenumberofnodesisduetodierentimplementationsandcomputationalcomplexity.
FurthereortwillbedevotedtoimposingtheeciencyofourPython-NumbaimplementationofPC.
Figure2:RuntimesofimplementationsofPConvariousgraphs4.
ImplementationandutilitiesRintegration.
Assaid,theCdtpackageintegrate10algorithmscodedinRand17codedinPython.
TheCdtpackageintegratesallofthem,usingWrapperfunctionsinPythontoenabletheusertolaunchanyRscriptandtocontrolitsarguments;theRscriptsareexecutedinatemporaryfolderwithasubprocesstoavoidthelimitationsofthePythonGIL.
TheresultsareretrievedthroughoutputlesbackintothemainPythonprocess.
ThewholeprocedureismodularandallowscontributorstoeasilyaddnewRfunctionstothepackage.
Sustainabilityanddeployment.
Inorderforthepackagetobeeasilyextended,foster-ingtheintegrationoffurthercommunitycontributions,specialcareisgiventothequalityoftests.
Specically,aContinuousIntegrationtooladdedtothegitrepository,allowstosequentiallyexecutetestsonnewcommitsandpullrequest:i)Testallfunctionalitiesofthenewversiononthepackageontoydatasets;ii)Builddockerimagesandpushthemtohub.
docker.
com;iii)Pushthenewversiononpypi;iv)Updatethedocumentation3Kalainathan,Goudet,Duttawebsite.
Thisprocedurealsoallowstotesttheproperfunctioningofthepackagewithitsdependencies.
5.
ConclusionandfuturedevelopmentsTheCausalDiscoveryToolbox(Cdt)packageallowsPythonuserstoapplymanycausaldiscoveryorgraphmodelingalgorithmsonobservationaldata.
Itisalreadyusedinresearchprojects,suchas(Yaleetal.
,2018;Kalainathanetal.
,2019).
Astheoutputgraphsarenetworkx.
Graphclasses,theseareeasilyexportableintovariousformatsforvisualizationsoftwares,usinge.
g.
GraphvizorGephi.
Atthepackageimport,testsarerealizedtopinpointthecongurationoftheuser:availabilityofGPUsandRpackagesandnumberofCPUsonthehostmachine.
Thepackagepromotesanend-to-end,step-by-stepapproach:theundirectedgraph(bi-variatedependencies)isrstidentied,beforeapplyingcausaldiscoveryalgorithms;thelatterareconstrainedfromtheundirectedgraph,withsignicantcomputationalgains.
Futureextensionsofthepackageinclude:i)reimplementingtheRalgorithmsinPython-NumbaandreimplementthePytorchalgorithmsinChainertodropallheavydependenciesandtointegrateCdtinthePythoncommunitywithaNumpy-API;ii)developingGPU-compliantimplementationofnewalgorithms;iii)handlinginterventionaldataandtime-seriesdata(e.
g.
forneuroimagingandweatherforecast).
Inthelongerterm,ourpriorityistoprovidetheuserwithteststowhetherthestandardassumptions(e.
g.
causalsuciencyassumption)holdandassesstheriskofapplyingmethodsoutoftheirintendedscope.
ReferencesPeterB¨uhlmann,JonasPeters,JanErnest,etal.
CAM:Causaladditivemodels,high-dimensionalordersearchandpenalizedregression.
TheAnnalsofStatistics,2014.
DavidMaxwellChickering.
Optimalstructureidenticationwithgreedysearch.
Journalofmachinelearningresearch,3(Nov):507–554,2002.
JoseA.
R.
Fonollosa.
Conditionaldistributionvariabilitymeasuresforcausalitydetection.
CauseEectPairsinMachineLearning,2019.
OlivierGoudet,DiviyanKalainathan,PhilippeCaillou,IsabelleGuyon,DavidLopez-Paz,andMicheleSebag.
Learningfunctionalcausalmodelswithgenerativeneuralnetworks.
ExplainableandInterpretableModelsinComputerVisionandMachineLearning,2018.
IsabelleGuyon.
Chalearncauseeectpairschallenge,2013.
URLhttp://www.
causality.
inf.
ethz.
ch/cause-effect.
php.
PatrikO.
Hoyer,DominikJanzing,JorisM.
Mooij,JonasPeters,andBernhardSch¨olkopf.
Nonlinearcausaldiscoverywithadditivenoisemodels.
InNeuralInformationProcessingSystems(NIPS),pages689–696,2009.
DiviyanKalainathan,OlivierGoudet,IsabelleGuyon,DavidLopez-Paz,andMich`eleSebag.
Structuralagnosticmodeling:Adversariallearningofcausalgraphs.
ArXiv,2019.
4CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythonMarkusKalisch,AlainHauser,etal.
Package'pcalg'.
2018.
URLhttps://cran.
r-project.
org/web/packages/pcalg/index.
html.
DavidLopez-Paz,KrikamolMuandet,BernhardSch¨olkopf,andIlyaOTolstikhin.
Towardsalearningtheoryofcause-eectinference.
InICML,pages1452–1461,2015.
AdamPaszke,SamGross,SoumithChintala,etal.
AutomaticdierentiationinPyTorch.
2017.
URLhttps://pytorch.
org/.
PaulRRosenbaumandDonaldBRubin.
Thecentralroleofthepropensityscoreinobservationalstudiesforcausaleects.
Biometrika,70(1):41–55,1983.
KarenSachs,OmarPerez,DanaPe'er,DouglasALauenburger,andGarryPNolan.
Causalprotein-signalingnetworksderivedfrommultiparametersingle-celldata.
Science,308(5721):523–529,2005.
MarcoScutari.
Package'bnlearn',2018.
URLhttp://www.
bnlearn.
com/.
PeterSpirtes,ClarkNGlymour,andRichardScheines.
Causation,prediction,andsearch.
MITpress,2000.
EricVStrobl,KunZhang,andShyamVisweswaran.
Approximatekernel-basedconditionalindependencetestsforfastnon-parametriccausaldiscovery.
2017.
AndrewYale,SaloniDash,RitikDutta,IsabelleGuyon,AdrienPavao,andKristinBennett.
Privacypreservingsynthetichealthdata.
ESANN,2018.
5
GigsGigsCloud是一家成立于2015年老牌国外主机商,提供VPS主机和独立服务器租用,数据中心包括美国洛杉矶、中国香港、新加坡、马来西亚和日本等。商家VPS主机基于KVM架构,绝大部分系列产品中国访问速度不错,比如洛杉矶机房有CN2 GIA、AS9929及高防线路等。目前Los Angeles - SimpleCloud with Premium China DDOS Protectio...
轻云互联成立于2018年的国人商家,广州轻云互联网络科技有限公司旗下品牌,主要从事VPS、虚拟主机等云计算产品业务,适合建站、新手上车的值得选择,香港三网直连(电信CN2GIA联通移动CN2直连);美国圣何塞(回程三网CN2GIA)线路,所有产品均采用KVM虚拟技术架构,高效售后保障,稳定多年,高性能可用,网络优质,为您的业务保驾护航。活动规则:用户购买任意全区域云服务器月付以上享受免费更换IP服...
瓜云互联怎么样?瓜云互联之前商家使用的面板为WHMCS,目前商家已经正式更换到了魔方云的面板,瓜云互联商家主要提供中国香港和美国洛杉矶机房的套餐,香港采用CN2线路直连大陆,洛杉矶为高防vps套餐,三网回程CN2 GIA,提供超高的DDOS防御,瓜云互联商家承诺打死退款,目前商家提供了一个全场9折和充值的促销,有需要的朋友可以看看。点击进入:瓜云互联官方网站瓜云互联促销优惠:9折优惠码:联系在线客...
graphsearch为你推荐
glucanotransferasechromecentrescss重要产品信息指南支持ipad支持ipadwin7关闭445端口如何快速关闭445端口photoshop技术什么是ps技术ipad如何上网IPAD4怎样上网?windows键是哪个Windows键是哪个键啊?iphone连不上wifi苹果手机无法连接wifi是什么原因
沈阳虚拟主机 西安服务器租用 域名查询软件 荣耀欧洲 openv virpus t楼 息壤主机 免费个人博客 100m免费空间 129邮箱 稳定免费空间 如何安装服务器系统 卡巴斯基免费试用版 无限流量 web服务器是什么 阿里云免费邮箱 摩尔庄园注册 apnic google搜索打不开 更多