JournalofMachineLearningResearch21(2020)1-5Submitted3/19;Revised11/19;Published03/20CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythonDiviyanKalainathandiviyan@fentech.
aiFenTech,TAU,LRI,INRIA,UniversiteParis-Sud20RueRaymondAron,75013Paris,FranceOlivierGoudetolivier.
goudet@univ-angers.
frLERIA,Universited'Angers,2boulevardLavoisier,49045Angers,FranceRitikDuttadutta.
ritik@iitgn.
ac.
inIITGandhinagar,Gandhinagar,Gujarat382355,IndiaEditor:AndreasMuellerAbstractThispaperpresentsanewopensourcePythonframeworkforcausaldiscoveryfromob-servationaldataanddomainbackgroundknowledge,aimedatcausalgraphandcausalmechanismmodeling.
TheCdtpackageimplementsanend-to-endapproach,recover-ingthedirectdependencies(theskeletonofthecausalgraph)andthecausalrelation-shipsbetweenvariables.
Itincludesalgorithmsfromthe'Bnlearn'(Scutari,2018)and'Pcalg'(Kalischetal.
,2018)packages,togetherwithalgorithmsforpairwisecausaldis-coverysuchasANM(Hoyeretal.
,2009).
CdtisavailableundertheMITLicenseathttps://github.
com/FenTechSolutions/CausalDiscoveryToolbox.
Keywords:CausalDiscovery,Graphrecovery,opensource,constraint-basedmethods,score-basedmethods,pairwisecausality,Markovblanket1.
IntroductionCausalmodelingiskeytounderstandphysicalorarticialphenomenaandtoguideinter-ventions.
MostsoftwaresforcausaldiscoveryhavebeendevelopedintheRprogramminglanguage(Kalischetal.
,2018;Scutari,2018),andafewcausaldiscoveryalgorithmsareavailableinPythone.
g.
RCC(Lopez-Pazetal.
,2015),CGNN(Goudetetal.
,2018)andSAM(Kalainathanetal.
,2019),whilePythonsupportsmanycurrentmachinelearningframeworkssuchasPyTorch(Paszkeetal.
,2017).
TheCausalDiscoveryToolbox(Cdt)isanopen-sourcePythonpackageconcernedwithobservationalcausaldiscovery,aimedatlearningboththecausalgraphandtheas-sociatedcausalmechanismsfromsamplesofthejointprobabilitydistributionofthedata.
Cdtincludesmanystate-of-the-artcausalmodelingalgorithms(someofwhichareimportedfromR),thatsupportsGPUhardwareaccelerationandautomatichardwaredetection.
AmaingoalofCdtistoprovidetheuserswithguidancetowardsend-to-endexperiments,.
ThisworkwasdoneduringDiviyanKalainathan'sPhDThesisatUniv.
Paris-Saclayc2020DiviyanKalainathan,OlivierGoudet,RitikDutta.
License:CC-BY4.
0,seehttps://creativecommons.
org/licenses/by/4.
0/.
Attributionrequirementsareprovidedathttp://jmlr.
org/papers/v21/19-187.
html.
Kalainathan,Goudet,Duttabyincludingscoringmetrics,andstandardbenchmarkdatasetssuchasthe"Sachs"dataset(Sachsetal.
,2005).
Comparedtoothercausaldiscoverypackages,Cdtuniespairwiseandscore-basedmulti-variateapproacheswithinasinglepackage,implementinganstep-by-steppipelineapproach(Fig.
1).
Figure1:TheCdtcausalmodelingpackage:GeneralpipelineCdtalsoprovidesanintuitiveapproachforincludingR-basedalgorithms,facilitatingthetaskofextendingthetoolkitwithadditionalRpackages.
Thepackagerevolvesaroundtheusageofnetworkx.
Graphclasses,mainlyforrecovering(un)directedgraphsfromob-servationaldata.
Cdtcurrentlyincludes17algorithmsforgraphskeletonidentication:7methodsbasedonindependencetests,and10methodsaimedatdirectlyrecoveringtheskeletongraph.
Itfurtherincludes20algorithmsaimedatcausaldirectedgraphprediction,including11graphicaland9pairwiseapproaches.
2.
OriginalcontributionsofthepackageThecausalpairwisesettingconsidersapairofvariablesandaimstodeterminethecausalrelationshipbetweenbothvariables.
Thissettingimplicitlyassumesthatbothvari-ablesarealreadyconditionedonothercovariates,orreadjustedwithapropensityscore(RosenbaumandRubin,1983),andthattheremaininglatentcovariateshavelittleornoinuenceandcanbeconsideredas"noise".
Thepairwisesettingisalsorelevanttocompleteapartiallydirectedgraphresultingfromothercausaldiscoverymethods.
Inthe2010s,thepairwisesettingwasinvestigatedbyHoyeretal.
(2009)amongothers,whoproposedtheAdditiveNoiseModel(ANM).
Lateron,Guyon(2013)onCause-Eectpair(CEP)prob-lems;CEPformulatesbivariatecausalidenticationasasupervisedmachinelearningtask,whereaclassieristrainedfromexamples(Ai,Bi,i),wherethevariablepair(Ai,Bi)isrepresentedbysamplesoftheirjointdistributionandlabeliindicatesthetypeofcausalrelationshipbetweenbothvariables(independent,Ai→Bi,Bi→Ai).
Cdtisonethefewpackagestoincludecausalpairwisediscoveryalgorithms.
Thesealgorithms,mostlyimple-mentedusingPythonorMatlabareoftenleftunmaintained.
Therefore,manyalgorithmsthatareknowntobequiteecient(suchasJarfo(Fonollosa,2019),rstandrstinthecause-eectpairschallenges,codedinPython2.
7)areoutdatedandrequireasubstantialamountofworktoxandupdate.
Cdtimplements9pairwisealgorithms,allcodedinPython,5ofthembeingnewimplementations(NCC,GNN,CDS,RECIandabaselinemethodbasedonregressionerror).
Thegraphsetting,extensivelystudiedintheliterature,issupportedbymanypack-ages.
Bayesianapproachesrelyeitheronconditionalindependencetestsnamedconstraint-basedmethods,suchasPCorFCI(Spirtesetal.
,2000;Strobletal.
,2017),oronscore-basedmethods,involvingndingthegraphthatmaximizesalikelihoodscorethrough2CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythongraphsearchheuristics,likeGES(Chickering,2002)orCAM(B¨uhlmannetal.
,2014).
OtherapproachesleveragetheGenerativeNetworksetting,suchasCGNNorSAM(Goudetetal.
,2018;Kalainathanetal.
,2019).
Graphsettingmethodsoutputeitheradirectedacyclicgraphorapartiallydirectedacyclicgraph.
MostapproachesinthegraphsettingareimportedfromRpackages,withtheexceptionofCGNNandSAM.
3.
ComparisonwithotherpackagesToourbestknowledge,CausalityandPy-CausalaretheonlyalternativestoCdtforcausaldiscoveryinPython.
However,theonlyoverlapwithCdtconcernsthePC-algorithm,commontoPy-CausalandCdt.
AkintoCdt,Py-CausalisawrapperpackagebutaroundtheTetradJavapackage.
Fig.
2comparestheruntimesofthetwoPCimplementationsonsyntheticgraphswithofvaryingsize,connectivity,andnumberofdatapoints,showingaconstantgapinwithrespecttothenumberofdatapointsandconnectivityofthegraph.
Thisgapisduetothecreationofthesubprocessandthedatatransfer,thatarenottakenintoaccountinthePyCausalexecutionruntime.
Thegapwithrespecttothenumberofnodesisduetodierentimplementationsandcomputationalcomplexity.
FurthereortwillbedevotedtoimposingtheeciencyofourPython-NumbaimplementationofPC.
Figure2:RuntimesofimplementationsofPConvariousgraphs4.
ImplementationandutilitiesRintegration.
Assaid,theCdtpackageintegrate10algorithmscodedinRand17codedinPython.
TheCdtpackageintegratesallofthem,usingWrapperfunctionsinPythontoenabletheusertolaunchanyRscriptandtocontrolitsarguments;theRscriptsareexecutedinatemporaryfolderwithasubprocesstoavoidthelimitationsofthePythonGIL.
TheresultsareretrievedthroughoutputlesbackintothemainPythonprocess.
ThewholeprocedureismodularandallowscontributorstoeasilyaddnewRfunctionstothepackage.
Sustainabilityanddeployment.
Inorderforthepackagetobeeasilyextended,foster-ingtheintegrationoffurthercommunitycontributions,specialcareisgiventothequalityoftests.
Specically,aContinuousIntegrationtooladdedtothegitrepository,allowstosequentiallyexecutetestsonnewcommitsandpullrequest:i)Testallfunctionalitiesofthenewversiononthepackageontoydatasets;ii)Builddockerimagesandpushthemtohub.
docker.
com;iii)Pushthenewversiononpypi;iv)Updatethedocumentation3Kalainathan,Goudet,Duttawebsite.
Thisprocedurealsoallowstotesttheproperfunctioningofthepackagewithitsdependencies.
5.
ConclusionandfuturedevelopmentsTheCausalDiscoveryToolbox(Cdt)packageallowsPythonuserstoapplymanycausaldiscoveryorgraphmodelingalgorithmsonobservationaldata.
Itisalreadyusedinresearchprojects,suchas(Yaleetal.
,2018;Kalainathanetal.
,2019).
Astheoutputgraphsarenetworkx.
Graphclasses,theseareeasilyexportableintovariousformatsforvisualizationsoftwares,usinge.
g.
GraphvizorGephi.
Atthepackageimport,testsarerealizedtopinpointthecongurationoftheuser:availabilityofGPUsandRpackagesandnumberofCPUsonthehostmachine.
Thepackagepromotesanend-to-end,step-by-stepapproach:theundirectedgraph(bi-variatedependencies)isrstidentied,beforeapplyingcausaldiscoveryalgorithms;thelatterareconstrainedfromtheundirectedgraph,withsignicantcomputationalgains.
Futureextensionsofthepackageinclude:i)reimplementingtheRalgorithmsinPython-NumbaandreimplementthePytorchalgorithmsinChainertodropallheavydependenciesandtointegrateCdtinthePythoncommunitywithaNumpy-API;ii)developingGPU-compliantimplementationofnewalgorithms;iii)handlinginterventionaldataandtime-seriesdata(e.
g.
forneuroimagingandweatherforecast).
Inthelongerterm,ourpriorityistoprovidetheuserwithteststowhetherthestandardassumptions(e.
g.
causalsuciencyassumption)holdandassesstheriskofapplyingmethodsoutoftheirintendedscope.
ReferencesPeterB¨uhlmann,JonasPeters,JanErnest,etal.
CAM:Causaladditivemodels,high-dimensionalordersearchandpenalizedregression.
TheAnnalsofStatistics,2014.
DavidMaxwellChickering.
Optimalstructureidenticationwithgreedysearch.
Journalofmachinelearningresearch,3(Nov):507–554,2002.
JoseA.
R.
Fonollosa.
Conditionaldistributionvariabilitymeasuresforcausalitydetection.
CauseEectPairsinMachineLearning,2019.
OlivierGoudet,DiviyanKalainathan,PhilippeCaillou,IsabelleGuyon,DavidLopez-Paz,andMicheleSebag.
Learningfunctionalcausalmodelswithgenerativeneuralnetworks.
ExplainableandInterpretableModelsinComputerVisionandMachineLearning,2018.
IsabelleGuyon.
Chalearncauseeectpairschallenge,2013.
URLhttp://www.
causality.
inf.
ethz.
ch/cause-effect.
php.
PatrikO.
Hoyer,DominikJanzing,JorisM.
Mooij,JonasPeters,andBernhardSch¨olkopf.
Nonlinearcausaldiscoverywithadditivenoisemodels.
InNeuralInformationProcessingSystems(NIPS),pages689–696,2009.
DiviyanKalainathan,OlivierGoudet,IsabelleGuyon,DavidLopez-Paz,andMich`eleSebag.
Structuralagnosticmodeling:Adversariallearningofcausalgraphs.
ArXiv,2019.
4CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythonMarkusKalisch,AlainHauser,etal.
Package'pcalg'.
2018.
URLhttps://cran.
r-project.
org/web/packages/pcalg/index.
html.
DavidLopez-Paz,KrikamolMuandet,BernhardSch¨olkopf,andIlyaOTolstikhin.
Towardsalearningtheoryofcause-eectinference.
InICML,pages1452–1461,2015.
AdamPaszke,SamGross,SoumithChintala,etal.
AutomaticdierentiationinPyTorch.
2017.
URLhttps://pytorch.
org/.
PaulRRosenbaumandDonaldBRubin.
Thecentralroleofthepropensityscoreinobservationalstudiesforcausaleects.
Biometrika,70(1):41–55,1983.
KarenSachs,OmarPerez,DanaPe'er,DouglasALauenburger,andGarryPNolan.
Causalprotein-signalingnetworksderivedfrommultiparametersingle-celldata.
Science,308(5721):523–529,2005.
MarcoScutari.
Package'bnlearn',2018.
URLhttp://www.
bnlearn.
com/.
PeterSpirtes,ClarkNGlymour,andRichardScheines.
Causation,prediction,andsearch.
MITpress,2000.
EricVStrobl,KunZhang,andShyamVisweswaran.
Approximatekernel-basedconditionalindependencetestsforfastnon-parametriccausaldiscovery.
2017.
AndrewYale,SaloniDash,RitikDutta,IsabelleGuyon,AdrienPavao,andKristinBennett.
Privacypreservingsynthetichealthdata.
ESANN,2018.
5
触摸云触摸云(cmzi.com),国人商家,有IDC/ISP正规资质,主营香港线路VPS、物理机等产品。本次为大家带上的是美国高防2区的套餐。去程普通线路,回程cn2 gia,均衡防御速度与防御,防御值为200G,无视UDP攻击,可选择性是否开启CC防御策略,超过峰值黑洞1-2小时。最低套餐20M起,多数套餐为50M,适合有防御型建站需求使用。美国高防2区 弹性云[大宽带]· 配置:1-16核· ...
Virtono是一家成立于2014年的国外VPS主机商,提供VPS和服务器租用等产品,商家支持PayPal、信用卡、支付宝等国内外付款方式,可选数据中心共7个:罗马尼亚2个,美国3个(圣何塞、达拉斯、迈阿密),英国和德国各1个。目前,商家针对美国圣何塞机房VPS提供75折优惠码,同时,下单后在LET回复订单号还能获得双倍内存的升级。下面以圣何塞为例,分享几款VPS主机配置信息。Cloud VPSC...
7月份已经过去了一半,炎热的夏季已经来临了,主机圈也开始了大量的夏季促销攻势,近期收到一些商家投稿信息,提供欧美或者亚洲地区主机产品,价格优惠,这里做一个汇总,方便大家参考,排名不分先后,以邮件顺序,少部分因为促销具有一定的时效性,价格已经恢复故暂未列出。HostMem部落曾经分享过一次Hostmem的信息,这是一家提供动态云和经典云的国人VPS商家,其中动态云硬件按小时计费,流量按需使用;而经典...
graphsearch为你推荐
FDCphp支持ipad支持ipad支持ipadwin7关闭445端口如何快速关闭445端口win10关闭445端口win10家庭版怎么禁用445端口itunes备份如何用iTunes备份iPhoneiphonewifi苹果手机怎样设置Wi-Fi静态IP?iphonewifi为什么我的苹果手机连不上wifiwin7关闭135端口windows 7如何关闭139端口
directspace 国外idc koss 免费ftp空间 debian7 绍兴高防 双拼域名 谁的qq空间最好看 asp免费空间申请 ntfs格式分区 可外链网盘 免费高速空间 爱奇艺vip免费领取 台湾谷歌 智能dns解析 114dns 国外网页代理 睿云 htaccess windowsserver2012 更多