contributionsgraph

graphsearch  时间:2021-02-11  阅读:()
JournalofMachineLearningResearch21(2020)1-5Submitted3/19;Revised11/19;Published03/20CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythonDiviyanKalainathandiviyan@fentech.
aiFenTech,TAU,LRI,INRIA,UniversiteParis-Sud20RueRaymondAron,75013Paris,FranceOlivierGoudetolivier.
goudet@univ-angers.
frLERIA,Universited'Angers,2boulevardLavoisier,49045Angers,FranceRitikDuttadutta.
ritik@iitgn.
ac.
inIITGandhinagar,Gandhinagar,Gujarat382355,IndiaEditor:AndreasMuellerAbstractThispaperpresentsanewopensourcePythonframeworkforcausaldiscoveryfromob-servationaldataanddomainbackgroundknowledge,aimedatcausalgraphandcausalmechanismmodeling.
TheCdtpackageimplementsanend-to-endapproach,recover-ingthedirectdependencies(theskeletonofthecausalgraph)andthecausalrelation-shipsbetweenvariables.
Itincludesalgorithmsfromthe'Bnlearn'(Scutari,2018)and'Pcalg'(Kalischetal.
,2018)packages,togetherwithalgorithmsforpairwisecausaldis-coverysuchasANM(Hoyeretal.
,2009).
CdtisavailableundertheMITLicenseathttps://github.
com/FenTechSolutions/CausalDiscoveryToolbox.
Keywords:CausalDiscovery,Graphrecovery,opensource,constraint-basedmethods,score-basedmethods,pairwisecausality,Markovblanket1.
IntroductionCausalmodelingiskeytounderstandphysicalorarticialphenomenaandtoguideinter-ventions.
MostsoftwaresforcausaldiscoveryhavebeendevelopedintheRprogramminglanguage(Kalischetal.
,2018;Scutari,2018),andafewcausaldiscoveryalgorithmsareavailableinPythone.
g.
RCC(Lopez-Pazetal.
,2015),CGNN(Goudetetal.
,2018)andSAM(Kalainathanetal.
,2019),whilePythonsupportsmanycurrentmachinelearningframeworkssuchasPyTorch(Paszkeetal.
,2017).
TheCausalDiscoveryToolbox(Cdt)isanopen-sourcePythonpackageconcernedwithobservationalcausaldiscovery,aimedatlearningboththecausalgraphandtheas-sociatedcausalmechanismsfromsamplesofthejointprobabilitydistributionofthedata.
Cdtincludesmanystate-of-the-artcausalmodelingalgorithms(someofwhichareimportedfromR),thatsupportsGPUhardwareaccelerationandautomatichardwaredetection.
AmaingoalofCdtistoprovidetheuserswithguidancetowardsend-to-endexperiments,.
ThisworkwasdoneduringDiviyanKalainathan'sPhDThesisatUniv.
Paris-Saclayc2020DiviyanKalainathan,OlivierGoudet,RitikDutta.
License:CC-BY4.
0,seehttps://creativecommons.
org/licenses/by/4.
0/.
Attributionrequirementsareprovidedathttp://jmlr.
org/papers/v21/19-187.
html.
Kalainathan,Goudet,Duttabyincludingscoringmetrics,andstandardbenchmarkdatasetssuchasthe"Sachs"dataset(Sachsetal.
,2005).
Comparedtoothercausaldiscoverypackages,Cdtuniespairwiseandscore-basedmulti-variateapproacheswithinasinglepackage,implementinganstep-by-steppipelineapproach(Fig.
1).
Figure1:TheCdtcausalmodelingpackage:GeneralpipelineCdtalsoprovidesanintuitiveapproachforincludingR-basedalgorithms,facilitatingthetaskofextendingthetoolkitwithadditionalRpackages.
Thepackagerevolvesaroundtheusageofnetworkx.
Graphclasses,mainlyforrecovering(un)directedgraphsfromob-servationaldata.
Cdtcurrentlyincludes17algorithmsforgraphskeletonidentication:7methodsbasedonindependencetests,and10methodsaimedatdirectlyrecoveringtheskeletongraph.
Itfurtherincludes20algorithmsaimedatcausaldirectedgraphprediction,including11graphicaland9pairwiseapproaches.
2.
OriginalcontributionsofthepackageThecausalpairwisesettingconsidersapairofvariablesandaimstodeterminethecausalrelationshipbetweenbothvariables.
Thissettingimplicitlyassumesthatbothvari-ablesarealreadyconditionedonothercovariates,orreadjustedwithapropensityscore(RosenbaumandRubin,1983),andthattheremaininglatentcovariateshavelittleornoinuenceandcanbeconsideredas"noise".
Thepairwisesettingisalsorelevanttocompleteapartiallydirectedgraphresultingfromothercausaldiscoverymethods.
Inthe2010s,thepairwisesettingwasinvestigatedbyHoyeretal.
(2009)amongothers,whoproposedtheAdditiveNoiseModel(ANM).
Lateron,Guyon(2013)onCause-Eectpair(CEP)prob-lems;CEPformulatesbivariatecausalidenticationasasupervisedmachinelearningtask,whereaclassieristrainedfromexamples(Ai,Bi,i),wherethevariablepair(Ai,Bi)isrepresentedbysamplesoftheirjointdistributionandlabeliindicatesthetypeofcausalrelationshipbetweenbothvariables(independent,Ai→Bi,Bi→Ai).
Cdtisonethefewpackagestoincludecausalpairwisediscoveryalgorithms.
Thesealgorithms,mostlyimple-mentedusingPythonorMatlabareoftenleftunmaintained.
Therefore,manyalgorithmsthatareknowntobequiteecient(suchasJarfo(Fonollosa,2019),rstandrstinthecause-eectpairschallenges,codedinPython2.
7)areoutdatedandrequireasubstantialamountofworktoxandupdate.
Cdtimplements9pairwisealgorithms,allcodedinPython,5ofthembeingnewimplementations(NCC,GNN,CDS,RECIandabaselinemethodbasedonregressionerror).
Thegraphsetting,extensivelystudiedintheliterature,issupportedbymanypack-ages.
Bayesianapproachesrelyeitheronconditionalindependencetestsnamedconstraint-basedmethods,suchasPCorFCI(Spirtesetal.
,2000;Strobletal.
,2017),oronscore-basedmethods,involvingndingthegraphthatmaximizesalikelihoodscorethrough2CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythongraphsearchheuristics,likeGES(Chickering,2002)orCAM(B¨uhlmannetal.
,2014).
OtherapproachesleveragetheGenerativeNetworksetting,suchasCGNNorSAM(Goudetetal.
,2018;Kalainathanetal.
,2019).
Graphsettingmethodsoutputeitheradirectedacyclicgraphorapartiallydirectedacyclicgraph.
MostapproachesinthegraphsettingareimportedfromRpackages,withtheexceptionofCGNNandSAM.
3.
ComparisonwithotherpackagesToourbestknowledge,CausalityandPy-CausalaretheonlyalternativestoCdtforcausaldiscoveryinPython.
However,theonlyoverlapwithCdtconcernsthePC-algorithm,commontoPy-CausalandCdt.
AkintoCdt,Py-CausalisawrapperpackagebutaroundtheTetradJavapackage.
Fig.
2comparestheruntimesofthetwoPCimplementationsonsyntheticgraphswithofvaryingsize,connectivity,andnumberofdatapoints,showingaconstantgapinwithrespecttothenumberofdatapointsandconnectivityofthegraph.
Thisgapisduetothecreationofthesubprocessandthedatatransfer,thatarenottakenintoaccountinthePyCausalexecutionruntime.
Thegapwithrespecttothenumberofnodesisduetodierentimplementationsandcomputationalcomplexity.
FurthereortwillbedevotedtoimposingtheeciencyofourPython-NumbaimplementationofPC.
Figure2:RuntimesofimplementationsofPConvariousgraphs4.
ImplementationandutilitiesRintegration.
Assaid,theCdtpackageintegrate10algorithmscodedinRand17codedinPython.
TheCdtpackageintegratesallofthem,usingWrapperfunctionsinPythontoenabletheusertolaunchanyRscriptandtocontrolitsarguments;theRscriptsareexecutedinatemporaryfolderwithasubprocesstoavoidthelimitationsofthePythonGIL.
TheresultsareretrievedthroughoutputlesbackintothemainPythonprocess.
ThewholeprocedureismodularandallowscontributorstoeasilyaddnewRfunctionstothepackage.
Sustainabilityanddeployment.
Inorderforthepackagetobeeasilyextended,foster-ingtheintegrationoffurthercommunitycontributions,specialcareisgiventothequalityoftests.
Specically,aContinuousIntegrationtooladdedtothegitrepository,allowstosequentiallyexecutetestsonnewcommitsandpullrequest:i)Testallfunctionalitiesofthenewversiononthepackageontoydatasets;ii)Builddockerimagesandpushthemtohub.
docker.
com;iii)Pushthenewversiononpypi;iv)Updatethedocumentation3Kalainathan,Goudet,Duttawebsite.
Thisprocedurealsoallowstotesttheproperfunctioningofthepackagewithitsdependencies.
5.
ConclusionandfuturedevelopmentsTheCausalDiscoveryToolbox(Cdt)packageallowsPythonuserstoapplymanycausaldiscoveryorgraphmodelingalgorithmsonobservationaldata.
Itisalreadyusedinresearchprojects,suchas(Yaleetal.
,2018;Kalainathanetal.
,2019).
Astheoutputgraphsarenetworkx.
Graphclasses,theseareeasilyexportableintovariousformatsforvisualizationsoftwares,usinge.
g.
GraphvizorGephi.
Atthepackageimport,testsarerealizedtopinpointthecongurationoftheuser:availabilityofGPUsandRpackagesandnumberofCPUsonthehostmachine.
Thepackagepromotesanend-to-end,step-by-stepapproach:theundirectedgraph(bi-variatedependencies)isrstidentied,beforeapplyingcausaldiscoveryalgorithms;thelatterareconstrainedfromtheundirectedgraph,withsignicantcomputationalgains.
Futureextensionsofthepackageinclude:i)reimplementingtheRalgorithmsinPython-NumbaandreimplementthePytorchalgorithmsinChainertodropallheavydependenciesandtointegrateCdtinthePythoncommunitywithaNumpy-API;ii)developingGPU-compliantimplementationofnewalgorithms;iii)handlinginterventionaldataandtime-seriesdata(e.
g.
forneuroimagingandweatherforecast).
Inthelongerterm,ourpriorityistoprovidetheuserwithteststowhetherthestandardassumptions(e.
g.
causalsuciencyassumption)holdandassesstheriskofapplyingmethodsoutoftheirintendedscope.
ReferencesPeterB¨uhlmann,JonasPeters,JanErnest,etal.
CAM:Causaladditivemodels,high-dimensionalordersearchandpenalizedregression.
TheAnnalsofStatistics,2014.
DavidMaxwellChickering.
Optimalstructureidenticationwithgreedysearch.
Journalofmachinelearningresearch,3(Nov):507–554,2002.
JoseA.
R.
Fonollosa.
Conditionaldistributionvariabilitymeasuresforcausalitydetection.
CauseEectPairsinMachineLearning,2019.
OlivierGoudet,DiviyanKalainathan,PhilippeCaillou,IsabelleGuyon,DavidLopez-Paz,andMicheleSebag.
Learningfunctionalcausalmodelswithgenerativeneuralnetworks.
ExplainableandInterpretableModelsinComputerVisionandMachineLearning,2018.
IsabelleGuyon.
Chalearncauseeectpairschallenge,2013.
URLhttp://www.
causality.
inf.
ethz.
ch/cause-effect.
php.
PatrikO.
Hoyer,DominikJanzing,JorisM.
Mooij,JonasPeters,andBernhardSch¨olkopf.
Nonlinearcausaldiscoverywithadditivenoisemodels.
InNeuralInformationProcessingSystems(NIPS),pages689–696,2009.
DiviyanKalainathan,OlivierGoudet,IsabelleGuyon,DavidLopez-Paz,andMich`eleSebag.
Structuralagnosticmodeling:Adversariallearningofcausalgraphs.
ArXiv,2019.
4CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythonMarkusKalisch,AlainHauser,etal.
Package'pcalg'.
2018.
URLhttps://cran.
r-project.
org/web/packages/pcalg/index.
html.
DavidLopez-Paz,KrikamolMuandet,BernhardSch¨olkopf,andIlyaOTolstikhin.
Towardsalearningtheoryofcause-eectinference.
InICML,pages1452–1461,2015.
AdamPaszke,SamGross,SoumithChintala,etal.
AutomaticdierentiationinPyTorch.
2017.
URLhttps://pytorch.
org/.
PaulRRosenbaumandDonaldBRubin.
Thecentralroleofthepropensityscoreinobservationalstudiesforcausaleects.
Biometrika,70(1):41–55,1983.
KarenSachs,OmarPerez,DanaPe'er,DouglasALauenburger,andGarryPNolan.
Causalprotein-signalingnetworksderivedfrommultiparametersingle-celldata.
Science,308(5721):523–529,2005.
MarcoScutari.
Package'bnlearn',2018.
URLhttp://www.
bnlearn.
com/.
PeterSpirtes,ClarkNGlymour,andRichardScheines.
Causation,prediction,andsearch.
MITpress,2000.
EricVStrobl,KunZhang,andShyamVisweswaran.
Approximatekernel-basedconditionalindependencetestsforfastnon-parametriccausaldiscovery.
2017.
AndrewYale,SaloniDash,RitikDutta,IsabelleGuyon,AdrienPavao,andKristinBennett.
Privacypreservingsynthetichealthdata.
ESANN,2018.
5

日本CN2独立物理服务器 E3 1230 16G 20M 500元/月 提速啦

提速啦的来历提速啦是 网站 本着“良心 便宜 稳定”的初衷 为小白用户避免被坑 由赣州王成璟网络科技有限公司旗下赣州提速啦网络科技有限公司运营 投资1000万人民币 在美国Cera 香港CTG 香港Cera 国内 杭州 宿迁 浙江 赣州 南昌 大连 辽宁 扬州 等地区建立数据中心 正规持有IDC ISP CDN 云牌照 公司。公司购买产品支持3天内退款 超过3天步退款政策。提速啦的市场定位提速啦主...

易探云服务器怎么过户/转让?云服务器PUSH实操步骤

易探云服务器怎么过户/转让?易探云支持云服务器PUSH功能,该功能可将云服务器过户给指定用户。可带价PUSH,收到PUSH请求的用户在接收云服务器的同时,系统会扣除接收方的款项,同时扣除相关手续费,然后将款项打到发送方的账户下。易探云“PUSH服务器”的这一功能,可以让用户将闲置云服务器转让给更多需要购买的用户!易探云服务器怎么过户/PUSH?1.PUSH双方必须为认证用户:2.买家未接收前,卖家...

域名注册需要哪些条件(新手注册域名考虑的问题)

今天下午遇到一个网友聊到他昨天新注册的一个域名,今天在去使用的时候发现域名居然不见。开始怀疑他昨天是否付款扣费,以及是否有实名认证过,毕竟我们在国内域名注册平台注册域名是需要实名认证的,大概3-5天内如果不验证那是不可以使用的。但是如果注册完毕的域名找不到那也是奇怪。同时我也有怀疑他是不是忘记记错账户。毕竟我们有很多朋友在某个商家注册很多账户,有时候自己都忘记是用哪个账户的。但是我们去找账户也不办...

graphsearch为你推荐
支持ipadpreviouslybitiphone连不上wifi为什么苹果手机连不上wifi微信都发不出去?ipad上网新买的ipad怎么用。什么装程序 怎么上网google中国地图怎样用GOOLE搜中国地图用卫星看的那一种(可以看到城市和房子的)谷歌sb为什么百度一搜SB是谷歌,谷歌一搜SB是百度?chromeframe有用过 Google Chrome Frame 的吗苹果5.1完美越狱苹果iPhone4 iOS5.1完美越狱教程是什么?fastreport2.5空调滤芯pm2.5是什么意思?chrome18谷歌浏览器,你正在用哪个版本呢??
手机网站空间 Hello图床 tk域名 lamp配置 圣诞节促销 空间出租 什么是刀片服务器 网站木马检测工具 老左正传 hinet 免费私人服务器 江苏双线服务器 免费外链相册 外贸空间 百度云空间 ssl加速 大化网 新疆服务器 空间排行榜 ncp是什么 更多