contributionsgraph

graphsearch  时间:2021-02-11  阅读:()
JournalofMachineLearningResearch21(2020)1-5Submitted3/19;Revised11/19;Published03/20CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythonDiviyanKalainathandiviyan@fentech.
aiFenTech,TAU,LRI,INRIA,UniversiteParis-Sud20RueRaymondAron,75013Paris,FranceOlivierGoudetolivier.
goudet@univ-angers.
frLERIA,Universited'Angers,2boulevardLavoisier,49045Angers,FranceRitikDuttadutta.
ritik@iitgn.
ac.
inIITGandhinagar,Gandhinagar,Gujarat382355,IndiaEditor:AndreasMuellerAbstractThispaperpresentsanewopensourcePythonframeworkforcausaldiscoveryfromob-servationaldataanddomainbackgroundknowledge,aimedatcausalgraphandcausalmechanismmodeling.
TheCdtpackageimplementsanend-to-endapproach,recover-ingthedirectdependencies(theskeletonofthecausalgraph)andthecausalrelation-shipsbetweenvariables.
Itincludesalgorithmsfromthe'Bnlearn'(Scutari,2018)and'Pcalg'(Kalischetal.
,2018)packages,togetherwithalgorithmsforpairwisecausaldis-coverysuchasANM(Hoyeretal.
,2009).
CdtisavailableundertheMITLicenseathttps://github.
com/FenTechSolutions/CausalDiscoveryToolbox.
Keywords:CausalDiscovery,Graphrecovery,opensource,constraint-basedmethods,score-basedmethods,pairwisecausality,Markovblanket1.
IntroductionCausalmodelingiskeytounderstandphysicalorarticialphenomenaandtoguideinter-ventions.
MostsoftwaresforcausaldiscoveryhavebeendevelopedintheRprogramminglanguage(Kalischetal.
,2018;Scutari,2018),andafewcausaldiscoveryalgorithmsareavailableinPythone.
g.
RCC(Lopez-Pazetal.
,2015),CGNN(Goudetetal.
,2018)andSAM(Kalainathanetal.
,2019),whilePythonsupportsmanycurrentmachinelearningframeworkssuchasPyTorch(Paszkeetal.
,2017).
TheCausalDiscoveryToolbox(Cdt)isanopen-sourcePythonpackageconcernedwithobservationalcausaldiscovery,aimedatlearningboththecausalgraphandtheas-sociatedcausalmechanismsfromsamplesofthejointprobabilitydistributionofthedata.
Cdtincludesmanystate-of-the-artcausalmodelingalgorithms(someofwhichareimportedfromR),thatsupportsGPUhardwareaccelerationandautomatichardwaredetection.
AmaingoalofCdtistoprovidetheuserswithguidancetowardsend-to-endexperiments,.
ThisworkwasdoneduringDiviyanKalainathan'sPhDThesisatUniv.
Paris-Saclayc2020DiviyanKalainathan,OlivierGoudet,RitikDutta.
License:CC-BY4.
0,seehttps://creativecommons.
org/licenses/by/4.
0/.
Attributionrequirementsareprovidedathttp://jmlr.
org/papers/v21/19-187.
html.
Kalainathan,Goudet,Duttabyincludingscoringmetrics,andstandardbenchmarkdatasetssuchasthe"Sachs"dataset(Sachsetal.
,2005).
Comparedtoothercausaldiscoverypackages,Cdtuniespairwiseandscore-basedmulti-variateapproacheswithinasinglepackage,implementinganstep-by-steppipelineapproach(Fig.
1).
Figure1:TheCdtcausalmodelingpackage:GeneralpipelineCdtalsoprovidesanintuitiveapproachforincludingR-basedalgorithms,facilitatingthetaskofextendingthetoolkitwithadditionalRpackages.
Thepackagerevolvesaroundtheusageofnetworkx.
Graphclasses,mainlyforrecovering(un)directedgraphsfromob-servationaldata.
Cdtcurrentlyincludes17algorithmsforgraphskeletonidentication:7methodsbasedonindependencetests,and10methodsaimedatdirectlyrecoveringtheskeletongraph.
Itfurtherincludes20algorithmsaimedatcausaldirectedgraphprediction,including11graphicaland9pairwiseapproaches.
2.
OriginalcontributionsofthepackageThecausalpairwisesettingconsidersapairofvariablesandaimstodeterminethecausalrelationshipbetweenbothvariables.
Thissettingimplicitlyassumesthatbothvari-ablesarealreadyconditionedonothercovariates,orreadjustedwithapropensityscore(RosenbaumandRubin,1983),andthattheremaininglatentcovariateshavelittleornoinuenceandcanbeconsideredas"noise".
Thepairwisesettingisalsorelevanttocompleteapartiallydirectedgraphresultingfromothercausaldiscoverymethods.
Inthe2010s,thepairwisesettingwasinvestigatedbyHoyeretal.
(2009)amongothers,whoproposedtheAdditiveNoiseModel(ANM).
Lateron,Guyon(2013)onCause-Eectpair(CEP)prob-lems;CEPformulatesbivariatecausalidenticationasasupervisedmachinelearningtask,whereaclassieristrainedfromexamples(Ai,Bi,i),wherethevariablepair(Ai,Bi)isrepresentedbysamplesoftheirjointdistributionandlabeliindicatesthetypeofcausalrelationshipbetweenbothvariables(independent,Ai→Bi,Bi→Ai).
Cdtisonethefewpackagestoincludecausalpairwisediscoveryalgorithms.
Thesealgorithms,mostlyimple-mentedusingPythonorMatlabareoftenleftunmaintained.
Therefore,manyalgorithmsthatareknowntobequiteecient(suchasJarfo(Fonollosa,2019),rstandrstinthecause-eectpairschallenges,codedinPython2.
7)areoutdatedandrequireasubstantialamountofworktoxandupdate.
Cdtimplements9pairwisealgorithms,allcodedinPython,5ofthembeingnewimplementations(NCC,GNN,CDS,RECIandabaselinemethodbasedonregressionerror).
Thegraphsetting,extensivelystudiedintheliterature,issupportedbymanypack-ages.
Bayesianapproachesrelyeitheronconditionalindependencetestsnamedconstraint-basedmethods,suchasPCorFCI(Spirtesetal.
,2000;Strobletal.
,2017),oronscore-basedmethods,involvingndingthegraphthatmaximizesalikelihoodscorethrough2CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythongraphsearchheuristics,likeGES(Chickering,2002)orCAM(B¨uhlmannetal.
,2014).
OtherapproachesleveragetheGenerativeNetworksetting,suchasCGNNorSAM(Goudetetal.
,2018;Kalainathanetal.
,2019).
Graphsettingmethodsoutputeitheradirectedacyclicgraphorapartiallydirectedacyclicgraph.
MostapproachesinthegraphsettingareimportedfromRpackages,withtheexceptionofCGNNandSAM.
3.
ComparisonwithotherpackagesToourbestknowledge,CausalityandPy-CausalaretheonlyalternativestoCdtforcausaldiscoveryinPython.
However,theonlyoverlapwithCdtconcernsthePC-algorithm,commontoPy-CausalandCdt.
AkintoCdt,Py-CausalisawrapperpackagebutaroundtheTetradJavapackage.
Fig.
2comparestheruntimesofthetwoPCimplementationsonsyntheticgraphswithofvaryingsize,connectivity,andnumberofdatapoints,showingaconstantgapinwithrespecttothenumberofdatapointsandconnectivityofthegraph.
Thisgapisduetothecreationofthesubprocessandthedatatransfer,thatarenottakenintoaccountinthePyCausalexecutionruntime.
Thegapwithrespecttothenumberofnodesisduetodierentimplementationsandcomputationalcomplexity.
FurthereortwillbedevotedtoimposingtheeciencyofourPython-NumbaimplementationofPC.
Figure2:RuntimesofimplementationsofPConvariousgraphs4.
ImplementationandutilitiesRintegration.
Assaid,theCdtpackageintegrate10algorithmscodedinRand17codedinPython.
TheCdtpackageintegratesallofthem,usingWrapperfunctionsinPythontoenabletheusertolaunchanyRscriptandtocontrolitsarguments;theRscriptsareexecutedinatemporaryfolderwithasubprocesstoavoidthelimitationsofthePythonGIL.
TheresultsareretrievedthroughoutputlesbackintothemainPythonprocess.
ThewholeprocedureismodularandallowscontributorstoeasilyaddnewRfunctionstothepackage.
Sustainabilityanddeployment.
Inorderforthepackagetobeeasilyextended,foster-ingtheintegrationoffurthercommunitycontributions,specialcareisgiventothequalityoftests.
Specically,aContinuousIntegrationtooladdedtothegitrepository,allowstosequentiallyexecutetestsonnewcommitsandpullrequest:i)Testallfunctionalitiesofthenewversiononthepackageontoydatasets;ii)Builddockerimagesandpushthemtohub.
docker.
com;iii)Pushthenewversiononpypi;iv)Updatethedocumentation3Kalainathan,Goudet,Duttawebsite.
Thisprocedurealsoallowstotesttheproperfunctioningofthepackagewithitsdependencies.
5.
ConclusionandfuturedevelopmentsTheCausalDiscoveryToolbox(Cdt)packageallowsPythonuserstoapplymanycausaldiscoveryorgraphmodelingalgorithmsonobservationaldata.
Itisalreadyusedinresearchprojects,suchas(Yaleetal.
,2018;Kalainathanetal.
,2019).
Astheoutputgraphsarenetworkx.
Graphclasses,theseareeasilyexportableintovariousformatsforvisualizationsoftwares,usinge.
g.
GraphvizorGephi.
Atthepackageimport,testsarerealizedtopinpointthecongurationoftheuser:availabilityofGPUsandRpackagesandnumberofCPUsonthehostmachine.
Thepackagepromotesanend-to-end,step-by-stepapproach:theundirectedgraph(bi-variatedependencies)isrstidentied,beforeapplyingcausaldiscoveryalgorithms;thelatterareconstrainedfromtheundirectedgraph,withsignicantcomputationalgains.
Futureextensionsofthepackageinclude:i)reimplementingtheRalgorithmsinPython-NumbaandreimplementthePytorchalgorithmsinChainertodropallheavydependenciesandtointegrateCdtinthePythoncommunitywithaNumpy-API;ii)developingGPU-compliantimplementationofnewalgorithms;iii)handlinginterventionaldataandtime-seriesdata(e.
g.
forneuroimagingandweatherforecast).
Inthelongerterm,ourpriorityistoprovidetheuserwithteststowhetherthestandardassumptions(e.
g.
causalsuciencyassumption)holdandassesstheriskofapplyingmethodsoutoftheirintendedscope.
ReferencesPeterB¨uhlmann,JonasPeters,JanErnest,etal.
CAM:Causaladditivemodels,high-dimensionalordersearchandpenalizedregression.
TheAnnalsofStatistics,2014.
DavidMaxwellChickering.
Optimalstructureidenticationwithgreedysearch.
Journalofmachinelearningresearch,3(Nov):507–554,2002.
JoseA.
R.
Fonollosa.
Conditionaldistributionvariabilitymeasuresforcausalitydetection.
CauseEectPairsinMachineLearning,2019.
OlivierGoudet,DiviyanKalainathan,PhilippeCaillou,IsabelleGuyon,DavidLopez-Paz,andMicheleSebag.
Learningfunctionalcausalmodelswithgenerativeneuralnetworks.
ExplainableandInterpretableModelsinComputerVisionandMachineLearning,2018.
IsabelleGuyon.
Chalearncauseeectpairschallenge,2013.
URLhttp://www.
causality.
inf.
ethz.
ch/cause-effect.
php.
PatrikO.
Hoyer,DominikJanzing,JorisM.
Mooij,JonasPeters,andBernhardSch¨olkopf.
Nonlinearcausaldiscoverywithadditivenoisemodels.
InNeuralInformationProcessingSystems(NIPS),pages689–696,2009.
DiviyanKalainathan,OlivierGoudet,IsabelleGuyon,DavidLopez-Paz,andMich`eleSebag.
Structuralagnosticmodeling:Adversariallearningofcausalgraphs.
ArXiv,2019.
4CausalDiscoveryToolbox:UncoveringcausalrelationshipsinPythonMarkusKalisch,AlainHauser,etal.
Package'pcalg'.
2018.
URLhttps://cran.
r-project.
org/web/packages/pcalg/index.
html.
DavidLopez-Paz,KrikamolMuandet,BernhardSch¨olkopf,andIlyaOTolstikhin.
Towardsalearningtheoryofcause-eectinference.
InICML,pages1452–1461,2015.
AdamPaszke,SamGross,SoumithChintala,etal.
AutomaticdierentiationinPyTorch.
2017.
URLhttps://pytorch.
org/.
PaulRRosenbaumandDonaldBRubin.
Thecentralroleofthepropensityscoreinobservationalstudiesforcausaleects.
Biometrika,70(1):41–55,1983.
KarenSachs,OmarPerez,DanaPe'er,DouglasALauenburger,andGarryPNolan.
Causalprotein-signalingnetworksderivedfrommultiparametersingle-celldata.
Science,308(5721):523–529,2005.
MarcoScutari.
Package'bnlearn',2018.
URLhttp://www.
bnlearn.
com/.
PeterSpirtes,ClarkNGlymour,andRichardScheines.
Causation,prediction,andsearch.
MITpress,2000.
EricVStrobl,KunZhang,andShyamVisweswaran.
Approximatekernel-basedconditionalindependencetestsforfastnon-parametriccausaldiscovery.
2017.
AndrewYale,SaloniDash,RitikDutta,IsabelleGuyon,AdrienPavao,andKristinBennett.
Privacypreservingsynthetichealthdata.
ESANN,2018.
5

Digital-VM80美元新加坡和日本独立服务器

Digital-VM商家的暑期活动促销,这个商家提供有多个数据中心独立服务器、VPS主机产品。最低配置月付80美元,支持带宽、流量和IP的自定义配置。Digital-VM,是2019年新成立的商家,主要从事日本东京、新加坡、美国洛杉矶、荷兰阿姆斯特丹、西班牙马德里、挪威奥斯陆、丹麦哥本哈根数据中心的KVM架构VPS产品销售,分为大硬盘型(1Gbps带宽端口、分配较大的硬盘)和大带宽型(10Gbps...

Ftech:越南vps,2核/2G/20G SSD/1Gbps不限流量/可安装Windows系统,$12.5月

ftech怎么样?ftech是一家越南本土的主机商,成立于2011年,比较低调,国内知道的人比较少。FTECH.VN以极低的成本提供高质量服务的领先提供商之一。主营虚拟主机、VPS、独立服务器、域名等传统的IDC业务,数据中心分布在河内和胡志明市。其中,VPS提供1G的共享带宽,且不限流量,还可以安装Windows server2003/2008的系统。Ftech支持信用卡、Paypal等付款,但...

日本CN2独立物理服务器 E3 1230 16G 20M 500元/月 提速啦

提速啦的来历提速啦是 网站 本着“良心 便宜 稳定”的初衷 为小白用户避免被坑 由赣州王成璟网络科技有限公司旗下赣州提速啦网络科技有限公司运营 投资1000万人民币 在美国Cera 香港CTG 香港Cera 国内 杭州 宿迁 浙江 赣州 南昌 大连 辽宁 扬州 等地区建立数据中心 正规持有IDC ISP CDN 云牌照 公司。公司购买产品支持3天内退款 超过3天步退款政策。提速啦的市场定位提速啦主...

graphsearch为你推荐
平板ipadcss3圆角在HTML里如何实现圆角矩形?重庆宽带测速重庆云阳电信宽带测速网址谁知道,帮个忙?win10445端口win7系统不能被telnet端口号,端口、服务什么全都开了win7如何关闭445端口如何关闭445端口,禁用smb协议360chrome使用360急速浏览器,360chrome进程结束不了联通iphone4北京 朝阳区 哪家联通店可以卖Iphone4的,本周周末过去买谷歌sb为什么搜索SB第一个是google?迅雷快鸟迅雷快鸟是做什么用的,,,迅雷下载速度为什么现在迅雷下载的速度比原来慢得多?
最好的虚拟主机 电信服务器租用 深圳域名空间 vps是什么意思 vps推荐 重庆服务器托管 directspace 新加坡服务器 香港服务器99idc 电影服务器 美元争夺战 韩国电信 抢票工具 tk域名 免费网站申请 中国智能物流骨干网 cpanel空间 卡巴斯基官方免费版 阿里校园 phpmyadmin配置 更多