Wikidata:AplatformfordataintegrationanddisseminationforthelifesciencesandbeyondElviraMitraka1,AndraWaagmeester2,SebastianBurgstaller-Muehlbacher3,LynnM.
Schriml1,AndrewI.
Su3,BenjaminM.
Good3UniversityofMarylandSchoolofMedicine,Baltimore,USA{emitraka,lschriml}@som.
umaryland.
eduMicelio,Antwerp,Belgiumandra@micelio.
beDepartmentofMolecularandExperimentalMedicine,ScrippsResearchInstitute,LaJolla,USA{sburgs,asu,bgood}@scripps.
eduAbstract.
Wikidataisanopen,SemanticWeb-compatibledatabasethatanyonecanedit.
This'datacommons'providesstructureddataforWikipediaarticlesandotherapplications.
EveryarticleonWikipediahasahyperlinktoaneditableiteminthisdatabase.
Thisuniqueconnectiontotheworld'slargestcommunityofvolunteerknowledgeeditorscouldhelpmakeWikidataakeyhubwithinthegreaterSemanticWeb.
Thelifesciences,asever,facescrucialchallengesindisseminatingandintegratingknowledge.
OurgroupisaddressingtheseissuesbypopulatingWikidatawiththeseedsofafoundationalsemanticnetworklink-inggenes,drugsanddiseases.
Usingthiscontent,weareenhancingWikipediaarticlestobothincreasetheirqualityandrecruithumaneditorstoexpandandimprovetheunderlyingdata.
Weencouragethecommunitytojoinusaswecollaborativelycreatewhatcanbecomethemostusedandmostcentralseman-ticdataresourceforthelifesciencesandbeyond.
Keywords:Wikidata,Wikipedia,LinkedData,SemanticWeb,Crowdsourcing,KnowledgeManagement1StoneDataSoupIntheStoneSoupfolktale[1],agroupofhungrytravelersarriveinavillagewithitsinhabitantsunwillingtosharetheirfood.
Withakettleofwaterandastonethetravelersmanagetotouchthecuriosityofthevillagers.
Thecuriosityfinallyspawnsacollaborativeefforttomakeagreatsoup.
Thisstoryisnowadaysusedtoexpressthepowerofcrowdsourcingandcollaborativeprojects[2],suchasWikipedia,wheremanyindividualseachmakesmallcontributionsbutcollectivelyproducesomethinglargerthanthesumofitsparts.
WikidataextendsthiscollaborativemodeltotheWebofdata[3].
InthisarticlewewilldescribeWikidataandthewaysthatthisopenpublicplatformcantakeacentralroleindatasharingandmanagementforthelifesciencecommunity.
2WikidataandWikipediaWikipediaisamongthemostvisitedsitesontheInternet.
Articlesaboutmedicaltopicswereviewedmorethan4.
88billiontimesin2013,anumberonparwithhttp://nih.
govandsignificantlygreaterthanWebMD[4].
Thisincrediblyimportantresource,createdthroughvolunteerlabor,isnowtightlycoupledtoWikidata-anopen,SemanticWeb-compatibledatabasethatanyonecanedit[3].
Wikipediainfoboxes-thetablesofdataoftenappearingontherightsideofarticles-cannowrendercontentstoredinWikidataandeachWikipediaarticlenowhasadirectlinktothecorrespondingWikidataitem,thusencouragingthecollaborativeeditingofthedata(Fig.
1).
Fig.
1.
Wikidataprovidesacentralizedresourceforstructureddata.
Applicationsincluding,butnotlimitedto,WikipediacannowreadandwritetoWikidata.
Infoboxesprovidethebridgebetweenmachine-readablestructureddataandtheunstructuredtextthatformsthemainbodyofeacharticle.
Since2008,theGeneWikiprojecthasautomaticallycreatedandmaintainedtheinfoboxesforaround10000articlesabouthumangenes[5].
Now,thisinitiativeisfocusedongeneratingafoundationofbiomedicalknowledgeinWikidatathatwillbeusedtoimproveinfoboxcontentonWikipediaandhelpdrivenewapplications.
Todate,wehaveloadedWikidatawithitemsabout:56451humanand73086mousegenesfromNCBIGene[6],6562conceptsintheDiseaseOntology[7],and1830FDA-approveddrugs.
ThisinitialdataloadgeneratedWikidataitemsforthesekeybiomedicalconcepts,mappedthemtoWikipediaarticlesandlinkedthemtothecorrespondingidentifiersinauthori-tativepublicdatabases.
Theidentifier-levelconnectionstothesourcedatabasesen-surethatWikidatacontentcanbeeasilyintegratedintotheexistingWebofbiomedi-caldata.
Moreover,theprovenanceofallWikidataclaimscanbeassessedthroughinspectionofthesupportingreferences.
Thedataiskeptuptodatebyperiodicallyrunning'bots'thatpropagatechangesfromauthoritativesourcestoWikidata.
WhenconflictsarisefromhumaneditstoWikidataitems,theseareflaggedformanualre-view.
Thenextphaseoftheprojectwillstitchtheseconceptsintoarichlyintercon-nectedsemanticnetwork.
3Takingasipofthedatasoup–WikidataandtheSemanticWebThefirstapplicationtouseWikidataextensivelyisWikipediabutthiscouldbethetipoftheiceberg.
TogiveapreviewofwhatWikidatacouldbecome,it'suse-fultobrieflyexamineitsclosestancestor,DBpedia.
TheDBpediaprojectminescon-tentfromWikipediabyparsinginfoboxes,mapsthiscontenttotheirownontology,andprovidesaccesstothisdataintheformofalargeRDFdatabaseavailablebothforbulkdownloadandSPARQLquery.
Whileenablinginterestingqueriesonitsown,itsmostimportantfunctionisasagloballinkinghubfortheSemanticWeb[8].
IncomparisontoDBpedia,Wikidatahasanumberofadvantages.
First,itcanbeediteddirectlyandchangesarereflectedinrealtime.
Second,itdoesnotrequireanyparsingbecausealldataismanagedinadatabasefromtheoutset.
Third,itcontainslargeamountsofcontentthatisnotpresentinWikipedia,suchasitemsforeverymousegene.
Finally,itsqueryAPIsupportsnotonlyqueriesalongitsassertedknowledgegraph,butalsoalongreferences,qualifiersandevenedithistories.
Theseadditionalcapabilities,viewedinlightofthesuccessoftheDBpediaproject,portendavitalfutureforWikidatainthecontextoftheSemanticWeb.
Withinthebiomedicaldomain,usefulqueriesarealreadypossibleasaresultofthe'single-pot'natureofWikidata.
Forexample,itispossibletouseWikidata'sSPARQLendpoint(https://query.
wikidata.
org/)toanswerquestionssuchas"whatclinicallyrelevantdrug-druginteractionsareknownforthedrugmethadone(CHEMBL651)"[9].
Importantly,thedatausedtoanswerthisquerycamefromtwogroupsworkingcompletelyindependently.
Our'drug_bot'botaddedtheCHEMBLidentifiers(aswellasmanyotheridentifiers)whileanotherbotdevelopedbyateamattheMedicalUniversityofViennaaddedthedrug-druginteractions[10].
Thishap-penedwithoutanydirectcoordinationbetweenourgroups.
Thiskindofserendipitous,automatic,cross-continentaldataintegrationistheprimarygoaloftheSemanticWeb,butisnotyetcommonplace.
ThekeybeautyandmainchallengeoftheSemanticWebisitsdistributednature.
InorderforthiskindofintegrationtohappenintheabsenceofacentralizedresourcelikeWikidata,severalmajorhurdleswouldneedtobeleaped.
First,bothteamswouldneedtoknowenoughaboutthefairlycomplexstackofsemantictechnologiestoprovidetheirdataasRDFthroughastable,publicSPARQLendpoint.
Second,theywouldhavetoworkwithoverlappingidentifiersystems.
Third,thewould-beconsumeroftheirdatawouldneedtodiscoverbothoftheirendpointsandbesophisticatedenoughwithSPARQLtoidentifyandissuetheappropriatedistributedquery.
Allofthisispossi-bleandcanwork,butitisnoteasy.
Byintegratingdatainacentralized,singlecommunitypot,Wikidatapro-videsaplatformthataddresseseachoftheseproblems.
DataprovidersdonothavetosetupandmaintaintheirownSPARQLendpoint–achallengethatveryfewteamshavesucceededatdoingforanylengthoftime[11].
Byvirtueofworkinginthesamedatabase,itisfarlesslikely-thoughnotimpossible-forindependentteamstogener-ateandpublishdifferentidentifiers,asthefirststepinworkingwithWikidataistoqueryittoseewhatisalreadythere.
Finally,thechallengeoffindingarelevantend-pointisnegatedwhenthereisonlyone.
NotethatWikidatacanbequeriedusingSPARQLortheWikidataQueryLanguage[12].
4ManyCooks.
.
.
ThefactthatWikidataisonecentralized,communityresourceimmediatelysurfacesthechallengesincurredinanycollaborativeontologydevelopmentpro-cess.
InWikidata,the'ontology'correspondstoitscollectionoflinkingpropertiesusedtodescribeitems.
AnewpropertyinWikidatahastobeproposedforcommuni-tydiscussionandisonlycreatedafteraconsensusregardingthevalueofthepropertyanditsrelationtoexistingpropertieshasbeenestablished.
Forthoseusedtocontrol-lingtheirowndataanddatamodels,thisprocesscanfeeltedious.
Butthissamefun-damentalprocessmustbeundertakeninanyattemptatdataintegration.
Thefactthatithappensupfront,whendataisfirstbeingloaded,shouldhelptokeepthedatacon-sistentandreducethedownstreamidentifierandontologicalmappingproblemsthatcontinuetoplaguebioinformatics.
ImaginethepowerofcombiningthestructureddatainWikidata,thehighaccessibilityanddedicatedcommunityofWikipediaandtheknowledgeofthescien-tificcommunity.
Contemplatefurtherthatallofthisdataisfreelyavailableandac-cessiblethroughastablequeryinterfaceandrobust,read/writeAPI.
Thismakesim-portant,high-qualityinformationeasilyaccessiblebyanyoneandopensupscientificknowledgeforpublicscrutiny.
Further,thebuilt-inprovenancetrackingcanprovidedetailedchainsofevidencetosupportorrefuteeachclaimandallofthiscanbedis-cussedusingthemanysocialtools,suchas'talkpages'foreverydataitem,bakedintotheMediaWikiinfrastructure.
Asidefromcreatingusefulwaystodisseminatedata,thissociotechnicalstructureprovidesaframeworkforthebroadcommunitytobroadcastfeedbackbacktotheoriginaldataowners.
Evenatthisearlystageofthisproject,thisprocesshasalreadyledtoimprovementsinsourcedata.
Forexample,intheDiseaseOntologytheterm'Ollierdisease'hadthesynonym'Maffuccisyndrome'.
UponimportingtheDiseaseOntologyintoWikidata,membersoftheWikidatacommunitypointedoutthatthetwoterms,thoughputativesynonyms,linkedtotwodifferentextantWikidataitems.
Uponcloserreviewitwasdeterminedthatthesetwotermsrepresenttwodif-ferent,albeitcloselyrelated,diseases,leadingtothecreationofanewtermintheDiseaseOntology.
AsWikidataexpandsitistobeexpectedthatadditionaldiffer-encesinrepresentationbetweenitandotherknowledgeresourceswillsurface.
ThesewillfirstbetriagedbytheWikidatacommunitytocheckforerrorsand,ifconsensusisachievedthatthereisanerrorintheoriginalsource,thiswillberelayedforconsid-eration.
Inthisway,theWikidatacommunitycanbecomethe'manyeyes'thatmakeallontologybugsshallow.
5.
.
.
CanMakeaDeliciousSoupWecancreateapowerfulcommonsofbiomedicalknowledgebybuildingonestablishedresourcesandthededicatedcommunitytoconnectgenes,proteins,drugs,diseases,phenotypesandsymptoms.
WikipediawillbethefirstapplicationtousethecontentinWikidata,butcertainlynotthelast.
Thefireisreadyandthepotisstartingtoheatup.
Somevillagersarealreadypeekingoutoftheirwindowsreadytojoinusaroundthepot,butitwilltaketheeffortofthewholecommunitytomakeadeliciousbiomedicaldatasoup.
Weinviteyoutojoinusinthiseffort.
References1.
HistoryoftheStoneSoupStoryfrom1720tonow.
Availablefrom:http://www.
stonesoup.
com/history-of-the-stone-soup-story-from-1720-to-now/.
2.
Taylor.
J.
TheStoneSoupofData.
20078May;Availablefrom:https://km.
aifb.
kit.
edu/ws/ckc2007/StoneSoup-www2007.
pdf.
3.
Vrandei,D.
andM.
Krtzsch,Wikidata:AFreeCollaborativeKnowledgebase,inCommunicationsoftheACM.
2014,ACM.
p.
78-85.
4.
Heilman,J.
M.
andA.
G.
West,Wikipediaandmedicine:quantifyingreadership,editors,andthesignificanceofnaturallanguage.
JMedInternetRes,2015.
17(3):p.
e62.
5.
Huss,J.
W.
,3rd,etal.
,Agenewikiforcommunityannotationofgenefunction.
PLoSBiol,2008.
6(7):p.
e175.
6.
Brown,G.
R.
,etal.
,Gene:agene-centeredinformationresourceatNCBI.
NucleicAcidsRes,2015.
43(Databaseissue):p.
D36-42.
7.
Kibbe,W.
A.
,etal.
,DiseaseOntology2015update:anexpandedandupdateddatabaseofhumandiseasesforlinkingbiomedicalknowledgethroughdiseasedata.
NucleicAcidsRes,2015.
43(Databaseissue):p.
D1071-8.
8.
Bizer,C.
,etal.
,DBpedia-AcrystallizationpointfortheWebofData.
WebSemantics:Science,ServicesandAgentsontheWorldWideWeb,2009.
7(3):p.
154-165.
9.
Getallthedrug-druginteractionsforMethadonebasedonitsCHEMBLidCHEMBL651.
2015[cited2015Sep.
14];Availablefrom:https://bitbucket.
org/sulab/wikidatasparqlexamples/overview#markdown-header-get-all-the-drug-drug-interactions-for-methadone-based-on-its-chembl-id-chembl651.
10.
Pfundner,A.
,etal.
,UtilizingtheWikidatasystemtoimprovethequalityofmedicalcontentinWikipediaindiverselanguages:apilotstudy.
JMedInternetRes,2015.
17(5):p.
e110.
11.
Buil-Arand,C.
,etal.
SPARQLWeb-QueryingInfrastructure:ReadyforActionin12thInternationalSemanticWebConference.
2013.
Sydney,Australia.
12.
WikidataQueryEditor.
[cited2015;Availablefrom:https://wdq.
wmflabs.
org/wdq/.
Pia云商家在前面有介绍过一次,根据市面上的信息是2018的开办的国人商家,原名叫哔哔云,目前整合到了魔方云平台。这个云服务商家主要销售云服务器VPS主机业务和服务,云服务器采用KVM虚拟架构 。目前涉及的机房有美国洛杉矶、中国香港和深圳地区。洛杉矶为crea机房,三网回程CN2 GIA,自带20G防御。中国香港机房的线路也是CN2直连大陆,比较适合建站或者有游戏业务需求的用户群。在这篇文章中,简...
近日CloudCone发布了七月的特价便宜优惠VPS云服务器产品,KVM虚拟架构,性价比最高的为2核心1.5G内存1Gbps带宽5TB月流量,2.89美元/月,稳定性还是非常不错的,有需要国外便宜VPS云服务器的朋友可以关注一下。CloudCone怎么样?CloudCone服务器好不好?CloudCone值不值得购买?CloudCone是一家成立于2017年的美国服务器提供商,国外实力大厂,自己开...
从介绍看啊,新增的HostYun 俄罗斯机房采用的是双向CN2线路,其他的像香港和日本机房,均为国内直连线路,访问质量不错。HostYun商家通用九折优惠码:HostYun内存CPUSSD流量带宽价格(原价)购买地址1G1核10G300G/月200M28元/月购买链接1G1核10G500G/月200M38元/月购买链接1G1核20G900G/月200M68元/月购买链接2G1核30G1500G/月...
mediawiki为你推荐
安顺网易yeahFlash动画设计与制作——第九章:导出和发布动画资费标准中国电信套餐资费一览表2021颁发的拼音发字的多音字组词瑞东集团道恩集团的集团简介青岛网通测速网通,联通,长城这三个宽带哪个网速最快?我是青岛的oa办公软件价格一套OA办公系统多少钱温州都市报招聘温州哪里有招暑期工?怎么去?要什么条件?急......艾泰科技艾泰的品牌介绍开源网店系统国内有哪些好的java开源电子商城系统
域名转让网 vps优惠码cnyvps 个人域名备案流程 域名备案收费吗 3322动态域名 主机评测 vultr美国与日本 vmsnap3 好看的桌面背景大图 ssh帐号 国内php空间 网站挂马检测工具 全站静态化 100m空间 秒杀汇 免费吧 免费cdn 100mbps 台湾谷歌 512mb 更多