diagramssafebase
safebase 时间:2021-05-24 阅读:(
)
TheChemistryDevelopmentKit(CDK):AnOpen-SourceJavaLibraryforChemo-andBioinformaticsChristophSteinbeck,*,YongquanHan,StefanKuhn,OliverHorlacher,EdgarLuttmann,§andEgonWillighagen#Max-Planck-InstituteofChemicalEcology,Jena,Germany,TheraSTratAG,Allschwil,Switzerland,InstituteofOrganicChemistry,UniversityofPaderborn,Germany,andNijmegen,TheNetherlandsReceivedAugust17,2002TheChemistryDevelopmentKit(CDK)isafreelyavailableopen-sourceJavalibraryforStructuralChemo-andBioinformatics.
Itsarchitectureandcapabilitiesaswellasthedevelopmentasanopen-sourceprojectbyateamofinternationalcollaboratorsfromacademicandindustrialinstitutionsisdescribed.
TheCDKprovidesmethodsformanycommontasksinmolecularinformatics,including2Dand3Drenderingofchemicalstructures,I/Oroutines,SMILESparsingandgeneration,ringsearches,isomorphismchecking,structurediagramgeneration,etc.
Applicationscenariosaswellasaccessinformationforinterestedusersandpotentialcontributorsaregiven.
1.
INTRODUCTIONWhoeverpursuestheendeavorofcreatingalargersoftwarepackageinchemoinformaticsorcomputationalchemistryfromscratchwillsoonbeconfrontedwiththeSyssiphustaskofimplementingthestandardrepertoireofchemoinformaticalalgorithmsandcomponentsinventedduringthelast20or30years.
TheobviousworkaroundforthisproblemarecommerciallyavailablechemoinformaticslibrariesthathavebeendevelopedbycompaniessuchasMDLInformationSystems,Inc.
,DaylightChemicalInfor-mationSystems,Inc.
,AdvancedChemistryDevelopment,andcertainlymanyothers.
Ascientistinanacademicenvironment,however,oftenfeelsobligedtoopenlysharehisresultswiththescientificcommunity.
Usingproprietarycomponentsforsoftwaredevelopmentmakesitimpossibletodoso.
Generally,scientificsoftwareistoooftenclosedsource,leavingtheuserwithablackboxperformingmagicaloperations.
Perceivedasbeingcounterproductivefortheoverallscientificprogress,thistrendfortunatelyseemstochange.
Sharingofideasandresultswithincommunitiesisprobablythemostcentralparadigminscience.
Bypublishinghisresultsascientistallowshiscolleaguestoverifyandbuilduponhisresults,therebyadvancingtheparticularfieldasawhole[IfIhaveseenfurtheritisbystandingontheshouldersofgiants.
-IsaacNewton].
Oneofthemotivationsforsuchcontributions,besidesthepurescientificcuriosity,is,ofcourse,thegainofsocialrecognitionandreputationamonghispeers.
Inrecentyearstheideassketchedabovehavebeenpartoftheopen-sourcerevolutionthattookplaceintheworldofsoftwaredevelopment,mostwidelyrecognizedthroughthegreatsuccessofthefreeUnix-likeoperatingsystemGNU/Linux,acollaborativeworkofmanyindividualsandorganizations,includingtheFreeSoftwareFoundationleadbyRichardStallmanandtheFinishcomputersciencestudentLinusTorvaldswhostartedtheproject.
Accordingtoseveralessaysonthissubject,open-sourcesoftware,forwhich,bydefinition,thesourcecodeisalwaysfreelyavailabletothepublic,1hasanumberofintriguingbenefits.
Mostimportantly,ifthecommunityofusersislargeenoughandeveryonecanlookatthesourcesandchangethem,itshouldnottaketoolonguntilaparticularsoftwareerrorisfoundandfixed.
"Givenenougheyeballs,allbugsareshallow",asEricRaymondputitinhiswidelyrecognizedessay"TheCathedralandtheBazaar",2inwhichheanalysesthemechanismsandprinciplesoftheopensourcemovement.
Further,otherscientistscaneasilybuildonexistingresults.
Creditcanstillbegivenintheappropriateform,becauseopen-sourcesoftwareisbynomeansfreewareorinthepublicdomain.
Quitethecontrary,thepackageasawholeaswellaseachpieceofsourcecodeislabeledwithaclearcopyrightnotice,statingthenameofthecopyrightholderandthenatureofthelicense.
Thiscopyrightnoticemustnotberemoved.
Additionalcomments,however,regardingthechangesandimprovementsmadebyotherscan,ofcourse,beadded.
Substantialimprovementstoanexistingpieceofcodebysomeoneotherthanthecopyrightholderwillusuallyleadtosomethingliketeamformation,includingappropriatecopyrightchanges.
Thisisespeciallyimportantforacademicscientists,whoneedtobeabletopointouttheircontributionstoaparticularfield.
Consideringthevirtuesofopen-sourcesoftwareononehandandthescientifictraditionontheotherhand,westartedtheCDKprojectundertermsofaliberalopen-sourcelicense.
3WeuseSourceForge,4aWebbasedopen-sourcedevelopmentplatform,forcoordinatingthecontributionsfromabout10developersfromaboutfivedifferentcountries.
Agreaternumberofpeoplehavesubscribedtothedevelopers*Correspondingauthorpresentaddress:CologneUniversityBioinfor-maticsCenter(CUBIC),Cologne,Germany.
Phone:+49(0)2214707426;fax:+49(0)2214705092;e-mail:c.
steinbeck@uni-koeln.
de.
Max-Planck-InstituteofChemicalEcology.
TheraSTratAG.
§UniversityofPaderborn.
#Nijmegen.
493J.
Chem.
Inf.
Comput.
Sci.
2003,43,493-50010.
1021/ci025584yCCC:$25.
002003AmericanChemicalSocietyPublishedonWeb02/11/2003ThisisanopenaccessarticlepublishedunderaCreativeCommonsAttribution(CC-BY)License,whichpermitsunrestricteduse,distributionandreproductioninanymedium,providedtheauthorandsourcearecited.
mailinglistandeitherlistensilentlyorcontributebymakingfeaturerequestsorcriticalcomments.
SourceForgeprovidesallthetoolswhicharegenerallyconsideredtobeindispen-sablecomponentsforcoordinatingthecontributionsfromdevelopersandusersinlargersoftwareprojects,asthereareWebspace,mailinglists,bugtrackers,softwareversioningsystems,releasemanagers,etc.
ThisarticleisnotonlytodescribetheCDKprojectinscientificandsoftware-technologicaltermsbutalsotopromotetheunderlyingdevelopmentmodel.
TheauthorsthinkthattheseprinciplesformaparadigmforscientificsoftwaredevelopmentwherescientistscantrulyexploitthebenefitsoftheInternetforadistributedcollaborationthatwouldnothavebeenpossibleinpre-Internettimes.
Weareexplicitlynotclaimingtogiveageneraloverviewofchemicalopensourcesoftware.
Thiswillformanarticleofitsown.
However,wewillgiveasynopsisonopensourceJavasoftwareinthefollowingsectioninstead.
TheinterestedreaderiscordiallyinvitedtovisittheCDKprojectpagesathttp://cdk.
sourceforge.
net,getintouchwiththedevelopers,makeuseoftheCDKpackage,andultimatelytoextenditsfunctionality.
2.
OPENSOURCEJAVASOFTWAREINCHEMISTRYAnumberoflibrarieswritteninJavaarefreelyavailableinbinaryform,buttheydonotincludeaccesstouseandextendthesourcecode.
5-7Librariesforothercomputerlanguageshavebeendescribedintheliteraturebutare,toourknowledge,notavailabletothepublic.
8Togiveanoverviewoftheopensourceactivititiesinchemistry,weanalyzedtheopensourceprojectsregisteredatSourceForge.
4ThisWebsitehasabout40projectsregis-teredinthefieldofmolecularchemistry,asfoundwithasearchonkeywordssuchasmolecule,molecular,chemistry,andchemical.
Manyprojectsareinactive:someareonlyregisteredbutshownoactivityatall,andsomeshowedactivityinthepastbutneverreleasedsoftwareinbinaryformorsourcecode.
Thenumberofactiveprojectsisabout25-30.
Oftheseprojects14werefoundthatusetheJavaprogramminglanguage.
Threeoftheseareinactiveforalongperiodanddonotprovidedownloads.
Twoaresucceededbythisproject,9,10andfourarebasedonCDK.
11-15Fourprojectsareinterestingtonote:MolMasterhavingaBSDlicense16andincludingvisualizationofisosurfaces,jVisu-alizerhavingtheGPLlicense17foranalyzingNMRcou-plings,CMLhavinganArtisticLicense18withtoolsaroundtheChemicalMarkupLanguage,19andJOELibhavingtheGPLlicense20withanextensivefileIOlibrarybasedonOpenBabel21andalibraryformoleculardescriptors.
Notethatthefirsttwoarenotreallylibrariesbutapplicationsinstead.
CMLDOMandJOELib,however,arelibrarieswithsimilarfunctionalityforstoringchemicalcontentinmemory.
3.
THEORIGINOFTHECDKTheCDKoriginatedasasupportprojectforacoupleofdifferentchemoinformaticssoftwarepackages,namelyastructureeditor,11aWebdatabasefororganiccompoundsandtheirNMRchemicalshifts,14aprogramforcomputerassistedstructureelucidation,22anda3Dstructureviewerandanalyzer,13whichisstillbeingportedtotheCDK.
TheauthorsoftheseprogramsgenerallyagreeonthebenefitsoftheprogramminglanguageJava,asthereareasfollows:clearobject-orienteddesign,platform-indepen-dency,andthefactthatithasbecomeanimportantstandardforclient-andserver-sideapplicationsontheWeb.
Sincemostofthescientificallyinterestingapplicationsinchemistryhaveacomputationallydemandingkernel,theybenefitfromaclient/serverarchitecturebecausetheserverpartcanthenberunonapowerfulmachine,whileauser-friendly(Web-)interfacecanbeusedonwhateverclientmachinetheuserchooses.
ThesedemandscanbemetmucheasierifonecanstillresorttoasingleprogramminglanguagefortheimplementationandsoweconsiderJavatobetheprogram-minglanguageofchoicenotonlyforchemoinformaticsandcomputationalchemistrybutalsoforscientificapplicationsingeneral.
Concernsarefrequentlyraisedwithrespecttotheperfor-manceofJava.
However,thelanguagestructureitself,comparedforexamplewithC++,providesnogoodreasonforJavahavingagenerallylowerperformancethanotherlanguagesmorefrequentlyusedinhighperformancecom-puting.
Indeed,greateffortshavebeenmadetoincreaseJavaruntimeperformanceandso,today,givenaproperimple-mentationandusingtherightruntimeenvironment,server-sideJavacodedoesnotneedtobeslowerthanC++withthesamescope.
Wewouldliketopointthereader'sattentiontoawholeissueoftheIBMsystemsjournaldedicatedtothesubjectofhighperformancecomputinginJava.
234.
DEVELOPMENTMODELToparticipateinCDKdevelopment,theinterestedindi-vidualneedstoregisterwithSourceForge(SF)toreceiveafreeSFaccountandsubscribetothedevelopersmailinglistcdk-devel@lists.
sourceforge.
net.
Heorshethencontactsoneoftheprojectadministrators,whothenaddsthenewmembertotheproject'sdeveloperslist.
BesidesgoodJavaprogram-mingskills,aworkingknowledgefortheConcurrentVersionsSystem(CVS)isneeded.
CVSisthemostwidelyusedsystemforversionmanagementintheOpenSourcecommunity,whichgreatlyfacilitatesthecoordinationofmultipledevelopersworkingonthesamesourcetree.
Itisquitecommonincomputersciencetowritearequirementsspecificationbeforecodingisstarted.
Suchaspecificationdescribestheintendedbehaviorofthesoftware(classesinthiscase)andcanbeusedbydeveloperstochecktheimplementationandbyuserstoseehowthoseclassescanbeused.
WhentheCDKwasdesigned,suchspecificationwasonlypartlymadeusingUnifiedModelingLanguage(UML)diagrams.
24CurrentlyweuseRequestsForComment(RFC)documentsforproposinganewspecificationtowhichtheCDKlibrarymustconform.
TheseRFC,whicharealongtimeInternetstandardfordecisionmaking,arediscussedonthedevelopersmailinglistafterwhichtheyaremarkedasfinalaftermajorityvoting.
5.
PROJECTCONVENTIONSInJava,sourcecodeisorganizedinso-calledpackages,whichoften(butnotneccessarily)followanamingschemeofsomethinglikeaninvertedInternetaddress.
PuttingaclasssuchasAtomintoauniquelynamedpackagepreventsclassnamecollisionsincaseswhereanotherlibrary,usedtogether494J.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003STEINBECKETAL.
withtheCDK,alsocontainsanAtomclasswithdifferentfunction.
SincetheCDKispartoftheOpenScienceproject,25theCDKsourcetreeisorganizedinpackagesundertheorg.
openscience.
cdkrootpackage.
Frequently,anewdevel-operisinterestedinaddingaparticularfunctionalitytotheCDK,forexamplethecapabilityforisomorphismandautomorphismchecking.
HediscussestheimplicationsofhisendeavorwiththeothersCDKdevelopersonthemailinglist.
Takingintoaccountthesuggestions,caveats,etc.
,ofhiscodevelopers,hewouldthencreateanewsubpackageorg.
openscience.
cdk.
isomorphismandaddhiscontributionunderthispartofthesourcetree.
AnimportantpartoftheCDKdevelopmenteffortisUnitTesting,whichisbasedontheideaofwritingeasilyrepeatabletestsforsmallestunitsofthesoftwarepackageinquestion.
WheneveraprogrammeraddsanewmodulewithnewfunctionalitytotheCDKsourcetree,heisexpectedtoaddatesttotheorg.
openscience.
cdk.
testspackage,adheringtoaparticularnamingconvention.
TheunittestingitselfisbasedontheJUnitpackage,26whichmakesiteasytorunafullyunattendedtestforthewholeCDKpackage.
ThishasproventobeofgreatvalueforsuchadistributedprogrammingeffortliketheCDK.
EspeciallyifadeveloperchangessomethingwithintheCDKcoreclasses,afullJUnittestrunoftheCDKtestswillshowhimwithinafewsecondswhetherhischangesbrokesomethingornot.
Further,eachoftheselittletestsnippetsisaninstructiveexampleonhowtouseaparticularCDKmodule.
Indispensableforalibraryisdocumentation.
TheCDKisdocumentedusingtheJavaDocsystemsanintegralpartoftheJavaprogramminglanguage.
Usingspecialtags,thecodeisdocumenteddirectlyinthesourcecode,fromwhichdocumentationcanbeproducedautomaticallyinvariousformats,mostimportantlyasWebpages.
Weareusingsourcecodemetricstoconstantlymeasuretheamountofdocu-mentedsourcecodestatements,andwetrytokeepthispercentageashighaspossible.
InadditiontotheJavaDocAPIdocumentation,theuserisguidedbyafewintroductorymanuals.
ItshouldalsobementionedthattheCDK'ssoftwarearchitecturehasbeenindependentlychosenassubjectofanM.
Sc.
thesisattheTechnion(IsraelInstituteofTechnol-ogy),27focusingonautomatedmethodsforcodeinspectionandreview.
Thisisacommonindustrialprocessbywhichsourcecodeisusuallyreadmanuallytofinderrors,potentialimprovements,dependencies,etc.
Thethesisfocusesonautomizingtheformalconceptanalysisusingconceptlat-tices28forthereviewofindividualjavaclasses.
Conceptanalysisisamathematicalclassificationtechnique,whichisusedfordifferentproblemsinsoftwareresearch.
Thismethodologyisappliedinthreestages:(1)understandingthepublicinterfaceoftheclassforuseasablackbox,(2)tryingtoreasonaboutthedesignandpossibleerrorsintheclassbasedonitslattice,and(3)inspectingactualsourcecode.
Thefirsttwostagesaredonewithoutevenhavingthesourcecode:themethodsandfieldsaredeterminedbyreverseengineeringofthecompiledclassfiles.
Wehavealreadyreceivedvaluableinputfromthisrelatedprojectwhichwillhelpustoresolvedesignflawsinourlibrary.
6.
DESCRIPTIONOFTHELIBRARY'SFUNCTIONALITY6.
1.
TheCoreClasses.
TheclassescontainedintherootsectionoftheCDK'spackagehierarchyareallformalizedrepresentationsofbasicchemicalconceptssuchasatoms,bonds,molecules,etc.
Figure1showsanUMLdiagramexplainingtheinheritancehierarchyandthedependenciesbetweenthefundamentalclassesoftheCDK.
TheUMLdiagramsshowninthisarticledepicttherelationshipofonlythecoreclasses.
Theyarethuseditedanddoonlyshowasubsetoftheirtrueinterclassrelationships.
TheyshowthecentralroleoftheChemObjectclass,whichisthesuperclassofallotherclassesandprovidesmethodsforstoringevencomplexpropertiesforanyderivedCDKobject.
ThefirstandprobablymostobviousinheritancechaintobementionedinthecoreclassesitthatofAtomextendingAtomTypeextendingIsotopeextendingElement.
ThisisnotonlylogicalfromachemicalpointofviewbutalsoprovidesthebasisforasimplemechanismforthecreationofAtoms,AtomTypes,Isotopes,andElementsbasedonsubclassesofasingleIsotopeFactorytoolclass,whichwillbediscussedbelow.
PlacingtheAtominalongchainofinheritanceprovidescentralaccesspointstothedifferentlevelsofFigure1.
UMLdiagram,showingtheinheritancehierarchyandthedependenciesofthefundamentalclasseswithintheCDK.
THECHEMISTRYDEVELOPMENTKITJ.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003495information.
WhiletheElement,forexample,providesaccesstothesymbolortheatomicnumber,someAtomTypecanfurtherdistinguishbetweenthestateofhybridizationofanAtomorsomeotherdistinctionaforcefieldmightneed.
AfurtherlevelofabstractionisincorporatedbytheAtomContainerandtheElectronContainer.
TheElectron-ContainerformsthebaseforconstructssuchasBondsandOrbitals,whereastheAtomContaineristheenvisionedstorageforAtomstogetherwiththeirBondsandisthesuperclassforRings,Molecules,andSubstructures.
Tosupporthigherlevelconceptssuchasmolecularensemblesorreactions,theCDKcoreiscomplementedbyclasseswhichgroupmoleculesintohigherorderconstructs,likeSetOfMolecules,ChemSequence,ChemModel,andChemFile.
Forclarity,therelationshipofChemObjectandtheAtomContainerhasbeenmovedtoanadditionalUMLdiagramshowninFigure2.
ItshowshowMoleculesarecontainedinaSetOfMol-ecules,whichispartofaChemModel.
ChemModelsaremeanttostorethemolecularinformationofthestateofachemicalsystemsatagivenpointintime.
Toallowforthemodelingofchangesintime,weintroducedthepossibilityofarrangingvariousChemModelsintoaChemSequence.
TheChemFileclassisdesignedasthetoplevelcontainer,whichcancontainalltheconceptsstoredinachemicaldocumentamongwhichoneormoreChemSequences.
ThePolymerclassextendsMoleculeandprovidescon-venientaccesstotheMonomersitconsistsof.
TheMonomeritselfisimplementedasanAtomContainer.
AsubclassofPolymeristheBioPolymerusedforrepresentingproteinandDNAmolecules.
ThePolymerdesignallowsBioPolymerstotreateachaminoacidasanAtomContainer.
6.
2.
2DStructureGraphicalHandling.
Theabilitytodisplayandmanipulate2Ddrawingsofchemicalstructuresisoneofthemostimportantfeaturesofanychemoinfor-matics-relatedprogram.
Thisincludesthecapabilityofgeneratingcoordinatesforthosechemicalstructureswhichhaveforexamplebeengeneratedbystructuregeneratorascoordinateless,chemicalgraphs.
ThedetailsforthislattersteparediscussedinSection6.
4.
TheModel-View-Controllerparadigm(seeforexampleref29)isusedintheCDKlibrarydesignwhereverapplicable.
Theclassesfor2Dstructuregraphicalhandling,forexample,workontopofaChemModelwhosecontenttheydisplayandmanipulate.
ARenderer2Dclassproducesa2Ddrawingcomparabletothoseproducedbythemajorcommerciallyavailableproducts.
Thisviewcanbecustom-izedbyalteringthestandardsettingsofaRenderer2DModelobject.
Ifthepuredisplayistobecomplementedbyanoptiontomanipulatethedrawing,aControler2Dcanbeaddedtothesetup.
Itssettings,again,aredeterminedbyaControler2DModelandcanbealtered,forexample,byusingsetDrawNumbers(true)inordertodisplayatomnumbersannotatedtothestructure.
TheControler2Disanadaptertotheavailableinputdevices,typicallymouseandkeyboard,andtranslatesinputintochangestotheunderlyingmodels,whichagainarereflectedbychangesintheviewproducedbytheRenderer2D.
AsimpleresultingapplicationisshowninFigure3.
6.
3.
3DStructureHandling.
Toprovidehighperformance3Dgraphics,theJava3DAPIisusedwithintheCDK.
This,however,makesCDK-based3Dapplicationsnolongerplatformindependent.
ThisdependencyoriginatesfromJava3DAPIrelyingonOpenGLorDirectXforthesakeofFigure2.
UMLdiagram,showingtheinheritancehierarchyandthedependenciesofclassesgroupbasedontheAtomContainerconcept.
496J.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003STEINBECKETAL.
higherperformance.
SUNmicrosystemsdoesonlyprovidetheJava3DforWindows(bothOpenGLandDirectX),SolarisandSGIIRIX,whereasaLinuxversionisdevelopedbyBlackdown30andavailableforavarietyofarchitectures.
InregardtoloosingtheplatformindependencytheCDKdoesalsocontainclassesfor3DrenderingwhicharenotbasedupontheJava3DAPI.
Togetherwiththeseparationoftherenderingclasses,duetotheModel-View-Controllerparadigm,thisleadstothefollowingfourfundamentalclassesfor3Drendering:Renderer3D,Renderer3DModel,Accel-eratedRenderer3D,andAcceleratedRenderer3DModel,thelattertwobaseduponJava3D.
6.
4.
StructureDiagramLayout.
Keyfieldsofchemoin-formatics,likevirtualcombinatorialchemistry,virtualscreen-ing,orcomputer-assistedstructureelucidation,frequentlyhandlechemicalstructuresasone-dimensionalgraphs.
Thesegraphsare,forexample,productsofstructuregeneratorswhichusegraphtheoreticaltechniquestoexhaustivelyandirredundantlygenerateallconstistitutionalisomerswhichareinagreementwithagivenmolecularformula.
Inanyoftheseprograms,however,comesthepointwhere,afteraselectionduringavirtualscreening,forexample,thesuccessfulcandidatestructure(s)needs(s)tobepresentedtoachemist.
Atthispoint,atoolisneededthatgenerates2Dor3Dcoordinatestoproducethekindofdepictionachemistisusedto.
ThisprocesshasbeentermedStructureDiagramGeneration.
31While3DmodelbuilderssuchasCORINA32areonourwishlistforthefutureandhavenotyetbeenimplemented,theCDKfeaturesa2Dstructurediagramgenerator,whichhasbeenwrittenfromscratchandwhichcaneasilybeseenasoneofthefinestandmostusefulpartsoftheCDK,sincemostofitsapplicationsrequirestructurediagramgenerationatseveralstages.
6.
5.
GraphInvariants.
ThispackagecontainsafewclassesforthecomputationofgraphinvariantssuchasWienerIndices,33Morgan'sextendedconnectivity(EC)indices,34andothers.
35Morgan'sECindicesare,forexample,usedforcanonicallabelingofcompounds.
Thispackageislikelytobeoneofthehotspotforfuturedevelopments,sincemanychemoinformaticsapplications,like(quantitative)structureactivityrelationship((Q)SAR)computations,dooftenrelyoncalculatingvariouscombinationsofgraphinvariantsofdifferenttypes.
6.
6.
StructureGenerators.
ThispackageholdssomesimplestructuregeneratorswhichareusedbytheSENECAsystemforcomputer-assistedstructureelucidation.
22TheclassSingleRandomStructureGeneratorcanbeusedtogener-ateatotallyrandomstructurefromtheconstitutionalspacegivenbyacertainmolecularformula.
BasedonthisrandomlygeneratedstructureonecanthenuseRandomGeneratortomakesmall,randommovesinconstitutionspace,basedonanalgorithmsuggestedbyFaulon.
36Ifsuchageneratoriscombinedwithatargetfunctionandsimulatedannealingprotocol,onecaneffectivelysearchconstitutionspaceforstructureswithcertaindesiredproperties,providedthatthesepropertiescanbereliablybackcalculatedfromagivenconstitutionalformula.
Tobeabletobuildastructuregeneratorforchemicalgraphsbasedonevolutionaryalgorithms(likethewell-knowngeneticalgorithm),wealsoincludedaCrossOverMachine,whichacceptstwochemicalgraphsintheformofAtom-Containersandproducestwooffsprings.
GeneticAlgorithmsarepopulationbasedmethodswhichproducenewoffspringsforthenextgenerationbyacarefullychosencombinationofmutationandcrossoverprocedures,appliedtothecurrentpopulation.
TheCrossOverMachinedoesthuscomplementthemutationoperationusedintheRandomGeneratorclass.
6.
7.
RingSearches.
JohnFigueras'fastalgorithmforfindingtheSmallestSetofSmallestRings(SSSR)hasbeenimplementedandisusedforexamplebythestructurediagramgenerationpackage.
37Especiallylargecondensedringsystems,forwhichtheprocessofcoordinategenerationcouldtakeuptoaminuteduetoaslowdepthfirstringperceptionalgorithminoldersystems,38cannowbelayedoutwithinfractionsofasecondasshowninFigure4.
FurtherthispackagecontainsaclassforpartioningagivenringsystemsintoAtomContainers,oneforeachring.
Inotherapplications,likearomaticitydetection,forexample,itisessentialtocomputetheSetofAllRings(SAR).
WhileprocedureshavebeenpublishedtoproducetheSARfromaSSSR,itiscomputationallymoreefficientFigure3.
Renderer2DandController2Dcooperatinginasimple,CDK-basedversionofJChemPaint.
JChemPaintsupportsinterna-tionalization,withthisexampleshowingadutchinterface.
Figure4.
AringssystemparsedfromaSMILES,analyzedbyFigueras'SSSRalgorithmanddisplayedbytheMoleculeViewerclass.
Theprocesstakes300msona600MHzPentiumwithWindowsXPandJDK1.
3.
1.
THECHEMISTRYDEVELOPMENTKITJ.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003497tousespecializedalgorithmsforthispurpose.
TheCDKcontainsanimplementationofafastandefficientalgorithmgivenbyHanseretal.
396.
8.
AromaticityDetection.
Therearevariousdefinitionsofaromaticityandatleastasmanywaysofdetectingaromaticityaccordingtothesedefinitions.
ThispackageistheintendedcontainerforallofthemanddoescurrentlyholdanimplementationofaHueckelAromaticityDetectorclass.
BasedontheSARdetectionalgorithmbyHanseretal.
(seesection6.
7)thisclassstartswiththelargestdetectedring,countsthenumberofalternatingdoubleortriplebondelectrons,anddoesalsotakeintoaccountfreeelectronpairsofheteroatoms.
Itthencheckswhethertheringcontains4n+2π-electrons,accordingtothewell-knownHu¨ckelrule.
Thering,allitsatoms,andbondsaremarkedasaromatic,andthesearchcontinueswiththeremainingringsofequalorsmallersize,leavingoutthoseringsthatarecompletelypartofanalreadydetectedlargeraromaticsystem.
6.
9.
Isomorphism.
Beingabletodetermineiftwochemi-calstructuresareidenticalorwhetheronestructureisasubgraphofanotherstructureisoneofthemostimportantcapabilitiesofachemoinformaticslibrary.
TheIsomorphismsubpackagecontainsaversatilemoduleforMaximumCommonSubstructure(MCSS)Searches.
SinceMCSSdeterminationisthemostgeneralcaseofgraphmatching,itcanbeusedtodeterminestructureidentityandtodosubgraphmatchingandmaximumcommonsubstructuresearches.
6.
10.
FileInput/Output.
Fileinputandoutputisgeneral-izedinCDK.
Allfilei/oclassesimplementeitherChemOb-jectReaderorChemObjectWriter.
Eachfileformatisrep-resentedbytwoseparateclassesimplementingoneoftheseinterfaces.
CDKcurrentlysupportsIOclassesforXYZ,MDLmolfile,40PDB,41andCML.
42Thelatterformatwasdevel-opedbyMurray-RustandRzepaasthefirstXMLbasedfileformatforchemicalcontent.
TheCDKcontainsbothaninputandoutputclassforthisformat.
TheCMLinputreaderusesanalternativetoMurray-Rust'sDOMapproachandisbasedonSAX.
436.
11.
InteractionwithotherJavaLibraries.
Besidesfilei/o,CDKsupportsasecondmethodtoexchangedatawithotherprogramsandlibraries.
Theinterfacetootherlibrariesmakesitpossibletocombinemethodsfrombothlibrariesgivingaccesstoalargersetoffunctionality.
CDKprovidesdirectconversionofCDKclassestoJOELib20classes.
SupportforCMLDOM19isplanned.
6.
12.
SMILES.
SimplifiedMolecularLineEntrySpeci-fication(SMILES)providesstringrepresentationsofmo-lecularconstitutions.
44Duetotheircompactnessandrelativesimplicitytheyarenowwidelyusedasaninterchangeformatforcoordinatelessmolecularstructures.
Basedonaspecifica-tionforunique(canonical)SMILES,45itisalsopossibletoperformgraphisomorphismchecks.
TheCDKfeaturesageneratorforcanonicalSMILES,writtentocomplywiththerulespublishedbytheDaylightInc.
founders.
WhiletheSMILESgeneratorimplementsallofthepublishedSMILESstandardincludingchirality,theSMILESparserintheCDKpackageonlycompliestothe(slightlyextended)SuperSimplifiedSMILESspecification46whichissufficienttocodemostorganicstructures.
6.
13.
Fingerprints.
Fingerprintingisnowadaysanindis-pensabletoolforjudgingmolecularsimilarity,asaprefilterforisomorphismcheckingandthusforstructuresearchingindatabases.
HereaswellasinthecaseofSMILESanownsubpackageforthisclassofalgorithmsisjustifiedbecausetherearevariouswaysofcomputingfingerprints.
Byallowingtheadditionofdifferentfingerprintersinsteadofjusthavingonemonolithicorg.
openscience.
cdk.
tools.
Fin-gerprinterwegivetheuserthefreedomofchoosingwhatevermethodsyieldsthebestperformanceforhiscase.
TheFingerprinterclassintheCDKproducesDaylight-typefingerprints.
47Itworksbyrunningabreadth-firstsearch,startingateachatominthemolecule,therebyproducingstringrepresentationsofpathsuptothelengthofsixatoms.
ForeachofthesesSMILES-likestrings,hashcodesarecomputed,usingthestandardstringhashingalgorithmprovidedbytheJavalanguage.
Withthesehashcodes,apseudorandomnumbergeneratorwithadefaultworkingrangeof[0-1023]isseededandthefirstrandomnumberisretrieved.
Thisnumberindicatesapositioninafingerprintbitstringoflength1024,whichisthensetto"1".
Basedontheentiretyofallcomputedpathsfromthemolecule,amolecularfingerprintisobtainedintheformofthisbitstring.
6.
14.
Tools.
Thetoolspackagecontainsutilityclassesforallthosecasesthatdidnotjustifythecreationofadedicatedpackage.
TheIsotopeFactory,forexample,canreturnpre-configuredinstancesofElementsandIsotopesforagivenelementsymboloragivenatomicmass.
TheConnectivityCheckerclasstestswhetheragivenchemicalgraphisconnected,i.
e.
,whetherthereisabondpathbetweeneverypossiblepairofatomsinthegraphand,inthecaseofanonconnectedgraph,itcanreturnaVectorwiththedisjunctpiecesofthegraph,storedinAtomCon-tainerobjects.
RelatedtoConnectivityCheckeristhePath-Toolsclasswhich,forexample,providesmethodsforfindingtheshortestpathbetweentogivenatomsinamolecule.
TheMFAnalyserclasshasmethodsofreturningthemolecularformulaofagivenMoleculeobjectandforcreatinganunbondedAtomContainerobjectfromagivenmolecularformulastring.
TheHOSECodeGeneratorproducesHOSEcodes48foreachatominagivenAtomContainer.
ByfeedingtheseHOSEcodesintotheBremserOneSphere-HOSECodePredictorclass,onecanpredictexpectationrangesforcarbon-13NMRchemicalshifts.
497.
RESULTSTheCDKisnowthebasisforanumberofsoftwareprojects.
ThechemicaleditorJChemPaint11whichtakesadvantageoftheCDKandforwhichtheCDK'sModel-View-ControllermechanismshavebeenimplementedisagainjustasupporttoolforhigherlevelapplicationssuchastheWebdatabaseNMRShiftDBfororganiccompoundsandtheirNMRchemicalshifts,orSENECA,aprogramforcomputerassistedstructureelucidation.
22WhileallowingthefastassemblyofsuchlargemonolithicapplicationssuchasSENECAorNMRShiftDB,thetruestrengthoftheCDKliesinitsabilitytoserveasachemoinformatician'sworkbench.
Byjustwritingafewlinesofcode,onecanquicklytestnewideasormodifyexistingCDKbasedapplicationstomakethemsuitotherneeds.
498J.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003STEINBECKETAL.
ThefollowingcodesnippetillustrateshowonecanquicklyparsealistofSMILESstringsintoAtomContainers,produce2Dcoordinates,anddisplaytheresultsinaMoleculeList-Viewer.
8.
CONCLUSIONWehavepresenteddetailsofanewopen-sourceJavalibraryfacilitatingtheimplementationofsoftwarepackagesinchemoinformatics.
TheCDKisfreelyavailable50underthetermsoftheGNULesserGeneralPublicLicense(LGPL)3.
Thesourcecodemaythusbedownloadedandimprovedoradaptedforspecificneeds.
IncontrasttothefamousGNUGeneralPublicLicense(GPL)51theLGPLallowsfortheuseoftheCDKinproprietarysoftwarepackages.
WhileanyuseoftheCDKforproprietaryandclosed-sourceprojectisthuswelcome,wealsohighlyappreciatefeedbackandanypotentialbackflow.
CompaniesareusingtheCDKforcommercialprojects,suchasSafeBase,atheragenomicsknowledgemanagementsystemonadversedrugreactions.
52AttheIBMGermanyDevelop-mentLabinBo¨blingenanExtremeBlueinternshipprojectgrouphasbeenstartedtowriteaCDK-basedopensource2D/3Deditorforchemicalstructures.
ThecompanyIXELIS,situatedinStrasbourg,France,isworkingonaglobalsemanticinformationsystemappliedtoscientificknowledgeandhascontributedtheMCSScode,whichcameintoexistenceduringtheirworkwiththeCDK.
Further,ourchemoinformaticssoftwarekitisthebasisforotheropen-sourceprojects,liketheSENECAsystemforcomputer-assistedstructureelucidation22andNMRShiftDB,14afreedatabaseoforganicchemicalsandtheirNMRdata.
Besidesitsprovenusabilityinresearchandproductionqualityscientificsoftware,theCDKhasalsobecomeavaluabletoolforteachingchemoinformatics.
Atleastoneofourauthors(C.
S.
)isusingthesoftwarepackageinlecturestodemonstratemanystandardchemoinformaticsalgorithmsonthefunctionalitylevelaswellasonthesourcecodelevel.
DuetotheinherentmodularizationoftheobjectorientedlanguageJava,mostoftheclassesandmethodsareconciseandeasytounderstand.
Itshouldbementionedthatwehaveexperienced,albeitonasmallerscalethanthelargeopen-sourceprojects,thebenefitsandthefascinationoftheprinciplesmentionedintheIntroduction.
Basedonthisexperience,thisarticleisalsosupposedtopromotetheseideasandtoattractfurthercontributorsforourproject.
Theinspiringexperienceisthatassoonasacertainamountofmaterialhasaccumulatedandacertainamountofpublicityhasbeengained,anopen-sourceprojectbecomessomethinglikeaself-runner,contributorsstartaddingtheirownsubprojects,andnewideasareintegratedwhichwouldprobablyneverhavebeenborneinmindiftheCDKwerecreatedbyasingleorganizationandevenindividual.
Ofcourse,suchadevelopmentmodelalsohasdisadvantages.
Itisprobablymuchmoredifficulttoadheretocertainqualitystandards,torespondtodeadlines(butontheotherhand,thererarelyareanyinsuchsmallprojects),andtodostrategicplanning.
Ithasbeenshown,however,thattheseproblemscanbeovercome.
ACKNOWLEDGMENTTheauthorswouldliketothankallmembersoftheCDKprojectfortheircontributions,corrections,andhelpfulcomments.
REFERENCESANDNOTES(1)TheOpenSourceInitiative(OSI),http://www.
opensource.
org(accessedonAug2002),2002.
(2)Raymond,E.
S.
TheCathedralandtheBazaar:MusingsonLinuxandOpenSourcebyanAccidentalReVolutionary;O'ReillyandAssociates:Sebastopol,CA,1999.
(3)GNULesserGeneralPublicLicense-GNUProject-FreeSoftwareFoundation(FSF),http://www.
gnu.
org/licenses/lgpl.
html(accessedonAug2002),2002.
(4)SourceForge.
net,http://www.
sf.
net/(accessedonAug2002),2002.
(5)Rzepa,H.
;Tonge,A.
VChemLab:AVirtualChemistryLaboratory.
TheStorage,Retrieval,andDisplayofChemicalInformationUsingStandardInternetTools.
J.
Chem.
Inf.
Comput.
Sci.
1998,38,1048-1053.
(6)Csizmadia,F.
JChem:JavaAppletsandModulesSupportingChemicalDatabaseHandlingfromWebBrowsers.
J.
Chem.
Inf.
Comput.
Sci.
2000,40,323-324.
(7)Blauch,D.
JavaClassesforManagingChemicalInformationandSolvingGeneralizedEquilibriumProblems.
J.
Chem.
Inf.
Comput.
Sci.
2002,42,143-146.
(8)Bauerschmidt,S.
;Gasteiger,J.
Overcomingthelimitationsofaconnectiontabledescription:Auniversalrepresentationofchemicalspecies.
J.
Chem.
Inf.
Comput.
Sci.
1997,37,705-714.
(9)TheCompChemlibraries,http://compchem.
sourceforge.
net/(accessedonAug2002),2002.
(10)TheJMDrawStructureDiagramGenerationEngine,http://jmdraw.
sourceforge.
net/(accessedonAug2002),2002.
(11)Steinbeck,C.
;Krause,S.
;Willighagen,E.
JChemPaint-UsingtheCollaborativeForcesoftheInternettoDevelopaFreeEditorfor2DChemicalStructures.
Molecules2000,5,93-98.
(12)TheJChemPaintStructureEditor,http://jmdraw.
sourceforge.
net/(ac-cessedonAug2002),2002.
Figure5.
ACDKcodesnippetillustratingtheuseofSmilesParserandStructureDiagramGeneratorisshownfollowedbyitsoutput.
THECHEMISTRYDEVELOPMENTKITJ.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003499(13)TheJmol3DMolecularVisualizationSoftware,http://jmol.
sourceforge.
net/(accessedonAug2002),2002.
(14)Kuhn,S.
;Krause,S.
;Steinbeck,C.
NMRShiftDB-AnOpen-Access,Open-Submission,Open-SourceDatabaseforOrganicStructuresandtheirNMRdata.
2002,Manuscriptinpreparation.
(15)TheNMRShiftDBNMRDatabase,http://www.
nmrshiftdb.
org/(ac-cessedonAug2002),2002.
(16)TheMolMasterMolecularVisualizationPackage,http://molmaster.
sourceforge.
net/(accessedonAug2002),2002.
(17)TheJVisualizerNMRAnalysisPackage,http://jvisualizer.
sourceforge.
net/(accessedonAug2002),2002.
(18)TheChemicalMarkupLanguageSupportingSoftwarePages,http://cml.
sourceforge.
net/(accessedonAug2002),2002.
(19)Murray-Rust,P.
;Rzepa,H.
ChemicalMarkupXML,andtheWorldwideWeb.
2.
InformationObjectsandtheCMLDOM.
J.
Chem.
Inf.
Comput.
Sci.
2001,41,1113-1123.
(20)JOELib-ajavabasedcomputationalchemistrypackage,http://joelib.
sourceforge.
net/(accessedonAug2002),2002.
(21)TheOpenBabelChemicalFileFormatConversionPackage,http://openbabel.
sourceforge.
net/(accessedonAug2002),2002.
(22)Steinbeck,C.
SENECA:APlatform-Independent,DistributedandParallelSystemforComputer-AssistedStructureElucidationinOrganicChemistry.
J.
Chem.
Inf.
Comput.
Sci.
2001,41,1500-1507.
(23)IBMSystemsJournal-JavaPerformance,2000.
(24)Stevens,P.
;Pooley,R.
UsingUML:softwareengineeringwithobjectsandcomponents;ObjectTechnologySeriesAddison-Wesley:1999UpdatededitionforUML1.
3:firstpublished1998(asPooleyandStevens).
(25)TheOpenScienceProject,http://www.
openscience.
org/(accessedonAug2002),2002.
(26)JUnit,TestingResourcesforExtremeProgramming,http://www.
junit.
org/(accessedonAug2002),2002.
(27)Dekel,U.
PersonalCommunication,2002.
(28)Ganter,B.
;Wille,R.
ConceptAnalysis:MathematicalFoundations;Springer-Verlag:Berlin-Heidelberg,1999.
(29)Krasner,G.
;Pope,S.
ACookbookforusingtheModel-View-ControllerUserInterfaceParadigminSmalltalk-80.
JOOP1988,29-49.
(30)Java-Linux,http://www.
blackdown.
org/(accessedonAug2002),2002.
(31)Helson,H.
StructureDiagramGeneration.
ReV.
Comput.
Chem.
1999,13,313-398.
(32)Gasteiger,J.
;Rudolph,C.
;Sadowski,J.
AutomaticGenerationof3D-AtomicCoordinatesforOrganicMolecules.
TetrahedronComput.
Method.
1990,4,537-547.
(33)Wiener,H.
CorrelationofHeatofIsomerizationandDifferenceinHeatofVaporizationofIsomersAmongParaffinHydrocarbons.
J.
Am.
Chem.
Soc.
1947,69,17-20.
(34)Morgan,H.
L.
TheGenerationofaUniqueMachineDescriptionforChemicalStructures-ATechniqueDevelopedatChemicalAbstractsService.
J.
Chem.
Doc.
1965,5,107-113.
(35)Hu,C.
Y.
;Lu,L.
Onhighlydiscriminatingmoleculartopologicalindex.
J.
Chem.
Inf.
Comput.
Sci.
1996,36,82-90.
(36)Faulon,J.
-L.
StochasticGeneratorofChemicalStructure.
2.
UsingSimulatedAnnealingToSearchtheSpaceofConstitutionalIsomers.
J.
Chem.
Inf.
Comput.
Sci.
1996,36,731-740.
(37)Figueras,J.
RingPerceptionUsingBreadth-FirstSearch.
J.
Chem.
Inf.
Comput.
Sci.
1996,36,986-991.
(38)Bley,K.
;Brandt,J.
;Dengler,A.
;Frank,R.
;Ugi,I.
ConstitutionalFormulaegeneratedfromConnectivityInformation:theProgramMDRAW.
J.
Chem.
Res.
(M)1991,2601-2689.
(39)Hanser,T.
;Jauffret,P.
;Kaufmann,G.
Anewalgorithmforexhaustiveringperceptioninamoleculargraph.
J.
Chem.
Inf.
Comput.
Sci.
1996,36,1146-1152.
(40)DescriptionofSeveralChemicalStructureFileFormatsUsedbyComputerProgramsDevelopedatMolecularDesignLimited,1992Anupdatedonlineversionofthisdocumentcanbefoundonhttp://www.
mdli.
com/downloads/literature/ctfile.
pdf.
(41)ProteinDataBankAtomicCoordinateandBibliographicEntryFormatDescription,1985.
(42)Murray-Rust,P.
;Rzepa,H.
ChemicalMarkupXML,andtheWorldwideWeb.
1.
BasicPrinciples.
J.
Chem.
Inf.
Comput.
Sci.
1999,39,928-942.
(43)Willighagen,E.
ProcessingCMLconventionsinJava.
InternetJ.
Chem.
2001,4,4.
(44)Weininger,D.
SMILES,aChemicalLanguageandInformationSystem.
1.
IntroductiontoMethodologyandEncodingRules.
J.
Chem.
Inf.
Comput.
Sci.
1988,28,31-36.
(45)Weininger,D.
;Weininger,A.
;Weininger,J.
SMILES.
2.
AlgorithmforGenerationofUniqueSMILESNotation.
J.
Chem.
Inf.
Comput.
Sci.
1989,29,97-101.
(46)SMILESHomePage,http://www.
daylight.
com/dayhtml/smiles/(ac-cessedonAug2002),2002.
(47)James,C.
A.
;Weininger,D.
;Delany,J.
DaylightTheoryManual,http://www.
daylight.
com/dayhtml/doc/theory/theory.
toc.
html(accessedonAug2002),2000.
(48)Bremser,W.
HOSE-ANovelSubstructureCode.
Anal.
Chim.
Act.
1978,103,355-365.
(49)Bremser,W.
ExpectationRangesof13-CNMRChemicalShifts.
Magn.
Reson.
Chem.
1985,23,271-275.
(50)TheChemicalDevelopmentKit,http://cdk.
sf.
net/(accessedonAug2002),2002.
(51)GNUGeneralPublicLicense-GNUProject-FreeSoftwareFoundation(FSF),http://www.
gnu.
org/licenses/gpl.
html(accessedonAug2002),2002.
(52)TheraSTrat-Takingdrugsafetyastepfurther,http://www.
therastrat.
com/(accessedonAug2002),2002.
CI025584Y500J.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003STEINBECKETAL.
六一云 成立于2018年,归属于西安六一网络科技有限公司,是一家国内正规持有IDC ISP CDN IRCS电信经营许可证书的老牌商家。大陆持证公司受大陆各部门监管不好用支持退款退现,再也不怕被割韭菜了!主要业务有:国内高防云,美国高防云,美国cera大带宽,香港CTG,香港沙田CN2,海外站群服务,物理机,宿母鸡等,另外也诚招代理欢迎咨询。官网www.61cloud.net最新直销劲爆...
无忧云官网无忧云怎么样 无忧云服务器好不好 无忧云值不值得购买 无忧云,无忧云是一家成立于2017年的老牌商家旗下的服务器销售品牌,现由深圳市云上无忧网络科技有限公司运营,是正规持证IDC/ISP/IRCS商家,主要销售国内、中国香港、国外服务器产品,线路有腾讯云国外线路、自营香港CN2线路等,都是中国大陆直连线路,非常适合免北岸建站业务需求和各种负载较高的项目,同时国内服务器也有多个BGP以及高...
目前在标准互联这边有两台香港云服务器产品,这不看到有通知到期提醒才关注到。平时我还是很少去登录这个服务商的,这个服务商最近一年的促销信息比较少,这个和他们的运营策略有关系。已经从开始的倾向低价和个人用户云服务器市场,开始转型到中高端个人和企业用户的独立服务器。在这篇文章中,有看到标准互联有推出襄阳电信高防服务器100GB防御。有三款促销方案我们有需要可以看看。我们看看几款方案配置。型号内存硬盘IP...
safebase为你推荐
北京市通州区教育委员会2019年全国职业院校技能大赛geraudios11Source163深圳市富满电子集团股份有限公司owned163交换机route大学生就业信息获取与信息分析买家googleC1:山东品牌商品馆
骨干网 gomezpeer php空间推荐 100m独享 息壤代理 免费cdn vip域名 空间登入 工信部icp备案查询 免费网络空间 攻击服务器 北京主机托管 japanese50m咸熟 windows2008 美国vpn代理 godaddy域名 连连支付 blaze sockscap怎么用 最年轻博士 更多