diagramssafebase

safebase  时间:2021-05-24  阅读:()
TheChemistryDevelopmentKit(CDK):AnOpen-SourceJavaLibraryforChemo-andBioinformaticsChristophSteinbeck,*,YongquanHan,StefanKuhn,OliverHorlacher,EdgarLuttmann,§andEgonWillighagen#Max-Planck-InstituteofChemicalEcology,Jena,Germany,TheraSTratAG,Allschwil,Switzerland,InstituteofOrganicChemistry,UniversityofPaderborn,Germany,andNijmegen,TheNetherlandsReceivedAugust17,2002TheChemistryDevelopmentKit(CDK)isafreelyavailableopen-sourceJavalibraryforStructuralChemo-andBioinformatics.
Itsarchitectureandcapabilitiesaswellasthedevelopmentasanopen-sourceprojectbyateamofinternationalcollaboratorsfromacademicandindustrialinstitutionsisdescribed.
TheCDKprovidesmethodsformanycommontasksinmolecularinformatics,including2Dand3Drenderingofchemicalstructures,I/Oroutines,SMILESparsingandgeneration,ringsearches,isomorphismchecking,structurediagramgeneration,etc.
Applicationscenariosaswellasaccessinformationforinterestedusersandpotentialcontributorsaregiven.
1.
INTRODUCTIONWhoeverpursuestheendeavorofcreatingalargersoftwarepackageinchemoinformaticsorcomputationalchemistryfromscratchwillsoonbeconfrontedwiththeSyssiphustaskofimplementingthestandardrepertoireofchemoinformaticalalgorithmsandcomponentsinventedduringthelast20or30years.
TheobviousworkaroundforthisproblemarecommerciallyavailablechemoinformaticslibrariesthathavebeendevelopedbycompaniessuchasMDLInformationSystems,Inc.
,DaylightChemicalInfor-mationSystems,Inc.
,AdvancedChemistryDevelopment,andcertainlymanyothers.
Ascientistinanacademicenvironment,however,oftenfeelsobligedtoopenlysharehisresultswiththescientificcommunity.
Usingproprietarycomponentsforsoftwaredevelopmentmakesitimpossibletodoso.
Generally,scientificsoftwareistoooftenclosedsource,leavingtheuserwithablackboxperformingmagicaloperations.
Perceivedasbeingcounterproductivefortheoverallscientificprogress,thistrendfortunatelyseemstochange.
Sharingofideasandresultswithincommunitiesisprobablythemostcentralparadigminscience.
Bypublishinghisresultsascientistallowshiscolleaguestoverifyandbuilduponhisresults,therebyadvancingtheparticularfieldasawhole[IfIhaveseenfurtheritisbystandingontheshouldersofgiants.
-IsaacNewton].
Oneofthemotivationsforsuchcontributions,besidesthepurescientificcuriosity,is,ofcourse,thegainofsocialrecognitionandreputationamonghispeers.
Inrecentyearstheideassketchedabovehavebeenpartoftheopen-sourcerevolutionthattookplaceintheworldofsoftwaredevelopment,mostwidelyrecognizedthroughthegreatsuccessofthefreeUnix-likeoperatingsystemGNU/Linux,acollaborativeworkofmanyindividualsandorganizations,includingtheFreeSoftwareFoundationleadbyRichardStallmanandtheFinishcomputersciencestudentLinusTorvaldswhostartedtheproject.
Accordingtoseveralessaysonthissubject,open-sourcesoftware,forwhich,bydefinition,thesourcecodeisalwaysfreelyavailabletothepublic,1hasanumberofintriguingbenefits.
Mostimportantly,ifthecommunityofusersislargeenoughandeveryonecanlookatthesourcesandchangethem,itshouldnottaketoolonguntilaparticularsoftwareerrorisfoundandfixed.
"Givenenougheyeballs,allbugsareshallow",asEricRaymondputitinhiswidelyrecognizedessay"TheCathedralandtheBazaar",2inwhichheanalysesthemechanismsandprinciplesoftheopensourcemovement.
Further,otherscientistscaneasilybuildonexistingresults.
Creditcanstillbegivenintheappropriateform,becauseopen-sourcesoftwareisbynomeansfreewareorinthepublicdomain.
Quitethecontrary,thepackageasawholeaswellaseachpieceofsourcecodeislabeledwithaclearcopyrightnotice,statingthenameofthecopyrightholderandthenatureofthelicense.
Thiscopyrightnoticemustnotberemoved.
Additionalcomments,however,regardingthechangesandimprovementsmadebyotherscan,ofcourse,beadded.
Substantialimprovementstoanexistingpieceofcodebysomeoneotherthanthecopyrightholderwillusuallyleadtosomethingliketeamformation,includingappropriatecopyrightchanges.
Thisisespeciallyimportantforacademicscientists,whoneedtobeabletopointouttheircontributionstoaparticularfield.
Consideringthevirtuesofopen-sourcesoftwareononehandandthescientifictraditionontheotherhand,westartedtheCDKprojectundertermsofaliberalopen-sourcelicense.
3WeuseSourceForge,4aWebbasedopen-sourcedevelopmentplatform,forcoordinatingthecontributionsfromabout10developersfromaboutfivedifferentcountries.
Agreaternumberofpeoplehavesubscribedtothedevelopers*Correspondingauthorpresentaddress:CologneUniversityBioinfor-maticsCenter(CUBIC),Cologne,Germany.
Phone:+49(0)2214707426;fax:+49(0)2214705092;e-mail:c.
steinbeck@uni-koeln.
de.
Max-Planck-InstituteofChemicalEcology.
TheraSTratAG.
§UniversityofPaderborn.
#Nijmegen.
493J.
Chem.
Inf.
Comput.
Sci.
2003,43,493-50010.
1021/ci025584yCCC:$25.
002003AmericanChemicalSocietyPublishedonWeb02/11/2003ThisisanopenaccessarticlepublishedunderaCreativeCommonsAttribution(CC-BY)License,whichpermitsunrestricteduse,distributionandreproductioninanymedium,providedtheauthorandsourcearecited.
mailinglistandeitherlistensilentlyorcontributebymakingfeaturerequestsorcriticalcomments.
SourceForgeprovidesallthetoolswhicharegenerallyconsideredtobeindispen-sablecomponentsforcoordinatingthecontributionsfromdevelopersandusersinlargersoftwareprojects,asthereareWebspace,mailinglists,bugtrackers,softwareversioningsystems,releasemanagers,etc.
ThisarticleisnotonlytodescribetheCDKprojectinscientificandsoftware-technologicaltermsbutalsotopromotetheunderlyingdevelopmentmodel.
TheauthorsthinkthattheseprinciplesformaparadigmforscientificsoftwaredevelopmentwherescientistscantrulyexploitthebenefitsoftheInternetforadistributedcollaborationthatwouldnothavebeenpossibleinpre-Internettimes.
Weareexplicitlynotclaimingtogiveageneraloverviewofchemicalopensourcesoftware.
Thiswillformanarticleofitsown.
However,wewillgiveasynopsisonopensourceJavasoftwareinthefollowingsectioninstead.
TheinterestedreaderiscordiallyinvitedtovisittheCDKprojectpagesathttp://cdk.
sourceforge.
net,getintouchwiththedevelopers,makeuseoftheCDKpackage,andultimatelytoextenditsfunctionality.
2.
OPENSOURCEJAVASOFTWAREINCHEMISTRYAnumberoflibrarieswritteninJavaarefreelyavailableinbinaryform,buttheydonotincludeaccesstouseandextendthesourcecode.
5-7Librariesforothercomputerlanguageshavebeendescribedintheliteraturebutare,toourknowledge,notavailabletothepublic.
8Togiveanoverviewoftheopensourceactivititiesinchemistry,weanalyzedtheopensourceprojectsregisteredatSourceForge.
4ThisWebsitehasabout40projectsregis-teredinthefieldofmolecularchemistry,asfoundwithasearchonkeywordssuchasmolecule,molecular,chemistry,andchemical.
Manyprojectsareinactive:someareonlyregisteredbutshownoactivityatall,andsomeshowedactivityinthepastbutneverreleasedsoftwareinbinaryformorsourcecode.
Thenumberofactiveprojectsisabout25-30.
Oftheseprojects14werefoundthatusetheJavaprogramminglanguage.
Threeoftheseareinactiveforalongperiodanddonotprovidedownloads.
Twoaresucceededbythisproject,9,10andfourarebasedonCDK.
11-15Fourprojectsareinterestingtonote:MolMasterhavingaBSDlicense16andincludingvisualizationofisosurfaces,jVisu-alizerhavingtheGPLlicense17foranalyzingNMRcou-plings,CMLhavinganArtisticLicense18withtoolsaroundtheChemicalMarkupLanguage,19andJOELibhavingtheGPLlicense20withanextensivefileIOlibrarybasedonOpenBabel21andalibraryformoleculardescriptors.
Notethatthefirsttwoarenotreallylibrariesbutapplicationsinstead.
CMLDOMandJOELib,however,arelibrarieswithsimilarfunctionalityforstoringchemicalcontentinmemory.
3.
THEORIGINOFTHECDKTheCDKoriginatedasasupportprojectforacoupleofdifferentchemoinformaticssoftwarepackages,namelyastructureeditor,11aWebdatabasefororganiccompoundsandtheirNMRchemicalshifts,14aprogramforcomputerassistedstructureelucidation,22anda3Dstructureviewerandanalyzer,13whichisstillbeingportedtotheCDK.
TheauthorsoftheseprogramsgenerallyagreeonthebenefitsoftheprogramminglanguageJava,asthereareasfollows:clearobject-orienteddesign,platform-indepen-dency,andthefactthatithasbecomeanimportantstandardforclient-andserver-sideapplicationsontheWeb.
Sincemostofthescientificallyinterestingapplicationsinchemistryhaveacomputationallydemandingkernel,theybenefitfromaclient/serverarchitecturebecausetheserverpartcanthenberunonapowerfulmachine,whileauser-friendly(Web-)interfacecanbeusedonwhateverclientmachinetheuserchooses.
ThesedemandscanbemetmucheasierifonecanstillresorttoasingleprogramminglanguagefortheimplementationandsoweconsiderJavatobetheprogram-minglanguageofchoicenotonlyforchemoinformaticsandcomputationalchemistrybutalsoforscientificapplicationsingeneral.
Concernsarefrequentlyraisedwithrespecttotheperfor-manceofJava.
However,thelanguagestructureitself,comparedforexamplewithC++,providesnogoodreasonforJavahavingagenerallylowerperformancethanotherlanguagesmorefrequentlyusedinhighperformancecom-puting.
Indeed,greateffortshavebeenmadetoincreaseJavaruntimeperformanceandso,today,givenaproperimple-mentationandusingtherightruntimeenvironment,server-sideJavacodedoesnotneedtobeslowerthanC++withthesamescope.
Wewouldliketopointthereader'sattentiontoawholeissueoftheIBMsystemsjournaldedicatedtothesubjectofhighperformancecomputinginJava.
234.
DEVELOPMENTMODELToparticipateinCDKdevelopment,theinterestedindi-vidualneedstoregisterwithSourceForge(SF)toreceiveafreeSFaccountandsubscribetothedevelopersmailinglistcdk-devel@lists.
sourceforge.
net.
Heorshethencontactsoneoftheprojectadministrators,whothenaddsthenewmembertotheproject'sdeveloperslist.
BesidesgoodJavaprogram-mingskills,aworkingknowledgefortheConcurrentVersionsSystem(CVS)isneeded.
CVSisthemostwidelyusedsystemforversionmanagementintheOpenSourcecommunity,whichgreatlyfacilitatesthecoordinationofmultipledevelopersworkingonthesamesourcetree.
Itisquitecommonincomputersciencetowritearequirementsspecificationbeforecodingisstarted.
Suchaspecificationdescribestheintendedbehaviorofthesoftware(classesinthiscase)andcanbeusedbydeveloperstochecktheimplementationandbyuserstoseehowthoseclassescanbeused.
WhentheCDKwasdesigned,suchspecificationwasonlypartlymadeusingUnifiedModelingLanguage(UML)diagrams.
24CurrentlyweuseRequestsForComment(RFC)documentsforproposinganewspecificationtowhichtheCDKlibrarymustconform.
TheseRFC,whicharealongtimeInternetstandardfordecisionmaking,arediscussedonthedevelopersmailinglistafterwhichtheyaremarkedasfinalaftermajorityvoting.
5.
PROJECTCONVENTIONSInJava,sourcecodeisorganizedinso-calledpackages,whichoften(butnotneccessarily)followanamingschemeofsomethinglikeaninvertedInternetaddress.
PuttingaclasssuchasAtomintoauniquelynamedpackagepreventsclassnamecollisionsincaseswhereanotherlibrary,usedtogether494J.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003STEINBECKETAL.
withtheCDK,alsocontainsanAtomclasswithdifferentfunction.
SincetheCDKispartoftheOpenScienceproject,25theCDKsourcetreeisorganizedinpackagesundertheorg.
openscience.
cdkrootpackage.
Frequently,anewdevel-operisinterestedinaddingaparticularfunctionalitytotheCDK,forexamplethecapabilityforisomorphismandautomorphismchecking.
HediscussestheimplicationsofhisendeavorwiththeothersCDKdevelopersonthemailinglist.
Takingintoaccountthesuggestions,caveats,etc.
,ofhiscodevelopers,hewouldthencreateanewsubpackageorg.
openscience.
cdk.
isomorphismandaddhiscontributionunderthispartofthesourcetree.
AnimportantpartoftheCDKdevelopmenteffortisUnitTesting,whichisbasedontheideaofwritingeasilyrepeatabletestsforsmallestunitsofthesoftwarepackageinquestion.
WheneveraprogrammeraddsanewmodulewithnewfunctionalitytotheCDKsourcetree,heisexpectedtoaddatesttotheorg.
openscience.
cdk.
testspackage,adheringtoaparticularnamingconvention.
TheunittestingitselfisbasedontheJUnitpackage,26whichmakesiteasytorunafullyunattendedtestforthewholeCDKpackage.
ThishasproventobeofgreatvalueforsuchadistributedprogrammingeffortliketheCDK.
EspeciallyifadeveloperchangessomethingwithintheCDKcoreclasses,afullJUnittestrunoftheCDKtestswillshowhimwithinafewsecondswhetherhischangesbrokesomethingornot.
Further,eachoftheselittletestsnippetsisaninstructiveexampleonhowtouseaparticularCDKmodule.
Indispensableforalibraryisdocumentation.
TheCDKisdocumentedusingtheJavaDocsystemsanintegralpartoftheJavaprogramminglanguage.
Usingspecialtags,thecodeisdocumenteddirectlyinthesourcecode,fromwhichdocumentationcanbeproducedautomaticallyinvariousformats,mostimportantlyasWebpages.
Weareusingsourcecodemetricstoconstantlymeasuretheamountofdocu-mentedsourcecodestatements,andwetrytokeepthispercentageashighaspossible.
InadditiontotheJavaDocAPIdocumentation,theuserisguidedbyafewintroductorymanuals.
ItshouldalsobementionedthattheCDK'ssoftwarearchitecturehasbeenindependentlychosenassubjectofanM.
Sc.
thesisattheTechnion(IsraelInstituteofTechnol-ogy),27focusingonautomatedmethodsforcodeinspectionandreview.
Thisisacommonindustrialprocessbywhichsourcecodeisusuallyreadmanuallytofinderrors,potentialimprovements,dependencies,etc.
Thethesisfocusesonautomizingtheformalconceptanalysisusingconceptlat-tices28forthereviewofindividualjavaclasses.
Conceptanalysisisamathematicalclassificationtechnique,whichisusedfordifferentproblemsinsoftwareresearch.
Thismethodologyisappliedinthreestages:(1)understandingthepublicinterfaceoftheclassforuseasablackbox,(2)tryingtoreasonaboutthedesignandpossibleerrorsintheclassbasedonitslattice,and(3)inspectingactualsourcecode.
Thefirsttwostagesaredonewithoutevenhavingthesourcecode:themethodsandfieldsaredeterminedbyreverseengineeringofthecompiledclassfiles.
Wehavealreadyreceivedvaluableinputfromthisrelatedprojectwhichwillhelpustoresolvedesignflawsinourlibrary.
6.
DESCRIPTIONOFTHELIBRARY'SFUNCTIONALITY6.
1.
TheCoreClasses.
TheclassescontainedintherootsectionoftheCDK'spackagehierarchyareallformalizedrepresentationsofbasicchemicalconceptssuchasatoms,bonds,molecules,etc.
Figure1showsanUMLdiagramexplainingtheinheritancehierarchyandthedependenciesbetweenthefundamentalclassesoftheCDK.
TheUMLdiagramsshowninthisarticledepicttherelationshipofonlythecoreclasses.
Theyarethuseditedanddoonlyshowasubsetoftheirtrueinterclassrelationships.
TheyshowthecentralroleoftheChemObjectclass,whichisthesuperclassofallotherclassesandprovidesmethodsforstoringevencomplexpropertiesforanyderivedCDKobject.
ThefirstandprobablymostobviousinheritancechaintobementionedinthecoreclassesitthatofAtomextendingAtomTypeextendingIsotopeextendingElement.
ThisisnotonlylogicalfromachemicalpointofviewbutalsoprovidesthebasisforasimplemechanismforthecreationofAtoms,AtomTypes,Isotopes,andElementsbasedonsubclassesofasingleIsotopeFactorytoolclass,whichwillbediscussedbelow.
PlacingtheAtominalongchainofinheritanceprovidescentralaccesspointstothedifferentlevelsofFigure1.
UMLdiagram,showingtheinheritancehierarchyandthedependenciesofthefundamentalclasseswithintheCDK.
THECHEMISTRYDEVELOPMENTKITJ.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003495information.
WhiletheElement,forexample,providesaccesstothesymbolortheatomicnumber,someAtomTypecanfurtherdistinguishbetweenthestateofhybridizationofanAtomorsomeotherdistinctionaforcefieldmightneed.
AfurtherlevelofabstractionisincorporatedbytheAtomContainerandtheElectronContainer.
TheElectron-ContainerformsthebaseforconstructssuchasBondsandOrbitals,whereastheAtomContaineristheenvisionedstorageforAtomstogetherwiththeirBondsandisthesuperclassforRings,Molecules,andSubstructures.
Tosupporthigherlevelconceptssuchasmolecularensemblesorreactions,theCDKcoreiscomplementedbyclasseswhichgroupmoleculesintohigherorderconstructs,likeSetOfMolecules,ChemSequence,ChemModel,andChemFile.
Forclarity,therelationshipofChemObjectandtheAtomContainerhasbeenmovedtoanadditionalUMLdiagramshowninFigure2.
ItshowshowMoleculesarecontainedinaSetOfMol-ecules,whichispartofaChemModel.
ChemModelsaremeanttostorethemolecularinformationofthestateofachemicalsystemsatagivenpointintime.
Toallowforthemodelingofchangesintime,weintroducedthepossibilityofarrangingvariousChemModelsintoaChemSequence.
TheChemFileclassisdesignedasthetoplevelcontainer,whichcancontainalltheconceptsstoredinachemicaldocumentamongwhichoneormoreChemSequences.
ThePolymerclassextendsMoleculeandprovidescon-venientaccesstotheMonomersitconsistsof.
TheMonomeritselfisimplementedasanAtomContainer.
AsubclassofPolymeristheBioPolymerusedforrepresentingproteinandDNAmolecules.
ThePolymerdesignallowsBioPolymerstotreateachaminoacidasanAtomContainer.
6.
2.
2DStructureGraphicalHandling.
Theabilitytodisplayandmanipulate2Ddrawingsofchemicalstructuresisoneofthemostimportantfeaturesofanychemoinfor-matics-relatedprogram.
Thisincludesthecapabilityofgeneratingcoordinatesforthosechemicalstructureswhichhaveforexamplebeengeneratedbystructuregeneratorascoordinateless,chemicalgraphs.
ThedetailsforthislattersteparediscussedinSection6.
4.
TheModel-View-Controllerparadigm(seeforexampleref29)isusedintheCDKlibrarydesignwhereverapplicable.
Theclassesfor2Dstructuregraphicalhandling,forexample,workontopofaChemModelwhosecontenttheydisplayandmanipulate.
ARenderer2Dclassproducesa2Ddrawingcomparabletothoseproducedbythemajorcommerciallyavailableproducts.
Thisviewcanbecustom-izedbyalteringthestandardsettingsofaRenderer2DModelobject.
Ifthepuredisplayistobecomplementedbyanoptiontomanipulatethedrawing,aControler2Dcanbeaddedtothesetup.
Itssettings,again,aredeterminedbyaControler2DModelandcanbealtered,forexample,byusingsetDrawNumbers(true)inordertodisplayatomnumbersannotatedtothestructure.
TheControler2Disanadaptertotheavailableinputdevices,typicallymouseandkeyboard,andtranslatesinputintochangestotheunderlyingmodels,whichagainarereflectedbychangesintheviewproducedbytheRenderer2D.
AsimpleresultingapplicationisshowninFigure3.
6.
3.
3DStructureHandling.
Toprovidehighperformance3Dgraphics,theJava3DAPIisusedwithintheCDK.
This,however,makesCDK-based3Dapplicationsnolongerplatformindependent.
ThisdependencyoriginatesfromJava3DAPIrelyingonOpenGLorDirectXforthesakeofFigure2.
UMLdiagram,showingtheinheritancehierarchyandthedependenciesofclassesgroupbasedontheAtomContainerconcept.
496J.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003STEINBECKETAL.
higherperformance.
SUNmicrosystemsdoesonlyprovidetheJava3DforWindows(bothOpenGLandDirectX),SolarisandSGIIRIX,whereasaLinuxversionisdevelopedbyBlackdown30andavailableforavarietyofarchitectures.
InregardtoloosingtheplatformindependencytheCDKdoesalsocontainclassesfor3DrenderingwhicharenotbasedupontheJava3DAPI.
Togetherwiththeseparationoftherenderingclasses,duetotheModel-View-Controllerparadigm,thisleadstothefollowingfourfundamentalclassesfor3Drendering:Renderer3D,Renderer3DModel,Accel-eratedRenderer3D,andAcceleratedRenderer3DModel,thelattertwobaseduponJava3D.
6.
4.
StructureDiagramLayout.
Keyfieldsofchemoin-formatics,likevirtualcombinatorialchemistry,virtualscreen-ing,orcomputer-assistedstructureelucidation,frequentlyhandlechemicalstructuresasone-dimensionalgraphs.
Thesegraphsare,forexample,productsofstructuregeneratorswhichusegraphtheoreticaltechniquestoexhaustivelyandirredundantlygenerateallconstistitutionalisomerswhichareinagreementwithagivenmolecularformula.
Inanyoftheseprograms,however,comesthepointwhere,afteraselectionduringavirtualscreening,forexample,thesuccessfulcandidatestructure(s)needs(s)tobepresentedtoachemist.
Atthispoint,atoolisneededthatgenerates2Dor3Dcoordinatestoproducethekindofdepictionachemistisusedto.
ThisprocesshasbeentermedStructureDiagramGeneration.
31While3DmodelbuilderssuchasCORINA32areonourwishlistforthefutureandhavenotyetbeenimplemented,theCDKfeaturesa2Dstructurediagramgenerator,whichhasbeenwrittenfromscratchandwhichcaneasilybeseenasoneofthefinestandmostusefulpartsoftheCDK,sincemostofitsapplicationsrequirestructurediagramgenerationatseveralstages.
6.
5.
GraphInvariants.
ThispackagecontainsafewclassesforthecomputationofgraphinvariantssuchasWienerIndices,33Morgan'sextendedconnectivity(EC)indices,34andothers.
35Morgan'sECindicesare,forexample,usedforcanonicallabelingofcompounds.
Thispackageislikelytobeoneofthehotspotforfuturedevelopments,sincemanychemoinformaticsapplications,like(quantitative)structureactivityrelationship((Q)SAR)computations,dooftenrelyoncalculatingvariouscombinationsofgraphinvariantsofdifferenttypes.
6.
6.
StructureGenerators.
ThispackageholdssomesimplestructuregeneratorswhichareusedbytheSENECAsystemforcomputer-assistedstructureelucidation.
22TheclassSingleRandomStructureGeneratorcanbeusedtogener-ateatotallyrandomstructurefromtheconstitutionalspacegivenbyacertainmolecularformula.
BasedonthisrandomlygeneratedstructureonecanthenuseRandomGeneratortomakesmall,randommovesinconstitutionspace,basedonanalgorithmsuggestedbyFaulon.
36Ifsuchageneratoriscombinedwithatargetfunctionandsimulatedannealingprotocol,onecaneffectivelysearchconstitutionspaceforstructureswithcertaindesiredproperties,providedthatthesepropertiescanbereliablybackcalculatedfromagivenconstitutionalformula.
Tobeabletobuildastructuregeneratorforchemicalgraphsbasedonevolutionaryalgorithms(likethewell-knowngeneticalgorithm),wealsoincludedaCrossOverMachine,whichacceptstwochemicalgraphsintheformofAtom-Containersandproducestwooffsprings.
GeneticAlgorithmsarepopulationbasedmethodswhichproducenewoffspringsforthenextgenerationbyacarefullychosencombinationofmutationandcrossoverprocedures,appliedtothecurrentpopulation.
TheCrossOverMachinedoesthuscomplementthemutationoperationusedintheRandomGeneratorclass.
6.
7.
RingSearches.
JohnFigueras'fastalgorithmforfindingtheSmallestSetofSmallestRings(SSSR)hasbeenimplementedandisusedforexamplebythestructurediagramgenerationpackage.
37Especiallylargecondensedringsystems,forwhichtheprocessofcoordinategenerationcouldtakeuptoaminuteduetoaslowdepthfirstringperceptionalgorithminoldersystems,38cannowbelayedoutwithinfractionsofasecondasshowninFigure4.
FurtherthispackagecontainsaclassforpartioningagivenringsystemsintoAtomContainers,oneforeachring.
Inotherapplications,likearomaticitydetection,forexample,itisessentialtocomputetheSetofAllRings(SAR).
WhileprocedureshavebeenpublishedtoproducetheSARfromaSSSR,itiscomputationallymoreefficientFigure3.
Renderer2DandController2Dcooperatinginasimple,CDK-basedversionofJChemPaint.
JChemPaintsupportsinterna-tionalization,withthisexampleshowingadutchinterface.
Figure4.
AringssystemparsedfromaSMILES,analyzedbyFigueras'SSSRalgorithmanddisplayedbytheMoleculeViewerclass.
Theprocesstakes300msona600MHzPentiumwithWindowsXPandJDK1.
3.
1.
THECHEMISTRYDEVELOPMENTKITJ.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003497tousespecializedalgorithmsforthispurpose.
TheCDKcontainsanimplementationofafastandefficientalgorithmgivenbyHanseretal.
396.
8.
AromaticityDetection.
Therearevariousdefinitionsofaromaticityandatleastasmanywaysofdetectingaromaticityaccordingtothesedefinitions.
ThispackageistheintendedcontainerforallofthemanddoescurrentlyholdanimplementationofaHueckelAromaticityDetectorclass.
BasedontheSARdetectionalgorithmbyHanseretal.
(seesection6.
7)thisclassstartswiththelargestdetectedring,countsthenumberofalternatingdoubleortriplebondelectrons,anddoesalsotakeintoaccountfreeelectronpairsofheteroatoms.
Itthencheckswhethertheringcontains4n+2π-electrons,accordingtothewell-knownHu¨ckelrule.
Thering,allitsatoms,andbondsaremarkedasaromatic,andthesearchcontinueswiththeremainingringsofequalorsmallersize,leavingoutthoseringsthatarecompletelypartofanalreadydetectedlargeraromaticsystem.
6.
9.
Isomorphism.
Beingabletodetermineiftwochemi-calstructuresareidenticalorwhetheronestructureisasubgraphofanotherstructureisoneofthemostimportantcapabilitiesofachemoinformaticslibrary.
TheIsomorphismsubpackagecontainsaversatilemoduleforMaximumCommonSubstructure(MCSS)Searches.
SinceMCSSdeterminationisthemostgeneralcaseofgraphmatching,itcanbeusedtodeterminestructureidentityandtodosubgraphmatchingandmaximumcommonsubstructuresearches.
6.
10.
FileInput/Output.
Fileinputandoutputisgeneral-izedinCDK.
Allfilei/oclassesimplementeitherChemOb-jectReaderorChemObjectWriter.
Eachfileformatisrep-resentedbytwoseparateclassesimplementingoneoftheseinterfaces.
CDKcurrentlysupportsIOclassesforXYZ,MDLmolfile,40PDB,41andCML.
42Thelatterformatwasdevel-opedbyMurray-RustandRzepaasthefirstXMLbasedfileformatforchemicalcontent.
TheCDKcontainsbothaninputandoutputclassforthisformat.
TheCMLinputreaderusesanalternativetoMurray-Rust'sDOMapproachandisbasedonSAX.
436.
11.
InteractionwithotherJavaLibraries.
Besidesfilei/o,CDKsupportsasecondmethodtoexchangedatawithotherprogramsandlibraries.
Theinterfacetootherlibrariesmakesitpossibletocombinemethodsfrombothlibrariesgivingaccesstoalargersetoffunctionality.
CDKprovidesdirectconversionofCDKclassestoJOELib20classes.
SupportforCMLDOM19isplanned.
6.
12.
SMILES.
SimplifiedMolecularLineEntrySpeci-fication(SMILES)providesstringrepresentationsofmo-lecularconstitutions.
44Duetotheircompactnessandrelativesimplicitytheyarenowwidelyusedasaninterchangeformatforcoordinatelessmolecularstructures.
Basedonaspecifica-tionforunique(canonical)SMILES,45itisalsopossibletoperformgraphisomorphismchecks.
TheCDKfeaturesageneratorforcanonicalSMILES,writtentocomplywiththerulespublishedbytheDaylightInc.
founders.
WhiletheSMILESgeneratorimplementsallofthepublishedSMILESstandardincludingchirality,theSMILESparserintheCDKpackageonlycompliestothe(slightlyextended)SuperSimplifiedSMILESspecification46whichissufficienttocodemostorganicstructures.
6.
13.
Fingerprints.
Fingerprintingisnowadaysanindis-pensabletoolforjudgingmolecularsimilarity,asaprefilterforisomorphismcheckingandthusforstructuresearchingindatabases.
HereaswellasinthecaseofSMILESanownsubpackageforthisclassofalgorithmsisjustifiedbecausetherearevariouswaysofcomputingfingerprints.
Byallowingtheadditionofdifferentfingerprintersinsteadofjusthavingonemonolithicorg.
openscience.
cdk.
tools.
Fin-gerprinterwegivetheuserthefreedomofchoosingwhatevermethodsyieldsthebestperformanceforhiscase.
TheFingerprinterclassintheCDKproducesDaylight-typefingerprints.
47Itworksbyrunningabreadth-firstsearch,startingateachatominthemolecule,therebyproducingstringrepresentationsofpathsuptothelengthofsixatoms.
ForeachofthesesSMILES-likestrings,hashcodesarecomputed,usingthestandardstringhashingalgorithmprovidedbytheJavalanguage.
Withthesehashcodes,apseudorandomnumbergeneratorwithadefaultworkingrangeof[0-1023]isseededandthefirstrandomnumberisretrieved.
Thisnumberindicatesapositioninafingerprintbitstringoflength1024,whichisthensetto"1".
Basedontheentiretyofallcomputedpathsfromthemolecule,amolecularfingerprintisobtainedintheformofthisbitstring.
6.
14.
Tools.
Thetoolspackagecontainsutilityclassesforallthosecasesthatdidnotjustifythecreationofadedicatedpackage.
TheIsotopeFactory,forexample,canreturnpre-configuredinstancesofElementsandIsotopesforagivenelementsymboloragivenatomicmass.
TheConnectivityCheckerclasstestswhetheragivenchemicalgraphisconnected,i.
e.
,whetherthereisabondpathbetweeneverypossiblepairofatomsinthegraphand,inthecaseofanonconnectedgraph,itcanreturnaVectorwiththedisjunctpiecesofthegraph,storedinAtomCon-tainerobjects.
RelatedtoConnectivityCheckeristhePath-Toolsclasswhich,forexample,providesmethodsforfindingtheshortestpathbetweentogivenatomsinamolecule.
TheMFAnalyserclasshasmethodsofreturningthemolecularformulaofagivenMoleculeobjectandforcreatinganunbondedAtomContainerobjectfromagivenmolecularformulastring.
TheHOSECodeGeneratorproducesHOSEcodes48foreachatominagivenAtomContainer.
ByfeedingtheseHOSEcodesintotheBremserOneSphere-HOSECodePredictorclass,onecanpredictexpectationrangesforcarbon-13NMRchemicalshifts.
497.
RESULTSTheCDKisnowthebasisforanumberofsoftwareprojects.
ThechemicaleditorJChemPaint11whichtakesadvantageoftheCDKandforwhichtheCDK'sModel-View-ControllermechanismshavebeenimplementedisagainjustasupporttoolforhigherlevelapplicationssuchastheWebdatabaseNMRShiftDBfororganiccompoundsandtheirNMRchemicalshifts,orSENECA,aprogramforcomputerassistedstructureelucidation.
22WhileallowingthefastassemblyofsuchlargemonolithicapplicationssuchasSENECAorNMRShiftDB,thetruestrengthoftheCDKliesinitsabilitytoserveasachemoinformatician'sworkbench.
Byjustwritingafewlinesofcode,onecanquicklytestnewideasormodifyexistingCDKbasedapplicationstomakethemsuitotherneeds.
498J.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003STEINBECKETAL.
ThefollowingcodesnippetillustrateshowonecanquicklyparsealistofSMILESstringsintoAtomContainers,produce2Dcoordinates,anddisplaytheresultsinaMoleculeList-Viewer.
8.
CONCLUSIONWehavepresenteddetailsofanewopen-sourceJavalibraryfacilitatingtheimplementationofsoftwarepackagesinchemoinformatics.
TheCDKisfreelyavailable50underthetermsoftheGNULesserGeneralPublicLicense(LGPL)3.
Thesourcecodemaythusbedownloadedandimprovedoradaptedforspecificneeds.
IncontrasttothefamousGNUGeneralPublicLicense(GPL)51theLGPLallowsfortheuseoftheCDKinproprietarysoftwarepackages.
WhileanyuseoftheCDKforproprietaryandclosed-sourceprojectisthuswelcome,wealsohighlyappreciatefeedbackandanypotentialbackflow.
CompaniesareusingtheCDKforcommercialprojects,suchasSafeBase,atheragenomicsknowledgemanagementsystemonadversedrugreactions.
52AttheIBMGermanyDevelop-mentLabinBo¨blingenanExtremeBlueinternshipprojectgrouphasbeenstartedtowriteaCDK-basedopensource2D/3Deditorforchemicalstructures.
ThecompanyIXELIS,situatedinStrasbourg,France,isworkingonaglobalsemanticinformationsystemappliedtoscientificknowledgeandhascontributedtheMCSScode,whichcameintoexistenceduringtheirworkwiththeCDK.
Further,ourchemoinformaticssoftwarekitisthebasisforotheropen-sourceprojects,liketheSENECAsystemforcomputer-assistedstructureelucidation22andNMRShiftDB,14afreedatabaseoforganicchemicalsandtheirNMRdata.
Besidesitsprovenusabilityinresearchandproductionqualityscientificsoftware,theCDKhasalsobecomeavaluabletoolforteachingchemoinformatics.
Atleastoneofourauthors(C.
S.
)isusingthesoftwarepackageinlecturestodemonstratemanystandardchemoinformaticsalgorithmsonthefunctionalitylevelaswellasonthesourcecodelevel.
DuetotheinherentmodularizationoftheobjectorientedlanguageJava,mostoftheclassesandmethodsareconciseandeasytounderstand.
Itshouldbementionedthatwehaveexperienced,albeitonasmallerscalethanthelargeopen-sourceprojects,thebenefitsandthefascinationoftheprinciplesmentionedintheIntroduction.
Basedonthisexperience,thisarticleisalsosupposedtopromotetheseideasandtoattractfurthercontributorsforourproject.
Theinspiringexperienceisthatassoonasacertainamountofmaterialhasaccumulatedandacertainamountofpublicityhasbeengained,anopen-sourceprojectbecomessomethinglikeaself-runner,contributorsstartaddingtheirownsubprojects,andnewideasareintegratedwhichwouldprobablyneverhavebeenborneinmindiftheCDKwerecreatedbyasingleorganizationandevenindividual.
Ofcourse,suchadevelopmentmodelalsohasdisadvantages.
Itisprobablymuchmoredifficulttoadheretocertainqualitystandards,torespondtodeadlines(butontheotherhand,thererarelyareanyinsuchsmallprojects),andtodostrategicplanning.
Ithasbeenshown,however,thattheseproblemscanbeovercome.
ACKNOWLEDGMENTTheauthorswouldliketothankallmembersoftheCDKprojectfortheircontributions,corrections,andhelpfulcomments.
REFERENCESANDNOTES(1)TheOpenSourceInitiative(OSI),http://www.
opensource.
org(accessedonAug2002),2002.
(2)Raymond,E.
S.
TheCathedralandtheBazaar:MusingsonLinuxandOpenSourcebyanAccidentalReVolutionary;O'ReillyandAssociates:Sebastopol,CA,1999.
(3)GNULesserGeneralPublicLicense-GNUProject-FreeSoftwareFoundation(FSF),http://www.
gnu.
org/licenses/lgpl.
html(accessedonAug2002),2002.
(4)SourceForge.
net,http://www.
sf.
net/(accessedonAug2002),2002.
(5)Rzepa,H.
;Tonge,A.
VChemLab:AVirtualChemistryLaboratory.
TheStorage,Retrieval,andDisplayofChemicalInformationUsingStandardInternetTools.
J.
Chem.
Inf.
Comput.
Sci.
1998,38,1048-1053.
(6)Csizmadia,F.
JChem:JavaAppletsandModulesSupportingChemicalDatabaseHandlingfromWebBrowsers.
J.
Chem.
Inf.
Comput.
Sci.
2000,40,323-324.
(7)Blauch,D.
JavaClassesforManagingChemicalInformationandSolvingGeneralizedEquilibriumProblems.
J.
Chem.
Inf.
Comput.
Sci.
2002,42,143-146.
(8)Bauerschmidt,S.
;Gasteiger,J.
Overcomingthelimitationsofaconnectiontabledescription:Auniversalrepresentationofchemicalspecies.
J.
Chem.
Inf.
Comput.
Sci.
1997,37,705-714.
(9)TheCompChemlibraries,http://compchem.
sourceforge.
net/(accessedonAug2002),2002.
(10)TheJMDrawStructureDiagramGenerationEngine,http://jmdraw.
sourceforge.
net/(accessedonAug2002),2002.
(11)Steinbeck,C.
;Krause,S.
;Willighagen,E.
JChemPaint-UsingtheCollaborativeForcesoftheInternettoDevelopaFreeEditorfor2DChemicalStructures.
Molecules2000,5,93-98.
(12)TheJChemPaintStructureEditor,http://jmdraw.
sourceforge.
net/(ac-cessedonAug2002),2002.
Figure5.
ACDKcodesnippetillustratingtheuseofSmilesParserandStructureDiagramGeneratorisshownfollowedbyitsoutput.
THECHEMISTRYDEVELOPMENTKITJ.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003499(13)TheJmol3DMolecularVisualizationSoftware,http://jmol.
sourceforge.
net/(accessedonAug2002),2002.
(14)Kuhn,S.
;Krause,S.
;Steinbeck,C.
NMRShiftDB-AnOpen-Access,Open-Submission,Open-SourceDatabaseforOrganicStructuresandtheirNMRdata.
2002,Manuscriptinpreparation.
(15)TheNMRShiftDBNMRDatabase,http://www.
nmrshiftdb.
org/(ac-cessedonAug2002),2002.
(16)TheMolMasterMolecularVisualizationPackage,http://molmaster.
sourceforge.
net/(accessedonAug2002),2002.
(17)TheJVisualizerNMRAnalysisPackage,http://jvisualizer.
sourceforge.
net/(accessedonAug2002),2002.
(18)TheChemicalMarkupLanguageSupportingSoftwarePages,http://cml.
sourceforge.
net/(accessedonAug2002),2002.
(19)Murray-Rust,P.
;Rzepa,H.
ChemicalMarkupXML,andtheWorldwideWeb.
2.
InformationObjectsandtheCMLDOM.
J.
Chem.
Inf.
Comput.
Sci.
2001,41,1113-1123.
(20)JOELib-ajavabasedcomputationalchemistrypackage,http://joelib.
sourceforge.
net/(accessedonAug2002),2002.
(21)TheOpenBabelChemicalFileFormatConversionPackage,http://openbabel.
sourceforge.
net/(accessedonAug2002),2002.
(22)Steinbeck,C.
SENECA:APlatform-Independent,DistributedandParallelSystemforComputer-AssistedStructureElucidationinOrganicChemistry.
J.
Chem.
Inf.
Comput.
Sci.
2001,41,1500-1507.
(23)IBMSystemsJournal-JavaPerformance,2000.
(24)Stevens,P.
;Pooley,R.
UsingUML:softwareengineeringwithobjectsandcomponents;ObjectTechnologySeriesAddison-Wesley:1999UpdatededitionforUML1.
3:firstpublished1998(asPooleyandStevens).
(25)TheOpenScienceProject,http://www.
openscience.
org/(accessedonAug2002),2002.
(26)JUnit,TestingResourcesforExtremeProgramming,http://www.
junit.
org/(accessedonAug2002),2002.
(27)Dekel,U.
PersonalCommunication,2002.
(28)Ganter,B.
;Wille,R.
ConceptAnalysis:MathematicalFoundations;Springer-Verlag:Berlin-Heidelberg,1999.
(29)Krasner,G.
;Pope,S.
ACookbookforusingtheModel-View-ControllerUserInterfaceParadigminSmalltalk-80.
JOOP1988,29-49.
(30)Java-Linux,http://www.
blackdown.
org/(accessedonAug2002),2002.
(31)Helson,H.
StructureDiagramGeneration.
ReV.
Comput.
Chem.
1999,13,313-398.
(32)Gasteiger,J.
;Rudolph,C.
;Sadowski,J.
AutomaticGenerationof3D-AtomicCoordinatesforOrganicMolecules.
TetrahedronComput.
Method.
1990,4,537-547.
(33)Wiener,H.
CorrelationofHeatofIsomerizationandDifferenceinHeatofVaporizationofIsomersAmongParaffinHydrocarbons.
J.
Am.
Chem.
Soc.
1947,69,17-20.
(34)Morgan,H.
L.
TheGenerationofaUniqueMachineDescriptionforChemicalStructures-ATechniqueDevelopedatChemicalAbstractsService.
J.
Chem.
Doc.
1965,5,107-113.
(35)Hu,C.
Y.
;Lu,L.
Onhighlydiscriminatingmoleculartopologicalindex.
J.
Chem.
Inf.
Comput.
Sci.
1996,36,82-90.
(36)Faulon,J.
-L.
StochasticGeneratorofChemicalStructure.
2.
UsingSimulatedAnnealingToSearchtheSpaceofConstitutionalIsomers.
J.
Chem.
Inf.
Comput.
Sci.
1996,36,731-740.
(37)Figueras,J.
RingPerceptionUsingBreadth-FirstSearch.
J.
Chem.
Inf.
Comput.
Sci.
1996,36,986-991.
(38)Bley,K.
;Brandt,J.
;Dengler,A.
;Frank,R.
;Ugi,I.
ConstitutionalFormulaegeneratedfromConnectivityInformation:theProgramMDRAW.
J.
Chem.
Res.
(M)1991,2601-2689.
(39)Hanser,T.
;Jauffret,P.
;Kaufmann,G.
Anewalgorithmforexhaustiveringperceptioninamoleculargraph.
J.
Chem.
Inf.
Comput.
Sci.
1996,36,1146-1152.
(40)DescriptionofSeveralChemicalStructureFileFormatsUsedbyComputerProgramsDevelopedatMolecularDesignLimited,1992Anupdatedonlineversionofthisdocumentcanbefoundonhttp://www.
mdli.
com/downloads/literature/ctfile.
pdf.
(41)ProteinDataBankAtomicCoordinateandBibliographicEntryFormatDescription,1985.
(42)Murray-Rust,P.
;Rzepa,H.
ChemicalMarkupXML,andtheWorldwideWeb.
1.
BasicPrinciples.
J.
Chem.
Inf.
Comput.
Sci.
1999,39,928-942.
(43)Willighagen,E.
ProcessingCMLconventionsinJava.
InternetJ.
Chem.
2001,4,4.
(44)Weininger,D.
SMILES,aChemicalLanguageandInformationSystem.
1.
IntroductiontoMethodologyandEncodingRules.
J.
Chem.
Inf.
Comput.
Sci.
1988,28,31-36.
(45)Weininger,D.
;Weininger,A.
;Weininger,J.
SMILES.
2.
AlgorithmforGenerationofUniqueSMILESNotation.
J.
Chem.
Inf.
Comput.
Sci.
1989,29,97-101.
(46)SMILESHomePage,http://www.
daylight.
com/dayhtml/smiles/(ac-cessedonAug2002),2002.
(47)James,C.
A.
;Weininger,D.
;Delany,J.
DaylightTheoryManual,http://www.
daylight.
com/dayhtml/doc/theory/theory.
toc.
html(accessedonAug2002),2000.
(48)Bremser,W.
HOSE-ANovelSubstructureCode.
Anal.
Chim.
Act.
1978,103,355-365.
(49)Bremser,W.
ExpectationRangesof13-CNMRChemicalShifts.
Magn.
Reson.
Chem.
1985,23,271-275.
(50)TheChemicalDevelopmentKit,http://cdk.
sf.
net/(accessedonAug2002),2002.
(51)GNUGeneralPublicLicense-GNUProject-FreeSoftwareFoundation(FSF),http://www.
gnu.
org/licenses/gpl.
html(accessedonAug2002),2002.
(52)TheraSTrat-Takingdrugsafetyastepfurther,http://www.
therastrat.
com/(accessedonAug2002),2002.
CI025584Y500J.
Chem.
Inf.
Comput.
Sci.
,Vol.
43,No.
2,2003STEINBECKETAL.

华纳云-618大促3折起,18元/月买CN2 GIA 2M 香港云,物理机高防同享,10M带宽独享三网直连,无限流量!

官方网站:点击访问华纳云活动官网活动方案:一、香港云服务器此次推出八种配置的香港云服务器,满足不同行业不同业务规模的客户需求,同时每种配置的云服务都有不同的带宽选择,灵活性更高,可用性更强,性价比更优质。配置带宽月付6折季付5.5折半年付5折年付4.5折2年付4折3年付3折购买1H1G2M/99180324576648直达购买5M/17331556710081134直达购买2H2G2M892444...

RAKsmartCloud服务器,可自定义配置月$7.59

RAKsmart商家一直以来在独立服务器、站群服务器和G口和10G口大端口流量服务器上下功夫比较大,但是在VPS主机业务上仅仅是顺带,尤其是我们看到大部分主流商家都做云服务器,而RAKsmart商家终于开始做云服务器,这次试探性的新增美国硅谷机房一个方案。月付7.59美元起,支持自定义配置,KVM虚拟化,美国硅谷机房,VPC网络/经典网络,大陆优化/精品网线路,支持Linux或者Windows操作...

rfchost:洛杉矶vps/双向CN2 GIA,1核/1G/10G SSD/500G流量/100Mbps/季付$23.9

rfchost怎么样?rfchost是一家开办了近六年的国人主机商,一般能挺过三年的国人商家,还是值得入手的,商家主要销售VPS,机房有美国洛杉矶/堪萨斯、中国香港,三年前本站分享过他家堪萨斯机房的套餐。目前rfchost商家的洛杉矶机房还是非常不错的,采用CN2优化线路,电信双程CN2 GIA,联通去程CN2 GIA,回程AS4837,移动走自己的直连线路,目前季付套餐还是比较划算的,有需要的可...

safebase为你推荐
!圈i申国电子政务发展调查报告aplicaios2021年中国城镇污泥处理处置技术与应用高级研讨会如时间选项无法打开请更改preloadedbaiduOPENCORE苹果引导配置说明第四版-基于支持ipad步骤ioseaccelerator开启eAccelerator内存优化就各种毛病,DZ到底用哪个内存优化比较好。。。micromediawww.macromedia.com 是什么网站
新加坡虚拟主机 vps是什么 美国便宜货网站 godaddy域名转出 轻博 搜狗抢票助手 好看的桌面背景图 北京双线机房 网站cdn加速 789电视网 股票老左 国外代理服务器地址 1美金 下载速度测试 德讯 免费蓝钻 美国迈阿密 广州服务器托管 上海联通 windowssever2008 更多