ORIGINALRESEARCHGlobalversuslocalQSPRmodelsforpersistentorganicpollutants:balancingbetweenpredictivityandeconomyTomaszPuzynAgnieszkaGajewiczAleksandraRybackaMaciejHaranczykReceived:7January2011/Accepted:12February2011/Publishedonline:9March2011TheAuthor(s)2011.
ThisarticleispublishedwithopenaccessatSpringerlink.
comAbstractExperimentallydetermineddataonthekeyphysicochemicalparametersforhalogenatedcongenersofpersistentorganicpollutants(POPs)areavailableonlyforalimitednumberofcompounds.
Intheabsenceofexperi-mentaldata,arangeofcomputationalmethodscanbeappliedtocharacterizethosespeciesforwhichexperi-mentaldataisnotavailable.
Oneofthetechniqueswidelyusedinthiscontextisquantitativestructure–propertyrelationships(QSPR)approach.
TherearetwowaystodeveloptheQSPRmodels:usingamorecomplexglobalmodelorttingasimplelocalmodelthatcoversaspecicclassofchemicallyrelatedcompounds.
Theessenceofthestudywastoinvestigate,iflocalmodelshavesignicantlybetterexplanatoryandpredictiveabilitythanglobalmodelswithwiderapplicabilitydomains.
Basedontheobtainedresults,weconcludedthatwheneverglobalmodelsfulllallqualityrecommendationsbyOECD,theywouldbeappliedinpracticeasmoreefcientonesinstateofmoretimeconsumingprocedureofmodelingtheparticulargroupsofPOPsone-by-one.
Onthecontrary,localmodelsareapplicabletosolvespecicproblems(i.
e.
,relatedtoonlyonegroupofPOPs),whenhigh-qualityexperimentaldataareavailableforasufcientnumberoftrainingandvalidationcompounds.
KeywordsGlobalmodelsLocalmodelsQSPRPersistentorganicpollutantsIntroductionTheoccurrenceofpolyhalogenatedpersistentorganicpol-lutants(POPs),suchasCl/Br-substitutedbenzenes(CBz/BBz),biphenyls(PCBs/PBBs),diphenylethers(PCDEs/PBDEs),dibenzofurans(PCDFs/PBDFs),dibenzo-p-diox-ins(PCDDs/PBDDs),andnaphthalenes(PCNs/PBNs)inair,water,soil,andsedimentshasbeenidentiedasaseriousenvironmentalthreat[1].
LargeamountsofPOPscomefromvariousanthropogenicsources,includingintentionallysyn-thesizedliquidsutilizedintransformersandcapacitors,plasticisers,ameretardants,aswellasthermalrecyclingofwaste,domesticheating,etc.
Substantialvolumesofthesecompoundsarereleasedineffectofgiantres,asthemostrecentreoftheoilspillattheDeepwaterHorizonplatformintheGulfofMexico[2].
Regardlessoftheirsource,theexposuretoPOPscancauseavastrangeofacuteandchronichealtheffects,includingmutagenic,carcinogenic,andmetabolicones.
Inaddition,aspersistentandliphophilicsubstances,POPscanbebioaccumulatedinbodyandbio-magniedinnaturalecosystems[3].
Hence,thereisanurgentneedtodeterminephysico-chemicalpropertiesrequiredtoperformacomprehensiveriskassessmentforallPOPs.
Unfortunately,thenumberofallpossiblecongeners(similarcompoundsbasedonthesamecarbonskeleton,butdifferbyanumberofchlorine/bromineatomsandthesubstitutionpattern)isextensive.
Intotal,thereare1436structurallydifferentcongenersofElectronicsupplementarymaterialTheonlineversionofthisarticle(doi:10.
1007/s11224-011-9764-5)containssupplementarymaterial,whichisavailabletoauthorizedusers.
T.
Puzyn(&)A.
GajewiczA.
RybackaLaboratoryofEnvironmentalChemometrics,FacultyofChemistry,UniversityofGdansk,Sobieskiego18,80-952Gdansk,Polande-mail:t.
puzyn@qsar.
eu.
orgM.
HaranczykComputationalResearchDivision,LawrenceBerkeleyNationalLaboratory,OneCyclotronRoad,MailStop50F-1650,Berkeley,CA94720,USA123StructChem(2011)22:873–884DOI10.
1007/s11224-011-9764-5polychlorinatedandpolybrominatedbenzenes,biphenyls,dibenzo-p-dioxins,dibenzofurans,diphenylethers,andnaphthalenes(Fig.
1).
Thenumberofpossiblemixedchloro-andbromo-substitutedisatleastoneorderofmagnitudelarger[4].
Forsuchalargenumberofcom-pounds,empiricalmeasurementofthephysicochemicalpropertiesisimpossible,duetohighcostsandtimelimi-tationsoftheanalyticalprocedures.
Therefore,alternativemethodsforphysicochemicalcharacterizationofPOPsarerequired.
Averypromisinggroupofsuchmethodsisthequanti-tativestructure–propertyrelationships(QSPR)approach.
QSPRisbasedontheassumptionthateachphysico-chemicalpropertyinagroupofcompoundscanbeexpressedasamathematicalfunctionoftheirchemicalstructure,representedbyasetofso-calledmoleculardescriptors.
Thus,basedontheexperimentaldata,availableonlyforsomerepresentativesofthegroup,itispossibletointerpolatethelackingdataforcompounds,forwhichsuchdataaremissing,fromthecalculatedmoleculardescriptorsandasuitablemathematicalmodel[5–7].
TwopossibleQSPRmodelingstrategieshavebeendescribedinthelit-erature,namely:localandglobalmodels.
Localmodelsarerestrictedonlytoonespecicclassofchemicallyrelatedcompounds(e.
g.
,PCBs),whereasglobalmodelsaredevelopedforalargenumberofstructurallysimilargroupsofcompounds(e.
g.
,PCBs,PCNs,PCDDs,PCDFs,etc.
).
Itiswidelyacceptedthatthelocalmodelshavebetterpre-dictiveabilityincomparisonwiththeglobalmodels[8].
However,theglobalmodelsseemtobeveryattractivefromaneconomicpointofview,becausesuchamodelingstrategyenablestoadditionallysaveresourcesbypredict-ingnewdataforalargernumberofcompoundsatatime.
Theargumentagainsttheglobalmodelingisthatthisstrategymayleadtomechanisticoversimplicationsand/orhighererrorsinthepredicteddata[9].
Therefore,therearetwofundamentalquestionsrelatedtothetopic.
First:HowsignicantarethedifferencesintheresultsobtainedusinglocalandglobalQSPRsSecond,consequently:Isthereductionofthemodel'sdomain(toonlyonegroupofFig.
1Chemicalstructuresofparentmoleculesofbenzenes,biphenyls,dibenzo-p-dioxins,dibenzofurans,diphenylethers,andnaphthalenesusedtoconstructchlorine-substitutedcongeners874StructChem(2011)22:873–884123POPs)reallynecessarytoimprovethepredictivepowerofaQSPRmodelOurstudywasaimedtoanswerbothquestions.
MaterialsandmethodsGlobalandlocalQSPRsTondtheanswers,weinitiallyselectedonephys/chempropertyandonecongenericgroupofPOPs,namely:watersolubilityin25°Candpolychlorinatednaphthalenes.
Then,weperformedadetailedcomparisonbetweenthepredictionswithlocalandglobalQSPRsforthisgroup.
Thesolubilityhasbeenselected,becauseitisaproperty,importantinestimatingbothenvironmentaltransportandtoxicokineticsafterenteringthebody[3].
ThegroupofPCNs(containing75congeners)hasbeenselectedforthecasestudy,sincetheparentmolecule(naphthalene)isstructurallythesimplestpolycyclicaromatichydrocarbon.
Moreover,polychlorinatednaphthaleneswere,historically,thersteverintentionallysynthesizedPOPs(between1910sand1980s)[10].
Theglobalmodelhasbeendevel-opedtogetherfor11othergroupsofhalogenatedPOPs,namely:CBzs,BBzs,PCBs,PBBs,PCDDs,PBDDs,PCDFs,PBDFs,PCDEs,PBDEs,PBNs,andPCNs(1,436compoundsintotal).
WehypothesizedthatwatersolubilityobtainedfromalocalQSPRmodelshouldnotsubstantiallydifferfromthosepredictedwithaglobalQSPRmodelforPOPs,duetothesimilarityofcarbonskeletons,thelevelofhalogenationandthesubstitutionpatternsofthestudiedcompounds.
Toverify,whetherthehypothesisandconclusionscanbeextendedtotheotherphys/chempropertiesandgroupsofPOPs,weadditionallyperformedacrosscomparisonbetweenfewlocalandglobalQSPRs,collectedfromtheliterature.
DevelopmentoftheglobalQSPRmodelDevelopmentofahigh-qualityQSPRmodelwithgoodpredictiveabilityrequiresreliableexperimentaldata,ononehand,andappropriatemoleculardescriptorsontheotherone.
Theprocedurewefollowedwhenconstructingtheglobalmodelincludedvesteps:Step1:Experimentaldatacollectionandsplittingthecompounds,forwhichthedataareavailable,intoatrainingset(T)andavalidationset(V)ThecrucialconditionthatmustbemettoobtainaplausibleQSPRmodelishomogeneityandhigh-qualityoftheexperimentaldata.
Itisbecausethequalityofthedatasignicantlyinuencesthemodelingresults.
Thus,noonecanexpectfromthedatapredictedwiththemodeltobebetterthantheoriginaldatautilizedtodevelopingthemodel.
Inpractice,thismeansthattheexperimentaldatashouldbeobtainedinasystematicway,accordingtothesamestandardizedprotocol[11].
Thisstageminimizestheriskofobtaininghighlyuncertain,extrapolatedresultsfromtheQSPRmodeling.
ForthepurposeofdevelopingaglobalQSPRmodel,whichquantitativelydescribestherelationshipbetweenthemolecularstructureofthehalogenatedPOPsandwatersolubility(logS),wecollectedtheexperimentaldataonwatersolubilityoriginallydeterminedat25°C.
Thevaluesofsolubilityforpolychlorinatedbiphenyls(PCBs)weretakenfrom[12,13],forpolychlorinateddibenzo-p-dioxins(PCDDs)from[14],forpolychlorinateddibenzofuran(PCDFs)from[14],forpolychlorinated/polybrominateddiphenylethers(PCDEs/PBDEs)from[15,16],forpoly-chlorinatednaphthalenes(PCNs)from[17],andforpoly-chlorinatedbenzenes(CBz)from[18].
Theexperimentaldatahavebeenavailablefor121halogenatedcongenersofPOPsintotal.
Logarithmicvaluesofthesolubilityvariedbetween-2.
58and-10.
83[mol/dm3](formoredetails,pleaserefertotheelectronicSupplementarymaterial).
Next,the121congenersweresortedalongwiththedecreasingvaluesofwatersolubility.
Then,everyfourthcompoundwasmovedtotheso-calledvalidationset(anadditionalsetforfurtherexternalvalidationofthemodel),whiletheremainingcompoundsformedthetrainingset(fordevelopingthemodel).
Theapplicationofthis''three-to-one''splittingalgorithmensuredthatthebothtrainingandvalidationsetswerecontainthecompoundsevenlydistributedwithintherangeofthewatersolubility[19].
Thesplittingprocedureledtoatrainingandavalidationsetconsistedof91(75%)and30(25%)compounds,respectively.
Step2:CalculatingmoleculardescriptorsSimultaneously,wecombinatoriallygeneratedmolecularstructuresofallchloro-andbromo-substitutedcongeners(1436compounds)withtheConGENER[20]softwarepackage,whichisbasedonourearlierworkoncharacter-izationofcombinatoriallygeneratedlibrariesoftautomers[21].
Weutilizedthosestructuresasinputsforquantum-mechanicalcalculationswhichincludedtwostages:(i)optimizationofthemoleculargeometrywithrespecttotheenergygradientand(ii)calculationofthedescriptorsbasedontheoptimizedgeometry.
Thecalculationshavebeenperformedatthesemi-empiricallevelofthetheorywithuseofPM6method[22]inMOPAC2009softwarepackage[23].
Wecalculatedthefollowing26moleculardescriptors:thenumberofatomsinthemolecule(nAT),thenumberofStructChem(2011)22:873–884875123chlorinesubstituents(nX),themolecularweight(MW),thestandardheatofformation(HOF),theelectronicenergy(EE),thecore–corerepulsionenergy(Core),thetotalenergy(TE),thetotalenergyofthecorrespondingcation(TE),thestandardheatofformationinasolutionrepresentedbytheConductor-likeScreeningModel,COSMO(HOFc),thetotalenergyinasolutionrepresentedbyCOMSO(TEc),theverticalionizationpotential(IP),theenergyofthehighestoccupiedmolecularorbital(HOMO),theenergyofthelowestunoccupiedmolecularorbital(LUMO),theXvectorofthedipolemoment(Dx),theYvectorofthedipolemoment(Dy),theZvectorofthedipolemoment(Dz),thetotaldipolemoment(Dtot),thesolventaccessiblesurface(SAS),themolecularvolume(MV),thelowestnegativeMulliken'spartialchargeonthemolecule(Q-),thehighestpositivepartialchargeonthemolecule(Q),theaveragepolariz-abilityderivedfromtheheatofformation(Ahof),theaver-agepolarizabilityderivedfromthedipolemoment(Ad),Mulliken'selectronegativity(EN),ParrandPople'sabsolutehardness(Hard),andSchuurmannMOshiftalpha(Shift).
StepIII:CalibratingandinternalvalidationoftheQSPRmodelHavingboth,high-qualityexperimentaldataandmoleculardescriptors,wedevelopedQSPRmodelfollowingthegoldenstandardsandrecommendationsoftheOrganizationforEconomicCooperationandDevelopment(OECD)[24].
RegardingtotheveOECDrecommendations,anidealQSPRmodelshouldbeassociatedwith:(i)adenedendpoint;(ii)anunambiguousalgorithm;(iii)adenedapplicabilitydomain;(iv)appropriatemeasuresofgoodness-of-t,robustnessandpredictivity;(v)amechanisticinterpretation,ifpossible.
WeemployedthePartialLeastSquaresregressioncombinedwithageneticalgorithm(GA-PLS)astheche-mometricmethodofmodeling.
PLSisbasedonalineartransitionfromalargenumberoforiginaldescriptorstoasmallnumberofneworthogonalvariablesso-called''latentvectors''(LVs),beinglinearcombinationsoftheoriginaldescriptors[25].
InordertoselecttheoptimalcombinationofthemoleculardescriptorstobeutilizedinthenalQSPRmodel,weemployedtheHolland'sgeneticalgorithm(GA)[26].
Thealgorithmminimizesthepredictionerrorbysearchingforthemostoptimalcombinationofthedescriptors.
Thename''genetic''camefromfactthatthismathematicalprocedureusestherulesofDarwiniantheoryofevolution.
However,inthiscase,therulesareappliedto''populations''and''generations''ofmathematicalsolu-tions(i.
e.
,combinationsofthedescriptors),nottopopulationsandgenerationsoflivingorganisms.
Thealgorithmiscontrolledbyasetofsteeringparameters.
Inourstudies,wehavespeciedthefollowingones:thesizeofapopulation:124,thepercentageoftheinitialterms:40%,themaximumnumberofgenerations:100,theper-centageofconvergence:50%,themutationrate:0.
005,doublecross-over:thenumberofrepetitions:7.
GA-PLScalculationswereperformedwithMATLAB7.
6[27]andPLSToolbox5.
2[28].
AnintegralpartofQSPRmodelingistoappropriatelydescribethebordersoftheoptimumpredictionspaceofthemodel.
Thespace,so-calledapplicabilitydomain(AD),isdenedbythenatureofthecompoundsincludedinthetrainingset.
WeveriedtheapplicabilitydomainbyuseoftheWilliamsplot,whichistheplotoftheleveragevaluesversuscross-validatedstandardizedresiduals[29,30].
Theleveragevaluehiforeveryithcompoundiscalculatedasfollows:[31](Eq.
1):hixTiXTX1xi1wherexiisthevectorofdescriptorscalculatedfortheconsideredithcompoundandXisthematrixofdescriptorscalculatedforthewholetrainingset.
Thevalueofhigreaterthanthecriticalone(h*)meansthatthestructureofacompounddiffersfromthetrainingsetsignicantlyand,inconsequence,thecompoundfallsoutsidetheoptimumpredictionspaceofthemodel[32].
Thewarningvalueh*iscalculatedaccordingtothefor-mula(Eq.
2):h3p1n2wherepisthenumberofvariablesusedinthemodelandnisthenumberoftrainingcompounds.
However,factthathi[h*doesnotalwaysindicatethattheithtrainingcompoundisanoutlier.
Ithasbeenshownthattrainingcompoundswithhighleveragesandsmallresiduals(differencesbetweentheobservedandpredictedvalues)stabilizethemodelandmakeitmoreprecise.
Suchpointsareso-called''goodleverages.
''Onlythecom-poundswithhighleveragesandresidualshigherthan±3standarddeviationsunits(so-called''badleverages'')destabilizethemodel[33].
Inordertoproverobustnessofthemodelandreduceprobabilityofthemodel'sovertting,weperformedaninternalvalidation[29,34].
Forthispurpose,weemployedtheleave-one-outcross-validation(CV-LOO)algorithm,inwhichthesamecompoundswereusedalternatingforthetrainingandvalidation[30].
Goodness-of-t(i.
e.
,howwellthemodeltsthedata)wasmeasuredbythedeterminationcoefcientinthetrainingset(R2)andtherootmeansquareerrorofcalibration(RMSEc)876StructChem(2011)22:873–884123(Eqs.
3and4).
WhereasthequantitativeassessmentoftherobustnesswasexpressedbytheCV-LOOdeterminationcoefcient(QCV2),theabsoluteaveragerelativedeviation(AARD),androotmeansquareerrorofcross-validation(RMSECV)(Eqs.
3–7)[30].
R21Pni1yobsiypredi2Pni1yobsi"yobs23RMSECPnn1yobsiypredi2nvuut4Q2CV1Pni1yobsiypredcvi2Pni1yobsi"yobs25AARD100nXni1yobsiyprediyobsi6RMSECVPnn1yobsiypredcvi2nvuut7whereyiobsistheexperimental(observed)valueofthepropertyfortheithcompound,yipredthepredictedvaluefortheithcompound,yipredcvthepredictedvalueforthetem-poraryexcluded(cross-validated)ithcompound,"yobsthemeanexperimentalvalueofthepropertyinthetrainingset,nthenumberofcompoundsinthetrainingset.
StepIV:ExternalvalidationofthedevelopedQSPRmodelToconrmthemodel'spredictivepower,wecarriedouttheexternalvalidationbasedonthecompoundsthatwerenotpreviouslyengagedinthemodel'soptimizationand/orcalibration[30].
Weutilizedtheexternalvalidationcoef-cient(QExt2)andtherootmeansquareerrorofprediction(RMSEP)(Eqs.
8and9)asmeasuresoftheexternalpredictivity.
Q2Ext1Pkj1yobsjypredj2Pkj1yobsj"yobs28RMSEPPkj1yobsjypredj2kvuut9whereyjobsistheexperimental(observed)valueofthepropertyforthejthcompound,yjpredthepredictedvalueforjthcompound,"yobsthemeanexperimentalvalueofthepropertyinthevalidationset,andkthenumberofcom-poundsinthevalidationset.
StepV:ApplyingthemodeltopredicttheendpointvaluesfornewcompoundsWhentheQSPRmodelfulllsallthevalidationcriteria,itcanbeappliedtopredicttheproperty(i.
e.
,watersolubility)ofthosenewcompounds,forwhichtheexperimentaldatahavenotbeenavailable.
MethodologyofcomparinglocalandglobalQSPRmodelsParticularlocalandglobalmodelswerecomparedeachothertakingintoaccounttwoaspects:economyandqualityofeach.
Thenumberoftrainingcompoundsandapplica-bilitydomainofthemodelrepresentedtheeconomicaspect,whereasthemeasuresofgoodness-of-t,robust-ness,andpredictivity—thequalitativeaspect.
Inaddition,weemployedStudent'sttesttoverify,whethertheaverageresidualsfromthepredictionswithlocalandglobalQSPRmodelsdiffersignicantly(p\0.
05).
ResultsanddiscussionComparingglobalandlocalQSPRmodelsofwatersolubilityAsmentioned,atrstweperformedacomparisonbetweentwoQSPRmodelsofwatersolubility(logS)developedbyourgroup.
Therstmodelwasdevelopedwithinthisstudy,whereasthesecondQSPRwastakenfromoneofourpreviouscontributions.
GlobalQSPRmodelofwatersolubilityWhenappliedtheve-stepprocedureofQSPR,includingGA-PLSmethod,weobtainedastatisticallysignicant(p\0.
05)globalmodel,capabletosuccessfullypredictthevaluesoflogSfor1436halogenatedPOPs.
Themodelutilizedthreelatentvectors(LVs)explainingtogether95%(57%17%21%)ofthetotalvarianceinthemolec-ulardescriptorsand93%(90%2%1%)ofthevari-anceinthemodeledendpoint(logS).
AlthoughtheGA-PLSmethodusesorthogonallatentvectorsforregression,itisalsopossibletoderive''quasi-regression''coefcientsfororiginaldescriptors(Eq.
10),keepinginmindthatthesecoefcientscannotbeindividuallyinter-preted,becausetheyarenotindependent[25].
logS0:287nAT0:293nX0:191LUMO0:320SAS0:085Q0:126Shift10TheglobalQSPRwascharacterizedbythesatisfactorygoodness-of-t,therobustness,andtheexternalpredictiveStructChem(2011)22:873–884877123performance(thestatisticalmeasuresaresummarizedinTable1).
AvisualcorrelationbetweentheexperimentalandpredictedvaluesoflogSispresentedinFig.
2a.
Themodelcanbeintuitivelyinterpreted,accordingtothephysicochemicaltheoryofdissolvation.
Thetheorydividesthewholeprocessintosixstages,namely:(i)breakingupsolute–soluteintermolecularbonds;(ii)breakingupsolvent–solventintermolecularbonds;(iii)formationofacavityinthesolventphaselargeenoughtoaccommodatesolutemolecule;(iv)vaporizationofsoluteintothecavity;(v)formingsolute–solventinter-molecularbonds;and(vi)reformingsolvent–solventbondswithsolventrestructuring.
Thus,sinceformationofthecavityappropriateforhighlyhalogenated,largemoleculesrequiresmoreenergy,thesolubilityoflargercongenersislower,whencomparingwithlesshalogenatedandsmallercongeners.
Thisfactorisrepresentedinthemodelequation(Eq.
10)bythreedescriptors:SAS,nAT,andnXthathaveanegativecontributiontothesolubility(i.
e.
,thesolubilityincreaseswhenthesolventaccessiblesurface,thenumberofatoms,andthenumberofhalogensubstituentsdecrea-ses).
Similarly,thedescriptorsthatarerelatedtoelectro-staticinteractions(e.
g.
,forminghydrogenbonds)betweenthesolventandsoluteandchemicalreactivity,namely:LUMO,Q,Shift,positivelycontributethesolubility.
Itisbecausetheprocessofformingsolute–solventintermo-lecularbondsfacilitatesdissolvation.
LocalQSPRmodelofwatersolubilityThelocalmodel,originallycalibratedonlyforagroupof75polychlorinatednaphthalenes,hasbeenadaptedfromourpreviouspaper[35].
Itwasbasedoneighttheoreticalmoleculardescriptors,calculatedexclusivelyfromthechemicalstructuresattheDensityFunctionalTheory(DFT)levelwiththeB3LYPfunctionaland6-311G(d,p)basisset.
Acombinationofthoseeightdescriptorsformedonelatentvector,utilizedthenasanindependentvariabletoconstructaone-variableGA-PLSmodel.
Themodelexplained93%ofthestructuralvariance(varianceinthedescriptors)and96%ofthevarianceinlogS.
Thisone-variablemodelcanbealternativelyexpressedinthequasi-regressionform(Eq.
11):logS0:109nClp10:123HOMO0:131Hard0:129Et0:131SASw0:132SAVw0:131DEw0:129TNEw11wherenClp1isthenumberofchlorineatomsintherstaromaticring,HOMOtheenergyofthehighestoccupiedmolecularorbital,Hardthemolecularhardness,Etthetotalenergyofthemolecule,SASwthesolventaccessiblemolecularsurfaceareainthewater,SAVwthesolventaccessiblemolecularvolumeinthewater,DEwthedis-persionenergyinthewater,andTNEwthetotalnon-electrostaticenergyofsolvation.
HighvaluesofR2,QCV2,andQExt2,aswellaslowvaluesofthesquarederrors:RMSEC,RMSECV,andRMSEP(Table1)conrmedthatthemodelwaswell-tted,robust,anddemonstrateditsgoodpredictiveability.
TheexistenceofastronglinearcorrelationbetweentheobservedandTable1ComparisonofstatisticalparametersbetweenlocalandglobalGA-PLSmodelsoflogSFeatureMeasureLocalQSPRmodelGlobalQSPRmodelGoodness-of-tR20.
960.
93RMSEC0.
280.
46RobustnessQCV20.
960.
93RMSECV0.
350.
48PredictivityQExt20.
960.
95RMSEP0.
200.
39Fig.
2TheexperimentallydeterminedvaluesoflogSversusthevaluesoflogSpredictedbyglobal(a)andlocal(b)QSPRmodels878StructChem(2011)22:873–884123predictedvaluesoflogShasbeengraphicallyproved(Fig.
2b).
DetailsonthelocalQSPR'sdevelopmentcanbefoundintheoriginalpaper[35].
Itshouldbementioned,however,thattheinterpretationoftheuseddescriptorsisverysimilartothosefortheglobalmodel.
Accordingtoourpreviouscontribution[35],thedescriptorsrefertothecavitationprocess(SASwandEt)aswellastothedisper-sive(DEwandTNEw)andelectrostatic(nClp1,HOMO,andHard)interactions.
ResultsofthecomparisonWheneversomeonewantstocomparetwoQSPRmodels,oneusuallystartsfromevaluatingtheirstatisticalcharac-teristics.
Withoutdoubts,themeasuresofgoodness-of-t,robustness,andpredictivity(Table1)favorthelocalQSPR.
Highercorrelationcoefcients(R2,QCV2,andQExt2)anduptotwotimeslowervaluesoftherootmeansquareerrorsforboththetrainingandthevalidationsetsincomparisontotheglobalmodelprovedthatlocalmodelwasmoreaccurateandhadbetterperformanceofexploringrelationshipsbetweenthestructureandwatersolubilityofPOPs.
Thisconclusionisalsosupportedbyanalysisoftwoplots(Fig.
3)presentingresidualscalculatedforchloro-naphthalenesbasedonthepredictionswithglobalandlocalQSPRs.
Note,theresidualswerecalculatedonlyfor15PCNs,forwhichtheexperimentallydetermineddataonwatersolubilityhavebeenavailable.
Incaseofthelocalmodelthatcoveredanarrowcalibrationdomain(consistedofverysimilarchloronaphthalenecongenersonly),thepredictionerrorswereconsiderablylowerthanthepre-dictionerrorsoftheglobalmodelwithawiderdomain(allPOPs).
ByemployingStudent'sttest,weconrmedthattheaverageresiduals(for15PCNs)forbothmodelsdif-feredsignicantly(t=4.
40,p=0.
0006).
Therefore,fromthequalitativepointofview,anappli-cationofthelocalmodelshouldberecommendedasbeingmoreaccurateandprecise.
However,theperformanceoftheevaluatedglobalmodelforPOPswasstillfairlygoodincomparisonwithother,moregeneralQSPRs.
Forinstance,Delaney[36]puttogetherstatisticsof10recentlypub-lishedQSPRmodelsofwatersolubility,calibratedontrainingsetscontainingbetween150and2874compounds.
Then,themodels'predictivitywastestedonthesame21compoundshavingacommonchemicalstructure.
Theauthorfoundthestandarderrorsofpredictionforthose21chemicalsvariedbetween0.
55and0.
91logarithmicunits.
Regardingthatthehigherresidualobservedforour''worse''globalmodelforPOPswasaboutonelogarithmicunit,itmaybeconcludedourglobalmodelpredictswatersolubilityuptothreetimesbetterthanthegeneralmodelsreviewedbyDelaney.
Fromtheeconomicalpointofview,anoptimalQSPRmodelshouldcharacterizebytwofeatures:(i)itshouldbebasedonpossiblysmallnumberoftraining/validationcompounds,withoutnecessitytoperformextensiveexperimentalworkand,simultaneously,(ii)itshouldensuremakingpredictionswithinapossiblywideappli-cabilitydomain.
TheminimalnumberofcompoundsrequiredfordevelopingaQSPRmodelisdenedbytheratiobetweenthenumberofdescriptorsandtrainingcompounds.
AccordingtothecriterionproposedbyTopplissandCos-tello[37],thisratioshouldbeatleast5:1.
Thelocalmodelthatutilizedonevariable(latentvector)hasbeencalibratedon10trainingcompounds,whereastheglobalmodelthatutilizedthreelatentvectorshasbeencalibratedon91trainingcompounds.
Thus,bothstudiedmodelsmetthecriterion,sincetheratioswere10:1forthelocalmodeland30:1fortheglobalmodel,respectively.
Fig.
3Residualvalues(inlogunits)calculatedfor15chloronaphn-thalenecongenersbasedonthepredictionswithglobal(a)andlocal(b)QSPRmodelsStructChem(2011)22:873–884879123Therearenoformalrequirementsrelatedtothenumberofvalidationcompounds,butdifferentauthorsgivesomerecommendations,basedontheirexperience.
Forinstance,Gramatica[30]recommendshavingatleastvecom-poundstoperformanappropriateexternalvalidation.
Bothmodelsfullledthisrecommendation,sincethenumberofvalidationcompoundswas5forthelocaland30fortheglobalQSPR.
However,accordingtoourexperience,whenthevalidationsetissmall(ofabout10compoundsandless),theresultsofexternalvalidationcouldbelessreli-able.
Itisbecause,insuchacase,thevalidationstatistics(QExt2andRMSEP)stronglydependonthesplittingalgo-rithm.
Indeed,theycansignicantlychange,whenonevalidationcompoundisreplacedwithanotherone[38].
Therefore,theexternalvalidationofourglobalmodeloflogSseemstobemorereliableincomparisontotheexternalvalidationofthelocalone.
ApplicabilitydomainsofbothmodelswereveriedbyusingtheWilliamsplots(Fig.
4).
TheglobalmodelhasbeencalibratedandvalidatedoncongenersofCBzs(10trainingand2validationcompounds),PCDEs(25trainingand6validationcompounds),PBDEs(6trainingand3validationcompounds),PCBs(24trainingand10valida-tioncompounds),PCDDs(11trainingand4validationcompounds),PCDFs(6trainingand2validationcom-pounds),andPCNs(9trainingand2validationcom-pounds).
Watersolubilityofallvalidationcompoundswaspredictedwiththeresidualslowerthanthecriticalthresh-oldvalues(0±3standarddeviations).
ThismeansthemodelcanbesuccessfullyappliedforpredictingthevaluesoflogSforallsevengroupsofPOPslistedabove.
Inter-estingly,threecompoundsfromthetrainingset(Fig.
4a)hadtheleveragevalueshigherthanthecriticalone(h*=0.
14).
Thecompoundsareperchlorinatedbenzene(CBz-12),perchlorinatednaphalene(PCN-75),andper-chlorinatedbiphenyl(PCB-209).
But,simultaneously,theirresidualswerelow.
Thissuggeststhemodeliswellstabi-lizedbytheexistenceofso-called''goodleveragepoints.
''Inaddition,themodelisprobablycapabletoperformreliablepredictionsforthecompoundsnotdifferingsub-stantiallyfromthetrainingset,butformallysituatedout-sideoftheapplicabilitydomain.
Thelastconclusion,however,shouldbeconrmedbyanadditionaltestingwithanadditionalvalidationsetofcompoundsthathavehighleveragevalues.
Inasimilarway,lowresidualsandleveragevaluesforall10trainingand5validationcom-pounds(Fig.
4b)conrmedthatthelocalmodelcanbeappliedtomakesatisfyingpredictionsofwatersolubilitywithinthegroupofchloronaphthalenes.
Thelastaspectthatshouldbetakenintoaccountwhencomparingbothmodelsistheselectionofmoleculardescriptorsemployedineachcase.
Onecanbesurprisedthatwecomparedtwomodelsutilizingthedescriptorscalculatedatdifferentlevelsoftheory(i.
e.
,theglobalmodelhasbeendevelopedbasedonmoleculardescriptorsfromsemiempiricalPM6calculations,whereasthelocalmodelusedDFTdescriptors).
However,wepreviouslydemonstrated[39]thateventualdifferencesinthenumer-icalvaluesofmoleculardescriptorsforPOPscalculatedwithbothmethodscouldbeneglected.
WeprovedthatQSPRmodelsemployingthedescriptorscalculatedatthelevelofnovelsemiempiricalmethods(PM6andRM1)wereofsimilaraccuracythatthemodelsutilizingdescriptorsfromDFT(B3LYPfunctionalwith6-311G(d,p)basisset).
Thislevelofaccuracywasoutofreachforthemodelsemployingearliersemiempiricalmethods(e.
g.
,PM3andAM1).
Moreover,itmaybeunclearwhy,whenputtingtogetherbothmodelequations(Eqs.
10and11)forthesameproperty(logS),theselecteddescriptorsaredifferent(e.
g.
,Fig.
4Williamsplotdescribingapplicabilitydomainsofglobal(a)andlocal(b)QSPRmodels.
Dottedlinesrepresenttheresidualthreshold(0±3standarddeviationunits),andthecriticalleveragevalue(h*),respectively880StructChem(2011)22:873–884123LUMOinEq.
10andHOMOinEq.
11)Toclarifytheseapparentcontradictions,oneneedstorefertothetheoryofdissolvation(describedinsection''GlobalQSPRmodelofwatersolubility'')andtokeepinmindthreefollowingimportantissues.
First,thequantum–mechanicaldescriptorsthatweusedareinternallycorrelated.
Thus,theyformgroupsofdescriptorsrelatedtothesame''global''property(latentvectors)and,becauseofthat,havingverysimilarmeaning.
Inconsequence,onedescriptorfromtheparticulargroup(latentvector)canbereplacedwithanotheronefromthesamegroupwithoutchangingoftheglobalinterpretationofthemodel.
Forinstance,inagroupofchlorinatedcong-eners,bothtotalenergyandthesolventaccessiblesurfaceareamainlydependonthenumberofchlorinesubstituentspresentinmoleculeswiththesamecarbonskeleton.
Therefore,inthiscontext,bothdescriptorshaveverysimilarmeaning.
Forthatreason,wedecidedtousePLSmethodofmodelinginsteadofmuchsimplerandmoreintuitivelyinterpretativemultiplelinearregression(MLR).
Second,moleculardescriptorsforbothlocalandglobalmodelswereselectedwithuseofthegeneticalgorithm.
Thealgorithmis,infact,anautomaticprobability-basedprocedure,blindonthemechanisticinterpretation.
Ineffect,whenthealgorithmhasachoicebetweentwostronglycorrelateddescriptorsrelatedtothesame''global''property(seeabove),itmightselecttherstortheseconddescriptoronlybychance.
Third,whenconsideringalocalmodel,developedforonlyonecongenericgroup(i.
e.
,polychlorinatednaphtha-lenes),themodelismuchmoresensitiveonthenumberofsubstituents(chlorineatoms)andthesubstitutionpatternthantheglobalmodelcalibratedformoregroups,inwhichthemaindifferencesbetweenparticularcompoundsarerelatedtotheircarbonskeletons(i.
e.
,thenumberofaro-maticrings,presenceofheteroatoms,etc.
).
Hence,nooneshouldexpectexactlythesamemodelequationsfortheglobalandlocalmodelsbeingcomparedinourstudy.
Inthecontextofthedissolutionmechanism,threestructuralfeatures(''global''properties)ofPOPs'congenersseemtobeveryimportant.
Theyare:(i)thesizeoftheparentmolecule(carbonskeleton),(ii)thetypeandthenumberofsubstituentspresentinthemolecule,and(iii)thesubstitutionpattern.
Therst''global''propertyisobviouslyrelatedtothecavitationprocess.
Weobservedthatthesolubilitydecreaseswiththeincreasingsizeofthemolecule.
Thetypeandthenumberofsubstituentsare,ofcourse,alsostronglyrelatedtothesize,andconsequently,tothecavitationstage.
Generally,moleculesbasedonthesameskeletonandsubstitutedwiththesamenumberofbromineatomsarelesssolublethantheirchlorinatedana-logues,duetolargerradiusofbrominesubstituentsincomparisontochlorineatoms.
Similarly,thesolubilityincreaseswiththeincreasingnumberofhalogensubstitu-ents(e.
g.
,monochloronaphthalenesaremoresolublethandichloronaphthalenes).
Thedescriptorsrelatedtothefac-torsinuencingthecavitationprocess,namely:thesizeofamolecule,thetypeandnumberofsubstituentsarenAT,nX,andSASinEq.
10,aswellasEt,SASw,SAVw,DEw,andTNEwinEq.
11.
Thesubstitutionpatternisthemainfactordecidingondifferencesinsolubilitybetweencongenerscontainingthesamenumberofsubstituentsofthesametype.
Differencesinthedistributionofthesubstituents(overthesamecarbonskeleton)decideondifferencesinpolarityofparticularcongeners.
Forexample,1,2,3,4-tetrachloronaphthaleneismorepolarthan2,3,6,7-tetrachloronaphthalene.
Subse-quently,electrostaticinteractionswithwaterasasolventarestrongerinthesecondcase.
Thus,thesecondcongenerinthispairismoresoluble.
Interestingly,aswedemon-stratedinmanypreviouscontributions[35,40–42]suchdescriptorsasHOMOandLUMOarestronglydependentonthesubstitutionpattern.
Thus,instudytheyshouldnotbeinterpretedasthosedescribingredoxpropertiesofthemolecules(accordingtothewell-knownKoopman'sthe-orem),butrathertheirsubstitutionpatterns.
AnotherdescriptorsrelatedtothesubstitutionpatternareQandShiftinEq.
10,aswellasnClp1andHardinEq.
11.
Therefore,themechanisticinterpretationofbothglobalandlocalQSPRmodelswouldbeverysimilar.
Insummary,fromtheeconomicalpointofview,bothmodelsareacceptable,sincetheyrequirearelativelysmallnumberofexperimentaldata.
Infact,botharebasedonthedatatakenfromtheliterature,thusperformingofanyextensiveempiricalworkwasunnecessary.
However,theuseoftheglobalQSPRwouldbemoreprotable,becauseitenablestomakepredictionsforthosegroupsofPOPs,forwhichthenumberofexperimentaldataisinsufcienttodevelopappropriatelocalmodels.
Forexample,theexperimentallydetermineddataonwatersolubilityareavailableonlyforeightcongenersofPCDFs,whichisevidentlytoosmallforcalibratingandvalidatingalocalmodel.
Moreover,timeand,inconse-quence,costsofobtainingthepredictedvaluesoflogScanbesignicantlyreducedbyemployingtheglobalmodelingscheme.
ComparingotherglobalandlocalQSPRmodelsInaddition,toextendtheinvestigationsontheotherphys/chemproperties,weperformedsimilarpairwisecompari-sonsfortheother,previouslypublishedQSPRmodels.
Weusedtwoofourpreviousglobalmodelsdevelopedforpredictingn-octanol/waterpartitioncoefcient(logKOW)[40]andsubcooledliquidvaporpressure(logPL)[42],StructChem(2011)22:873–884881123respectively,foragroupof1436POPs,includingchloro-andbromo-analoguesofdibenzo-p-dioxins,dibenzofurans,biphenyls,naphthalenes,diphenylethers,andbenzenes.
Themodelswerecomparedwithtwocorrespondinglocalmodelspublishedbyothergroups.
TherstonewasdesignedtopredictlogKOWfor209PCBs[43],whereasthesecondonethevaluesoflogPLfor210congenersofPCDDsandPCDFs[44].
Interestingly,theconclusionsfrombothcomparisons(basedonpredictinglogKOWandlogPL)areevenmoreoptimisticthanthatforlogS.
Thestatisticalmeasuresofgoodness-of-tandrobustnesswereverysimilarinpairsforcorrespondingglobalandlocalmodels(Table2).
Moreover,theobserveddifferencesbetweentheexperi-mentallymeasuredandpredictedvaluesofbybothmeth-odsofmodeling(i.
e.
,localandglobal)werenotstatisticallysignicant(p[0.
05)(Table3),whichwasconsistentwithourassumption.
Regardingthat(i)bothglobalmodelshavebeendevelopedforamuchwiderapplicabilitydomain(coveringofabout85%morecom-pounds)and(ii)theypracticallydidnotdifferfromtheirlocalcounterpartsinquality,weconcludedthattheemploymentofglobalQSPRswouldbemuchmoreef-cientthenthedevelopmentofparticularlocalones.
ConclusionsWehaveveriedtheefciencyoftwomodelingstrategies.
Therstoneassumesthereductionofthemodel'sdomainandthedevelopmentofQSPRbasedonasmallnumberofstructurallysimilarcompounds(localQSPR).
Accordingtothesecondone,themodeliscalibratedwithuseofthewiderandmorestructurallydiversiedtrainingset(globalQSPR),evenifthisleadstoasmalldecreaseofthemodel'spredictivity.
Basedontheobtainedresults,werecommendthatwheneverglobalmodelsfulllallqualitycriteriaproposedbytheOrganizationforEconomicCooperationandDevelopment(OECD),theyshouldbeappliedinpracticewithoutnecessityofdevelopingaseriesoflocalQSPRs.
Sucharecommendationisreasonable,becauseofthreereasons.
First,theglobalmodelsallowforsimultaneouspredictionsofphysicochemicalpropertiesforevenmanyhundredsofcompounds.
Thisfeatureisveryimportantfromtheeconomicpointofview,regardingthatthenumberofnewchemicalssynthesizedand/oridentiedintheenvironmentalcompartmentsisgrowingexponentially.
Second,theglobalmodelingapproachmaybetheonlypossibilityofmodeling,whenthenumberofchemicalsfromonespecicclassofthechemicallyrelatedcom-poundsisinsufcienttocalibrateandappropriatelyvali-datealocalQSPRmodel.
Third,asdemonstrated,theperformance(predictiveability)ofglobalmodelsisnotalwaysworsethantheseoflocalones.
AcknowledgmentsTheauthorsthanktheeditorsforrapidlycon-sideringoursubmissionandtheanonymousreviewerforvaluablecomments,whichhelpedtoimprovescienticqualityofthiscontri-bution.
T.
P.
thankstheFoundationforPolishScienceforgrantinghimwithafellowshipandaresearchgrantinframeoftheHOMINGProgramsupportedbyNorwegianFinancialMechanismandEEAFinancialMechanisminPoland.
ThisworkwassupportedbytheTable2StatisticalparametersoflocalandglobalmodelsoflogPLandlogKOWaTakenfrom[43],btakenfrom[44]n/aNodataprovidedintheoriginalpaperModelFeatureMeasureLocalQSPRmodelGlobalQSPRmodellogPLGoodness-oftR20.
99a0.
97RMSECn/a0.
21RobustnessQCV20.
96a0.
97RMSECVn/a0.
22PredictivityQExt2n/a0.
97RMSEPn/a0.
22logKOWGoodness-oftR20.
91b0.
92RMSEC0.
23b0.
32RobustnessQCV20.
91b0.
92RMSECV0.
23b0.
32PredictivityQExt2n/a0.
92RMSEPn/a0.
30Table3Comparisonbetweentheresidualsderivedfromthepre-dictionsoflogPLandlogKOWwithlocalandglobalGA-PLSmodelsbytheStudentttestModelStatisticslogPLTeststatistic2.
01pValue0.
051logKOWTeststatistic1.
81pValue0.
072882StructChem(2011)22:873–884123PolishMinistryofScienceandHigherEducation,GrantNo.
DS/8430-4-0171-11.
Thisresearchwassupportedinpart(toM.
H.
)bytheU.
S.
DepartmentofEnergyundercontractDE-AC02-05CH11231.
OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttributionNoncommercialLicensewhichper-mitsanynoncommercialuse,distribution,andreproductioninanymedium,providedtheoriginalauthor(s)andsourcearecredited.
References1.
YangG,ZhangX,WangZ,LiuH,JuX(2006)Estimationoftheaqueoussolubility(-lgSw)ofallpolychlorinateddibenzo-furans(PCDF)andpolychlorinateddibenzo-p-dioxins(PCDD)cong-enersbydensityfunctionaltheory.
JMolStructTheochem766:25–332.
Rotkin-EllmanM,NavarroKM,SolomonGM(2010)Gulfoilspillairqualitymonitoring:lessonslearnedtoimproveemer-gencyresponse.
EnvironSciTechnol44:8365–83663.
UNEP(2001)Stockholmconventiononpersistentorganicpol-lutants.
UnitedNationsEnvironmentProgramme,Geneva4.
HaranczykM,PuzynT,NgEG(2010)Onenumerationofcongenersofcommonpersistentorganicpollutants.
EnvironPollut9:2786–27895.
SchultzTW,CroninMTD,WalkerJD,AptulaAO(2003)Quantitativestructure–activityrelationships(QSARs)intoxi-cology:ahistoricalperspective.
JMolStructTheochem622:1–226.
CastroEA,ToropovaAP,ToropovAA,MukhamedjanovaDV(2005)QSPRmodelingofGibbsfreeenergyoforganiccom-poundsbyweightingofnearestneighboringcodes.
StructChem16:305–3247.
GolmohammadiH,DashtbozorgiZ(2010)Quantitativestruc-ture–propertyrelationshipstudiesofgas-to-wetbutylacetatepartitioncoefcientofsomeorganiccompoundsusinggeneticalgorithmandarticialneuralnetwork.
StructChem21:1241–12528.
O¨bergT,LiuT(2008)GlobalandlocalPLSregressionmodelstopredictvaporpressure.
QSARCombSci27:273–2799.
LeiB,MaY,LiJ,LiuX,YoX,GramaticaP(2010)Predictionoftheadsorptioncapabilityontoactivatedcarbonofalargedatasetofchemicalsbylocallazyregressionmethod.
AtmosEnviron44:2954–296010.
HaywardD(1998)Identicationofbioaccumulatingpolychlo-rinatednaphthalenesandtheirtoxicologicalsignicance.
EnvironRes76:1–1811.
DeardenJC,CroninMTD,KaiserKLE(2009)Hownottodevelopaquantitativestructure–activityorstructure–propertyrelationship(QSAR/QSPR).
SARQSAREnvironRes20:241–26612.
DunnivantFM,ElzermanAW(1988)AqueoussolubilityandHenry'slawconstantforPCBcongenersforevaluationofquantitativestructure-propertyrelationships(QSPRs).
Chemo-sphere17:525–54113.
MillerMM,GhodbaneS,WasikSP,TewariYB,MartireDE(1984)Aqueoussolubilities,octanol/waterpartitioncoefcients,andentropiesofmeltingofchlorinatedbenzenesandbiphenyls.
JChemEngData29:184–19014.
GoversHAJ,KropHB(1998)Partitionconstantsofsomechlo-rinateddibenzofurans,anddibenzo-p-dioxins.
Chemosphere37:2139–215215.
RuelleP,KesselringUW(1997)Aqueoussolubilitypredictionofenvironmentallyimportantchemicalsfromthemobileorderthermodynamics.
Chemosphere34:275–29816.
TittlemierSA,HalldorsonT,SternGA,TomyGT(2002)Vaporpressures,aqueoussolubilities,andHenry'slawconstantsofsomebrominatedameretardants.
EnvironToxicolChem21:1804–181017.
OpperhuizenA,VeldeEW,GobasFAP,LiemDAK,SteenJMD(1985)Relationshipbetweenbioconcentrationinshandstericfactorsofhydrophobicchemicals.
Chemosphere14:1871–189618.
DoucetteWJ,AndrenAW(1988)Aqueoussolubilityofselectedbiphenyl,furan,anddioxincongeners.
Chemosphere17:243–25219.
HewittM,CroninMTD,MaddenJC,RowePH,JohnsonC,ObiA,EnochSJ(2007)ConsensusQSARmodels:dothebenetsoutweighthecomplexityJChemInfModel47:1460–146820.
HaranczykM,PuzynT,SadowskiP(2008)ConGENER—atoolformodelingofthecongenericsetsofenvironmentalpollutants.
QSARCombSci27:826–83321.
HaranczykM,GutowskiM(2007)Quantummechanicalenergy-basedscreeningofcombinatoriallygeneratedlibraryoftautom-ers.
TauTGen:atautomergeneratorprogram.
JChemInfModel47:686–69422.
StewartJJP(2007)OptimizationofparametersforsemiempiricalmethodsV:modicationofNDDOapproximationsandappli-cationto70elements.
JMolModel13:1173–121323.
StewartJJP(2009)In:ChemistrySC(ed)MOPAC2009http://openmopac.
net/MOPAC2009.
html.
Accessed14April201024.
OECD(2004)OECDprinciplesforthevalidation,forregulatorypurposes,of(quantitative)structure–activityrelationshipmodels.
In:37thjointmeetingofthechemicalscommitteeandworkingpartyonchemicals,pesticidesandbiotechnology.
OrganisationforEconomicCo-OperationandDevelopment,Paris25.
WoldS,Sjo¨stro¨mM,ErikssonL(2001)PLS-regression:abasictoolofchemometrics.
ChemometrIntellLabSyst58:109–13026.
HollandJ(1992)Adaptationinnaturalandarticialsystems.
MITPress,Michigan,MI27.
MATLAB(2008)MATLAB7.
6.
0.
324.
Mathworks28.
PLS_Toolbox(2009)PLS_Toolbox5.
2.
EigenvectorResearchInc.
,Wenatchee,WA29.
TropshaA,GramaticaP,GombarVK(2003)Theimportanceofbeingearnest:validationistheabsoluteessentialforsuccessfulapplicationandinterpretationofQSPRmodels.
QSARCombSci22:69–7730.
GramaticaP(2007)PrinciplesofQSARmodelsvalidation:internalandexternal.
QSARCombSci26:694–70131.
AtkinsonAC(1985)Plots,transformations,andregression.
Anintroductiontographicalmethodsofdiagnosticregressionanal-ysis.
OxfordStatisticalScienceSeries,Oxford32.
NetzevaTI,WorthAP,AldenbergT,BenigniR,CroninMTD,GramaticaP,JaworskaJ,KahnS,KlopmanG,MarchantCA,MyattG,Nikolova-JeliazkovaN,PatlewiczGY,PerkinsR,RobertsDW,SchultzTW,StantonDT,vandeSandtJJM,TongW,VeithG,YangC(2005)Currentstatusofmethodsfordeningtheapplicabilitydomainof(quantitativestructure–activityrelationships).
ThereportandrecommendationsofECVAMworkshop52.
AlternLabAnim33:155–17333.
JaworskaJ,Nikolova-JeliazkovaN,AldenbergT(2005)QSARapplicabilitydomainestimationbyprojectionofthetrainingsetindescriptorspace:areview.
AlternLabAnim33:445–45934.
OECD(2007)Guidancedocumentonthevalidationof(quanti-tative)structure–activityrelationships(QSAR)models.
Organi-sationforEconomicCo-OperationandDevelopment,Paris35.
PuzynT,MostragA,FalandyszJ,KholodY,LeszczynskiJ(2009)Predictingwatersolubilityofcongeners:chloronaphtha-lenes—acasestudy.
JHazardMater170:1014–1022StructChem(2011)22:873–88488312336.
DelaneyJS(2005)Predictingaqueoussolubilityfromstructure.
DrugDiscovToday10:289–29537.
ToplissJG,CostelloRJ(1972)Chancecorrelationsinstructure–activitystudiesusingmultipleregressionanalysis.
JMedChem15:1066–106838.
PuzynT,Mostrag-SzlichtyngA,GajewiczA,SkrzynskiM,WorthAP(2011)InvestigatingtheinuenceofdatasplittingonthepredictiveabilityofQSAR/QSPRmodels.
StructChem39.
PuzynT,SuzukiN,HaranczykM,RakJ(2008)Calculationofquantum-mechanicaldescriptorsforQSPRattheDFTlevel:isitnecessaryJChemInfModel48:1174–118040.
PuzynT,SuzukiN,HaranczykM(2008)HowdothepartitioningpropertiesofpolyhalogenatedPOPschangewhenchlorineisreplacedwithbromineEnvironSciTechnol42:5189–519541.
PuzynT,MostragA,SuzukiN,FalandyszJ(2008)QSPR-basedestimationoftheatmosphericpersistenceforchloronaphthalenecongeners.
AtmosEnviron42:6627–663642.
GajewiczA,HaranczykM,PuzynT(2010)Predictinglogarith-micvaluesofthesubcooledliquidvaporpressureofhalogenatedpersistentorganicpollutantswithQSPR:howdifferentarechlorinatedandbrominatedcongenersAtmosEnviron44:1428–143643.
PadmanabhanJ,ParthasarathiR,SubramanianV,ChattarajPK(2006)QSPRmodelsforpolychlorinatedbiphenyls:n-octanol/waterpartitioncoefcient.
BioorgMedChem14:1021–102844.
YangP,ChenJ,ChenS,YuanX,ScharmmKW,KettrupA(2003)QSPRmodelsforphysicochemicalpropertiesofpoly-chlorinateddiphenylethers.
SciTotalEnviron305:65–76884StructChem(2011)22:873–884123
10gbiz怎么样?10gbiz 美国万兆带宽供应商,主打美国直连大带宽,真实硬防。除美国外还提供线路非常优质的香港、日本等数据中心可供选择,全部机房均支持增加独立硬防。洛杉矶特色线路去程三网直连(电信、联通、移动)回程CN2 GIA优化,全天低延迟。中国大陆访问质量优秀,最多可增加至600G硬防。香港七星级网络,去程回程均为电信CN2 GIA+联通+移动,大陆访问相较其他香港GIA线路平均速度更...
飞讯云官网“飞讯云”是湖北飞讯网络有限公司旗下的云计算服务品牌,专注为个人开发者用户、中小型、大型企业用户提供一站式核心网络云端部署服务,促使用户云端部署化简为零,轻松快捷运用云计算。飞讯云是国内为数不多具有ISP/IDC双资质的专业云计算服务商,同时持有系统软件著作权证书、CNNIC地址分配联盟成员证书,通过了ISO27001信息安全管理体系国际认证、ISO9001质量保证体系国际认证。 《中华...
VPSDime是2013年成立的国外VPS主机商,以大内存闻名业界,主营基于OpenVZ和KVM虚拟化的Linux套餐,大内存、10Gbps大带宽、大硬盘,有美国西雅图、达拉斯、新泽西、英国、荷兰机房可选。在上个月搞了一款达拉斯Linux系统VPS促销,详情查看:VPSDime夏季促销:美国达拉斯VPS/2G内存/2核/20gSSD/1T流量/$20/年,此次推出一款Windows VPS,依然是...
bbzs为你推荐
解压程序下RAR那个解压软件易pc华硕易PC怎么样?性价比到底怎么样?深圳公交车路线深圳公交车路线查询打开网页出现错误我打开网页老出现错误是怎么了?ghostxp3GHOSTxp sp3系统有什么优点和缺点???网站联盟网站联盟的运作流程今日热点怎么删除今日热点怎么卸载删除 今日热点新闻彻底卸载删彩信中心移动的彩信中心是?主页是?收不到彩信,怎么设置?ios7固件下载ios 7及以上固件请在设备上点“信任”在哪点?畅想中国未来的中国是什么样子的
免费linux主机 网通vps 如何查询域名备案号 x3220 a2hosting kvmla bluehost 站群服务器 好看的留言 长沙服务器 免空 卡巴斯基免费试用 流媒体加速 shopex主机 无限流量 厦门电信 东莞idc vul 中国电信测速网站 东莞主机托管 更多