subgroupivybridge

ivybridge  时间:2021-03-28  阅读:()
FastSoftwareImplementationofBinaryEllipticCurveCryptographyManuelBluhm1andShayGueron2,31RuhrUniversityBochum,Germany2UniversityofHaifa,Israel3IntelCorporation,IsraelDevelopmentCenter,IsraelNovember10,2013Abstract.
ThispaperpresentsanecientandsidechannelprotectedsoftwareimplementationofpointmultiplicationforthestandardNISTandSECGbinaryellipticcurves.
TheenhancedperformanceisachievedbyimprovingtheL`opez-Dahab/Montgomerymethodatthealgorithmiclevel,andbyleveragingIntel'sAVXarchitectureandthepclmulqdqpro-cessorinstructionatthecodinglevel.
Thefastcarry-lessmultiplicationisfurtherusedtospeedupthereductiononthenewestHaswellplatforms.
FortheveNISTcurvesoverGF(2m)withm∈{163,233,283,409,571},theresultingpointmultiplicationimplementationisabout6to12timesfasterthanthatofOpenSSL-1.
0.
1e,enhancingtheECDHEandECDSAalgorithmssignicantly.
Keywords:ECC,binaryellipticcurves,pointmultiplication,pclmulqdq,AVX,ECDH,ECDSA,sidechannelprotection,NIST,L`opez-Dahab.
1IntroductionAllSSL/TLScommunicationsstartwithaclient-serverkeyestablishment,basedonapublickeycryptosystem.
TheclassicalprotocolusestheRSAalgorithmforencryption/decryptionofasessionkey,butitsuersfromthefollowingdraw-back:thecondentialityofthesessionsdependsdirectlyonthesecurityoftheserver'sprivateRSAkey.
Especiallymotivatedbyrecentdisclosures,thecurrenttrendsaretomovetoprotocolsproviding(perfect)forwardsecrecy,suchastheauthenticatedDie-HellmanEphemeral(DHE)keyexchange.
OneapproachistouseEllipticCurveCryptography(ECC)toecientlyperformthiskindofkeyexchange(ECDHE).
Similarly,ECCbasedsignatureschemes(ECDSA)becomeattractivebecauseoftheirrelativelyshortkeys.
Thus,softwareimplementationsofECCforthehighendserverplatformsbecomeatargetforoptimization.
ProtocolssuchasTLS1.
xsupportellipticcurvesoverbothprimeandbi-naryelds,andtheimportantcryptographiclibrariesincludeimplementationsforbothtypesofECC.
Forinstance,theOpenSSLlibrarythatunderliesthemodsslmoduleofApache,implementstheECDHE-ECDSA,ECDHE-RSAand1.
INTRODUCTIONotherkeyagreementprotocolsoverbinaryandprimeelds.
Itisinterestingtonotethatfromthehardwareimplementationcost,protocolsusingbinarycurvesarecheapertoimplementthanprimeeldcurves,whichcanbesignicantforlow-endclientdevices.
However,onthesoftwareimplementationside,binarycurveshavebeenlessecientthantheirprimecurvescounterpartsonhighendprocessors.
Thiswasmainlycausedbythefastintegermultiplicationunitscomparedtoslowcarry-lessmultiplication.
InOpenSSL-1.
0.
1eforexample,theprimecurvesecp256k1outperformesit'sbinarycounterpartsect283k1bysev-eralmagnitudes.
Thispaperfocusesonsoftwareoptimizationofpointmultiplicationoverbinaryellipticcurves,leveragingtheAVXinstructionsetandthenewlyop-timizedcarry-lessmultipliersonhighendx86-64platforms.
Targetingperfor-manceandsecuritybyfast,sidechannelprotectedcode,webuiltalibraryfortheunderlyingbinaryeldarithmeticforthecommoneldsspeciedbyNISTandSECG[17],coveringsecuritylevelsfrom80bitsupto256bits.
Table1showsthebinaryellipticcurvesoverGF(2m)(bothrandomandKoblitzcurves)thatareoptimizedinthispaper.
Itfurtherdisplaystheireldrepresentationandtheequivalentbitlevelsecurity,and(inparentheses)thecorrespondingRSAkeylength.
Ontheleftcolumn,"1"indicatesthattherespectivecurveisstan-dardizedbyNIST(theNationalInstituteofStandardsandTechnology);"2"indicatesacurvestandardizedbySECG(StandardsforEcientCryptographyGroup);"3"indicatesthatthecurveispartoftheWirelessTransportLayerSecurity(WTLS).
CurveidentierTypemBitlevelsecurity(Equiv.
RSAkeylength)FieldpolynomialforreductioninGF(2m)sect163k11,2,3Koblitz16380(1024)x163+x7+x6+x3+1sect163r12Randomsect163r21,2Randomsect193r12Random19396(1536)x193+x15+1sect193r22Randomsect233k11,2,3Koblitz233112(2240)x233+x74+1sect233r11,2,3Randomsect239k12Koblitz239115(2304)x239+x158+1sect283k11,2Koblitz283128(3456)x283+x12+x7+x5+1sect283k11,2Randomsect409k11,2Koblitz409192(7680)x409+x87+1sect409r11,2Randomsect571k11,2Koblitz571256(15360)x571+x10+x5+x2+1sect571r11,2RandomTable1:Targetedbinaryellipticcurvesandtheirsecuritylevel22.
THEIMPROVED2PALGORITHM2Theimproved2PalgorithmGivenanellipticcurveEwith|E|=|GE|·|HE|=n·h(here:h∈{2,4}),theessentialoperationofellipticcurvecryptosystemsismultiplyingapointP∈GEonthecurve,byapositiveintegerk∈[1,n],toobtainanewpointQ=k·P=P+P+.
.
.
+Pk-times=k1i=0P.
Montgomery[15]introducedafastapproachforthispointmultiplication.
Itusesthefactthatthesumoftwopointswhosedierenceisknown,canbeeval-uatedwithaprocedurethatinvolvesonlythex-coordinateofthesepointswhenperformingonepointdoubling,andonepointadditionperiteration.
LopezandDahab(LDhereafter)improvedthisalgorithmforbinarycurvesoverGF(2m),byeliminatingcostlyinversionoperationsintheeld(whengiveninpolynomialrepresentation).
Intheirso-called2P-Algorithm[14],thepointP=(x,y)isrepresentedintheLD-projectiveformP=(X/Z,Y/Z2).
Thex-coordinateofthesumP0+P1andof2PiisthefractionX/Z,where2Pi:P0+P1:X=X4i+b·Z4iX=x·Z+(Z0·X1)·(Z1·X0)Z=X2i·Z2iZ=(Z0·X1)+(Z1·X0)Thecomputationalcostoftheresultingalgorithmisdominatedbytheperfor-manceofthepointaddition(Madd)andpointdoubling(Mdouble),whichareexe-cutedlog2(k)timesinapointmultiplicationkP.
Here,wespeedupthepointmultiplicationbyreducingtheamountofcomputationsinMaddandMdouble.
2.
1ImprovedMdoubleByre-orderingtheowoftheMdouble,werstmultiplyZ2withtheprecom-putedvaluec=√b,thenaddtheresulttoX2,squarethisagainandobtaintheresultX←(X2+c·Z2)2=X4+b·Z4.
Thissavesonesquaringandonereductionoperationperrun.
Algorithm2showstheimprovedMdoubleow.
ForKoblitzcurves,wehaveb=1andthereforec=b2m1=1,soanothermultiplicationandreductioncanbesaved,asshowninAlgorithm3.
2.
2ImprovedMaddTheMaddfunctioncanbeimprovedbydeferringreductionoperations(wecallitlazyreduction,after[21]).
Algorithm4showstheimprovedMaddow,wherewedeferthereductionafterSteps4and8,andreducetheresultonlyinStep9.
Therationaleisthatdoublesizedadditionischeaperthanareduction(theactualgaindependsonthespeciceld).
33.
SIDECHANNELPROTECTEDPOINTMULTIPLICATION2.
3EstimatingtheimpactoftheimprovedMdoubleandMaddTable2comparesthenumberoftheeldoperationsinAlgorithms2,3,and4,comparedtotheiroriginalform.
Forsimplicity,wecountthereductionasaeldoperationhere.
FieldOperationsMdoubleKoblitzMdoubleRandomMdoubleOriginalMaddImprovedMaddOriginal#MUL1log2(k)2log2(k)2log2(k)4log2(k)4log2(k)#RED4log2(k)5log2(k)6log2(k)4log2(k)5log2(k)#SQR3log2(k)3log2(k)4log2(k)log2(k)log2(k)#ADDlog2(k)log2(k)log2(k)3log2(k)2log2(k)Table2:Thenumberofeldoperationsforpointmultiplication,withtheoriginalandtheimprovedMdoubleandMaddalgorithms.
3SidechannelprotectedpointmultiplicationThissectiondiscussesthepotentialthreatandcountermeasuresofsoftwaresidechannelattacks,whereanadversaryattemptstocollectinformationaboutthesecretkeysthatarebeingused.
Today,itisknownthatsoftwareimplementationsofcryptographicprimitivesneedtotakesuchsidechannelattacksintoaccount,andaddappropriatecountermeasures.
3.
1ProtectionrequirementsAswithothercryptographicschemes,implementationsofECChavebeenshowntobesusceptibletosoftwaresidechannelattacks.
Eventheapplicabilityofre-motetimingattacksonaECCimplementationhasbeendemonstrated[7].
WeconcentratehereonprotectingthesoftwareimplementationoftheLD/Mont-gomerypointmultiplication,withoutcoveringthehigherimplementationpartsofthevariousprotocolslikeECDHEorECDSA.
Timingattacksexploittheoccurrenceofvariantexecutiontimesfordierentinputs.
Thiskindofattackcanbemountedwheneveracorrelationbetweenthekeyandtheexecutiontimeisfound.
Further,wetakecachebasedtimingattacksintoaccount,wheretheadversaryisassumedtobeabletorunanunprivilegedprocesstomeasureinstructionanddatacachelatencies.
Thus,keydependentbranchesandtableaccessesshouldbeavoidedormasked.
ToavoidtimingattacksintheLD/Montgomerypointmultiplicationonemustensurethat1.
allinputs(k,P)tothepointmultiplicationfunctionarevalid,meaningIntegerkisintheinterval[1,n]withn=|P|khasconstantbitlength(forthesamegroup)Pisavalidpointonthecurve.
2.
thepointmultiplicationanditssub-functionsareexecutedinconstanttime,3.
nokeydependentcodebranchesordataaccessesexist.
43.
SIDECHANNELPROTECTEDPOINTMULTIPLICATION3.
2ProtectionstrategyThelargevarietyofattacksmakesitverydiculttoprotectanimplementationagainstallpossiblesidechannelthreats.
Inthefollowing,wewillshowthemech-anismstopreventthepreviouslymentionedthreatsforserverimplementations.
ConstantkeylengthisarststeptoassertaconstantrunningtimeofthepointmultiplicationkP.
Although,theinputvalidationofthescalarkandthepointPshouldbeperformedpriortothepointmultiplication,weprovideabitlengthxforallvalidk∈[1,n]withnorderofthelargesubgroupGEofE.
Atrst,wealwaysaddntok,sothatk=k+nandkP=kP.
Thebitlengthofkisthenlog2(n)≤log2(k)≤log2(n)+1,sincetheresultofanadditionoftwointegersnandk≤nisnotlongerthanlog2(n)+1bits.
Tomakethebitlengthxedforallcasesofk,onecanrepeattheadditionofn,ifandonlyifthebitatpositionlog2(n)+1isNOTset.
Toensurethattheimplementationdoesnotincludeanybranches,oneshouldcreateamaskinasimilarwayassuggestedinAlgorithm1andthenaddthemaskedvalue,whichiseithertheordernoranequivalentsizednumberwithzerovalue.
Thereafter,thebitlengthofkisxedtolog2(n).
Onehastotakeintoaccountthatkneedstobeavalidinputwithk∈[1,n]inordertoassertthexedbitlength,whichisexpectedtobeevaluatedoutsidethepointmultiplication.
Additionally,thiscountermeasureeventuallyaddsvulnerabilitytoCarry-basedattacks[8],attackingnotthepointmultiplicationbutthisparticularcountermeasureitself.
However,consideringourattackermodelthisisnoreasonablethreat.
Constantmemoryaccesspatternandtheeliminationofkeydependentbranchesarethemaincountermeasureagainstcachebasedsidechannelattacksinourimplementation.
Sincecacheattacksaremostlikelywithourattackermodel,wewanttoassurenotonlyaxedmemoryaccesspatternbutalsoavoidanykeydependentbranch.
Therefore,wesuggestalgorithm1toveilthedataaccess.
Lettx1,tx2,tz1andtz2bexedsizetemporaryvariablesforaGF(2m)andkithecurrentbitatpositioni.
Table3showsthetransitionofthecoordinatesinvolvedinthedouble/addprocessinsidetheLD/Montgomerypointmultipli-cation'skeyevaluationloop.
Fortheproposeddataveilingmethod,amaskingwordt0iscreatedwitheitherallbitssettozero,orallbitssettoone,dependingonthevalueofki.
Thiswordisthenusedtocreatethemaskforeldelementsinstep2,whichinturnissucces-sivelyappliedwithANDandNANDoperationstothetwopossiblevalues,satisfyingtherulesinTable3.
Thistransitionisself-inverseandneedstobere-appliedaftertheexecutionofMaddandMdouble.
ki=0ki=1tx1x2x1tx2x1x2tz1z2z1tz2z1z2Table3:X/ZTransformation54.
EFFICIENTIMPLEMENTATIONOFBINARYFIELDARITHMETICSAlgorithm1:ProposeddataveilingmethodInput:Keybitki,ECcoordinatesx1,x2,z1,z2∈GF(2m)Output:Temporarycoordinatestx1,tx2,tz1,tz2∈GF(2m)1t0←(0x00.
.
.
0Wbitwordki);2for(j=0;j63,wemustmultiplyeachwordmorethanonce,particularlys=k64timesperwordandwethereforerequiret·smultiplicationsoverall.
Inordertosparetheadditionalreductionstep,asdescribedabove,themulti-plicationsareexecutedfromthelefttotherightandtheresultsareimmediatelyaddedtotheremainder.
Duetothefactthatthebitsofc(x)areresultsfromeithermultiplicationorsquaringandarestoredintoaseriesofwordswheretheborderbetweencL(x)andcH(x)atpositionm1isrightinthemiddleofa64bitblock,eachmultiplicationresultmustbealignedinordertobeaddedtotheremainder.
Therefore,wecaneitherinitiallyshiftthepolynomialr(x)totheleftbyl=mmod64bits(ifk+l<64stillholds),orwecanshifttheinputwordslbitstotheright,or64lbitstotheleft.
Thereby,theoperandneedstobesavedtemporarilyasthereductionisperformedin-place,requiringmoretemporaryvariables.
Shiftingtheoperandespeciallymakessenseifthedatain95.
RESULTScH(x)overlapsintoanadditionalregister,thusrequiringt+1multiplications.
ForexamplethereductioninGF(2163),thebitsofcH(x)=325i=163cixiarespreadoverfour64bitwords.
Forourtargetcurves,theoperandshiftingre-quirestomucheortscomparedtoasinglemultiplicationonHaswell,whereasontheSandy/IvyBridgethisshiftingpayso.
Obviously,theproposedtech-niquemakesmoresensethegreaterthehammingweightofr(x)is,andifallthebitsinr(x)areinone64bitword(s=1).
Fortunately,thisisthecasefortheNISTcurvesoverGF(2163),GF(2283)andGF(2571).
Asamatteroffact,theperformanceofthepclmulqdqreductionstronglydependsontheunderlyingarchitecture.
Thisreductionschemestartstopayowhenthemultiplierperformstwomultiplicationsfasterthantheaccordingshiftsandadditions.
Letthereductionpolynomialbepentanomialwithnospe-cialpropertiesasfastshiftsorotherspecialproperties.
Forthe"shift"stepoftheshift&addreduction,therequiredamountofoperationsforeach128bitwordis4*2SHIFT's(psllq/psrlq)plus3*2XOR's.
Requiringthesameamountofoperationsforthe"add"step,themultipliermustbeabletoperform2mul-tiplicationsfasterthanthisinordertobeattheshift&addimplementationforpentanomialreductionpolynomials.
OntheSandy/IvyBridge,thiscannotbeperformedfasterthan10cycles(usuallymore),whereasone64bitmultiplica-tionhasalatencyof14cycles.
Thisexplainswhythisreductionschemedoesnotpayoonthesearchitectures.
OnHaswellhowever,withalatencyof7andathroughputofonly2cycles,thiscanbeperformedin9cycles.
Inmanycases,onecanmakeadvantageofthesmallthroughputbyfetchingmultiplemultiplicationsatoncetogainevenbetterperformance.
5ResultsThissectionpresentstheresultsforthebinaryeldarithmeticandshowstheimpactoftheimprovedLD/MontgomerypointmultiplicationandtheresultingperformancegainforbothECDHandECDSAoftheOpenSSLimplementa-tion.
Thecorrespondingtablescanbefoundintheappendix.
Sinceourimple-mentationissuitableforseveralarchitectures,weprovidedataforaHaswellCorei7-4770CPUat3.
40GHz(HSW)andaCorei5-3210MIvyBridge(IVB),runningwith2.
50GHz.
TestsonaSandyBridgeprocessorhaveproducedverysimilarrelativeresultsasontheIvyBridge.
Toreducerandomness,wefollowedtheguidelinesfromECRYPTBenchmarkingofCryptographicSystems(eBACS)[20].
Thecodewascompiledwiththelatestgccinversion4.
8.
1.
AstoolsformeasurementweusedtheOpenSSLbuilt-inspeedtoolaswellasacyclecounterusingtheRDTSCinstruction,similarto[10],withslightlydierentrepetitionfactors.
TheOpenSSLspeedutilitycanbeusedtoreproduceourresultsafterapplyingthepatchandindicatesthepatch'simpacttotheoverallperformanceofOpenSSLECDHandECDSAoperations.
105.
RESULTSBinaryeldarithmeticcostsareshowninTable5.
TheresultsindicatethebenetsofthefastpclmulqdqonHaswellandtheperformancedierenceofsquaringandmultiplication.
ReductioncostscanbefoundinTable6onthedierentarchitectures.
Itshowstheperformancedierenceoftrinomialsandpentanomialsaswellastheperformancelossduetothe64bitmodeinGF(2239).
Furthermore,weseethemul&addreductionoutperformingtheshift&addreductionwithfactor1.
25inGF(2163)andabout1.
62inGF(2283)and1.
61inGF(2571)onHaswell.
However,theshift&addimplementationdoesnotuseAVX2features,whichcouldeventuallyallowimprovements.
Thecomputationalcostsforsidechannelprotectionbyhidingthecoordinatesinordertoassureaxedmemoryaccesspatternperroundarealsonotneclectable.
AsingledataveilingrequirestwelvelogicaloperationsatthecostofanXOR.
Additionally,onerequiresfourtemporaryvariablesinordertohidethedataprocessingforallfourcoordinatesinvolvedinthepointmultiplication.
Thishasofcoursesomeperformanceimpacttothepointmultiplicationprocess,asthisperformanceoverheadappliesforeachloopexecution.
Thepointmultiplication'simportancehasbeenpointedoutseveraltimesbynow.
Table7showstheresultsfork·Pincycles.
Wendthattheamountofcycleshasbeensubstantiallydiminished,uptoafactorofalmost12.
Also,weobservetheimpactoftheimprovedcodeowfortheKoblitzratherthanrandomcurves,whichisnoticeableinbothabsolutenumbersandtherelativespeedup.
Hereby,wehavetomentionthatthenumbersfortheunpatchedOpenSSL-1.
0.
1earenotconstant,sinceitisnotaconstanttimeimplementation,especiallythebitlengthofkisnotxed.
Therefore,thespeedupfactorisalsonotconsistentandwilldierfromruntorun.
EllipticcurveDie-HellmanoperationsarepresentedinTable11andFigure2,measuredwiththeOpenSSLspeedtoolandthereforethecurveiden-tiershavebeenchangedtonistkforKoblitzandnistbforrandomcurves.
SincetheECDHisbasi-callyapointmultiplication,thespeedupisveryhigh.
Primecurvesarecurrentlymorefrequentlyusedinserverimplementations.
AsdemonstratedinTable8,ourimprovementstothepointmultiplicationofthebinarycurvesresultinabigperformancelead.
Fig.
2:ECDH-operationspersecond.
OpenSSL(withenable-ecnistp64gcc128ag)versusproposedimplementationontheHaswellprocessor116.
COMPARISONTOOTHERWORKSEllipticcurvesignatureandvericationalgorithmshaveconsiderablymoreoverheadthantheECDH.
However,asshowninTables9and10,weperceiveanaveragespeedupoffactor4.
9(sign)and6.
2(verify)onIvyBridgeaswellas4.
5and6.
3onHaswell,respectively.
SincethebitlengthfortheECDSAinOpenSSLhasbeenxedbeforethepointmultiplicationinresponsetoatimingattack[7],theresultingscalarkisnotintheinterval[1,n]anymoreandiseitheroneortwobitslonger(butconstantforaspeciceld)thantheoriginalscalar(dependingontheformofthecorrespondingcurveordern),whichslightlyincreasestheamountofcyclesforthepatchedversion.
6ComparisontootherworksMostworksabouttheimplementationofbinaryellipticcurveswiththeuseofvectorinstructionsareaimingtobreakspeedrecordsandthusoftenexploital-gebraicproperties.
ThisworkaimsforprovisionofagenericpointmultiplicationforthewellknownNIST/SECGcurvesoverthebinaryeld,withoutanycom-promisesinregardtosecurity.
Thisworkdoesnotuseprecomputationschemes(suchaswNAF,τNAF),pointhalving,specialeldswithbenecialproperties(suchasFq2),orotheralgebraicproperties(e.
g.
,Forbeniusmorphism).
InTable4weshowresultsforimplementationsofrandompointmultiplica-tionforbinaryellipticcurvesinNIST/SECelds.
Allnumbersaregivenin103cyclesonasinglecore.
WorkSCP1CPUMethod2233228324092571Aranhaetal.
[2]noCorei7-8604-TNAF-386-1656Taverneetal.
[19]noCorei7(SNB)5-τNAF,τ&add068-264-Aranhaetal.
[3]noCorei7(SNB)5-τNAF,τ283/2-099--ThisworkyesCorei5(IVB)LD-Montgomery1282175111095KoblitzyesCorei7(HSW)LD-Montgomery81118286566Aranhaetal.
[2]noCorei7-860LD-Montgomery-793-4440Taverneetal.
[19]noCorei7(SNB)4-ωNAF180-738-ThisworkyesCorei5(IVB)LD-Montgomery1482545991271RandomyesCorei7(HSW)LD-Montgomery91135325650Table4:UnknownPointMultiplicationoverNISTCurves(Koblitz)Thisworkyieldsverygoodresultsforrandomcurvesandwinsagainstallcur-rentstate-of-the-artimplementationsovertheNISTelds,evenwiththecostlysidechannelcountermeasures.
SinceallpreviousworkshavebeenimplementedontheSandy/IvyBridge,wecomparewiththeIvyBridgeimplementationhere.
FortherandomcurveovertheGF(2233)NISTeld,ourimplementationisaboutfactor1.
22(1.
23inGF(2409))fasterthan[19]andevenfactor3(GF(2283))and3.
5(GF(2571))fasterthanreportedin[2]forasinglecore.
AlthoughourIvyBridgeimplementationforKoblitzcurvesbeatsthenumberspresentedin[2]by1.
77xand1.
51x,[19]achieveresultswhicharefactor1.
53faster,whilst[3]isevenastwiceasfast.
1SideChannelProtection127.
CONCLUSIONForfurthercomparison,wegivethelatestnumbersofprotected(asclaimedbytheauthors)pointmultiplicationimplementationsatthe128bitsecuritylevel,usingcurveswithspecialalgebraicproperties.
Bosetal.
[6]report117,000cyclesforaprotectedpointmultiplicationonaKummersurfacewiththeMont-gomeryladderand[16]reports115,000cyclesonaGLScurveusing2-GLV(double&add)onasinglecore.
Atthesamesecuritylevel,thisworkachievesapointmultiplicationin118,000cycleswiththeOpenSSLpatch[5]ontheHaswellprocessor,withoutusingspecialelds,exploitingalgebraicpropertiesorprecom-putationtechniques.
Atthe256bitsecuritylevel,thisworkachievesaECpointmultiplicationinabout566,000(Koblitz)and650,000(random)cycles.
7ConclusionInthiswork,weproposedafast,constanttimeimplementationofthepointmultiplicationonbinaryellipticcurvesatasecuritylevelof80bitsandgreater,standardizedbyNISTandSECG.
Weanalyzedmajorimplementationthreatsforserverenvironmentsandse-curedtheimplementationbyapplyingcountermeasuresagainstcommonanddangeroussidechannelthreatssuchascacheandremotetimingattacks.
Overthelastyears,severalpublicationshavepointedoutthatthesesidechannelat-tacksaremorethanatheoreticalbutaseriousmenace.
Toaddressthisissue,weprovideaconstanttimeimplementationoftheellipticcurvepointmultipli-cationandbinaryeldarithmetic,withoutkeydependentbranchesandaxedmemoryaccesspattern.
Additionally,weintroducedanimprovedversionforthewellknownLD/-Montgomerymultiplicationwithenhancementsonthearithmeticlevel,providedapatchfortheOpenSSLlibraryandshoweditssignicantperformanceimpact.
Thisimprovementisavailableonall64bitarchitecturesimplementingatleastSSE4(better:AVX)andthepclmulqdqinstruction.
TheimplementationhasbeencontributedtoOpenSSLaspatchforversion1.
0.
1eandisavailablefordownload[5].
Wefurthershowedthatthisimpactisnotlimitedtothepointmultiplica-tionitselfbutimprovestheserversperformanceregardingECDHoperationsandsignaturebasedcryptography.
ThisfurtherleadstoamajorperformanceimprovementforserversandclientsconductingTLShandshakes.
Thisisanes-sentialenhancementespeciallyforserverenvironments,whichareoftenrequiredtoconductabignumberofhandshakessimultaneously.
Also,ourresultsmightindicateadierentviewtothebinaryECConserverarchitectures.
13References1.
D.
Aranha,A.
Faz-Hernndez,J.
Lpez,andF.
Rodrguez-Henrquez,Fasterimplementationofscalarmultiplicationonkoblitzcurves,inProgressinCryptologyLATINCRYPT2012,A.
HeviaandG.
Neven,eds.
,vol.
7533ofLectureNotesinComputerScience,SpringerBerlinHeidelberg,2012,pp.
177–193.
2.
D.
Aranha,J.
Lpez,andD.
Hankerson,Ecientsoftwareimplementationofbinaryeldarithmeticusingvectorinstructionsets,inProgressinCryptologyLATINCRYPT2010,M.
AbdallaandP.
Barreto,eds.
,vol.
6212ofLectureNotesinComputerScience,SpringerBerlinHeidelberg,2010,pp.
144–161.
3.
D.
F.
Aranha,A.
Faz-Hernndez,J.
Lpez,andF.
Rodrguez-Henrquez,Fasterimplementationofscalarmultiplicationonkoblitzcurves.
CryptologyePrintArchive,Report2012/519,2012.
http://eprint.
iacr.
org/.
4.
D.
F.
Aranha,J.
Lopez,andD.
Hankerson,EcientSoftwareImplementationofBinaryFieldArithmeticUsingVectorInstructionSets,inTheFirstInterna-tionalConferenceonCryptologyandInformationSecurity(LATINCRYPT2010),M.
AbdallaandP.
S.
L.
M.
Barreto,eds.
,vol.
6212ofLNCS,2010,pp.
144–161.
5.
M.
BluhmandS.
Gueron,Afastvectorizedimplementationofbinaryellip-ticcurvesonx86-64processors,2013.
http://rt.
openssl.
org/Ticket/Display.
htmlid=3117.
6.
J.
W.
Bos,C.
Costello,H.
Hisil,andK.
Lauter,Fastcryptographyingenus2.
CryptologyePrintArchive,Report2012/670,2012.
http://eprint.
iacr.
org/.
7.
B.
B.
BrumleyandN.
Tuveri,Remotetimingattacksarestillpractical,inProceedingsofthe16thEuropeanconferenceonResearchincomputersecurity,ESORICS'11,Berlin,Heidelberg,2011,Springer-Verlag,pp.
355–371.
8.
P.
-A.
Fouque,D.
Ral,F.
Valette,andM.
Drissi,Thecarryleakageontherandomizedexponentcountermeasure,inCryptographicHardwareandEmbeddedSystemsCHES2008,E.
OswaldandP.
Rohatgi,eds.
,vol.
5154ofLectureNotesinComputerScience,SpringerBerlinHeidelberg,2008,pp.
198–213.
9.
S.
GueronandM.
Kounavis,Intelcarry-lessmultiplicationinstructionanditsusageforcomputingthegcmmode,April2008.
http://software.
intel.
com/sites/default/files/article/165685/clmul-wp-rev-2.
01-2012-09-21.
pdf.
10.
S.
GueronandV.
Krasnov,Parallelizingmessageschedulestoacceleratethecomputationsofhashfunctions,JournalofCryptographicEngineering,2(2012),pp.
241–253.
11.
T.
ItohandS.
Tsujii,Afastalgorithmforcomputingmultiplicativeinversesingf(2m)usingnormalbases,Inf.
Comput.
,78(1988),pp.
171–177.
12.
M.
Knezevic,K.
Sakiyama,J.
Fan,andI.
Verbauwhede,Modularreductioningf(2n)withoutpre-computationalphase,inWAIFI,2008,pp.
77–87.
13.
A.
O.
KrzysztofJankowski,PierreLaurent,Intelpolynomialmultiplicationinstructionanditsusageforellipticcurvecryptography,2012.
14.
J.
LpezandR.
Dahab,Fastmultiplicationonellipticcurvesovergf(2m)withoutprecomputation,inCryptographicHardwareandEmbeddedSystems,.
KoandC.
Paar,eds.
,vol.
1717ofLectureNotesinComputerScience,SpringerBerlinHeidelberg,1999,pp.
316–327.
15.
P.
L.
Montgomery,SpeedingthePollardandEllipticCurveMethodsofFactor-ization,MathematicsofComputation,48(1987),pp.
243–264.
16.
T.
Oliveira,J.
Lpez,D.
F.
Aranha,andF.
Rodrguez-Henrquez,Twoisthefastestprime.
CryptologyePrintArchive,Report2013/131,2013.
http://eprint.
iacr.
org/.
17.
StandardsforEfficientCryptographyGroup,SEC2:RecommendedEl-lipticCurveDomainParameters,inStandardsforEcientCryptography,2000.
18.
C.
SuandH.
Fan,Impactofintel'snewinstructionsetsonsoftwareimplemen-tationofgf(2)[x]multiplication,Inf.
Process.
Lett.
,112(2012),pp.
497–502.
19.
J.
Taverne,A.
Faz-Hernndez,D.
Aranha,F.
Rodrguez-Henrquez,D.
Han-kerson,andJ.
Lpez,Speedingscalarmultiplicationoverbinaryellipticcurvesusingthenewcarry-lessmultiplicationinstruction,JournalofCryptographicEn-gineering,1(2011),pp.
187–199.
20.
E.
VAMPIRE,Ecryptbenchmarkingofcryptographicsystems.
http://bench.
cr.
yp.
to/supercop.
html,2013.
21.
D.
WeberandT.
Denny,Thesolutionofmccurley'sdiscretelogchallenge,inAdvancesinCryptologyCRYPTO'98,H.
Krawczyk,ed.
,vol.
1462ofLectureNotesinComputerScience,SpringerBerlinHeidelberg,1998,pp.
458–471.
A.
1SpeedresultsFieldSquaringMultiplicationInversionHSWIVBHSWIVBHSWIVBGF(2163)212134714,2714,811GF(2193)171937984,1224,788GF(2233)182138876,0746,956GF(2239)49446710015,34914,777GF(2283)27305312111,6889,677GF(2409)28319722715,18215,648GF(2571)616815833537,11842,595Table5:CyclesforarithmeticoperationsinGF(2m)(withoutaddition)FieldShift&Add(Haswell)Mul&Add(Haswell)Shift&Add(IvyBridge)Mul&Add(IvyBridge)GF(2163)15121831GF(2193)11-13-GF(2233)12-15-GF(2239)42-41-GF(2283)21132557GF(2409)15-20-GF(2571)37234881Table6:ComparisonofthetworeductionschemesonHaswellandIvyBridgeBinaryCurveOpenSSL-1.
0.
1eOpenSSLpatchSpeedupfactorIvyBridgeHaswellIvyBridgeHaswellIvyBridgeHaswellsect163k1sect163r1sect163r2sect193r1sect193r2sect233k1sect233r1sect239k1sect283k1sect283r1sect409k1sect409r1sect571k1sect571r1624,949643,236646,374801,932818,985993,1951,030,0431,027,5411,892,0162,019,1343,315,8953,655,2797,659,2888,432,393543,097563,928567,854572,307567,929681,347712,864718,156125,87111,341,3661,984,8792,142,0644,764,8705,222,55966,37876,53477,238121,127121,163128,474148,230185,629217,192254,163510,787599,0021,094,7651,270,66645,64249,86950,12773,29673,44281,13391,372144,816118,120134,571286,090325,279566,257659,5289.
428.
408.
376.
626.
767.
736.
955.
548.
717.
946.
496.
107.
006.
6411.
9011.
3111.
337.
817.
738.
407.
804.
9610.
669.
976.
946.
598.
417.
92Table7:PointmultiplicationresultsincomparisontoOpenSSL-1.
0.
1eincyclesCurveIdentierPatchedOpenSSL-1.
0.
1eSpeedupfactorIvyBridgeHaswellIvyBridgeHaswellsecp160r1nistk163nistb163nistp224nistk233nistb233nistp256nistk283nistb283nistp384nistk409nistb409nistp521nistk571nistb5714444.
327837.
024043.
57573.
114946.
413057.
03891.
59026.
57754.
41051.
93879.
53319.
6971.
01822.
31565.
07391.
567212.
461667.
811993.
839102.
235246.
46489.
027586.
524320.
71848.
511611.
210238.
11682.
85941.
85158.
8-6.
265.
41-1.
971.
72-2.
321.
99-3.
693.
16-1.
881.
61-9.
098.
34-3.
262.
94-4.
253.
75-6.
285.
54-3.
533.
07Table8:ComparisonofECDHoperationspersecondforbinaryandprimeel-lipticcurvesonIvyBridgeandHaswellBinaryCurveOpenSSL-1.
0.
1ePatchedOpenSSLSpeedupFactorSignVerifySignVerifySignVerifyIvyBridgesect163k1sect163r1sect163r2sect193r1sect193r2sect233k1sect233r1sect239k1sect283k1sect283r1sect409k1sect409r1sect571k1sect571r1929,0811,667,026955,1871,743,232959,1191,734,0781,004,3621,806,7731,007,3171,816,4761,161,8522,125,3981,197,6832,226,9271,194,3242,210,1812,097,2574,006,0912,227,0014,241,6173,505,2426,801,7553,776,6327,322,0938,018,93015,635,2288,745,81317,133,137181,832215,389190,778238,232193,299236,150248,191335,012248,683334,215262,615362,315289,039405,330321,347482,566366,470557,568404,091628,606702,7671,181,176793,0731,359,8161,365,8812,461,4851,542,3002,816,4025.
117.
745.
017.
324.
967.
344.
055.
394.
055.
444.
425.
874.
145.
493.
724.
585.
727.
185.
516.
754.
995.
764.
765.
385.
876.
355.
676.
08Haswellsect163k1sect163r1sect163r2sect193r1sect193r2sect233k1sect233r1sect239k1sect283k1sect283r1sect409k1sect409r1sect571k1sect571r1680,0851,215,819698,4301,256,375702,5851,246,757730,8941,302,676732,1721,306,010841,0661,530,004875,7371,585,333869,9891,582,7031,440,7712,724,1991,517,9522,882,3402,201,7474,230,8562,353,7684,524,2175,091,1939,885,9735,478,25810,727,383157,761173,166160,901180,300163,610182,937198,160234,577197,911236,197213,628264,756223,396285,478278,374402,386266,409355,636283,443386,729480,983738,817524,003822,021833,5671,384,070914,4641,552,6404.
317.
024.
346.
974.
296.
823.
695.
553.
705.
533.
945.
783.
925.
553.
133.
935.
417.
665.
367.
454.
585.
734.
495.
506.
117.
145.
996.
91Table9:ECDSAsignandverifycyclesfortheNISTcurvesonIvyBridgeandHaswellBinaryCurveOpenSSL-1.
0.
1eOpenSSLpatchSpeedupfactorIvyBridgeHaswellIvyBridgeHaswellIvyBridgeHaswellSIGNnistk163nistk233nistk283nistk409nistk571nistb163nistb233nistb283nistb409nistb5713,749.
91,881.
71,267.
5542.
2257.
63,766.
51,893.
11,265.
7539.
3257.
26,465.
33,259.
22,204.
7977.
0466.
46,487.
33,279.
22,196.
4976.
3466.
617,721.
810,359.
06,688.
93,140.
91,556.
016,203.
59,386.
55,962.
32,763.
41,354.
836,872.
622,998.
416,884.
98,150.
04,424.
135,110.
021,468.
815,602.
77,423.
13,977.
04.
735.
515.
285.
796.
044.
304.
964.
715.
125.
275.
707.
067.
668.
349.
495.
416.
557.
107.
608.
52VERIFYnistk163nistk233nistk283nistk409nistk571nistb163nistb233nistb283nistb409nistb5711,578.
61,211.
6639.
3361.
9159.
91,514.
51,150.
4594.
2344.
2145.
73,159.
52,419.
81,355.
7839.
1368.
33,043.
92,348.
01,283.
5786.
9341.
011,688.
16,439.
43,951.
11,757.
1834.
610,453.
85,711.
93,445.
51,522.
4724.
926,508.
415,557.
111,003.
24,845.
02,533.
624,904.
814,095.
69,888.
54,361.
92,251.
67.
405.
316.
184.
865.
226.
904.
975.
804.
424.
988.
396.
438.
125.
776.
888.
186.
007.
705.
546.
60Table10:OpenSSLECDSAsignandvericationoperationspersecondfortheNISTcurvesonIvyBridgeandHaswellBinaryCurveOpenSSL-1.
0.
1eOpenSSLpatchSpeedupfactorIvyBridgeHaswellIvyBridgeHaswellIvyBridgeHaswellnistk163nistk233nistk283nistk409nistk571nistb163nistb233nistb283nistb409nistb5714,129.
02,628.
41,338.
5836.
3406.
53,217.
62,443.
51,244.
3723.
4368.
36,586.
95,121.
92,825.
71,745.
8763.
26,382.
54,881.
92,651.
61,640.
3693.
834,739.
017,013.
49,017.
63,884.
01,824.
124,237.
213,114.
37,711.
63,382.
31,943.
567,029.
639,441.
327,718.
511,634.
25,930.
960,729.
635,230.
424,456.
410,228.
65,172.
18.
416.
476.
744.
644.
497.
535.
376.
204.
685.
2810.
187.
709.
816.
667.
779.
527.
229.
226.
247.
45Table11:OpenSSLECDHoperationspersecondovertheNISTcurvesA.
2New2P-AlgorithmsubfunctionsAlgorithm2:PointDoublingalgorithmforrandomcurvesInput:c∈GF(2m)wherec2=b,x-coordinateX/ZforapointPOutput:x-coordinateX/Zforthepoint2P1t←c;2X←X2modf(x);3Z←Z2modf(x);4t←Z*tmodf(x);5Z←Z*Xmodf(x);6X←X+t;7X←X2modf(x);8returnX,ZAlgorithm3:PointDoublingalgorithmforKoblitzcurvesInput:x-coordinateX/ZforapointPOutput:x-coordinateX/Zforthepoint2P1X←X2modf(x);2Z←Z2modf(x);3t←X+Z;4Z←Z*Xmodf(x);5X←t2modf(x);6returnX,ZAlgorithm4:AlgorithmforPointAdditionInput:x-coordinatesX/ZofthepointP(x,y),P0(X0,Z0),P1(X1,Z1)Output:x-coordinateX1/Z1forthepointP0+P11t0←X1;2X0←X0*Z0modf(x);3Z0←Z0*t0modf(x);4t1←X0*Z0;//mult.
w/oreduction,resultdoublesized5Z0←Z0+X0;6Z0←Z20modf(x);7t0←x;8t2←Z0*t0;//mult.
w/oreduction,resultdoublesized9t2←t2+t1;//doublesizedaddition10X0←t2modf(x);//finalreduction11returnX0,Z0

触碰云高性价20.8元/月,香港云服务器,美国cn2/香港cn2线路,4核4G15M仅115.2元/月起

触碰云怎么样?触碰云是一家成立于2019年的商家。触碰云主营香港/美国 VPS服务器、独立服务器以及免备案CDN。采用的是kvm虚拟构架,硬盘Raid10,Cn2线路,去程电信CN2、移动联通直连,回程三网CN2。最低1核1G带宽1M仅20.8元/月,不过这里推荐香港4核4G15M,香港cn2 gia线路云服务器,仅115.2元/月起,性价比还是不错的。点击进入:触碰云官方网站地址触碰云优惠码:优...

spinservers($179/月),1Gbps不限流量服务器,双E5-2630Lv3/64GB/1.6T SSD/圣何塞机房

中秋节快到了,spinservers针对中国用户准备了几款圣何塞机房特别独立服务器,大家知道这家服务器都是高配,这次推出的机器除了配置高以外,默认1Gbps不限制流量,解除了常规机器10TB/月的流量限制,价格每月179美元起,机器自动化上架,一般30分钟内,有基本自助管理功能,带IPMI,支持安装Windows或者Linux操作系统。配置一 $179/月CPU:Dual Intel Xeon E...

hypervmart:英国/荷兰vps,2核/3GB内存/25GB NVMe空间/不限流量/1Gbps端口/Hyper-V,$10.97/季

hypervmart怎么样?hypervmart是一家国外主机商,成立于2011年,提供虚拟主机、VPS等,vps基于Hyper-V 2012 R2,宣称不超售,支持linux和windows,有荷兰和英国2个数据中心,特色是1Gbps带宽、不限流量。现在配置提高,价格不变,性价比提高了很多。(数据中心不太清楚,按以前的记录,应该是欧洲),支持Paypal付款。点击进入:hypervmart官方网...

ivybridge为你推荐
摩根币摩根币是传销吗75ff.com开机出现www.ami.com是什么?怎么解决啊比肩工场大运比肩主事,运行长生地是什么意思?rawtools闪迪32Gsd卡,无法格式化,显示只有30M,并且是raw格式。如何恢复?同一ip网站同IP的网站互相链接会被K吗?haole018.com为啥进WWWhaole001)COM怎么提示域名出错?囡道是haole001换地了吗789se.comwuwu8.com这个站长是谁?haole10.com空人电影网改网址了?www.10yyy.cn是空人电影网么www.5any.com我想去重庆上大学www.hyyan.comdota屠夫怎么玩?从初期到后期的装备是什么?
天津虚拟主机 东莞虚拟主机 美国vps推荐 华为云服务 liquidweb 外贸主机 好玩的桌面 服务器干什么用的 美国独立日 跟踪路由命令 免费邮件服务器 net空间 金主 windows2008 godaddy退款 美国西雅图独立 nano 赵荣博客 g6950 ddos攻击工具 更多