Studies on Modular Arithmetic Hardware Algorithms for Public-key Cryptography Marcelo Emilio Kaihara Graduate School of Information Science Nagoya University January 2006 Dedicated to myfather. iii Abstract Public-key cryptography plays an important role in digital communication and storage systems. Processing public-key cryptosystems requires huge amount of computation, and, there is therefore, a great demand for developing dedicated hardware to speed up the computations. In this thesis, we focus on modular arithmetic hardware algorithms for public-key cryptosystem since these two operations are the computationally most intensiveparts in encryptionand decryptionprocesses. After reviewing major algorithms for computing modular multiplication and divi- sion in Chapter 2, we present in Chapter 3, a mixed radix-4=2 algorithm for modular multiplication/division suitable for VLSI implementation. The hardware algorithm is based on the Montgomery multiplication algorithmfor modular multiplication and the Extended Binary GCD algorithm for modular division. These two algorithms are com- bined into the proposed algorithm in order to share hardware components. The new algorithm carries out both calculations using simple operations such as shifts, addi- tions and subtractions. The radix-2 signed-digit representation is used to avoid carry propagation in all additions and subtractions. A modular multiplier/divider based on the algorithm performs an n-bit modular multiplication/division in O(n) clock cycles where the length of the clock cycle is constant and independent of n. A modular multi- plier/divider based on this hardware algorithm has a linear array structure with a bit- v slice feature and can be implemented with much smaller hardware than that necessary to implementboth multiplier anddividerseparately. Chapter4presentsahardwarealgorithmformodularmultiplication/divisionbased on the extended Euclidean algorithm. This hardware algorithm performs modular di- vision, Montgomery multiplication, and ordinary modular multiplication. In order to calculateMontgomerymultiplication,weproposeanewcomputationmethodthatcon- sistsofprocessingthemultiplierfromthemostsigni(cid:2)cantdigit(cid:2)rst. Theordinarymod- ular multiplication is based on the interleaved modular multiplication algorithm. Each ofthesethreeoperationsiscarriedoutthroughtheiterationofsimpleoperationssuchas shifts and additions/subtractions. In order to avoid carry propagation in all additions andsubtractions,theradix-2signed-digitrepresentationisemployed. Amodularmulti- plier/dividerbased on thealgorithmhas a linear array structure with a bit-slicefeature and carries out n-bit modular multiplication/division in O(n) clock cycles, where the length of the clock cycle is constant and independent of n. This multiplier/divider can be implementedusinga hardware amount onlyslightly largerthanthat of themodular divider. Chapter5presentsanewfastmethodforcalculatingmodularmultiplicationnamed Bipartite Modular Multiplication. The calculation is performed using a new represen- tation of residue classes modulo M that enables the splitting of the multiplier into two parts. These two parts are then processed separately, in parallel, potentially doubling thecalculationspeed. Theupperpartandthelowerpartofthemultiplierareprocessed usingtheinterleavedmodularmultiplicationalgorithmandtheMontgomeryalgorithm respectively. Conversions back and forth between the original integer set and the new residuesystemcanbeperformedatspeedsuptotwicethatoftheMontgomerymethod withouttheneedforprecomputedconstants. Thisnewmethodissuitableforbothhard- ware implementation;and softwareimplementationin a multiprocessorenvironment. Afasthardwarealgorithmforcalculatingmodularmultiplicationbasedonthismethod is presented at the end of this chapter. In this hardware algorithm, the addition of the partialproductstotheintermediateaccumulatedproductispipelinedinordertoreduce thecriticalpathdelay. Aradix-4versionofthehardwarealgorithmisthengivenandits hardware implementation isdiscussed. Finally, in Chapter 6 we conclude that taking advantage of similarities and symme- tries is a good technique for reducing hardware requirement and for speeding up the calculations. Contents Acknowledgements xiii 1 Introduction 1 2 Preliminaries 7 2.1 Modularreduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 ModularMultiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Ordinary ModularMultiplication . . . . . . . . . . . . . . . . . . . 7 2.2.2 Montgomery MultiplicationAlgorithm . . . . . . . . . . . . . . . . 9 2.3 ModularDivision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Extended Euclidean AlgorithmforModular Division . . . . . . . . 12 2.3.2 Extended Binary GCDAlgorithmfor Modular Division . . . . . . . 14 2.4 BinarySigned-DigitNumber Representation . . . . . . . . . . . . . . . . . 16 3 A Hardware Algorithmfor Modular Multiplication/Division Based ontheBinary GCD Algorithm 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Similarity Between the Extended Binary GCD Algorithm and the Mont- gomeryMultiplicationAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . 18 ix 3.3 HardwareAlgorithmsforModularMultiplicationandDivisionBasedon theExtendedBinary GCDAlgorithm . . . . . . . . . . . . . . . . . . . . . . 21 3.3.1 AnAcceleratedHardwareAlgorithmforModularDivisionBased on theBinaryGCD Algorithm . . . . . . . . . . . . . . . . . . . . . 21 3.3.2 A Hardware Algorithmfor MontgomeryModular Multiplication . 27 3.4 TheCombinedHardwareAlgorithmforModularMultiplication/Division BasedontheExtended Binary GCDAlgorithm . . . . . . . . . . . . . . . . 32 3.5 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.5.1 A Descriptionof theHardware . . . . . . . . . . . . . . . . . . . . . 35 3.5.2 Hardware Implementationand Evaluation . . . . . . . . . . . . . . 37 3.6 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.6.1 Applications toModular Exponentiation . . . . . . . . . . . . . . . 38 3.6.2 Applications toCryptography . . . . . . . . . . . . . . . . . . . . . 39 3.7 ConcludingRemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4 A Hardware Algorithmfor Modular Multiplication/Division Based ontheExtendedEuclidean Algorithm 41 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.1 SimilarityBetweentheExtendedEuclideanAlgorithmandtheIn- terleaved Modular MultiplicationAlgorithm . . . . . . . . . . . . . 42 4.2.2 The Extended Euclidean Algorithm and the Montgomery Multi- plication Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2.3 BasicOperations inSD2 System . . . . . . . . . . . . . . . . . . . . 44 4.3 Hardware Algorithmsfor ModularMultiplication andDivision BasedontheExtended Euclidean Algorithm . . . . . . . . . . . . . . . . . 45 4.3.1 An Improved Hardware Algorithm for Modular Division Based on theExtended EuclideanAlgorithm . . . . . . . . . . . . . . . . . 46 4.3.2 A NewAlgorithmfor Computing MontgomeryMultiplication . . 52
Description: