Bit-Planes: Dense Subpixel Alignment of Binary Descriptors HatemAlismail BrettBrowning SimonLucey RoboticsInstitute RoboticsInstitute RoboticsInstitute CarnegieMellonUniversity CarnegieMellonUniversity CarnegieMellonUniversity [email protected] [email protected] [email protected] 6 1 Abstract plate. Therelationship betweenappearanceand geometric 0 2 displacementisseldomlinear,sothelinearizationprocessis Binarydescriptorshavebeeninstrumentalintherecent typicallyrepeateduntilconvergence.Duetoitsimportance, n evolution of computationally efficient sparse image align- numerousextensionsandvariationsupontheLKalgorithm a J mentalgorithms. Increasingly,however,thevisioncommu- havesubsequentlybeenexploredinliterature[2]. 1 nityisinterestedindenseimagealignmentmethods,which AttheheartoftheLKalgorithmisthenotionthatanap- 3 aremoresuitableforestimatingcorrespondencesfromhigh proximatelinearrelationshipbetweenpixelappearanceand frameratecamerasastheydonotrelyonexhaustivesearch. geometric displacement can be reliably established. Pixel ] V However,classicdensealignmentapproachesaresensitive intensities are not deterministically differentiable with re- to illumination change. In this paper, we propose an easy spect to geometric displacement. Instead, the linear rela- C to implement and low complexity dense binary descriptor, tionship is established stochastically through spatial finite . s whichwerefertoasbit-planes,thatcanbeseamlesslyinte- differences whose outputs we refer to as image gradients. c [ gratedwithinamulti-channelLucas&Kanadeframework. The notion of estimating stochastic gradients on image in- This novel approach combines the robustness of binary tensitieshasalongandrichhistorydatingbacktosomeof 1 descriptors with the speed and accuracy of dense align- the most seminal works of computer vision [29]. Further, v ment methods. The approach is demonstrated on a tem- it has been well documented that pixel intensities within 7 0 platetrackingproblemachievingstate-of-the-artrobustness naturalimagesarestronglycorrelatedoversmallspatialar- 3 andfasterthanreal-timeperformanceonconsumerlaptops easfurthervalidatingtheassumedapproximatelinearrela- 0 (400+ fps on a single core Intel i7) and hand-held mobile tionship between pixel intensities and geometric displace- 0 devices(100+fpsonaniPadAir2). ment[34]. Pixelintensities,however,haveaproblemwhen . 2 applied to most practical image alignment tasks. Specif- 0 ically, they violate the brightness constancy assumption, 6 1.Introduction which states that pixel intensities describing a scene shall 1 remain constant under geometric distortion. Our proposed v: Binary descriptors such as BRIEF [10] & BRISK [24] dense bit-planes descriptor offers a solution to this short- i arepowerfultoolsforsolvingsparseimagealignmentprob- X comingusingacomputationallyefficientstrategy. lemsduetotheirdiscriminativepower,robustnesstoillumi- r nationchange,andlowcomplexity[21,16,20,39]. Match- Contributions: In this work we explore the validity of a ingbinarydescriptorsistypicallyperformedbyexhaustive a descriptor constancy assumption using photometrically search [7, 23] using the Hamming distance. Exhaustive invariant descriptors. In particular, we explore the effec- search,however,isinefficientwhendensecorrespondences tivenessofoneofthesimplestandmostefficientbinaryde- arerequiredinreal-time[19,30]. scriptorsLBP[32]—ortheCensusTransform[42]—for Aclassicalwayofspeedingupthetaskofimagealign- robustandefficientdensecorrespondenceestimationprob- ment is to linearize pixel intensities of an image with re- lems.Theconceptoflinearizingfeaturedescriptorswithre- specttogeometricdisplacement.Themostnotableexample spect to geometric displacement within the LK framework ofthisstrategycanbefoundintheseminalworkofLucas is a relatively new and emerging topic [8, 1]. Hitherto, & Kanade [27]. The Lucas & Kanade (LK) algorithm at- most of the previously employed descriptors have a con- tempts to establish an approximate linear relationship be- siderablecomputationalfootprintsuchasHOG[12],dense tween appearance and geometric displacement. Efficient SIFT [6, 26], and even SURF [5] making them unsuitable linearsolverscanthenbeemployedforfindingthebestgeo- forpracticaluseinmanyvisionapplicationsrequiringdense metricalignmentoftheimagewithrespecttoaknowntem- correspondencesinreal-timefromhighframeratedata. In 1 thispaperwemakethefollowingcontributions: where∂I1/∂x(cid:48) isestimatedstochasticallythroughx-andy- • Weproposethebit-planesdescriptor,anadaptationof finitedifferencefilters,while∂x(cid:48)/∂θisobtaineddeterminis- the LBP descriptor, that can be used within the LK ticallyusingtheclosed-formofthewarpingfunction. The framework. Specifically, we propose a multi-channel original formulation of LK is applicable to a wide variety LK adaptation that allows us to minimize the Ham- of problems. For special warps that satisfy a group re- ming distance using standard least squares optimiza- quirement, however, a more efficient variation is Baker & tion. Matthews’InverseCompositionalalgorithm(IC)[2]which • The suitability of our bit-planes descriptor for lin- wewilluseintheexperimentalportionofthispaper. earization is explored as a function of geometric dis- Photometric variation: The classical formulation of LK placement. We demonstrate that even though the relies on the brightness constancy assumption [27], which densebit-planesdescriptorisinherentlydiscontinuous is seldom satisfied in real life applications. Techniques to it shares the same critical properties enjoyed by pixel address illumination change include: (i) estimating illumi- intensities, which make them suitable for gradient- nation parameters alongside the motion parameters (either basedoptimization. jointly [4] or in an alternating fashion [38]), (ii) using in- • Unlike classical dense descriptors such as HOG and trinsicallyrobustsimilaritymeasures,suchasMutualInfor- dense SIFT, we demonstrate the efficiency of our bit- mation[14,13],orthenormalizedcorrelation[22],and(iii) planes descriptor on planar target tracking achieving preprocessing the images to obtain a representation that is speedsinexcessof400fpsonalaptop,andinexcess morerobusttointensityvariations[28,1,40]. of120fpsonmobiledevices.Furthermore,wedemon- On the one hand, estimating illumination is sensitive to stratefasterandmorerobusttemplatetrackingincom- themodelingassumptionsandincreasesthedimensionality parison to RANSAC-based algorithms on sparse fea- of the state and vector, thereby increasing the complexity tures, especially with low- and ambiguously textured of the optimization. On the other hand, optimizing robust objects. similarity metrics requires general purpose optimizers that cannot exploit the special structure of least squares prob- 2.TheLucas&KanadeAlgorithm lems. Preprocessing the image does not typically require re- InthissectionwebrieflyreviewtheLKalgorithminor- strictiveassumptions,anddoesnotaffectthedimensionality der to introduce notation. Let I : R2 → R be the tem- 0 ofthestatevector. Traditionally,preprocessinganimageis plate/referenceimage. Aftercameramotionwithparameter donebyconvolvingwithfilters,orothersimpleoperations vector θ ∈ Rp, we obtain an input/moving image I . We 1 suchaswhiteningthesignal[17,36]. Denselysampledfea- desire to estimate the parameters of motion such that the ture descriptors are another form of preprocessing, which followingquantityisminimized: weadoptinthiswork. Inparticular,weproposetheuseof (cid:88) a dense bit-planes descriptor. During evaluation, we show E(x;θ)= (cid:107)I (x)−I (x(cid:48)(θ))(cid:107)2, (1) 0 1 2 thatourapproachexceedstherobustnessofalgorithmsthat x∈Ω0 explicitly model illumination as well as methods that rely on robust cost metrics. Furthermore, our method is more where Ω is a subset of pixels in the template image, θ is 0 efficient, and simpler to implement. Central to our work an initial estimate of parameters and x(cid:48)(θ) describes the is the multi-channel formulation of LK, which we review transformedpixelcoordinatesgiventhemotionparameters, next. commonlyknownasthewarpingfunction. Byperforming afirst-orderTaylorexpansionofEq.(1)inthevicinityofθ, Multi-channel LK: In this section we present a general- taking the derivative with respect to the parameter update, ization of the LK algorithm to accommodate the applica- andequatingittozero,wearriveatthenormalequations: tionofmulti-channeldescriptors. Herein,weshallreferto thisgeneralizationasthemulti-channelLKalgorithm. Let J(x;θ)(cid:62)J(x;θ)∆θ =J(x;θ)(cid:62)e(x;θ), (2) φ : R2 → Rd bethed-channelrepresentationofthetem- 0 plate/referenceimage. Employingasimilarnotationtothe where J(x;θ) is the matrix of partial derivatives of the classicalLKalgorithm,aftercameramotionwithparameter warpedimageintensitieswithrespecttothemotionparam- vector θ ∈ Rp, we obtain an input/moving d-channel rep- etersevaluatedatthecurrentestimateofparametersθ,and resentation φ . To align descriptors using LK we seek to 1 e(x;θ) = I (x)−I (x(cid:48)(θ)). Usingthechainruleweob- minimize 0 1 tain (cid:88) E (x;θ)= (cid:107)φ (x)−φ (x(cid:48)(θ))(cid:107)2. (4) φ 0 1 J(x;θ)= ∂I1(x) = ∂I ∂x(cid:48). (3) x∈Ω0 ∂θ ∂x(cid:48) ∂θ To linearize Eq. (4) we must obtain an estimate of the Ja- cobian Jφ(x;θ) = ∂φ/∂θ ∈ Rd×p. Let the value of the j-th channel, as illustrated in Fig. 1, of the multi-channel representation be described as φj(x), where φ(x) = [φ1(x),...,φd(x)](cid:62). The sought Jacobian for each chan- nelinEq.(4)canbeobtainedusingthechainrule ∂φj(x) ∂φj ∂x(cid:48) 1 = 1 (5) ∂θ ∂x(cid:48) ∂θ Figure1.AnexampleoftheLBPdescriptorevaluatedona3×3 neighborhood,whichresultsina8−channelbit-planedescriptor. for j = 1,...,d where ∂φj1/∂x(cid:48) is estimated stochastically UnliketheclassicLBPdescriptor,thebit-planedescriptorcanbe throughx-andy-finitedifferencefiltersonφj1,and∂x(cid:48)/∂θ employed within a multi-channel LK framework using a sum of is obtained deterministically from the warp function. The squareddifferences(SSD)costmeasure. multi-channeld×pJacobianmatrixcanthenbeformedas ∂φ11(x)/∂θ 8 12 200 8<42 12<42 200<42 1 1 0 J (x;θ)= ∂φ1(x) = .. . (6) 56 42 55 56<42 55<42 0 0 φ ∂θ . ∂φd1(x)/∂θ 128 16 11 128<42 16<42 11<42 0 1 1 Using this multi-channel linearization all extensions and (a) (b) (c) variations of the LK algorithm can be extended to differ- Figure2.ThecanonicalLBPdescriptorisobtainedbyperforming ent types of multi-channel descriptors. Recent work has pixelcomparisonsinafixedorderandconvertingthebinarystring demonstrated the utility of multi-channel LK [1, 8] using toadecimalvalue. InFig.2athecenterpixelisina3×3neigh- classical dense descriptors such as dense SIFT and HOG. borhoodishighlighted,andcomparedtoitsneighborsasshownin Anovelcomponentofthispaperisthederivationofalow- Fig.2b.Finally,thedescriptorisobtainedbycombiningtheresults complexity dense binary descriptor that can be seamlessly ofeachcomparisoninFig.2cintoasinglescalardescriptor. appliedwithinthemulti-channelLKframework. comparison/binarytest,andthebracketnotationrepresents 3.DenseBinaryDescriptors the indicator function. We refer to the LBP descriptor de- Local Binary Patterns (LBP) [32] were among the first scribedinEquationEq.(7)assingle-channelsinceitsout- binary descriptors proposed in vision. An almost iden- putisascalarateverypixelpositionxwithintheimage. A tical binary representation was independently developed visual depiction of the single-channel LBP descriptor esti- by Zabih & Woodfill under the name: Census Transform mationprocesscanbefoundinFig.2. (CT) [42], which is still commonly applied in stereo and Bit-planes descriptor: When matching LBP descriptors optical flow research [39, 20, 31, 35]. LBP is based on it is common practice to employ the Hamming distance. thepredicateofpixelcomparisonsinasmallneighborhood Hamming distance is useful, because it matches LBP de- as illustrated in Fig. 2. By definition, the LBP descrip- scriptorsinafashionthatisinvarianttotheorderingofpixel torisinvarianttomonotonicilluminationchange,whichis comparisonswithinthe3×3neighborhood. Otherdistance desirable in practical image alignment applications. Re- metrics such as sum or squared distances (SSD) lack this cently, binary descriptor research has progressed signifi- desirablepropertyandaredependentontheorderingspec- cantly with the development of several high performance ified by {∆x }8 . This becomes problematic when em- descriptorssuchasORB[10]andBRISK[24]amongoth- i i=1 ploying dense binary descriptors within the multi-channel ers[10,37,3,25]. LKframeworkduetoitsinherentdependenceontheSSD. LBPdescriptor: WhenextractingaLBPdescriptorabout To make dense binary descriptors compatible with LK apixelpositionxoneobtains, weproposethebit-planesdescriptorgivenby φ(x)=(cid:88)8 2i−1(cid:2)I(x)(cid:46)(cid:47)I(x+∆xi)(cid:3), (7) φ(x)=I(x)(cid:46)(cid:47)I(..x+∆x1) . (8) i=1 . I(x)(cid:46)(cid:47)I(x+∆x ) 8 where {∆x }8 is the set of the eight relative coordinate i i=1 displacementspossiblewithina3×3neighborhoodaround For each pixel coordinate x in the image, this descriptor the center pixel location x. Other neighborhood sizes and producesan8-channelvectorofbinaryvalues{0,1}. No- samplinglocationscanbeused,butwefounda3×3region tably, usingtheSSDwiththemulti-channelrepresentation toperformbest. Theoperator(cid:46)(cid:47)∈ {>,≥,<,≤}isapixel in Eq. (8) between two bit-planes descriptor is equivalent to the Hamming distance between single-channel LBP de- scriptors. Specifically, the ordering of the pixel compar- isons within the 3×3 neighborhood of the bit-planes de- scriptorhasnoeffectontheSSD.Ananalysisonthechoice of operator (cid:46)(cid:47)∈ {>,≥,<,≤} is explored in the experi- mentssectionofthispaper. 4.LinearizingBit-Planes Inorderforourproposedbit-planesdescriptortobeef- (b)rawintensity fectivewithinamulti-channelLKframeworkwefirstneed (a)bit-planes toestablishthatthereexistsanapproximatelinearrelation- Figure 3. Cost surface of our bit-planes descriptor Fig. 3a com- putedoverasubsetofnaturalimages[41]incomparisontoSSD ship between the multi-channel bit-planes descriptor and overrawintensityFig.3b.BothcostsurfacesaresuitableforLK. geometric displacements. Inspecting a visualization of the bit-planesdescriptorinFig.1,onecouldbedoubtfulabout theexistenceofsuchrelationship. Specifically,eachchan- nelofthebit-planesdescriptorishighlydiscontinuous(due toitsbinarynature). Inaddition,estimatingstochasticgra- dientsperchannelofthebit-planesdescriptorseemsstrange astheycantakeononlythreepossibilities:{−1,0,+1}. SSD cost surface: However, the news is not all gloomy. InFig.3weseetheSSDcostsurfacebetweenapatchwithin a natural image and shifted versions of itself in the x- and y- directions. This was repeated over a subset of natural (a)Bit-planes. (b)RawIntensity. Figure 4. Assessment of the linearization properties of our pro- images with the aggregate result being depicted in Fig. 3. posed bit-planes descriptor in terms of the signal-to-noise-ratio Sub-pixelshiftsareentertainedhereusingbi-linearinterpo- (SNR)asafunctionoftranslationaldisplacementinx-andy-di- lation. In Fig. 3b one sees the cost surface for raw pixel rections. Onenotesthateventhoughrawpixelsaresuperior,bit- intensities,andasexpected,weseeaquasi-convexcostsur- planesofferasufficientapproximationtobeusedwithinamulti- face surrounding the origin. This quasi-convex surface is channel LK framework. This thesis is the central focus of our importantwithrespecttotheeffectivenessoftheLKalgo- paper. rithm—astheLKobjectivereliesonagracefulreduction of the SSD cost as a function of geometric displacement The signal-to-noise-ratio (SNR) of the approximation can from ground-truth. In fact the LK algorithm can be inter- thenbedefinedas pretedasattemptingtohallucinateaconvexquadraticrep- resentation of this SSD cost surface. Interestingly, when (cid:88) SNR(∆θ)=10·[log (cid:107)R(x(0))(cid:107)2−log(cid:15)(∆θ)] (11) inspecting Fig. 3a we see a similar quasi-convex cost sur- 2 x∈Ω face,whichindicatesthatbit-planeshavesimilarproperties torawpixelintensitieswhenitcomestotheuseofSSDas In Fig. 4 we depict the SNR of the linearized objective as ameasureofdissimilarity. a function of increasing translational shifts from the true minimaforbothrawintensities,andthebinarychannelsin Linearpredictionsofbit-planes: Consideratranslational displacement warp ∆θ ∈ R2 where we attempt to lin- bit-planes. The experiments were carried out in a similar manner through the use of a subset of natural images and earlypredictanimagerepresentationR(rawpixels,orbit- aggregated to form the results in Fig. 4. As expected, the planes)inthex-andy-directions, SNRwhenusingbinaryfeaturesislowerthenusingrawin- ∂R(0) tensities. However,itseemsthat—atleastqualitatively— R(x(0))+ ∆θ ≈R(x(∆θ)) . (9) ∂θ bit-planesgradientestimatesprovideagoodlocallinearap- proximation of the objective. Hence, further justifying the We employ the neutral notation R to represent either raw useofthebit-planesdescriptorwithintheLKframework. pixels I or bit-planes φ. We can define the error of this linearapproximationtobe: 5.Experiments (cid:15)(∆θ)= (cid:88)(cid:107)R(x(0))+ ∂R(0)∆θ−R(x(∆θ))(cid:107)2 In this section we shall attempt to answer a number of ∂θ 2 importantquestionsregardingthevalidityofourdensebit- x∈Ω (10) planesdescriptorforrobustandefficientimagealignment. Pre-computing descriptors: An obvious question to ask 3.4 ·10−2 whenconsideringtheapplicationofmulti-channeldescrip- S tors, such as bit-planes, within the LK framework is: M 2.4 R whether we can pre-compute the descriptors before warp- nt ing? Specifically,duetotheiterativenatureoftheLKalgo- Poi 1.3 rithmitbecomescomputationallyexpensivetore-compute 0.3 FA FA-1 FC FC-1 IC IC-1 dense descriptors after each image warp. If one can pre- computethedescriptorbeforewarpingsubstantialefficien- Figure 5. Recomputing descriptors after image warping shows ciescanbeintegratedintoanyLK-basedimagealignment. consistentlybetterperformancethanwarpingfeatureimageswhen We attempted to answer this question in Fig. 5 where testedwithseveralLKvariants. FA:ForwardAddition,FC:For- weevaluatedanumberofwell-knownLKvariants[2]: for- ward Compositional, IC: Inverse Compositional. The suffix ‘-1‘ ward additive (FA), forward compositional (FC), and in- indicatesrecomputedfeaturesonwarpedimages. verse compositional (IC) for the task of image alignment onnaturalimages.Randomwarpinitializationsandappear- 1 S ancevariationoftheform M R (cid:18)αI (θ (x))+β(cid:19)1+γ nt 0.5 I (x)=255 0 a (12) oi 1 255 P 0 FA FA-1 FC FC-1 IC IC-1 wereincluded,whereθ (·)arethe6DOFparametersofan a affinewarp,αandβaremultiplicativeandadditivelighting Figure6.Recomputingdescriptorsafterimagewarpingisequiv- change terms, and |γ| < 1 is used for gamma correction. alent to warping feature channels given a linear relationship be- We can see that warping feature channels is less accurate tweenthewarpparametersandintensities(translationwarps).The than re-computing the descriptor on the warped image as reason for reduced IC performance is due to higher noise at the showninFig.5. template. Thedegreetowhichwarpingthefeaturechannelsvs.re- computing them affects accuracy depends on the applica- 1200 BitPlanes(18.44) tion and the type of warp. For simple warps such as 2D 1000 >(31.31) translation, the relationship between intensity deformation 800 ≥(35.47) s aimsaatifnugncmtiuolnti-ocfhwananrpelpLaKrambyetwerasrpisinlignethaer.feHaetunrcee,cahpapnrnoexls- ount 600 <≤((3334..2139)) C 400 isequivalenttore-computingthefeaturesonwarpedimages 200 asshownin Fig.6. However, formore complicatedwarps where deformation of image intensities is nonlinearly de- 0−140 −100 −60 −20 20 60 100 140 pendent on the warp parameters we expect a pronounced Error differenceinalignmentaccuracy. Thisisbecausethevalue Figure7.Histogramoffinalintensityerrorswhenusingourpro- of each descriptor channel might significantly differ after posed multi-channel bit-planes vs. classical single-channel LBP a nonlinear warp. Overall, it is possible to approximate descriptorswithdifferentcomparisonoperators(cid:46)(cid:47)∈ {>,≥,<,≤ themulti-channelobjectiveinEq.(4)withwarpingfeature },theRMSisshowninparenthesis. channelsdependingonthetypeandaccuracyrequirements oftheapplicationathand. Inourexperiments,wechoseto recompute descriptors after every iteration of image warp- neighborhoodordering. InFig.8weshowtheeffectofdif- ing. feringbinarycomparisonoperators(cid:46)(cid:47)∈{>,≥,<,≤}com- pared to our proposed bit-plane descriptor. Our bit-planes LBP within LK: Employing bit-planes requiresthe align- descriptorisunaffectedbytheordering. Inourexperiments mentofeightseparatechannelsasopposedtoasinglechan- wenoticedindistinguishabledifferencesinperformancebe- nel when working with raw intensities. In Fig. 8 we dis- tweenbinarycomparisonoperatorswhenemployingthebit- cussed the problems of using a LBP descriptor within the planesdescriptor. Asaresult,wechosetousethe>opera- LK framework. In particular, the representation is inher- torfortherestofourexperiments. ently sensitive to the ordering of pixel comparisons when usingaSSDmeasureofdissimilarity. UsingLBPdescrip- Real-time template tracking: We evaluate the perfor- tors within a LK framework as been reported to perform mance of bit-planes for a template tracking problems us- well [39, 20] given small displacements. However, under ingthebenchmarkdatasetcollectedbyGauglitzetal.[18]. moderate displacements the use of the LBP descriptor in An example of the dataset is shown in Fig. 9. Our plane LK introduces biases due to choices of the binary test and trackerestimatesan8DOFhomographyusingtheICalgo- Templatearea 75×57 150×115 300×230 640×460 Intensity 650 360 140 45 Bit-planes 460 170 90 35 Table 2. Plane template tracking runtime in frames per second (FPS)[email protected]. (a)Templateatt=0 (b)Bit-planesresultatt=50 a variety of template tracking methods summarized in Ta- ble 1. The algorithms are: the enhance correlation coeffi- cientECC[15],whichservesasanexampleofanintrinsi- callyrobustcostfunctionthatisinvariantuptoanaffineil- luminationchange. TheDualInverseCompositional(DIC) algorithm[4],whichseversasanexampleofalgorithmsthat (c)(cid:46)(cid:47) := > (d)(cid:46)(cid:47) := ≥ (e)(cid:46)(cid:47) := < (f)(cid:46)(cid:47) := ≤ Figure8.TrackingdriftwhenusingLBPvs.bit-planes. Thebot- attempt to estimate illumination parameters. We use two tom row shows the result of template tracking using LPB. The variationsoftheDIC:(i)thegain+biasmodelongrayscale imagemagnifiedforbettervisualization(comparewithFig.8b). imagesdenotedbyDIC-1,and(ii)usingafullaffinelight- AquantitativeanalysisisshowninFig.7.Bestviewedincolor. ing model the makes use of RGB image data denoted by DIC-2.Wealsocomparetheperformanceagainstarecently published descriptor-based method [11] called Descriptor Algorithm #parameters #channels Fields DF. Finally, we include baseline results from raw BP(ours) 8 8 intensity BF, improved LK with the Gradient Constraint ECC[15] 8 1 GC[9],andalignmentwiththeGradientMagnitudeGM. DIC-1[4] 10 1 We report two quantities in the evaluation. First, is the DIC-2[4] 20 3 percentageofsuccessfullytrackedframes. Aframeissuc- DF[11] 8 5 cessfullytrackediftheoverlapbetweentheestimateandthe GC[9] 8 3 GM 8 2 groundtruthisgreaterthan90%. Theoverlapiscomputed LK 8 1 as o = (A∩B)/(A∪B), where A is the warped image Table 1. Algorithms compared in this work. The number of pa- giveneachalgorithm’sestimate,andBisthewarpedimage rameters indicates the DOF of the state vector, which is 8 for a giventhegroundtruth. Second,sincewearealsointerested homography in addition to any photometric parameters. We use insubpixelaccuracyweshowthemeanpercentageofover- theauthors’codeforECCandDIC. lapacrossallframesgivenbym=1/n(cid:80)ni=1oi,wherenis thenumberofframesineachsequence. Real-time results: Results are compared for three types rithm [2]. The template is extracted from the first frame ofgeometricandphotometricvariations. Firstisanoutof ineachsequenceandiskeptfixedthroughoutthesequence planerotation,whichinducesperspectivechangeasshown as we are interested in tracking robustness overtime. To inFig.9b. Second,isdynamiclightingchangewherethe improveconvergenceweusea3-levelpyramidandinitial- imageisstationarybutailluminatedwithnonlinearlyvary- izethetrackerforsubsequentframesusingthemostrecent inglightsource.Finallyastaticlightingchange,wherethe estimate. We use Gauss-Newton as the optimization algo- transitionphaseofchangeinlightingisomitted. rithm, without robust weighting, with a maximum of 100 Our evaluation results are shown in Table 3 and in iterations. Tracking terminates early if the relative change Fig.10. Basedonourexperimentation,thetopperforming intheestimatedparametersbetweeniterationsdropsbelow 1×10−6,ortherelativechangeinthecostfunctionreduc- methodsaretheonesthatemployadescriptorconstancyas- tion drops below 1×10−5. For small motion, the tracker sumption,namely: bit-planesandDF. However,bit-planes ismoreefficientanditperformedsignificantlybetterforthe typically converges in less than 10 iterations. Our imple- ‘outofplanerotation’,whichinducesperspectivechangein mentation runs faster than real time as shown in Table 2. the image. In fact, all tested algorithms, with the excep- The efficiency is achieved by utilizing SIMD instructions tion of bit-planes, performed poorly with this data. Algo- on the CPU that allows us to process 16 pixels at once (or rithms that use a robust function (ECC) and the ones that 32pixelswithmodernAVXinstructions). Additionally,the attempttoestimateillumination(DIC)performedwell,but operationsrequiredtocomputetheLBPdescriptorarelim- fellbehindincomparisontodescriptorconstancyandgra- ited to bit shifts, ORs and ANDs, all of which can be per- dientconstraint. formedwithlowlatency. We compare the performance of our algorithm against Results on mobile devices: We further evaluate the work br bu mi pa su wd OutofPlaneRotation BP 100.0(99.38) 100.0(99.51) 87.50(99.38) 97.92(99.26) 79.17(99.57) 93.75(99.30) ECC 25.00(96.16) 33.33(95.85) 25.00(95.99) 33.33(96.65) 20.83(95.52) 18.75(95.14) DIC-1 25.00(96.20) 33.33(95.83) 25.00(95.98) 33.33(96.73) 20.83(95.95) 18.75(95.46) DIC-2 25.00(96.22) 35.42(95.56) 25.00(95.51) 35.42(96.42) 25.00(96.22) 18.75(95.06) DF 91.67(99.51) 93.75(99.44) 79.17(99.70) 85.42(99.75) 70.83(99.60) 83.33(99.51) GC 100.0(99.24) 95.83(99.66) 87.50(99.52) 93.75(99.51) 62.50(98.88) 91.67(99.34) GM 62.50(99.86) 83.33(99.62) 77.08(99.72) 77.08(99.81) 58.33(99.71) 62.50(99.66) LK 93.75(99.68) 91.67(99.70) 83.33(99.32) 91.67(99.63) 37.50(97.64) 66.67(99.63) DynamicLightingChange BP 100.0(98.97) 100.0(99.08) 100.0(99.13) 100.0(98.91) 100.0(98.98) 100.0(99.02) ECC 16.33(98.03) 19.39(99.00) 100.0(98.64) 100.0(98.69) 100.0(97.30) 67.35(98.55) DIC-1 100.0(98.40) 100.0(99.04) 100.0(98.77) 100.0(98.60) 86.87(96.02) 20.41(95.36) DIC-2 100.0(98.39) 100.0(98.85) 100.0(98.61) 100.0(98.58) 85.86(96.42) 26.53(97.73) DF 100.0(99.30) 100.0(99.08) 100.0(98.35) 100.0(98.87) 20.41(99.36) 68.37(99.02) GC 17.35(99.87) 100.0(99.50) 22.45(99.84) 18.37(99.88) 12.24(99.72) 17.35(99.84) GM 17.35(98.99) 19.39(99.23) 23.47(99.10) 19.39(99.08) 0.00(0.00) 0.00(0.00) LK 13.27(99.34) 31.63(98.26) 18.37(98.82) 18.37(99.32) 12.24(99.16) 16.33(98.96) Staticlightingchange BP 100.0(99.76) 100.0(99.85) 100.0(99.61) 100.0(99.85) 100.0(99.63) 100.0(99.76) ECC 100.0(97.33) 100.0(97.67) 100.0(97.75) 100.0(97.41) 100.0(96.79) 100.0(97.55) DIC-1 100.0(97.70) 100.0(97.77) 100.0(97.80) 100.0(97.20) 98.72(96.58) 89.74(96.19) DIC-2 100.0(97.58) 79.49(97.59) 100.0(97.07) 100.0(97.13) 89.74(95.75) 79.49(96.38) DF 100.0(99.68) 100.0(99.51) 76.92(99.71) 100.0(99.77) 74.36(99.70) 100.0(99.83) GC 74.36(99.73) 74.36(99.84) 48.72(99.97) 74.36(99.76) 48.72(99.74) 51.28(99.88) GM 48.72(99.88) 74.36(99.75) 74.36(99.66) 74.36(99.81) 48.72(99.76) 48.72(99.83) LK 48.72(99.80) 74.36(99.67) 48.72(99.95) 48.72(99.93) 48.72(99.40) 48.72(99.94) Table3.Templatetrackingevaluation[18]. Weshowthepercentageofsuccessfullytrackedframes. Inparenthesisweshowtheaverage percentageofoverlapforallsuccessfullytrackedframes. Theavailabletexturesare:br(bricks),bu(building),mi(mission),pa(paris), su(sunset),andwd(wood). (a)Suddenlightingchangeandambiguoustexture. (b)Suddenlightingchangeandperspectivedistortionwithmediumtexture. (c)Suddenlightingchangeandmotionblurwithhightexture. Figure11.Highframeratedataat120HzcapturedusinganiPhone5s. Datasetcontainsdifferenttexturesundersuddenlightingchange, lowlighting,andmotionblur.Alldataandcodewillbepubliclyavailablefortheresearchcommunity. texture in Fig. 11b as well as higher amount of texture in Fig.11c. ThefirstimageinFig.11showstheselectedtem- plate, which we hold fixed throughout tracking. The total numberofframesfromthevideoscombinedis6447. Wecomparetheperformanceofdensetrackingusingbit- planeswiththeRANSAC-basedtrackingbydetectionusing (a)Lightingchange. twotypesofbinarydescriptors,ORB[33]andBRISK[24]. Intermsofefficiency,eventhoughourmobiledeviceimple- mentation does not make use of NEON instructions or the GPU,weoutperformopencv3’soptimizedimplementations ofORBandBRISKbyasubstantialmargin. Moreimportantly,ourapproachismorerobust. Feature- based tracking failed approximately on 15% of the frames duetoeither: (i)inabilitytodetectfeaturesunderlowlight, and(ii)RANSACfailureduetoimprecisecorrespondences (b)Out-of-planerotation. Figure9.TrackingresultsusingtheBricksdataset[18]. Thetop undermotionblur. rowofeachfigureshowstheperformanceofbit-planes,whilethe Perhapsmoreinterestingly,bit-planesisabletoperform bottomrowshowsclassicalintensity-basedLK. well and improve efficiency by reducing image resolution. In fact, tracking speed more than doubles when reducing 1 thetemplatesizebyhalf. However,thisisnotthecasewith sparsefeaturesasmemoryoverheaddependsonthenumber n 0.8 o of extracted keypoints, which we kept fixed at 512. It is cti a 0.6 possibletoimprovethetrackingspeedofORBandBRISK fr s by reducing the number of extracted keypoints. However, s e 0.4 c loweringthenumberofkeypointsmustbedonecarefullyas uc BP ECC DIC-1 S 0.2 DIC-2 DF GC not to compromise the robustness of the system. We note LK that the ability to work with lower resolution is important 0 0.5 0.6 0.7 0.8 0.9 1 on mobile devices, not only for efficiency considerations, Overlapfraction butalsoforpowerconsumption. Figure10.Fractionofsuccessfullytrackedframesasfunctionof Finally,wenotethatwhiledensebit-planestrackingpro- theoverlapareagiventhegroundtruth.Bit-planesandDFperform ducesfasterandmoreaccurateresults,itsmainlimitationis better than other methods. However, in Table 3 we see that bit- theinabilitytorecoverifthetemplateislostduetoocclu- planes’performanceisbetterwithchallengingsequences. sions or significant drift. In such cases, track by detection canbeofimmensevaluetore-initializeLK-basedmethods ifneeded. template iPadAir2 iPhone5s size BP ORB BRISK BP ORB BRISK 6.Conclusions 70×55 123 N/A N/A 50 N/A N/A 150×115 48 15 15 22 13 13 In this work, we proposed a multi-channel representa- 311×230 17 12 14 10 8 11 tion that enables nonlinear gradient-based optimization al- Table4.TemplatetrackingrunningtimeonARMarchitectureus- ingasingleCPUcoreinframespersecond(FPS).Thebottleneck gorithmstoworkwithbinaryfeatures. Wearriveatthesur- forbit-planesisimageresizingandwarping,whichcouldbealle- prisingresultthatbinarydataissuitableforgradient-based viatedusingtheGPU. Resultsareaveragedoverthreevideosof optimization, as the local approximation of gradients per challengingdatatotalling6446frames. channeliswell-approximatedwithaquadraticform. We used the multi-channel representation in a Lucas & Kanade (LK) image alignment framework with our pro- on high frame rate data (Slo-mo) using two smart mobile posedbit-planesdescriptors,whichgreatlyimprovesthero- devices: the iPad Air 2 and the iPhone 5s. In addition bustnesstoarbitraryilluminationvariationswithoutsignif- tocompression artifacts, wemade thedatamore challeng- icantlyincreasingcomputationaldemands. Inaddition,we ing by turning off the lights multiples times to cause sud- obtained a precise subpixel localization of binary descrip- den lighting change and low illumination. The videos are torsatspeedsfasterthanrealtime. recorded with unsteady hands causing further motion blur. Inthecontextofbinaryfeatures,leastsquaresminimiza- AnexampleofthevideosisshowninFig.11featuringan tion over the multi-channel representation is equivalent to ambiguously textured object in Fig. 11a, normal levels of minimizing the Hamming distance. Hence, we are able to minimizetheHammingdistanceinastandardleastsquares [15] G. D. Evangelidis and E. Z. Psarakis. Parametric image optimizationframework. alignmentusingenhancedcorrelationcoefficientmaximiza- tion. PAMI,30(10),2008. 6 References [16] J.Figat,T.Kornuta,andW.Kasprzak. PerformanceEvalu- ationofBinaryDescriptorsofLocalFeatures. InComputer [1] E. Antonakos, J. Alabort-i Medina, G. Tzimiropoulos, and VisionandGraphics,pages187–194.Springer,2014. 1 S.Zafeiriou. Feature-BasedLucas-KanadeandActiveAp- [17] D. Fleet and Y. Weiss. Optical Flow Estimation. In pearanceModels. ImageProcessing,IEEETransactionson, N.Paragios,Y.Chen,andO.Faugeras,editors,Handbookof 24(9):2617–2632,Sept2015. 1,2,3 Mathematical Models in Computer Vision, pages 237–257. [2] S.BakerandI.Matthews.Lucas-kanade20yearson:Auni- SpringerUS,2006. 2 fyingframework. InternationalJournalofComputerVision, [18] S.Gauglitz,T.Höllerer,andM.Turk. EvaluationofInterest 56(3):221–255,2004. 1,2,5 Point Detectors and Feature Descriptors for Visual Track- [3] V. Balntas, L. Tang, and K. Mikolajczyk. BOLD - Bi- ing. InternationalJournalofComputerVision, 94(3):335– naryOnlineLearnedDescriptorForEfficientImageMatch- 360,2011. 5,7,8 ing. TheIEEEConferenceonComputerVisionandPattern [19] K.GraumanandR.Fergus. Learningbinaryhashcodesfor Recognition(CVPR),June2015. 3 large-scaleimagesearch. InMachinelearningforcomputer [4] A.Bartoli. Groupwisegeometricandphotometricdirectim- vision,pages49–87.Springer,2013. 1 ageregistration. IEEETrans.onPatternAnalysisandMa- [20] D.Hafner,O.Demetz,andJ.Weickert. WhyIstheCensus chineIntelligence,30(12):2098–2108,2008. 2,6 Transform Good for Robust Optic Flow Computation? In [5] H.Bay,A.Ess,T.Tuytelaars,andL.V.Gool. Speeded-Up Scale Space and Variational Methods in Computer Vision, RobustFeatures(SURF). ComputerVisionandImageUn- volume7893.2013. 1,3,5 derstanding, 110(3):346–359, 2008. SimilarityMatching [21] J.Heinly,E.Dunn,andJ.-M.Frahm.Comparativeevaluation inComputerVisionandMultimedia. 1 ofbinaryfeatures. InComputerVision–ECCV2012,pages [6] A.Bosch,A.Zisserman,andX.Muoz. ImageClassification 759–773.Springer,2012. 1 usingRandomForestsandFerns.InComputerVision,2007. [22] M.IraniandP.Anandan. Robustmulti-sensorimagealign- ICCV2007.IEEE11thInternationalConferenceon, 2007. ment.InComputerVision,1998.SixthInternationalConfer- 1 enceon,pages959–966,Jan1998. 2 [7] E. Bostanci. Is Hamming distance only way for match- [23] S.Kaneko,I.Murase,andS.Igarashi. Robustimageregis- ing binary image feature descriptors? Electronics Letters, trationbyincrementsigncorrelation. PatternRecognition, 50(11):806–808,May2014. 1 35(10):2223–2234,2002. 1 [8] H. Bristow and S. Lucey. In Defense of Gradient-Based [24] S. Leutenegger, M. Chli, and R. Siegwart. BRISK: Bi- AlignmentonDenselySampledSparseFeatures. InDense nary Robust invariant scalable keypoints. In Computer Vi- correspondencesincomputervision.Springer,2014. 1,3 sion(ICCV),2011IEEEInternationalConferenceon,pages [9] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High 2548–2555,Nov2011. 1,3,8 Accuracy Optical Flow Estimation Based on a Theory for [25] G.LeviandT.Hassner. LATCH:LearnedArrangementsof Warping. InECCV,volume3024ofLectureNotesinCom- ThreePatchCodes. CoRR,abs/1501.03719,2015. 3 puterScience,pages25–36.2004. 6 [26] C.Liu,J.Yuen,andA.Torralba. SIFTFlow: DenseCorre- [10] M.Calonder,V.Lepetit,C.Strecha,andP.Fua. BRIEF:Bi- spondenceacrossScenesandItsApplications. IEEETrans. naryRobustIndependentElementaryFeatures. InK.Dani- PatternAnal.Mach.Intell.,33(5):978–994,2011. 1 ilidis,P.Maragos,andN.Paragios,editors,ComputerVision [27] B.D.LucasandT.Kanade. AnIterativeImageRegistration –ECCV2010, volume6314ofLectureNotesinComputer TechniquewithanApplicationtoStereoVision(DARPA).In Science,pages778–792.2010. 1,3 Proc.ofthe1981DARPAImageUnderstandingWorkshop, [11] A.CrivellaroandV.Lepetit. Robust3DTrackingwithDe- pages121–130,April1981. 1,2 scriptorFields. InConferenceonComputerVisionandPat- [28] S. Lucey, R. Navarathna, A. B. Ashraf, and S. Sridharan. ternRecognition(CVPR),2014. 6 FourierLucas-KanadeAlgorithm. PatternAnalysisandMa- [12] N.DalalandB.Triggs. Histogramsoforientedgradientsfor chineIntelligence,IEEETransactionson,35(6),2013. 2 humandetection. InComputerVisionandPatternRecogni- [29] D.Marr.Vision:AComputationalInvestigationintotheHu- tion,2005.CVPR2005.IEEEComputerSocietyConference man Representation and Processing of Visual Information. on,volume1,pages886–893vol.1,June2005. 1 HenryHoltandCo.,Inc.,NewYork,NY,USA,1982. 1 [13] A.DameandE.Marchand. Accuratereal-timetrackingus- [30] M.MujaandD.G.Lowe. Fastmatchingofbinaryfeatures. ing mutual information. In Mixed and Augmented Reality In Computer and Robot Vision (CRV), 2012 Ninth Confer- (ISMAR),20109thIEEEInternationalSymposiumon,pages enceon,pages404–410.IEEE,2012. 1 47–56,Oct2010. 2 [31] T.Müller,C.Rabe,J.Rannacher,U.Franke,andR.Mester. [14] N.DowsonandR.Bowden. MutualInformationforLucas- Illumination-RobustDenseOpticalFlowUsingCensusSig- Kanade Tracking (MILK): An Inverse Compositional For- natures. In Pattern Recognition, volume 6835 of Lecture mulation. PAMI,30(1):180–185,Jan2008. 2 NotesinComputerScience,pages236–245.2011. 3 [32] T. Ojala, M. Pietikäinen, and D. Harwood. A comparative [38] G. Tzimiropoulos, J. Alabort-i Medina, S. Zafeiriou, and study of texture measures with classification based on fea- M.Pantic.GenericActiveAppearanceModelsRevisited.In tureddistributions. PatternRecognition,29:51–59,1996. 1, K.Lee,Y.Matsushita,J.Rehg,andZ.Hu,editors,Computer 3 Vision–ACCV2012,pages650–663.2013. 2 [33] E.Rublee,V.Rabaud,K.Konolige,andG.Bradski. ORB: [39] C. Vogel, S. Roth, and K. Schindler. An Evaluation of AnefficientalternativetoSIFTorSURF. InComputerVi- DataCostsforOpticalFlow. InJ.Weickert, M.Hein, and sion(ICCV),2011IEEEInternationalConferenceon,pages B. Schiele, editors, Pattern Recognition, Lecture Notes in 2564–2571,Nov2011. 8 ComputerScience.2013. 1,3,5 [34] E.P.SimoncelliandB.A.Olshausen. NaturalImageStatis- [40] A. Wedel, T. Pock, C. Zach, H. Bischof, and D. Cremers. tics and Neural Representation. Annual Review of Neuro- AnImprovedAlgorithmforTV-L1OpticalFlow.InStatisti- science,24:1193–1216,2001. 1 calandGeometricalApproachestoVisualMotionAnalysis, [35] F. Stein. Efficient Computation of Optical Flow Using the volume5604ofLectureNotesinComputerScience.2009.2 CensusTransform. InPatternRecognition,volume3175of [41] J.Xiao,J.Hays,K.Ehinger,A.Oliva,A.Torralba,etal.Sun LectureNotesinComputerScience,pages79–86.2004. 3 database: Large-scalescenerecognitionfromabbeytozoo. [36] D.Sun, S.Roth, andM.Black. Secretsofopticalflowes- In Computer vision and pattern recognition (CVPR), 2010 timationandtheirprinciples. InComputerVisionandPat- IEEEconferenceon,pages3485–3492.IEEE,2010. 4 ternRecognition(CVPR),2010IEEEConferenceon,pages [42] R.ZabihandJ.Woodfill. Non-parametriclocaltransforms 2432–2439,June2010. 2 forcomputingvisualcorrespondence. InComputerVision- [37] V.L.T.Trzcinski,M.ChristoudiasandP.Fua. BoostingBi- ECCV’94,pages151–158.Springer,1994. 1,3 naryKeypointDescriptors. InComputerVisionandPattern Recognition,2013. 3