COUNTING LYNDON FACTORS AMYGLEN,JAMIESIMPSON,ANDW.F.SMYTH ABSTRACT. In this paper, we determine the maximum number of distinct Lyndon factors that a word of lengthncancontain.WealsoderiveformulasfortheexpectedtotalnumberofLyndonfactorsinawordof lengthnonanalphabetofsizeσ,aswellastheexpectednumberofdistinctLyndonfactorsinsuchaword. 7 TheminimumnumberofdistinctLyndonfactorsinawordoflengthnis1andtheminimumtotalnumberis 1 n,withbothboundsbeingachievedbyxnwherexisaletter.Amoreinterestingquestiontoaskiswhatisthe 0 minimumnumberofdistinctLyndonfactorsinaLyndonwordoflengthn?Inthisdirection,itisknown[13]thatan 2 optimallowerboundforthenumberofdistinctLyndonfactorsinaLyndonwordoflengthnis log (n)+1 , n ⌈ φ ⌉ whereφdenotesthegoldenratio(1+√5)/2.Moreover,thislowerboundisattainedbytheso-calledfinite a J Fibonacci Lyndon words, which are preciselythe Lyndon factors of the well-known infiniteFibonacci word f —aspecialexampleofainfiniteSturmianword.Saari[13]conjecturedthatifwisLyndonwordoflengthn, 4 n=6,containingtheleastnumberofdistinctLyndonfactorsoverallLyndonwordsofthesamelength,then ] w6isaChristoffelword(i.e.,aLyndonfactorofaninfiniteSturmianword).Wegiveacounterexampletothis O conjecture.Furthermore,wegeneraliseSaari’sresultonthenumberofdistinctLyndonfactorsofaFibonacci C Lyndon word by determining the number of distinct Lyndonfactors of a givenChristoffel word.We end withtwoopenproblems. . h t a m [ 1. INTRODUCTION 1 v 8 ThispaperisconcernedwithcountingLyndonwordsoccurringinagivenwordoflengthn. 2 9 First, let us recall some terminology and notation from combinatorics on words (see, e.g., [10, 11]). A 0 wordisa(possiblyempty)finiteorinfinitesequenceofsymbols,calledletters, drawnfromagivenfinite 0 set Σ, called an alphabet, ofsize σ = Σ . A finite word w := x x x with each x Σ is said tohave 1. length n, written w = n. The empty| w|ord is the unique word1 o2f·l·e·ngnth 0, denotedi ∈by ε. The set of all 0 | | finite wordsover Σ (including theemptyword)is denotedby Σ ,and for each integer n 2, thesetof 7 ∗ ≥ 1 allwordsoflengthnoverΣisdenotedbyΣn. : v A finitewordz is saidtobeafactor ofagiven finiteword w ifthereexistwordsu, v suchthat w = uzv. i X If u = ε, then z is said to be a prefix of w, and if v = ε, then z is said to be a suffix of w. If both u and v r arenon-empty,wesaythatzisaproper factorofw.Aprefix(respectively,suffix)ofwthatisnotequalto a w itselfissaidtobeaproper prefix (respectively,proper suffix)ofw.Afactorofan infinitewordisafinite wordthatoccurswithinit. A non-empty word x that is both a proper prefix and a proper suffix of a finite word w is said to be a border of w. We say that a word which has only an empty border is borderless. If, for some word x, w = xx x (k times for some integer k 1), we write w = xk, and w is called the k-th power of x. A ··· ≥ non-empty finite word is said to be primitive if it is not a power of a shorter word. Two finite words u, v are said to be conjugate if there exist words x,y such that u = xy and v = yx. Accordingly, conjugate words are cyclic shifts of one another, and thus conjugacy is an equivalence relation. A primitive word of length n has exactly n distinct conjugates.For example, theprimitive word abacaba of length7 has 7 distinctconjugates;namely,itselfandthesixwordsbacabaa, acabaab, cabaaba, abaabac, baabaca, aabacab. Thesetofallconjugatesofafinitewordwiscalledtheconjugacyclassofw. Date:January4,2017. 2000MathematicsSubjectClassification. 68R15. Keywordsandphrases. Lyndonword;Sturmianword;Fibonacciword;Christoffelword. 1 2 AMYGLEN,JAMIESIMPSON,ANDW.F.SMYTH InthispaperweconsideronlywordsonanorderedalphabetΣ = a ,a ,...,a wherea < a < < 1 2 σ 1 2 { } ··· a .ThistotalorderonΣnaturallyinducesalexicographical order(i.e.,analphabeticalorder)onthesetof σ allfinitewordsoverΣ.ALyndonwordoverΣisanon-emptyprimitivewordthatisthelexicographically least word in its conjugacy class, i.e., w Σ or w < vu for all non-empty words u,v such that w = uv (e.g.,see[10]). Equivalently, anon-empty∈finiteword w over Σ is Lyndonifand onlyif w Σ or w < v ∈ forallpropersuffixesvofw[7].Note,inparticular,thatthereisauniqueLyndonwordintheconjugacy classofanygivenprimitiveword.Forexample,aabacab istheuniqueLyndonconjugateoftheprimitive wordabacaba. LyndonwordsarenamedafterR.C.Lyndon[12],whointroducedthemin1954underthe nameof“standardlexicographicsequences”.Suchwordsarewellknowntobeborderless[7]. WebegininSection2bycomputingD(σ,n),themaximumnumberofdistinctLyndonfactorsinaword of length n on an alphabet Σ of size σ. In Section 3 we compute ET(σ,n), the expected total number of Lyndon factors (that is, counted according to their multiplicity) in a word of length n over Σ, while Section4computesED(σ,n),theexpectednumberofdistinctLyndonfactorsinwordoflengthnoverΣ. Section 5 considersdistinctLyndonfactors in a Lyndonword oflength n; in particular, we generalisea resultofSaari[13]onthenumberofdistinctLyndonfactorsofaFibonacciLyndonwordbydeterminingthe numberofdistinctLyndonfactorsofagivenChristoffelword(i.e.,aLyndonfactorofaninfiniteSturmian word—tobedefinedlater).Lastly,inSection6,westatesomeopenproblems. 2. THE MAXIMUM NUMBER OF DISTINCT LYNDON FACTORS IN A WORD Let D(σ,n) be the maximum number of distinct Lyndon factors in a word of length n on the alphabet Σ = a ,a ,...,a . We want to find a word that achieves D(σ,n), given σ and n. It is clear that a 1 2 σ { } necessaryconditionforattainingthemaximumisthatwtakestheformak1ak2 ...akσ.Thiswordcontains 1 2 σ (n+1) factors of lengths 1,2,...,n, of which each is a Lyndon word except those of the form ak, k > 1. 2 i Thenumberofpowersofeach a is(ki+1),including a itself.ThetotalnumberofLyndonfactorsin w is i 2 i therefore n+1 σ k +1 (1) ∑ i +σ, 2 − 2 (cid:18) (cid:19) i=1(cid:18) (cid:19) where the final σ counts the single letters a . We claim that the summation is minimised when the k i i differbyatmostone.Supposetothecontrarythatk = k +sforsomei, jands 2.Itiseasilychecked j i ≥ that k +1+s k +1 k +s k +2 i + i > i + i 2 2 2 2 (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) for s 2. Thus the summation term will be minimised when each k equals either n/σ or n/σ . If i n = m≥σ+ p, where0 < p < σ, then n/σ = m and n/σ = m+1.If p = 0thenea⌊ch k ⌋equa⌈ls m.⌉We i ⌊ ⌋ ⌈ ⌉ thereforehavethefollowingresult. Theorem1. Ifn = mσ+p,where0 p < σ,then ≤ n+1 m+1 m+2 (2) D(σ,n) = (σ p) p +σ 2 − − 2 − 2 (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) andthemaximumisattainedusing w = am...am am+1 ...am+1. 1 n p n p+1 n − − Corollary2. Ifn = mσthen σ D(σ,n) = m2+σ. 2 (cid:18) (cid:19) COUNTINGLYNDONFACTORS 3 Proof. Ifn = mσ,Theorem1gives n+1 m+1 D(σ,n) = σ +σ 2 − 2 (cid:18) (cid:19) (cid:18) (cid:19) σ = (m(mσ+1) (m+1)m)+σ 2 − σ = m2+σ asrequired. 2 (cid:18) (cid:19) (cid:3) Thefollowingtableshowsvaluesof D(σ,n)forlowvaluesofn. n D(2,n) D(5,n) D(10,n) 1 2 5 10 2 3 6 11 3 4 8 13 4 6 11 16 5 8 15 20 6 11 19 25 7 14 24 31 8 18 30 38 9 22 37 46 10 27 45 55 15 58 95 110 20 102 165 190 25 158 255 290 30 227 365 415 TABLE 1. ThemaximumnumberofdistinctLyndonfactorsthatcanappearinwordsoflengthn. 3. THE EXPECTED TOTAL NUMBER OF LYNDON FACTORS IN A WORD We now wish to calculate the total number M(σ,n) of Lyndon factors (that is, counted according to multiplicity) appearing in all words in Σn. Consider a Lyndon word L of length m n and a position ≤ i, 1 i n m+1, in wordsof length n. Wordscontaining L starting at position i have the form xLy ≤ ≤ − where xy is any word on Σ with length n m. Thustherewill be σn m wordsin Σn which contain L in − − thisposition.Thiswillbethesameforanyofthen m+1possiblevaluesofisointheσn wordsin Σn − therewillbe(n m+1)σn m appearancesofL.ThisisthesameforallLyndonwordsofthislength.The − − numberofsuchLyndonwordsis1/mofthenumberofprimitivewordsofthislength,sinceexactlyone conjugateofeach primitive wordisLyndon.Thenumberofprimitive wordsoflengthn ([10],equation (1.3.7))is m ∑µ σd d d|m (cid:16) (cid:17) whereµistheMöbiusfunction.TogetthetotalnumberofLyndonfactorsappearinginΣn,wesumover possiblevaluesofm: n n m+1 m (3) M(σ,n) = ∑ − σn m ∑µ σd. − m d m=1 d|m (cid:16) (cid:17) Dividing by σn givestheexpectedtotalnumber ET(σ,n) := M(σ,n)/σn ofLyndonfactorsin awordof lengthnonthealphabetΣ.Table2belowshowsvaluesforσ = 2,5andlowvaluesofn. 4 AMYGLEN,JAMIESIMPSON,ANDW.F.SMYTH n M(2,n) ET(2,n) M(5,n) ET(5,n) 1 2 1.00 5 1.00 2 9 2.25 60 2.40 3 30 3.75 515 4.12 4 87 5.43 3800 6.08 5 234 7.31 25749 8.24 6 597 9.32 165070 10.56 7 1470 11.48 1018135 13.03 8 3522 13.76 6103350 15.62 9 8264 16.14 35797125 18.33 10 19067 18.62 206363748 21.13 TABLE 2. Valuesofthetotalnumber M(σ,n)ofLyndonfactorsappearinginallwordsoflength nandtheexpectedtotalnumberET(σ,n)ofLyndonfactorsinawordoflengthnonanalphabet ofsizeσforσ =2,5andn =1,2,...,10. 4. THE EXPECTED NUMBER OF DISTINCT LYNDON FACTORS IN A WORD Weusethenotationfromabove,with[n]beingtheset 1,2,...,n .Mostofthefollowinganalysiscounts { } thenumberofwordsinΣn thatcontainatleastonefactorequaltoaspecificLyndonword L.Attheend we sum over all possible L. Let S be a non-empty set of positions in a word w and let P(L,S,w) = 1 if w contains factors equal to L at each positionin w beginningat apositionin theset S,and 0 otherwise. Notethatwmaycontainotherfactorsequalto L.Weclaimthat n 1ifwcontainsatleastonefactorequalto L, (4) ∑( 1)s+1 ∑ P(L,S,w) = s=1 − S [n],S=s (0otherwise. ⊆ | | If w contains no factor equal to L then P(L,S,w) equals 0 for all S so the “otherwise” part of the claim holds. Suppose w contains copies of L beginning at positions in T = i ,i ,...,i and nowhere else. 1 2 t { } Then P(L,S,w) equals 1 if and only if S is any non-empty subset of T, so the left hand side of (4) be- comes t ∑( 1)s+1 S T : S = s − |{ ⊆ | | }| s=1 t t = ∑( 1)s+1 − s s=1 (cid:18) (cid:19) t t = ∑( 1)s+1 +1. − s s=0 (cid:18) (cid:19) Thisequals1sincethefinalsumisthebinomialexpansionof(1 1)t.ThenumberofwordsinΣn which − containatleastonefactorequalto Listherefore n ∑ ∑( 1)s+1 ∑ P(L,S,w) − w Σns=1 S [n],S=s ∈ ⊆ | | n (5) = ∑( 1)s+1 ∑ ∑ P(L,S,w). − s=1 S [n],S=sw An ⊆ | | ∈ We now evaluate ∑ P(L,S,w).This is counting the words in Σn which have factors L beginning at w Σn positions i S. It cle∈arly equals 0 if s L > n since then there is no room in w for s factors L (recalling ∈ | | COUNTINGLYNDONFACTORS 5 that L is Lyndon,thereforeborderless,and therefore cannot intersecta copy of itself). We also needthe membersofStobeseparatedbyatleast L .ThenumberofsuchsetsSis | | n s L +s − | | . s (cid:18) (cid:19) OnceSischosenthereareσn s L waysofchoosingthelettersinwwhicharenotinthespecifiedfactors − | | L.Thus n s L +s ∑ P(L,S,w) = − | | σn s L . − | | s w Σn (cid:18) (cid:19) ∈ Substituting in (5) we see that the number of words in Σn which contain at least one occurrence of L is ⌊n∑/|L|⌋( 1)s+1 n−s|L|+s σn−s|L|. − s s=1 (cid:18) (cid:19) To get the expected number ED(σ,n) of distinct Lyndon factors in a word of length n, we sum this over all L withlengthat most n,usingthesame techniqueas in theprevioussection,and divideby σn. Replacing L withmwegetthefollowing: | | (6) ED(σ,n) = ∑n 1 ∑µ m σd⌊n∑/m⌋( 1)s+1 n−sm+s σ sm. − m d − s m=1 d|m (cid:16) (cid:17) s=1 (cid:18) (cid:19) ThefollowingtableshowsvaluesofED(σ,n)forlowvaluesofnandseveralvaluesofσ. n σ =2 5 10 20 1 1.00 1.00 1.00 1.00 2 1.75 2.20 2.35 2.42 3 2.50 3.56 3.94 4.14 4 3.25 5.02 5.69 6.05 5 4.06 6.55 7.57 8.12 6 4.91 8.16 9.54 10.31 7 5.81 9.82 11.59 12.61 8 6.77 11.54 13.70 14.99 9 7.77 13.31 15.88 17.45 10 8.83 15.13 18.11 19.97 15 14.77 24.93 29.90 33.36 20 21.67 35.76 42.58 47.70 25 29.35 47.43 56.02 62.73 30 37.70 59.82 70.11 78.33 TABLE 3. The expected number ED(σ,n) of distinct Lyndon factors in a word of length n for alphabetsofsizeσ =2,5,10,20. 5. DISTINCT LYNDON FACTORS IN A LYNDON WORD Minimising thenumber ofLyndonfactors overwordsof length n is notvery interesting:theminimum number of distinct Lyndon factors is 1 and the minimum total number is n. Both bounds are achieved by xn wherexisaletter.AmoreinterestingquestionhasbeenstudiedbySaari[13]:whatistheminimum numberofdistinctLyndonfactorsinaLyndonwordoflength n?Heprovedthatanoptimallowerboundfor thenumberofdistinctLyndonfactorsinaLyndonwordoflengthnis log (n)+1 ⌈ φ ⌉ 6 AMYGLEN,JAMIESIMPSON,ANDW.F.SMYTH where φ denotes the golden ratio (1+√5)/2. Moreover, this lower bound is attained by the so-called finiteFibonacciLyndonwords, whicharepreciselytheLyndonfactorsofthewell-knowninfiniteFibonacci word f —aspecialexampleofacharacteristic Sturmianword. Following the notation and terminology in [11, Ch. 2], an infinite word s over a,b is Sturmian if and { } only if there exists an irrational α (0,1), and a real number ρ, such that s is one of the following two ∈ infinitewords: s , s : N a,b α,ρ ′α,ρ −→ { } definedby aif (n+1)α+ρ nα+ρ = 0, s [n] = ⌊ ⌋−⌊ ⌋ α,ρ (botherwise; (n 0) ≥ aif (n+1)α+ρ nα+ρ = 0, s [n] = ⌈ ⌉−⌈ ⌉ ′α,ρ (botherwise. Theirrationalαiscalledtheslopeofsandρistheintercept.Ifρ = 0,wehave s = ac and s = bc α,0 α ′α,0 α where c is called the characteristic Sturmian word of slope α. Sturmian words of the same slope have α the same set of factors [11, Prop. 2.1.18], so when studying the factors of Sturmian words, it suffices to consideronlythecharacteristicones. The infiniteFibonacci word f is the characteristic Sturmian word ofslope α = (3 √5)/2. Itcan be con- − structedasthelimitofaninfinitesequenceofso-calledfiniteFibonacciwords f ,definedby: n n 1 { } ≥ f = b, f = a, f = f f for n 1. 1 0 n n 1 n 2 − − − ≥ That is, f = ab, f = aba, f = abaab, f = abaababa, f = abaababaabaab, etc. (where f is a prefix of 1 2 3 4 5 n f foreachn 1),andwehave n+1 ≥ f = lim f = abaababaabaab n n ∞ ··· → Note. The length of the n-th finite Fibonacci word f is the n-th Fibonacci number F , defined by: F = n n 1 1,F = 1,F = F +F forn 1. − 0 n n 1 n 2 − − ≥ Moregenerally,anycharacteristicSturmianwordcanbeconstructedasthelimitofaninfinitesequence of finite words. To this end, we recall that every irrational α (0,1) has a unique simple continued ∈ fractionexpansion: 1 α = [0;a ,a ,a ,...] = 1 2 3 1 a + 1 1 a + 2 a + 3 ··· whereeach a isapositiveinteger.Then-thconvergentofαisdefinedby i p n = [0;a ,a ,...,a ] foralln 1, 1 2 n q ≥ n wherethesequences p and q aregivenby n n 0 n n 0 { } ≥ { } ≥ p = 0, p = 1, p = a p +p , n 2 0 1 n n n 1 n 2 q = 1, q = a , q = a q − +q − , n ≥ 2 0 1 1 n n n 1 n 2 − − ≥ Supposeα = [0;1+d ,d ,d ,...],withd 0andallotherd > 0.Tothedirectivesequence(d ,d ,d ,...), 1 2 3 1 n 1 2 3 ≥ weassociateasequence s ofwordsdefinedby n n 1 { } ≥− s = b, s = a, s = sdn s for n 1. −1 0 n n−1 n−2 ≥ COUNTINGLYNDONFACTORS 7 Suchasequenceofwordsiscalledastandardsequence,andwehave s = q foralln 0. n n | | ≥ Notethatabisasuffixofs andbaisasuffixofs foralln 1. 2n 1 2n − ≥ Standardsequencesare relatedtocharacteristicSturmianwordsinthefollowingway.Observethat,for any n 0, sn is a prefix of sn+1, which gives obvious meaning to limn ∞sn as an infinite word.In fact, ≥ → onecanprove[8,3]thateachs isaprefixofc ,andwehave n α c = lims . α n n ∞ → The following lemma collects togethersomepropertiesofthestandard words s .Notethatfrom nowon n < whenreferringtoLyndonwordsoverthealphabet a,b weassumethenaturalordera b. { } Lemma3. Letc bethecharacteristic Sturmianwordofslopeα = [0;1+d ,d ,d ,...]withc = lim s where α 1 2 3 α n n ∞ thewordss aredefinedasabove. → n ◮ Foralln 1,s isaprimitive word[6]. n ≥ ◮ Foralln 1,thereexistuniquelydeterminedpalindromes u ,v , p suchthat n n n ≥ p ab ifnisodd, n s = u v = n n n (pnba ifniseven, where u = q 2and v = q q +2.[6] n n 1 n n n 1 | | − − | | − − ◮ Forall n 1, the reversal of s is the (q 2)-ndconjugate of s , andhence the conjugacy class of s is n n n n ≥ − closedunderreversal. [9,Prop.2.9(4)] ◮ TheLyndonfactorsofc oflengthatleast2arepreciselytheLyndonconjugatesofthe(primitive)standard α wordss foralln 1.[2,5] n ≥ Thefollowinglemmaisageneralisationof[13,Lemma8]. Lemma 4. Let cα be the characteristic Sturmian word of slope α = [0;1+d1,d2,d3,...] with cα = limn ∞sn → wheres = p xywithxy ab,ba . TheLyndonconjugateofs isthewordap bforalln 1.Moreover,every n n n n ∈ { } ≥ Lyndonfactorofc thatisshorterthan ap biseitheraprefixorasuffixofap b. α n n Proof. First we show that, for all n 1, the Lyndon conjugate of s is the word ap b. If s = p ba, n n n n ≥ then ap b is clearly a conjugate of s and it is Lyndon[2, 5]. On the other hand, if s = p ab, then bp a n n n n n is a clearly a conjugate of s , and since the conjugacy class of s is closed under reversal and p is a n n n palindrome(byLemma3),itfollowsthatap bisaconjugateofs anditisLyndon[2,5]. n n < Toprovethesecondclaim, itsufficestoshowthatifk n,thentheLyndonconjugateofs isaprefixor k suffix of ap b (since, by Lemma 3, theLyndonfactors of c of lengthat least 2 are precisely theLyndon n α conjugatesofthe(primitive)standardwordsinc ).Theclaimistruefork = 1andk = 0sinces = b α 1 and s0 = a. It is also true for k = 1 because s1 = ad1b is the Lyndon conjug−ate of itself, and is a−prefix of ap b if d 1 and a suffix of ap b if d = 0. Now suppose that k 2. Then k < n implies that s n 1 n 1 k ≥ ≥ is a prefix of p . Furthermore, since p is a palindrome, the reversal of s is a suffix of p . Therefore if n n k n s = p ba, then its Lyndon conjugate ap b is a prefix of ap b; otherwise, if s = p ab, then its Lyndon k k k n k k (cid:3) conjugateap bisasuffixofap b. k n TheLyndonfactorsof(characteristic)Sturmianwordsoflengthatleast2(i.e.,theLyndonconjugatesof standard words)over a,b are precisely the so-called Christoffel words beginning with the letter a (see, { } e.g., the nice survey [1]). Christoffel words take the form aPal(v)b and bPal(v)a where v a,b and ∗ ∈ { } Pal isiteratedpalindromic closure, definedby: Pal(ε) = ε and Pal(wx) = (Pal(w)x)+ foranyfinitewordwandletterx, 8 AMYGLEN,JAMIESIMPSON,ANDW.F.SMYTH where u+ denotestheshortestpalindromebeginningwith u (called thepalindromic closure ofu). Forex- ample, Pal(aba) = abaabawheretheunderlinedlettersindicatethepointsatwhichpalindromicclosure isapplied. < < Let p,qbeco-primeintegerswith0 p q.Therational p/qhastwodistinctsimplecontinuedfraction expansions: p/q = [0;1+d ,d ,...,d ,1] = [0;1+d ,d ,...,d +1] 1 2 n 1 2 n whered 0andallotherd 1.Theso-calledChristoffelwordofslope p/qbeginningwiththeletterais 1 i ≥ ≥ theuniqueSturmianLyndonwordover a,b oflengthqcontaining p occurrencesoftheletterb,given { } by: a ifnisodd, aPal(v)b with v = ad1bd2ad3 xdn where x = ··· (b ifniseven. For example, the Christoffel word of slope 2/5 = [0;2,1,1] = [0;2,2] beginning with the letter a is aPal(ab)b = aabab. Remark 5. Note,inparticular, thattheChristoffel wordofslope p/q = [0;1+d ,d ,...,d ,1] beginningwith 1 2 n the letter a is precisely the Lyndon conjugate of the standard word s = s s where s = b, s = a, and n+1 n n 1 1 0 s = sdi s for1 i n(see,e.g.,[1]). − − i i−1 i−2 ≤ ≤ Example 6. For all n 1, the Fibonacci Lyndon word of length F (i.e., the Lyndon conjugate of the finite n ≥ Fibonacciword f )istheChristoffelwordofslope F /F = [0;2,1,1,...,1]beginningwith a. n n 2 n − (n 1)1s − | {z } Saari [13, Thm. 1] proved that if w is a Lyndonword with w F for some n 1, then w contains at n | | ≥ ≥ leastn+2distinctLyndonfactors,withequalityifandonlyifwistheFibonacciLyndonwordoflength F . For example, aPal(ab)a = aabab is the Fibonacci Lyndon word of length F = 5 and contains the n 3 minimumnumber(3+2 = 5)ofdistinctLyndonfactorsoverallLyndonwordsofthesamelength.Saari alsomadethefollowingconjecture. Conjecture 7. [13] If w is a Lyndon word of length n, n = 6, containing the least number of distinct Lyndon 6 factorsoverallLyndonwordsofthesamelength,thenwisaChristoffel word. The number 6 is excluded because the following words all contain 7 distinct Lyndon factors, which is theminimumforlength6words,andonlythefirstandlastareChristoffel: aaaaab, aaabab, aabbab, ababbb, ababac, abacac, acbacc, abbbbb. However the conjecture is not true. The following Lyndon word has length 28 and contains 10 distinct Lyndon factors — the minimum number of distinct Lyndon factors in a Lyndonword of this length — butitisnotChristoffel(comparetheprefixoflength5tothesuffixoflength5): aabaababaabaabababaabaababab Theminimum of10 distinctLyndonfactorsfor aLyndonwordoflength28is also attainedby theLyn- don word aabaababaababaababaababaabab which is the Christoffel word aPal(abab4)b of slope 11/28 = [0;2,1,1,4,1]. We now generalise Saari’s result on the number ofdistinct Lyndonfactors in a Fibonacci Lyndonword by determiningthenumberofdistinctLyndonfactorsin agiven Christoffelword.Let (w) denotethe L numberofdistinctLyndonfactorsinawordw. Theorem 8. If w is a Sturmian Lyndon word on a,b with a < b, i.e., a Christoffel word (beginning with the letter a) of slope p/q = [0;1+d ,d ,...,d ,{1] fo}r some co-prime integers p, q with 0 < p < q, then 1 2 n (w) = d +d + +d +3. 1 2 n L ··· COUNTINGLYNDONFACTORS 9 Proof. The word w is the Lyndon conjugate of the standard word s = s s with s = b, s = a, n+1 n n 1 1 0 and s = sdi s for 1 i n (see Remark 5). By Lemma4, the Lyndonconju−gatesof s− for 1 i n areeitiherpi−re1fii−xe2sorsu≤ffixe≤sofw.Moreover,foreachi with1 i n,thestandardwordi s con≤tain≤s d i i ≤ ≤ distinct Lyndonfactors oflengths sm s for m = 1,2,...,d .ByLemma3, thesearetheonlyLyndon factors oftheLyndonword w besid| ei−s1itis−e2lf|and thetwolettersi a and b. Hence (w) = (d +d + + 1 2 d )+3. L ···(cid:3) n The above result is a generalisation of [13, Lemma 9], which reworded (with the indexing of Fibonacci words and numbers shifted back by 2) states that if w is the Fibonacci Lyndon word of length F for n 2 − somen 3,i.e.,theChristoffelwordofslope ≥ F /F = [0;2, 1,1,...,1] n 4 n 2 − − (n 3)1s − beginningwiththelettera,then (w) = (n 3)+3 = n|. {z } L − Examples: ◮ The Christoffel word of slope 2/5 = [0;2,1,1] beginning with the letter a, namely aPal(ab)b = aabab,istheFibonacciLyndonwordoflengthF = 5thatcontainstheminimumnumber(5 = 2+3) 3 ofdistinctLyndonfactorsforitslength. ◮ The Christoffel word of slope 1/6 = [0;5,1] beginning with the letter a, namely aPal(a4)b = aaaaab, is a Sturmian Lyndon word of length 6 containing the minimum number (7 = 4+3) of distinctLyndonfactorsforitslength. ◮ TheChristoffelwordofslope3/8 = [0;2,1,1,1]beginningwiththelettera,namelyaPal(aba)b = aabaabab, is theFibonacci Lyndonwordof length F = 8 containing the minimum number (6 = 4 3+3)ofdistinctLyndonfactorsforitslength. 6. OPEN PROBLEMS OpenProblem1: OnemightsuspectthatanygivenChristoffelwordcontainstheminimumnum- berofdistinctLyndonfactorsoverallLyndonwordsofthesamelength.However,thisisnottrue. For instance, the Christoffel word of slope 5/11 = [0;2,4,1] beginning with the letter a, namely aPal(ab4)b = aababababab,contains8(= 5+3)distinctLyndonfactors,buttheminimumnumber ofdistinctLyndonfactorsofaLyndonwordoflength11isactually7.Thisminimumisattained bytheChristoffelwordaPal(a2ba)b = aaabaaabaab ofslope3/11 = [0;3,1,1,1]. IsittruethattheminimumnumberofdistinctLyndonfactorsoverallLyndonwordsofthesame lengthisattainedbyatleastoneChristoffelwordofthatlength? OpenProblem2: Tables 2 and 3, showing values for ET(σ,n) and ED(σ,n), raise the question of whethertheremay exist asymptoticformulas for thesequantities,simpler than the exact values displayedinequations(2)and(6),respectively. REFERENCES [1] J.Berstel,Sturmianandepisturmianwords(asurveyofsomerecentresults),CAI2007,LectureNotesinComputerScience, vol.4728,2007,pp.23–47. [2] J.Berstel,A.deLuca,Sturmianwords,Lyndonwordsandtrees,Theoret.Comput.Sci.178(1997)171–203. [3] T.C.Brown,Descriptionsofthecharacteristicsequenceofanirrational,Canad.Math.Bull.36(1993)15–21. [4] K.T.Chen,R.H.Fox,R.C.Lyndon,Freedifferentialcalculus,IV.Thequotientgroupsofthelowercentralseries,Ann.Math. 68(1958)81–95. [5] A.deLuca,Sturmianwords:structure,combinatoricsandtheirarithmetics,Theoret.Comput.Sci.183(1997)45–82. [6] A.deLuca,F.Mignosi,SomecombinatorialpropertiesofSturmianwords,Theoret.Comput.Sci.136(1994)361–385. [7] J.P.Duval,Factorizingwordsoveranorderedalphabet,J.Algs.4(1983)363–381. 10 AMYGLEN,JAMIESIMPSON,ANDW.F.SMYTH [8] A.S. Fraenkel, M. Mushkin, U. Tassa, Determination of [nθ] by its sequence of differences,Canad. Math. Bull.21 (1978) 441–446. [9] A.Glen,OnSturmianandepisturmianwords,andrelatedtopics,PhDthesis,TheUniversityofAdelaide,Australia,April 2006,http://hdl.handle.net/2440/37765. [10] M.Lothaire,CombinatoricsonWords,EncyclopediaofMathematicsanditsApplications,vol.17,CambridgeUniversityPress,UK, 1997. [11] M.Lothaire,AlgebraicCombinatoricsonWords,EncyclopediaofMathematicsanditsApplications,vol.90,CambridgeUniversity Press,UK,2002. [12] R.C.Lyndon,OnBurnside’sproblem, Trans.Amer.Math.Soc.77(1954)202–215. [13] K.Saari,LyndonwordsandFibonaccinumbers,J.Combin.TheorySer.A121(2014)34–44. AMYGLEN SCHOOLOFENGINEERING&INFORMATIONTECHNOLOGY MURDOCHUNIVERSITY 90SOUTHSTREET MURDOCH,WA6150AUSTRALIA E-mailaddress:[email protected] JAMIESIMPSON DEPARTMENTOFMATHEMATICSANDSTATISTICS CURTINUNIVERSITY BENTLEY,WA6102AUSTRALIA E-mailaddress:[email protected] W.F.SMYTH DEPARTMENTOFCOMPUTINGANDSOFTWARE MCMASTERUNIVERSITY HAMILTON,ONTARIOL8S4K1CANADA E-mailaddress:[email protected]