Schur-Siegel-Smyth-Serre-Smith

If \(\alpha\) is an algebraic number, the normlized trace of \(\alpha\) is defined to be

\( \displaystyle{T(\alpha):=\frac{\mathrm{Tr}(\alpha)}{[\mathbf{Q}(\alpha):\mathbf{Q}].}}\)

If \(\alpha\) is an algebraic integer that is totally positive, then the normalized trace is at least one. This follows from the AM-GM inequality, since the normalized trace is at least the \(n\)th root of the norm, and the norm of a non-zero integer is at least one. But it turns out that one can do better, as long as one excludes the special case \(\alpha = 1\). One reason you might suspect this to be true is as follows. The AM-GM inequality is strict only when all the terms are equal. Hence the normalized trace will be close to one only when many of the conjugates of \(\alpha\) are themselves close together. But the conjugates of algebraic integers have a tendency to repel one another since the product of their differences (the discriminant is also a non-zero integer.) In an Annals paper from 1945, Siegel (bulding on a previous inequality of Schur) proved the following:

Theorem [Siegel] There are only finitely many algebraic integers with \(T(\alpha) < \lambda\) for \(\lambda = 1.7336105 \ldots\)

Siegel was also able to find that the only such integers with noramlized trace at most \(3/2\) are \(1\) and \((3 \pm \sqrt{5})/2 = \phi^{\pm 2}\) for the golden ratio \(\phi\) (We will also prove this below). On the other hand (generalizing these examples), one has

\(\displaystyle{T(\left((\zeta_p + \zeta^{-1}_p)^2\right) = 2 \left(1 – \frac{1}{p-1} \right),}\)

and hence the optimal value of \(\lambda\) is at most \(2\). Sometime later, Smyth had a very nice idea to extend the result of Siegel. (An early paper with these ideas can be found here.) Consider a collection of polynomials \(P_i(x)\) with integral coefficients, and suppose that

\(Q(x) = -\lambda + x \ – \sum a_i \log |P_i(x)| \ge 0\)

for all real positive \(x\) where \(Q(x)\) is well-defined, and where the coefficients \(a_i\) are also real and non-negative. Now take the sum of \(Q(x)\) as \(x\) ranges over all conjugates of \(\alpha\). The key point is that the sum of \(\log |P_i(\sigma \alpha)|\) is log of the absolute value of the norm of \(P_i(\alpha)\). Assuming that \(\alpha\) is not a root of this polynomial, it follows that the norm is at least one, and so the log of the norm is non-negative, and so the contribution to the sum (since \(-a_i\) is negative) is zero or negative. On the other hand, after we divide by the degree, the sum of \(\lambda\) is just \(\lambda\) and the sum of \(\sigma \alpha\) is the normalized trace. Hence one deduces that \(T(\alpha) \ge \lambda\) unless \(\alpha\) is actually a root of the polynomial \(P_i(x)\). So the strategy is to first find a bunch of polynomials with small normalized traces, and then to see if one can construct for a constant \(\lambda\) as close to \(2\) as possible some function \(Q(x)\) which is always positive.

One can make this very explicit. Suppose that

\(\displaystyle{Q(x) = -\lambda + x – \frac{43}{50} \cdot \log |x| – \frac{18}{25} \cdot \log |x-1| – \frac{7}{50} \cdot \log|x-2|,}\)

Calculus Exercise: Show that, with \(\lambda = 1.488753\ldots\), that \(Q(x) \ge 0\) for all \(x\) where it is defined. Deduce that the only totally real algebraic integer with \(T(\alpha) \le \lambda\) is \(\alpha = 1\). The graph is as follows:

a positive function

One can improve this by increasing \(\lambda\) and modifying the coefficients slightly, but note that we can’t possibly modify this with the given polynomials to get \(\lambda> 3/2\), because \(T(\phi^2) = 3/2\). Somewhat surprisingly, we can massage the coefficients reprove the theorem of Siegel and push this bound to \(3/2\). Namely, take

\(\displaystyle{Q(x) = -\frac{3}{2} + x – a \log |x| – (2a-1) \log |x-1| – (1-a) \log|x-2|,}\)

and note that the derivative satisfies

\(Q'(x)x(x-1)(x-2) = (x^2-3x+1)(x-2a),\)

Hence the minimum occurs at either \(x=2a\) or at the conjugates of \(\phi^2\) where \(\phi\) is the golden ratio. Since \(\phi^2-1 = \phi\) and \(\phi^2-2 = \phi^{-1}\), one finds that

\(Q(\phi^2) = -\frac{3}{2} + \phi^2 + (2-5 a) \log \phi,\)

and so chosing \(a\) so that this vanishes when, we get

\(\displaystyle{a = \frac{2}{5} + \frac{1}{2 \sqrt{5} \log \phi} = 0.864674\ldots} \)

and then we find that \(Q(x) \ge 0\) for all \(x\) where it is defined with equality at \(\phi^2\) and \(\phi^{-2}\). So this reproves Siegel’s theorem by elementary calculus. Of course we can strictly improve upon this result by including the polynomial \(x^2 -3x + 1\), for example, replacing \(Q(x)\) by
.
\(\displaystyle{P(x) = Q(x) – \frac{1}{15} \cdot \log |x^2 – 3x + 1| + \left(\frac{3}{2} – \lambda\right)}\)

where \(\lambda = 1.5444\ldots \) is now strictly greater than \(3/2\). By choosing enough polynomials and optimizing the coefficients by hook or crook, Smyth beat Siegel’s value of \(\lambda\) (even with an explicit list of exceptions), although he did not push \(\lambda\) all the way to \(2\). This left open the following problem: is \(2\) the first limit point? That is, does Siegel’s theorem hold for any \(\lambda < 2\)? This was already asked by Siegel and it became known as the Schur-Siegel-Smyth problem. Some point later, Serre made a very interesting observation about Smyth's argument. (Serre's original remarks were in some letter which was hard to track down, but a more recent exposition of these ideas is contained in this Bourbaki seminar.) He more or less proved that Smyth’s ideal could never prove that \(2\) was the first limit point. Serre basically observed that there existed a measure \(\mu\) on the positive real line (compactly supported) such that

\(\int \log |P(x)| d \mu \ge 0\)

for every polynomial \(P(x)\) with integer coefficients, and yet with

\(\int x d \mu = \lambda < 2\)

for some \(\lambda \sim 1.89\ldots \). Since Smyth’s method only used the positivity of these integrals as an ingredient, this means the optimal inequality one could obtain by these methods is bounded above by Serre’s \(\lambda\). On the other hand, Serre’s result certainly doesn’t imply that the first limit point of normalized traces of totally positive algebraic integers is less than \(2\). A polynomial with roots chosen uniformly from \(\mu\) will have normalized trace close to \(\lambda\), but it is not at all clear that one can deform the polynomial to have integral coefficients and still have roots that are all positive and real.

I for one felt that Serre’s construction pointed to a limitation of Smyth’s method. Take the example of \(Q(x)\) we considered above. We were able to prove the result for \(\lambda = 3/2\) by virtue of the fact that \(Q(x)=0\) at these points. But that required the fact that the three quantities:

\(\phi^2, \phi^2 -1 = \phi, \phi^2- 2 = \phi^{-1}\)

were all units and so of norm one. The more and more polynomials one inputs into Smyth’s method, the inequalities are optimal only when \(P_i(\alpha)\) is a unit for all the polynomials \(P_i\). But maybe there are arithmetic reasons why non-Chebychev polynomials (suitably shifted and normalized) must be far from being a unit when evaluated at \((\zeta + \zeta^{-1})^2\) for a root of unity \(\zeta\).

However, it turns out my intuition was completely wrong! Alex Smith has just proved that, for a measure \(\mu\) on (say) a compact subset of \(\mathbf{R}\) with countably many components and capacitance greater than one, that if Serre’s (necessary) inequality

\( \int \mathrm{log}|Q(x)| d \mu \ge 0\)

holds for every integer polynomial \(Q(x)\), then you can indeed find a sequence of polynomials with integer coefficients whose associated atomic measure is weakly converging to \(\mu\). In particular, this shows that Serre’s example actually proves the maximal \(\lambda\) in the Schur-Siegel-Smyth problem is strictly less than \(2\), and indeed is probably equal to something around \(1.81\) or so. Remarkable! I generally feel that my number theory intuition is pretty good, so I am always really excited when I am proved wrong, and this result is no exception.

Exercise for the reader: One minor consequence of Smith’s argument is that for any constant \(\varepsilon > 0\), there exist non-Chebyshev polynomials \(P(x) \in \mathbf{Z}[x]\) such that, for primes \(p\) say and primitive roots of unity \(\zeta\), one has

\( \displaystyle{\log \left| N_{\mathbf{Q}(\zeta)/\mathbf{Q}} P(\zeta + \zeta^{-1}) \right|} < \varepsilon [\mathbf{Q}(\zeta_p):\mathbf{Q}]\)

for all sufficiently large primes \(p\). Here by non-Chebyschev I mean to rule out “trivial” examples that one should think of as coming from circular units, for example with \(P(\zeta + \zeta^{-1}) = \zeta^k + \zeta^{-k}\) for some fixed \(k\). Is there any other immediate construction of such polynomials? For that matter, what are the best known bounds for the (normalized) norm of an element in \(\mathbf{Z}(\zeta)\) which is not equal to \(1\), and ruling out bounds of elements in the group generated by units and Galois conjugates of \(1-\zeta\)? I guess one expects the class number \(h^{+}\) of the totally real subfield field to be quite small, perhaps even \(1\) infinitely often. Then, assuming GRH, there should exist primes which split completely of order some bounded power of \(\log |\Delta_K|\), which gives an element of very small norm (bounded by some power of \([\mathbf{Q}(\zeta):\mathbf{Q}]\)). However, this both uses many conjectures and doesn’t come from a fixed polynomial. In the opposite direction, the most trivial example is to take the element \(2\) which has normalized norm \(2\), but I wonder if there is an easy improvement on that bound. There is an entire circle of questions here that seems interesting but may well have easy answers.

Posted in Mathematics | Tagged , , , , , | 5 Comments

Polymath Proposal: 4-folds of Mumford’s type

Let \(A/K\) be an abelian variety of dimension \(g\) over a number field. If \(g \not\equiv 0 \bmod 4\) and \(\mathrm{End}(A/\mathbf{C}) = \mathbf{Z}\), then Serre proved that the Galois representations associated to \(A\) have open image in \(\mathrm{GSp}_{2g}(\mathbf{Z}_p)\). The result is not true, however, when \(g=4\), as first noted by Mumford (in this paper).

The goal of this polymath project is to find an “explicit” example of such a Mumford \(4\)-fold over \(\mathbf{Q}\). There are a number of things I have in mind for what “explicit” might mean (this is, after all, supposed to be a polymath project so I’m not supposed to know how to do everything). But here is one way: associated to \(A\) is a compatible family of Galois representations

\(\rho_p: G_{\mathbf{Q}} \rightarrow \mathrm{GSp}_8(\mathbf{Z}_p)\)

such that, for some integer \(N\), the Galois representations \(\rho_p\) are unramified outside \(Np\), and for all other primes \(q\) the characteristic polynomial of \(\rho_p(\mathrm{Frob}_q)\) is equal to

\(Q_q(T) \in \mathbf{Z}[T]\)

for some polynomial which does not depend on \(p\). Then for example one could hope to give a list of the polynomials \(Q_q(T)\) for a collection of primes \(q\).

Here is the strategy to find such Galois representations.

We start by choosing a totally real cubic field, which for reasons to possibly be explained later should perhaps be \(F = \mathbf{Q}(\zeta_7)^{+}\). (One reason: it is the Galois cubic field of smallest possible discriminant.)

Step I: Find a Hilbert modular form over \(F\) of weight \((1,1,2)\) with coefficients in \(F\).

The idea here will be to follow the strategy employed by Moy-Specter (following Schaeffer) to compute a Hilbert modular form of weight \((1,3)\) over the field \(\mathbf{Q}(\sqrt{5})\). Namely, Let \(W\) denote the space of Hilbert modular forms of weight \((2,2,3)\) of some fixed level. Now divide by some suitable Eisenstein series of weight \((1,1,1)\) to get a space \(V\) of meromorphic forms of weight \((1,1,2)\). This will contain the (possibly zero) space \(U\) of holomorphic forms of weight \((1,1,2)\). The holomorphic forms will be preserved under the action of Hecke operators whereas \(V\) in general will not be. Hence one can start computing the intersection of \(V\) with its Hecke translates, which will also contain \(U\). Either you eventually get zero, or you (most likely) end up with an eigenform which you can hope to prove is holomorphic by proving its square is holomorphic.

Some Issues: The way that Moy-Specter compute the (analogue) of \(W\) is to use Dembélé’s programs to compute the Hecke eigensystems of that weight, and then use the fact that \(q\)-expansions are determined by the Hecke eigenvalues for Hilbert modular forms (suitably interpreted, one has to compute spaces of old forms of lower level etc.). The same idea should certainly work, but note that we are working here in non-paritious weight (that is, not all weights are congruent modulo \(2\)). My memory is that the current programs on the contrary assume that the weight is paritious. This would have to be fixed! Perhaps this is an opportunity for someone to code up Dembélé’s algorithms in sage?

Step II: Suppose one finds such a form \(\pi\). Note that I am also insisting that the coefficient field be as small as possible, namely the field \(F\) itself. Even though \(\pi\) is of non-paritious weight, there are still associated Galois representations (Some relevant references are this paper of Patrikis and also this paper of Dembélé, Loeffler, and Pacetti). More precisely, there are nice projective Galois representations, and these lift to actual representations, but they will not be Hodge-Tate; rather, up to twist (making the determinant have finite order, for example), they will have Hodge–Tate weights \([0,0]\), \([0,0]\), and \([-1/2,1/2]\). But now consider the tensor induction (twisted by a half!) of this representation from \(G_F\) to \(G_{\mathbf{Q}}\), that is, for \(\sigma \in \mathrm{Gal}(F/\mathbf{Q})\), the representations

\(\varrho:=\rho(\pi) \otimes \rho^{\sigma}(\pi) \otimes \rho^{\sigma^2}(\pi)(1/2)\)

Now these representations will be crystalline with Hodge-Tate weights \([0,0,0,0,1,1,1,1]\). Moreover, they will be symplectic, have cyclotomic similitude character, and (this is where the assumption on the coefficients of \(\pi\) comes in) will also have Frobenius traces in \(\mathbf{Q}\). OK, I literally have not checked any of those statements at all, but it kind of feels like it has to be true so that’s what I’m going with. The point of insisting that the coefficients of \(\pi\) was just \(F\) is to make the coefficients of this new system in \(\mathbf{Q}\). But this means (at least conjecturally) that these Galois representations have to come exactly from an abelian variety of Mumford’s type, because the Galois representations tell you that the Mumford-Tate group has Lie algebra \((\mathfrak{sl}_2)^3\).

Step III: Find this family in a different way. One issue with the construction above is that the Galois representations are not obviously motivic (or even satisfy purity!), so they certainly don’t provably come from an abelian variety. But it might be easier to find the actual variety once one knows its exact level. I’m not quite sure what I mean by “find” here — it’s an open question as to whether these Mumford 4-folds are Jacobians so I’m not entirely sure what one should be looking for.

Step IV: Bonus: prove that these \(4\)-folds have \(L\)-functions with meromorphic continuations (at least for \(H^1\) but it’s worth checking the other degrees as well) using triple product \(L\)-functions.

Some Further Remarks: There are a number of relevant papers by Rutger Noot that one should be aware of (An particularly relevant example: this one). There are restrictions on the possible level structures that can arise for Hilbert modular forms of this weight (in particular, they can’t be Steinberg at some place), so make sure not to waste time computing at those levels. This is related to the fact that the corresponding Shimura variety is compact. The actual associated Shimura variety is isomorphic to \(\mathbf{P}^1\) over the complex numbers; there’s some discussion in section 5.4 of Elkies’ paper. These Shimura curves naturally have models over the reflex field, which is \(F\) in this case, but actually they can sometimes be defined over even smaller fields, such as \(\mathbf{Q}\). Now I confess I am confused by a number of points, in increasing order:

  1. What is the exact relationship between the model of this Shimura curve over \(\mathbf{Q}\) and the moduli problem? This is an issue both with understanding the moduli problem but also (because of the stackiness issues) differences between fields of moduli and fields of definition.
  2. Does this Shimura curve have points over \(\mathbf{R}\)? I think so. If I understand Shimura’s paper here, I think the answer is yes.
  3. Does this Shimura curve have points over \(\mathbf{Q}\)? I think so! Assuming it has points over \(\mathbf{R}\) you only need to check all other finite primes, and the one that is most worrying is \(p=7\) but you don’t really even need to check that one either if the others all work.
  4. Assuming it is \(\mathbf{P}^1_{\mathbf{Q}}\), does that help at all? At the very least it provides succor that lots of \(A\) should exist over \(\mathbf{Q}\), but it’s not so clear how to go from a point to an equation. (Consider the easier case of Shimura curves corresponding to fake elliptic curves, for example.) Given a complex point, can one at least reconstruct some complex invariants of \(A\) such as its periods? Probably understanding this Shimura curve and its relationship with the moduli problem (over different fields) as concretely as possible would be a “second track” in this problem. (Presumably an advantage of a polymath project is that you can attack it from several angles at once.)

Thoughts welcome!

Posted in Mathematics | Tagged , , , , , , , , , , , , | 2 Comments

59,281

The target audience of this blog (especially the mathematics) is usually professional mathematicians in the Langlands program. I do sometimes, however, have posts suitable for a broader mathematical audience. Very rarely though do I have anything (possibly) interesting to say to a popular audience. In my recent talk in the Number Theory Web Seminar, I gave a talk about some math that I’ve discussed with Soundararajan (and which will possibly be written up at some day) about the “average” digit of \(1/p\) in its decimal expansion, in particular, discussing the distribution of primes for which the average digit of \(1/p\) is less than, equal to, or greater than \(4.5\) respectively. An easy argument using Cebotarev shows that the density of primes for which the average is exactly \(4.5\) is \(2/3\). More subtle, however, is that there are more primes for which the average is less than \(4.5\) than greater than \(4.5\), but still the (upper and lower) density of primes for which the average is greater than \(4.5\) is still positive, assuming GRH (the actual percentages of primes with digit average less than, equal to, and greater than \(4.5\) are approximately 28%, 67%, and 5% respectively).

I think the talk went well, and one reason I suspect is that it was self-contained. Moreover, quite a lot of the setup was completely elementary, although certainly it did move towards deeper topics (Kummer’s Conjecture and work of Patterson and Heath-Brown on equidistribution of Gauss sums, and work of Granville–Soundararajan on the distribution of L-values), it was a result that could more or less be appreciated by an undergraduate.

I decided that this was the time — if ever — that I should make a video post. I decided to make a “numberphile” style video — complete with brown paper and a title consisting of a single number — by taking my talk and significantly scaling back the mathematical content. My first attempt was, to put it mildly, a bit of a disaster. First of all, the aspects of making a video that I know nothing about (lighting, audio, glare, video, editing) were unsurprisingly a complete mess and a distraction from the actual mathematics. Second, my resident expert felt that it was still a bit too long, a bit too much like a recording of some lecture, and lacking a hook. So I cut down the script and made a second even more elementary version. This version (unfortunately) no longer has me writing on physical brown paper, but it might at least reach a bare minimum audio/video quality.

Just in case you want to skip the video and skip straight to the challenge problem, here it is:

Guess/Conjecture: Let \(p \ne 2,5\) be prime, and let \(C(p)\) denote the average of the digits of \(1/p\) in its decimal expansion. (Since the digits repeat this makes sense.) Then the maximum of \(C(p)\) for all primes is achieved by \(p = 59281\), with:

\(\displaystyle{C(59281) = \frac{486}{95} = 5.11 \ldots} \)

\(\displaystyle{
\begin{aligned}
\frac{1}{59281} = & \ 0.\overline{000016868811254870869249843963495892444459438943337662994} \\
& \ \overline{88875018977412661729727906074458932879} \end{aligned}}\)

A rough heuristic why this should be true: if the period of \(p\) is sufficiently large, then, if the digits are sufficiently random, the probability that the average deviates that much from \(4.5\) becomes exponentially small. Since there are not that many primes with small period, this leads to the heuristic that all but finitely many primes should have \(C(p)\) very close to \(4.5\). Moreover, it suggests finding them as factors of \(10^n – 1\) for small values of \(n\). (\(59281\) is a factor of \(10^{95} – 1\).) Making the above idea more precise suggests that it is highly unlikely to find a counterexample with period more than \(400\) or so. Now pari/gp can’t factor most numbers of this form even for small \(n\), but there is a second competing heuristic. If \(p\) is too large and still has small period, then because \(1/p\) starts out with a bunch of zeros, this suppresses the digit average. So any big prime factors of \(10^{n} – 1\) that pari/gp doesn’t find probably won’t be counterexamples anyway. Note this secondary effect also explains why \(C(p)\) can be significantly less than \(4.5\) — if \(p = (10^q – 1)/9\) is prime, for example, then \(C(p) = 9/q\). Since one expects infinitely many primes of this form (\(q = 2, 19, 23, 317, 1031, \ldots\)) one expects that \(C(p)\) can be arbitrarily small.

That said, I certainly have not done any significant computation on this question — possibly pari/gp is not finding \(10\) digit factors of \(10^n – 1\) for odd \(n < 400\) --- it was just an idle question I added to the end of my talk for fun. Hence:

  1. I offer a beer to the person who finds the first counterexample.
  2. I offer a bottle of fine Australian wine to the first person who proves the result. Proofs assuming GRH, for example, are certainly acceptable.

Probably the first thing to try (in order to look for a counter-example) would be to test all primes \(p < 10^{10}\) (say) which are factors of \(10^n - 1\) for some odd \(n < 1000\) or so.

Edit 08/25/21: Update from the youtube link: Matthew Bolan has carried out the computation I suggested above, using in addition information about the factorization of \(10^n – 1\) for small \(n\) given at the Cunningham Project (Jonathan Webster told me about this link). The current records for the primes \(p\) with the six highest values of \(A(1/p)\) are given in the following table. (I had already found the four smallest of these primes in my initial search.) After this computation, it looks like my beer is pretty safe!

Prime \(A(1/p)\) Period
\(59281\) \(5.115789474\) \(95\)
\(307627\) \(4.898734177\) \(79\)
\(9269802917\) \(4.866028708\) \(209\)
\(53 \) \(4.846153846\) \(13\)
\(173\) \(4.813953488\) \(43\)
\(561470969\) \(4.803108808\) \(193\)
Posted in Mathematics | Tagged , , , , , , , , | 14 Comments

Divisors near sqrt(n)

Analytic Number Theory Alert! An even more idle question than normal (that’s because it comes from twitter). Alex Kontorovich noted with pleasure the following pictorial representation of the integers from a Veritasium youtube video, where prime numbers are represented by \(1 \times n\) rectangles and all other numbers represented as \(a \times b\) rectangles (of area \(n\)) for some \(a > 1\).

This leads to the natural followup questions. How much horizontal space does it take to graph the first \(X\) integers this way if one either:

  1. Plots the integers \(n\) as \(a \times b\) with \(a \le b\) as big as possible?
  2. Plot the integers \(n\) as \(a \times b\) with \(a = 1\) if \(n\) is prime, and otherwise with \(a\) as small as possible, that is, the smallest divisor of \(n\) greater than \(1\)?

(From the graph, it actually appears that the second algorithm is actually used.)

In both cases, there is a trivial upper bound \( \ll X^{3/2}\). On the other hand, simply by considering products of primes in the interval \([X^{1/2}/C,X^{1/2}]\) for some constant \(C > 1\) you get at least a constant times [corrected] \((X^{1/2}/\log X)^2\) integers less than \(X\) with \(a \gg X^{1/2}\), and hence a lower bound (in both cases) of \(\gg X^{3/2}/(\log X)^2\). But neither of these bounds are presumably best possible. What then are the precise asymptotics? This seems like the type of question Kevin Ford might be able to answer. Actually, this might be a question that Kevin Ford already knows how to answer. I summon his spirit from the whispers of the internet to come and answer this for me. But if that doesn’t work, anyone else should feel free to give an answer or make a guess.

Update: Friend of the blog Boaty McBoatface emails me to say:

I think the second one is quite easy.

I think you just want to compute

\(\displaystyle{\sum_{p < X^{1/2}} p F(X/p, p)}\)

where \(F(y,p)\) is the number of integers \(\le y\) with all prime factors \(\ge p\). (This has a name, the Buchstab function).

Here the \(X/p\) should be \(\lfloor X/p \rfloor\) but this is of little consequence.

Using the trivial bound \(F(y,p) \le y\) shows that essentially all the contribution is from \(p > X^{1/2 – \varepsilon}\), and in this range a number \( \le X/p\) has all its prime factors \( \ge p\) if and only if it is in fact a prime \( \ge p\) . So in fact you want to compute

\(\displaystyle{\sum_{p \le X^{1/2}} p \pi(X/p)}.\)

There are various ways to do this more or less carefully, but by splitting into ranges \(cX^{1/2} < p < (c + 1/N) X^{1/2}\), summing over \(c = 0,1,2,\ldots, N-1\) and then letting \(N \rightarrow \infty\) I think one gets

\(\displaystyle{ \frac{4X^{3/2}}{(\log X)^{2}} \int^1_0 (1 – c^2) dc \sim \frac{8}{3} \cdot \frac{X^{3/2}}{(\log X)^{2}}}.\)

The first question is definitely much harder and, as you guess, feels pretty close to the kind of stuff Ford and Tenenbaum do in their work.

Posted in Mathematics | Tagged , , , | 5 Comments

Potential Automorphy for GL(n)

Fresh on the arXiv, a nice new paper by Lie Qian proving potential automorphy results for ordinary Galois representations

\(\rho: G_F \rightarrow \mathrm{GL}_n(\mathbf{Q}_p)\)

of regular weight \([0,1,\ldots,n-1]\) for arbitrary CM fields \(F\). The key step in light of the 10-author paper is to construct suitable auxiliary compatible families of Galois representations for which:

  1. The mod-\(p\) representation coincides with the one coming from \(\rho\),
  2. The compatible family can itself be shown to be potentially automorphic.

The main result then follows by an application of the p-q switch. Something similar was done by Harris–Shepherd-Barron–Taylor in the self-dual case. They ultimately found the motives inside the Dwork family. Perhaps surprisingly, Qian also finds his motives in the same Dwork family, except now taken from a part of the cohomology which is not self-dual!

This result doesn’t *quite* have immediate implications for the potential modularity of compatible families: If you take a (generically irreducible) compatible family with Hodge-Tate weights \([0,1,\ldots,n-1]\) then one certainly expects (with some assumption on the monodromy group) that the representations are generically ordinary, but this is a notorious open problem even in the analogous case of modular forms of high weight. One way to try to avoid this would be by proving analogous results for non-ordinary representations. But then you run into genuine difficulties trying to find such arbitrary residual representations inside the Dwork family over extensions unramified at \(p\). This difficulty also arises in the self-dual situation, and the ultimate fix in BLGGT was to bypass such questions by applying Khare-Wintenberger lifting style results. However, such lifting results can’t immediately be adapted to the \(l_0 > 0\) situation under discussion here.

On the other hand, I guess one should be OK for very small \(n\): If \(M\) is (say) a rank three motive over \(\mathbf{Q}\) with HT weights \([0,1,2]\), determinant \(\varepsilon^3\), and coefficients in some CM quadratic field \(E\) (you have to allow coefficients since otherwise the motive is automatically self-dual, see here), then one is probably in good shape. For example, the characteristic polynomials of Frobenius are Weil numbers \(\alpha,\beta,\gamma\) of absolute value \(p\) and will have (as noted in the blog post linked to in the previous sentence) the shape

\(X^3 – a_p X^2 + \overline{a_p} p X + p^3,\)

and now for primes \(p\) which split in \(E\), the corresponding v-adic representation will be ordinary for at least one of the \(v|p\) unless \(a_p\) is divisible by \(p\), which by purity forces \(a_p \in \{-3p,-2p,-p,0,p,2p,3p\}\). From the usual arguments, one sees that there is at least one ordinary \(v\) for almost all split primes \(p\). The rest of the Taylor-Wiles hypotheses should also be generically satisfied assuming the monodromy of \(M\) is \(\mathrm{GL}(3)\), potential modularity in any other case surely being more or less easy to handle directly. Hence Qian thus proves such motives are potentially automorphic. A funny thing about this game is that actually finding examples of non-self dual motives is very difficult, but in this case, van Geemen and Top studied a family of such motives \(S_t\) occurring inside \(H^2\) of the surface

\(z^2 = xy(x^2 – 1)(y^2 – 1)(x^2-y^2 + t x y)\)

for varying \(t\) (they note that this family was first considered by Ash and Grayson. Also apologies for changing the notation slightly from the paper, but I prefer to denote the parameter of the base by \(t\)). They then compare their particular motive when \(t=2\) to an explicit non-self dual form for \(\mathrm{GL}(3)/\mathbf{Q}\) of level \(128\). I’m sure by this time (after HLTT and Scholze) someone has verified using the Faltings–Serre method that \(S_2\) is automorphic, but now by Qian’s result we know that the \(S_t\) are potentially automorphic for all \(t\).

Posted in Mathematics | Tagged , , , , , , , , , , , , , , , , , , | 1 Comment

Don’t cite my paper!

The process of publishing a paper is an extremely long one, and it is not atypical to take several years from the first submission to the paper finally being accepted. The one part of the process that happens extremely quickly, however, is the moment when the journal sends you the galley proofs of the paper and then gives you 48 hours to make any final minor corrections. Despite the journal having taken up to several years to referee the
paper, these messages often come with breathless warnings that failure to respond within the time window puts your paper in danger of not being published at all. I remember in 2019 being given the (relatively generous) span of two weeks to look over the (96 page) galley proofs of my Duke paper with David Geraghty. Except that two week period happened to coincide with the holidays (the requested return date was December 24), and overlapped with a period where I was particularly busy. Moreover, I was also about to go on a trip to Australia (which I ultimately had to cancel because of the bushfires). I told them that I should be able to get around to going through the paper by February. The journal responded by telling me, and I quote, that “While we appreciate the fact that this is a long article and that it will take some time to review, a delay of two months to handle this seems excessive to us.” In response, I gently mentioned that the journal had taken 864 days to accept my paper followed by a further 147 days before they produced the galley proofs and that this seemed a little excessive to me, but that I would do what I could within my own time constraints. They followed up with a note that they looked forward to receiving my answers in February.

But at least the copy-editing done by Duke on this occasion made a few genuine improvements and did not detract from the paper. What is surprising is when a for-profit journal makes the paper categorically worse by adding errors and not even telling the author about it. And surprise surprise, it always seems to be the for-profit journals that do this. (My very best copy-editing experiences have been with MSP and with the AMS journals.) If you are anything like me, if you try to read over your own paper for typos, your brain is very good at seeing what it expects rather than what is actually there (the standard example is not picking up on “the the” in the middle of a sentence). In particular, the actual probability of finding any error in a 100+ page paper with a 48 hour window by looking at the galley proofs is vanishingly small. The two worst experiences I have had in this regard are at Springer journals. A bare minimum requirement for a galley proof where changes have been made to the original paper is that the journal should tell you what changes they actually made. What they should really do is send a diff file comparing your original .tex to their new version. What they should not do is make subtle changes that are impossible to pick up on a quick reading that change the mathematical meaning of the paper without telling the author. To take an example, I just found out that in my Inventiones paper with David Geraghty, every one of the 47 occurrences of “[[” and “]]” was replaced by single brackets “[” and “]”. The rings \(\mathbf{Z}_p[[X]]\) and \(\mathbf{Z}_p[X]\) are very very different — one is a complete local ring, the other is not. So now in the published paper we patch modules over the \(S_{\infty}\) which is now a polynomial ring; modular forms over a field \(K\) have \(q\)-expansions in \(K[q]\) and hence are polynomials, and so on. I think that every one of those 47 occurrences introduced an error of this magnitude. But at the same time, picking this up is close to impossible when looking at the paper because the mind naturally “corrects” to what it should be, especially if you know how the Taylor-Wiles method works already (which, if you are one of the authors of this paper, is certainly the case). What’s particularly annoying is the stupidity of this process — unilaterally making the change, not telling the author, and then giving them (in this case) 48 hours to look at the paper.

So what is to be done? The good news is that I am in the luxurious position that citations (or lack thereof) make no difference to me. (Hat tip to both the University of Chicago and the NSF for not being obsessed by such metrics.) So clearly the solution is that anyone who wants to cite my paper should cite the latest version on the ArXiV rather than the published version. If a journal complains that they want to cite the published version, simply point out that the published version is riddled with misprints and thus should not be cited. You have my blessing to do this!

Update Aug 2: I did decide to email the journal and ask if they could republish the online version. The mathematicians involved were perhaps not surprisingly very apologetic and upset as well, and have put pressure on Springer to fix the problem. We will see what actually happens.

Posted in Rant | Tagged , , | 18 Comments

The Arbeitsgemeinschaft has returned!

An update on this post; the Arbeitsgemeinschaft on derived Galois deformation rings and the cohomology of arithmetic groups will now be taking place the week of April 5th. Here is some practical information if you are curious.

Is there somewhere I can watch the lectures even though I am not a participant? No, the workshop is invitation only.

Is there somewhere I can watch the lectures as a virtual participant? I assume so, but I don’t know the exact details. I predict you will find out at the same time I do.

Is anyone attending in person? I believe so, but I’m not sure how many, I’m guessing they will be mostly coming from Germany. I think that Gerd Faltings and a few graduate students from Bonn will be there in person, for example.

What is the schedule of lectures? Am I going to have to wake up at 3:00AM to watch them from the USA? Ah, on this point I do have some useful information. The schedule of lectures is as follows, all times are local German afternoon time. Please bear in mind that German Daylight Savings time begins this weekend, so a talk at 3:00PM Oberwolfach time will be at 8:00AM in Chicago, 9:00AM on the East Coast, and 6:00AM on the West Coast.

Monday:

3-3:45 A1
4:15-5 A2
5:15-6 A3
8-8:45 A4

Tuesday:

3-3:45 B1
4:30-5:15 B2
8-8:45 C1

Wednesday:

4:15-5 C2
5:15-6 B3
8-8:45 B4

Thursday:

3-3:45 C3
4:15-5 C4
5:15-6 D1
8-8:45 D2

Friday:

3-3:45 D3
4:15-5 D4

Posted in Mathematics | Tagged , , | 2 Comments

Test Your Intuition: p-adic local Langlands edition

Taking a page from Gil Kalai, here is a question to test your intuition about 2-dimensional crystalline deformation rings.

Fix a representation:

\(\rho: G_{\mathbf{Q}_p} \rightarrow \mathrm{GL}_2(\overline{\mathbf{F}}_p)\)

after twisting, let me assume that this representation has a crystalline lift of weight \([0,k]\) for some \(1 \le k \le p\). Let \(R\) denote the universal framed local deformation ring with fixed determinant. Now consider positive integers \(n \equiv k \bmod p-1\), and let \(R_n\) denote the Kisin crystalline deformation ring also with fixed determinant. Global considerations suggest that for \(n \equiv m \equiv k \bmod p-1\) and \(n \ge m\), there should be a surjection \(R_n/p \rightarrow R_m/p\), and quite possibly one even knows this to be true. Global considerations also suggest that any representation can be seen in high enough weight, which leads to the following problem:

Question: How large does \(n\) have to be to see the entire tangent space of the unrestricted local deformation ring \(R\)? That is, how large does \(n\) have to be for the map

\(R/(p,\mathfrak{m}^2) \rightarrow R_n/(p,\mathfrak{m}^2)\)

to be an isomorphism? Naturally, one can also ask the same question with \(\mathfrak{m}^2\) replaced by \(\mathfrak{m}^k\) for any \(k \ge 2\).

The first question came up in a discussion with my student Chengyang. I made a guess, and then we proceeded (during our meeting) to do a test computation on magma, where my prediction utterly failed, but in retrospect my computation itself may have been dodgy so now I’m doubly confused.

Matt remarked that this question is not entirely unrelated in spirit to the Breuil-Mezard conjecture. Instead of counting multiplicities of geometric cycles, one is measuring the Hilbert-Samuel function and its “convergence” to that of the free module. Also, if you know everything about \(\mathrm{GL}_2(\mathbf{Q}_p)\) and \(2\)-dimensional Galois representations then you should be able to answer this question too.

Of course I could have re-done the initial computation for this blog post, but I think at least some readers are happier when I ask questions for which I don’t know the answer…

Posted in Mathematics | Tagged , , | 12 Comments

Fermat Challenge

A challenge inspired from a question of Doron Zeilberger. Do there exist arbitrarily large integers \(n\) with the following property:

  1. There exists an ordered field \(F\) such that \(x^n+ y^n = z^n\) has solutions in \(F\) with \(xyz \ne 0\).
  2. The only solutions in \(F\) to \(x^m + y^m = z^m\) for \(3 \le m < n\) satisfy \(xyz = 0\),

To give a somewhat looser phrasing, you might try to prove Fermat over \(\mathbf{Q}\) by an inductive argument that only relies on positivity of squares together with the fact that Fermat was classical known for some small values of \(n\). This question asks whether you can rule out such a proof.

This might be tricky. Quite possibly taking \(F = \mathbf{Q}(2^{1/n}) \subset \mathbf{R}\) will work for infinitely many integers \(n\), but this is not obvious. Indeed, since any ordered field \(F\) will always contain \(\mathbf{Q}\), any proof that arbitrarily large \(n\) with the properties above exist will also prove Fermat over \(\mathbf{Q}\). That said, there might be simple constructions of such \(F\) assuming Fermat is true over \(\mathbf{Q}\) which we fortunately know to be true.

Posted in Mathematics | Tagged , , | 6 Comments

Ramanujan Machine Redux

I had no intention to discuss the Ramanujan Machine again, but over the past few days there has been a flurry of (attempted) trollish comments on that post, so after taking a brief look at the latest version, I thought I would offer you my updates. (I promise for the last time.)

Probably the nicest thing I have to say about the updated paper is that it is better than the original. My complaints about the tone of the paper remain the same, but I don’t think it is necessary for me to revisit them here.

Concerning the intellectual merit, I think it is worth making the following remarks. First, I am only address the contributions to mathematics, Second, what counts as a new conjecture is not really as obvious as it sounds. Since continued fractions are somewhat recherché, it might be more helpful to give an analogy with infinite series. Suppose I claimed it was a new result that

\( \displaystyle{ 2G = \sum_{n=0}^{\infty} a_n = 1 + \frac{1}{2} + \frac{5}{36} + \frac{5}{72} + \frac{269}{3600} – \frac{1219}{705600} + \ldots } \)

where for \(n \ge 4\) one has

\(2 n^2 a_n = n^2 a_{n-1} – 2 (n-2)^2 a_{n-2} + (n-2)^2 a_{n-3}.\)

How can you evaluate this claim? Quite probably this is the first time this result has been written down, and you will not find it anywhere in the literature. But it turns out that

\( \displaystyle{ \left(\sum_{n=0}^{\infty} \frac{x^n}{2^n} \right) \times \left(\sum_{n=0}^{\infty} \frac{(-1)^n x^{2n+1}}{(2n+1)^2} \right)
= \sum_{n=0}^{\infty} a_n x^n}\)

and letting \(x=1\) recovers the identity above and immediately explains how to prove it. To a mathematician, it is clear that the proof explains not only why the originally identity is true, but also why it is not at all interesting. It arises as more or less a formal manipulation of a definition, with a few minor things thrown in like the sum of a geometric series and facts about which functions satisfy certain types of ordinary differential equations. The point is that the identities produced by the Ramanujan Machine have all been of this type. That is, upon further scrutiny, they have not yet revealed any new mathematical insights, even if any particular example, depending on what you know, may be more or less tricky to compute.

What then about the improved irrationality measures for the Catalan constant? I think that is a polite way of describing a failed attempt to prove that Catalan’s constant was irrational. It’s something that would be only marginally publishable in a mathematics journal even with a proof. Results about the irrationality measure in the range where they are irrational have genuine implications about the arithmetic of the relevant numbers, but these results do not.

What then about the new continued fractions developed over the last year — maybe these are now deeper? Here you have to remember that continued fractions, especially of the kind considered in this paper, are more or less equivalent to questions about certain types of ordinary differential equations and their related periods. (But importantly, not conversely: most of these interesting ODEs have nothing to do with continued fractions since they are associated with recurrences of length greater than two.) For your sake, dear reader, I voluntarily chose to give up an hour or two of my life and took a closer look at one of their “new conjectures.” I deliberately chose one that they specifically highlighted in their paper, namely:

Where \(G\) here is Catalan’s constant \(L(2,\chi_4)\). As you might find unsurprising, once you start to unravel what is going on you find that, just as in the example above, the mystery of these numbers goes away. This example can be generalized in a number of ways without much change to the argument. Let \(p_0=1\) and \(q_0 = 0\), and otherwise let

\(\displaystyle{\frac{p_n}{q_n} = \frac{3}{1}, \frac{33}{13}, \frac{765}{313}, \frac{30105}{12453}, \frac{1790775}{743403}, \ldots} \)

denote the (non-reduced) partial fraction convergents. If

\( \displaystyle{ P(z) = \sum \frac{4^n p_n z^n}{n!^2} = 1 + 12z + 132 z^2 + \ldots
\quad Q(z) = \sum \frac{4^n q_n z^n}{n!^2} = 4z +52 z^2 + \ldots} \)

Then, completely formally, \(DP(z) = 0\) where

\( \displaystyle{ D = z(8z-1)(4z-1) \frac{d^2}{dz^2} + (160 z^2 – 40 z + 1) \frac{d}{dz} + 12(8z – 1)}\)

and \(DQ(z) = 4\). If \(K\) and \(E\) denote the standard elliptic functions, one observes that \(P(z)\) is nothing but the hypergeometric function

But now one is more or less done! The argument is easily finished with a little help from mathematica. Another solution to \(DF(z) = 0\) is of course

\( \displaystyle{ R(z) = \frac{ 2 E((1-8z)^2) -2 K((1-8z)^2) }{(1 – 8z)^2} = \log(z) + 2 + \ldots } \)

and knowing both homogenous solutions allows one to write \(Q(z) = u(z) P(z) + v(z) R(z)\) and then easily compute that

\(\displaystyle{ \lim_{n \rightarrow \infty} \frac{p_n}{q_n}
= \lim_{z \rightarrow 1/8} \frac{P(z)}{Q(z)} = \frac{2}{-1 + 2G}.}\)

as desired. For those playing at home, note that a convenient choice of \(u(z)\) and \(v(z)\) can be given by

\( \displaystyle{ v(z) = \int \frac{ E(16 z(1-4z))}{\pi} = 4 z – 8 z^2 + \ldots }\)

Posted in Mathematics, Rant | Tagged , , | 8 Comments