PRIMES
August 2013 Draft
Barry Mazur

William Stein

Contents
I

The Riemann Hypothesis

8

1 Thoughts about numbers

9

2 What are prime numbers?

12

3 “Named” Prime Numbers

17

4 Sieves

19

5 Questions about primes

22

6 Further questions about primes

25

7 How many primes are there?

29

8 Prime numbers viewed from a distance

33

9 Pure and applied mathematics

35

10 A probabilistic “ﬁrst” guess

37

11 What is a “good approximation”?

41

12 What is Riemann’s Hypothesis?

43

13 The Prime Number Theorem

45

14 The staircase of primes

49

15 Tinkering with the staircase of primes

51

16 Computer music ﬁles and prime numbers

54

17 Spectra and Trigonometric Sums

60

18 The spectrum and the staircase of primes

62

2

CONTENTS

3

19 To our readers of Part I

64

II

65

Distributions

20 Slopes of graphs that have no slopes

66

21 Distributions

73

22 Fourier transforms: second visit

79

23 Fourier transform of delta

82

24 Trigonometric series

84

III

86

The Riemann Spectrum of the Prime Numbers

25 A sneak preview

87

26 On losing no information

93

27 Going from the primes to the Riemann Spectrum

96

28 Going from the Riemann Spectrum to the primes

101

IV

Back to Riemann

104

29 Building π(X) knowing the Spectrum

105

30 As Riemann envisioned it

112

31 Companions to the zeta function

119

32 Glossary

123

Foreword
The Riemann Hypothesis is one of the great unsolved problems of mathematics
and the reward of $1,000,000 of Clay Mathematics Institute prize money awaits
the person who solves it. But—with or without money—its resolution is crucial
for our understanding of the nature of numbers.
There are at least four full-length books recently published, written for a general
audience, that have the Riemann Hypothesis as their main topic. A reader of
these books will get a fairly rich picture of the personalities engaged in the
pursuit, and of related mathematical and historical issues. [1]
This is not the mission of the book that you now hold in your hands. We
aim—instead—to explain, in as direct a manner as possible and with the least
mathematical background required, what this problem is all about and why it
is so important. For even before anyone proves this hypothesis to be true (or
false!), just getting familiar with it and with some of the the ideas behind it, is
exciting. Moreover, this hypothesis is of crucial importance in a wide range of
mathematical ﬁelds; for example, it is a conﬁdence-booster for computational
mathematics: even if the Riemann Hypothesis is never proved, its truth gives
us an excellent sense of how long certain computer programs will take to run,
which, in some cases, gives us the assurance we need to initiate a computation
that might take weeks or even months to complete.
———
To be done: Add a photo of Sarnak
———–

Here is how the Princeton mathematician Peter Sarnak described the broad
impact the Riemann Hypothesis has had:
“The Riemann hypothesis is the central problem and it implies many,
many things. One thing that makes it rather unusual in mathematics
today is that there must be over ﬁve hundred papers—somebody
should go and count—which start ‘Assume the Riemann hypothesis,’
and the conclusion is fantastic. And those [conclusions] would then
4

CONTENTS

5

become theorems ... With this one solution you would have proven
ﬁve hundred theorems or more at once.” [2]

Our book is profusely illustrated, containing over 125 ﬁgures, diagrams, and pictures that accompany the text [3]. There are comparatively fewer mathematical
equations in Part I.
So, what is the Riemann Hypothesis? Below is a ﬁrst description of what it is
about. The task of our book is to develop the following boxed paragraph into
a fuller explanation and to convince you of the importance and beauty of the
mathematics it represents. We will be oﬀering, throughout our book, a number
of diﬀerent—but equivalent—ways of precisely formulating this hypothesis (we
display these in boxes). When we say that two mathematical statements are
“equivalent” we mean that, given the present state of mathematical knowledge,
we can prove that if either one of those statements is true, then the other is
true. The endnotes will guide the reader to the mathematical literature that
establishes these equivalences.

What sort of Hypothesis is the Riemann Hypothesis?
Consider the seemingly innocuous series of questions:
• How many prime numbers (2, 3, 5, 7, 11, 13, 17, . . .) are
there less than 100?
• How many less than 10,000?
• How many less than 1,000,000?
More generally, how many primes are there less than any
given number X?
Riemann proposed, a century and half ago, a strikingly simple-todescribe “very good approximation” to the number of primes less than
a given number X. We now see that if we could prove this Hypothesis of
Riemann we would have the key to a wealth of powerful mathematics.
Mathematicians are eager to ﬁnd that key.

6

CONTENTS

Figure 1: Raoul Bott (1923–2005)
A famous mathematician, Raoul Bott, once said—giving advice to some young
mathematicians—that whenever one reads a mathematics book or article, or
goes to a math lecture, one should aim to come home with something very
speciﬁc (it can be small, but should be speciﬁc) that has application to a wider
class of mathematical problem than was the focus of the text or lecture. If
we were to suggest some possible speciﬁc items to come home with, after read
our book, three key phrases – prime numbers, square-root accurate, and
spectrum – would head the list. As for words of encouragement to think hard
about the ﬁrst of these, i.e., prime numbers, we can do no better than to quote
a paragraph of Don Zagier’s classic 12-page exposition, The First 50 Million
Prime Numbers:

Figure 2: Don Zagier
“There are two facts about the distribution of prime numbers of
which I hope to convince you so overwhelmingly that they will be
permanently engraved in your hearts. The ﬁrst is that, [they are]
the most arbitrary and ornery objects studied by mathematicians:
they grow like weeds among the natural numbers, seeming to obey
no other law than that of chance, and nobody can predict where the
next one will sprout. The second fact is even more astonishing, for
it states just the opposite: that the prime numbers exhibit stunning
regularity, that there are laws governing their behavior, and that
they obey these laws with almost military precision.”

CONTENTS

7

Part I of our book is devoted to conveying the essence of the Riemann Hypothesis and explaining why it is so intensely pursued. It requires a minimum of
mathematical background, and does not, for example, use calculus, although it
would be helpful to know—or to learn on the run—the meaning of the concept
of function. Given its mission, Part I is meant to be complete, in that it has a
beginning, middle, and end.
Part II does require calculus, and is meant as a general preparation for the type
of Fourier analysis that will occur in the later chapters. The notion of spectrum
is key.
Part III is for readers who wish to see, more vividly, the phenomenology that
links the placement of prime numbers and (what we call there) the Riemann
Spectrum.
Part IV requires some complex analysis, and returns to Riemann’s original viewpoint. In particular it relates the “Riemann Spectrum” that we discuss in Part
III to the nontrivial zeroes of the Riemann’s zeta-function.
The end-notes are meant to provide more technical commentary, and link the
text to references.

Part I

The Riemann Hypothesis

8

Chapter 1

Thoughts about numbers:
ancient, medieval, and
modern
If we are to believe the ancient Greek philosopher Aristotle the early Pythagoreans thought that the principles governing Number are “the principles of all
things,” the concept of Number being more basic than earth, air, ﬁre, or water,
which were according to ancient tradition the four building blocks of matter.
To think about number is to get close to the architecture of “what is.”
So, how far along are we in our thoughts about numbers?

Figure 1.1: Ren´ Descartes
e
The French philosopher and mathematician Ren´ Descartes, almost four cene
turies ago, expressed the hope that there soon would be “almost nothing more
to discover in geometry.” Contemporary physicists dream of a ﬁnal theory [4].
But despite its venerability and its great power and beauty, the pure mathematics of numbers may still be in the infancy of its development, with depths
to be explored as endless as the human soul, and never a ﬁnal theory.
9

10

CHAPTER 1. THOUGHTS ABOUT NUMBERS

Figure 1.2: Don Quixote and “his” Dulcinea del Toboso
Numbers are obstreperous things. Don Quixote encountered this when he requested that the “bachelor” compose a poem to his lady Dulcinea del Toboso,
the ﬁrst letters of each line spelling out her name. The “bachelor” found
“a great diﬃculty in their composition because the number of letters
in her name was 17, and if he made four Castilian stanzas of four
octosyllabic lines each, there would be one letter too many, and if
he made the stanzas of ﬁve octosyllabic lines each, the ones called
d´cimas or redondillas, there would be three letters too few...” [5]
e
“It must ﬁt in, however, you do it,” pleaded Quixote, not willing to grant the
imperviousness of the number 17 to division.
Seventeen is indeed a prime number: there is no way of factoring it as the product of smaller numbers, and this accounts—people tell us—for its occurrence in
some phenomena of nature, as when the 17-year cicadas all emerged to celebrate
a “reunion” of some sort in our ﬁelds and valleys.

Figure 1.3: Cicadas emerge every 17 years
Prime numbers, despite their primary position in our modern understanding of
number, were not speciﬁcally doted over in the ancient literature before Euclid,
at least not in the literature that has been preserved. Primes are mentioned as

11
a class of numbers in the writings of Philolaus (a predecessor of Plato); they are
not mentioned speciﬁcally in the Platonic dialogues, which is surprising given
the intense interest Plato had in mathematical developments; and they make
an occasional appearance in the writings of Aristotle, which is not surprising,
given Aristotle’s emphasis on the distinction between the composite and the
incomposite. “The incomposite is prior to the composite,” writes Aristotle in
Book 13 of the Metaphysics.

Figure 1.4: Euclid
The notion of prime number occurs in Euclid’s Elements and play a role there
as the extraordinary mathematical concept, central to any deep understanding
of arithmetic phenomena, that it is now understood to be.
There is an extraordinary wealth of established truths about whole numbers;
these truths provoke sheer awe for the beautiful complexity of prime numbers.
But each of the important new discoveries we make give rise to a further richness
of questions, educated guesses, heuristics, expectations, and unsolved problems.

Chapter 2

What are prime numbers?
Primes as atoms.
To begin from the beginning, think of the operation of
multiplication as a bond that ties numbers together: the equation 2 × 3 = 6
invites us to imagine the number 6 as (a molecule, if you wish) built out of
its smaller constituents 2 and 3. Reversing the procedure, if we start with a
whole number, say 6 again, we may try to factor it (that is, express it as a
product of smaller whole numbers) and, of course, we would eventually, if not
immediately, come up with 6 = 2×3 and discover that 2 and 3 factor no further;
the numbers 2 and 3, then, are the indecomposable entities (atoms, if you wish)
that comprise our number.

6
3

2

Figure 2.1: The number 6 = 2 × 3
By deﬁnition, a prime number (colloquially, a prime) is a whole number,
bigger than 1, that cannot be factored into a product of two smaller whole
numbers. So, 2 and 3 are the ﬁrst two prime numbers. The next number along
the line, 4, is not prime, for 4 = 2 × 2; the number after that, 5, is. Primes are,
multiplicatively speaking, the building blocks from which all numbers can be
made. A fundamental theorem of arithmetic tells us that any number (bigger
than 1) can be factored as a product of primes, and the factorization is unique
except for rearranging the order of the primes.
For example, if you try to factor the number 12 as a product of smaller numbers—
ignoring the order of the factors—there are two ways to begin to do this:
12 = 2 × 6

and

12 = 3 × 4

But neither of these ways is a full factorization of 12, for both 6 and 4 are
12

13
not prime, so can be themselves factored, and in each case after changing the
ordering of the factors we arrive at:

12 = 2 × 2 × 3.

12

12
6

2

3

4

3

2

2

2

Figure 2.2: Factorizations of 12

If you try to factor the number 300, there are many ways to begin:

300 = 30 × 10

or

300 = 6 × 50

and there are various other starting possibilities. But if you continue the factorization (“climbing down” any one of the possible “factoring trees”) to the
bottom, where every factor is a prime number as in Figure 2.3, you always end
up with the same collection of prime numbers [6]:

300 = 22 × 3 × 52 .

300
3
2

300
100

10
5

10
2

20
10

2
5

2

15
5

3

5

Figure 2.3: Factor trees that illustrates the factorization of 300 as a product of
primes.

14

CHAPTER 2. WHAT ARE PRIME NUMBERS?

6469693230
690
30
3
2

9376367

23

13

721259

10

1309

5

119
17

551
11

19

29

7

Figure 2.4: Factorization tree for the product of the primes up to 29.
The Riemann Hypothesis probes the question: how intimately can we know
prime numbers, those atoms of multiplication? Prime numbers are an important
part of our daily lives. For example, anytime we visit a website and purchase
something online, prime numbers having hundreds of decimal digits are used
to keep our bank transactions private. This ubiquitous use to which these
giant primes are put depends upon a very simple principle: it is much easier to
multiply numbers together than to factor them. If you had to factor, say, the
number 391 you might scratch your head for a few minutes before discovering
that 391 is 17 × 23. But if you had to multiply 17 by 23 you would do it
straightaway. Oﬀer two primes, say, P and Q each with a few hundred digits,
to your computing machine and ask it to multiply them together: you will get
their product N = P × Q with its hundreds of digits in about a microsecond.
But present that number N to any current desktop computer, and ask it to
factor N , and the computer will (almost certainly) fail to do the task. [7] [8]
The safety of much encryption depends upon this guaranteed failure! [9]
If we were latter-day number-phenomenologists we might revel in the discovery
and proof that
p = 243,112,609 − 1 = 3164702693 . . . . . . (millions of digits) . . . . . . 6697152511
is a prime number, this number having 12,978,189 digits! This prime, which
was discovered on August 23, 2008 by the GIMPS project (see [10]), is the ﬁrst
prime ever found with more than ten million digits.
Now 243,112,609 − 1 is quite a hefty number! Suppose someone came up to you
saying “surely p = 243,112,609 − 1 is the largest prime number!” (which it is not)
how might you convince that person that he or she is wrong? [11]
Here is a neat—and, we hope, convincing—strategy to show there are prime
numbers even larger than p = 243,112,609 − 1. Imagine forming the following

15
humungous number: let M be the product of all prime numbers up to and
including p = 243,11,2609 − 1. Now go one further than M by taking the next
number N = M + 1.
OK, even though this number N is wildly large, it is either a prime number
itself—which would mean that there would indeed be a prime number larger
than p = 243,112,609 − 1, namely N ; or in any event it is surely divisible by some
prime number, call it P .
Here, now, is a way of seeing that this P is bigger than p: Since every prime
number smaller than or equal to p divides M , these prime numbers cannot
divide N = M + 1 (since they divide M evenly, if you tried to divide N = M + 1
by any of them you would get a remainder of 1). So, since P does divide N it
must not be any of the smaller prime numbers: P is therefore a prime number
bigger than p = 243,112,609 − 1.
This strategy, by the way, is not very new: it is, in fact, well over two thousand
years old, since it already occurred in Euclid’s Elements. The Greeks did know
that there are inﬁnitely many prime numbers and they showed it via the same
method as we showed that our p = 243,112,609 −1 is not the largest prime number.
Here is the argument again, given very succinctly: Given primes p1 , . . . , pm , let
n = p1 p2 · · · pm + 1. Then n is divisible by some prime not equal to any pi , so
there are more than m primes.
You can think of this strategy as a simple game that you can play. Start with
any ﬁnite bag of prime numbers (say the bag that only contains one prime, the
prime 2). Now each “move” of the game consists of multiplying together all the
primes you have in your bag to get a number M , then adding 1 to M to get the
even larger number N = M +1, then factoring N into prime number factors, and
then including all those new prime numbers in your bag. Euclid’s proof gives
us that we will—with each move of this game—be ﬁnding more prime numbers:
the bag will increase. After, say, a million moves our bag will be guaranteed to
contain more than a million prime numbers.
For example, starting the game with your bag containing only one prime number
2, here is how your bag grows with after successive moves of the game:
{2}
{2, 3}
{2, 3, 7}
{2, 3, 7, 43}
{2, 3, 7, 43, 13, 139}
{2, 3, 7, 43, 13, 139, 3263443}
{2, 3, 7, 43, 13, 139, 3263443, 547, 607, 1033, 31051}
{2, 3, 7, 43, 13, 139, 3263443, 547, 607, 1033, 31051, 29881, 67003,
9119521, 6212157481}
etc. [12]
Though there are inﬁnitely many primes, actually ﬁnding them is a major chal-

16

CHAPTER 2. WHAT ARE PRIME NUMBERS?

lenge. In the 1990s, the Electronic Frontier Foundation http://www.eff.org/
awards/coop oﬀered a $100,000 cash reward to the ﬁrst group to ﬁnd a prime
with at least 10,000,000 decimal digits (the record prime p above won this prize
[13]), and oﬀers another $150,000 cash prize to the ﬁrst group to ﬁnd a prime
with at least 100,000,000 decimal digits.
The number p = 243,112,609 − 1 is the largest prime we know, where by “know”
we mean that we know it so explicitly that we can compute things about it. For
example, the last two digits of p are both 1 and the sum of the digits of p is
58,416,637. Of course p is not the largest prime number since there are inﬁnitely
many primes, e.g., the next prime q after p is a prime. But there is no known
way to eﬃciently compute anything interesting about q. For example, what is
the last digit of q in its decimal expansion?

Chapter 3

“Named” Prime Numbers
Prime numbers come in all sorts of shapes, some more convenient to deal with
than others. For example, the number we have been talking about,
p = 243,112,609 − 1,
is given to us, by its very notation, in a striking form; i.e., one less than a power
of 2. It is no accident that the largest “currently known” prime number has
such a form. This is because there are special techniques we can draw on to
show primality of a number, if it is one less than a power of 2 and—of course—if
it also happens to be prime. The primes of that form have a name, Mersenne
Primes, as do the primes that are one more than a power of 2, those being called
Fermat Primes. [14]
Here are two exercises that you might try to do, if this is your ﬁrst encounter
with primes that diﬀer from a power of 2 by 1:
1. Show that if a number of the form M = 2n −1 is prime, then the exponent
n is also prime. [Hint: This is equivalent to proving that if n is composite,
then 2n − 1 is also composite.] For example: 22 − 1 = 3, 23 − 1 = 7 are
primes, but 24 − 1 = 15 is not. So Mersenne primes are numbers that are
• of the form
2prime

number

− 1,

and
• are themselves prime numbers.
2. Show that if a number of the form F = 2n + 1 is prime, then the exponent
n is a power of two. For example: 22 + 1 = 5 is prime, but 23 + 1 = 9 is
not. So Fermat primes are numbers that are
17

18

CHAPTER 3. “NAMED” PRIME NUMBERS
• of the form
2power

of two

+ 1,

and
• are themselves prime numbers.
Not all numbers of the form 2prime number − 1 or of the form 2power of two + 1 are
prime. We currently know only ﬁnite many primes of either of these forms. How
we have come to know what we know is an interesting tale. See, for example,
http://www.mersenne.org/.

Chapter 4

Sieves
Eratosthenes, the mathematician from Cyrene (and later, librarian at Alexandria) explained how to sift the prime numbers from the series of all numbers:
in the sequence of numbers,

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26,

for example, start by circling the 2 and crossing out all the other multiples of 2.
Next, go back to the beginning of our sequence of numbers and circle the ﬁrst
number that is neither circled nor crossed out (that would be, of course, the
3), then cross out all the other multiples of 3. This gives the pattern: go back
again to the beginning of our sequence of numbers and circle the ﬁrst number
that is neither circled nor crossed out; then cross out all of its other multiples.
Repeat this pattern until all the numbers in our sequence are either circled, or
crossed out, the circled ones being the primes.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

In Figures 4.1–4.4 we use the primes 2, 3, 5, and ﬁnally 7 to sieve out the primes
up to 100, where instead of crossing out multiples we grey them out, and instead
of circling primes we color their box red.
19

20

CHAPTER 4. SIEVES

11
21
31
41
51
61
71
81
91

2
12
22
32
42
52
62
72
82
92

3
13
23
33
43
53
63
73
83
93

4
14
24
34
44
54
64
74
84
94

5
15
25
35
45
55
65
75
85
95

6
16
26
36
46
56
66
76
86
96

7
17
27
37
47
57
67
77
87
97

8
18
28
38
48
58
68
78
88
98

9
19
29
39
49
59
69
79
89
99

10
20
30
40
50
60
70
80
90
100

Figure 4.1: Using the prime 2 to sieve for primes up to 100
Since all the even numbers greater than two are eliminated as being composite
numbers and not primes they appear as gray in Figure 4.1, but none of the odd
numbers are eliminated so they still appear in white boxes.
2
11
21
31
41
51
61
71
81
91

3
13
23
33
43
53
63
73
83
93

5
15
25
35
45
55
65
75
85
95

7
17
27
37
47
57
67
77
87
97

9
19
29
39
49
59
69
79
89
99

Figure 4.2: Using the primes 2 and 3 to sieve for primes up to 100
2
11
31
41
61
71
91

3
13
23
43
53
73
83

5
25
35
55
65
85
95

7
17
37
47
67
77

19
29
49
59
79
89

97

Figure 4.3: Using the primes 2,3,5 to sieve for primes up to 100
Looking at Figure 4.3, we see that for all but six numbers up to 100 we have
(after sieving by 2,3, and 5) determined which are primes and which composite.

21
2
11
31
41
61
71
91

3
13
23
43
53
73
83

5

7
17
37
47
67
77

19
29
49
59
79
89

97

Figure 4.4: Using the primes 2,3,5,7 to sieve for primes up to 100
Finally, as in Figure 4.4, sieving by 2,3,5, and 7 we have determined which are
primes for all but two numbers. [15]

Chapter 5

Questions about primes
that any person might ask
We become quickly stymied when we ask quite elementary questions about the
spacing of the inﬁnite series of prime numbers.
For example, are there inﬁnitely many pairs of primes whose diﬀerence is 2?
The sequence of primes seems to be rich in such pairs
5 − 3 = 2,

7 − 5 = 2,

13 − 11 = 2,

19 − 17 = 2,

and we know that there are loads more such pairs (see [16]) but the answer
to our question, are there inﬁnitely many?, is not known. Nevertheless there
is very exciting recent work in this direction, speciﬁcally, Yitang Zhang proved
that there are inﬁnitely many pairs of primes that diﬀer by no more than 7×107 .
See the wikipedia entry http://en.wikipedia.org/wiki/Yitang_Zhang. Are
there inﬁnitely many pairs of primes whose diﬀerence is 4? Answer: equally
unknown. Is every even number greater than 2 a sum of two primes? Answer:
unknown. Are there inﬁnitely many primes which are 1 more than a perfect
square? Answer: unknown.
Remember the Mersenne prime p = 243,112,609 − 1? and how we showed that
there is a prime P larger than that? Suppose, though, someone asked us whether
there was a Mersenne Prime larger than this p: that is, is there a prime number
of the form
2some

prime number

−1

bigger than p = 243,112,609 −1? Answer: We don’t know. It is possible that there
are inﬁnitely many Mersenne primes but we’re far from being able to answer
such questions.
22

23

Figure 5.1: Mersenne
Is there some neat formula giving the next prime? More speciﬁcally, If I give you
a number N , say N = one million, and ask you for the ﬁrst number after N that
is prime, is there a method that answers that question without, in some form or
other, running through each of the successive odd numbers after N rejecting the
nonprimes until the ﬁrst prime is encountered? Answer: unknown.
One can think of many ways of “getting at” some understanding of the placement of prime numbers among all number. Up to this point we have been mainly
just counting them, trying to answer the question “how many primes are there
up to X?” and we have begun to get some feel for the numbers behind this
question, and especially for the current “best guesses” about estimates.
What is wonderful about this subject is that people attracted to it cannot resist
asking questions that lead to interesting, and sometimes surprising numerical
experiments. Moreover, given our current state of knowledge, many of the
questions that come to mind are still unapproachable: we don’t yet know enough
about numbers to answer them. But asking interesting questions about the
mathematics that you are studying is a high art, and is probably a necessary
skill to acquire, in order to get the most enjoyment—and understanding—from
mathematics. So, we oﬀer this challenge to you:
Come up with with your own question about primes that
• is interesting to you,
• is not a question whose answer is known to you,
• is not a question that you’ve seen before; or at least not exactly,
• is a question about which you can begin to make numerical investigations.

24

CHAPTER 5. QUESTIONS ABOUT PRIMES

If you are having trouble coming up with a question, read on for more examples
that provide further motivation.

Chapter 6

Further questions about
primes
Let us, for variety, dice the question diﬀerently by concentrating on the gaps
between one prime and the next, rather than the tally of all primes. Of course,
it is no fun at all to try to guess how many pairs of primes p, q there are with
gap q − p equal to a ﬁxed odd number, since the diﬀerence of two odd numbers
is even. The fun, though, begins in earnest if you ask for pairs of primes with
diﬀerence equal to 2 (these being called twin primes) for it has long been guessed
that there are inﬁnitely many such pairs of primes, but no one has been able to
prove this yet.
The largest known twin primes are
65516468355 · 2333333 ± 1
They have 100355 digits and were found by Kaiser and Klahnin. [17]
Similarly, it interesting to consider primes p and q with diﬀerence 4, or 8, or—in
fact—any even number 2k. That is, people have guessed that there are inﬁnitely
many pairs of primes with diﬀerence 4, with diﬀerence 6, etc. but none of these
guesses have yet been proved.
So, deﬁne
Gapk (X)
to be the number of pairs of successive primes (p, q) with q < X that have “gap
k” (i.e., such that their diﬀerence q − p is k). Here p is a prime, q > p is a prime,
and there are no primes between p and q. For example, Gap2 (10) = 2, since the
pairs (3, 5) and (5, 7) are the pairs less than 10 with gap 2. See Table 6.1 for
various values of Gapk (X) and Figure 6.1 for the distribution of prime gaps for
X = 107 .
25

26

CHAPTER 6. FURTHER QUESTIONS ABOUT PRIMES

Table 6.1: Values of Gapk (X)
X
10
102
103
104
105
106
107

Gap2 (X)
2
8
35
205
1224
8169
58980

Gap4 (X)
1
8
40
202
1215
8143
58621

Gap6 (X)
0
7
44
299
1940
13549
99987

Gap8 (X)
0
1
15
101
773
5569
42352

1e5
8e4
6e4
4e4
2e4
10

20

30

40

50

Figure 6.1: Frequency histogram showing the distribution of prime gaps of size
≤ 50 for all primes up to 107 . Six is the most popular gap in this data. The
vertical axis labels such as “6e4” mean 6 · 104 = 60,000.
Gap 6

200
Gap 2
Gap 4

150
100

Gap 8

50
1000

2000

3000

4000

5000

6000

7000

Figure 6.2: Plots of Gapk (X) for k = 2, 4, 6, 8. Which wins?
Here is yet another question that deals with the spacing of prime numbers that
we do not know the answer to:
Racing Gap 2, Gap 4, Gap 6, and Gap 8 against each other:
Challenge: As X tends to inﬁnity which of Gap2 (X), Gap4 (X), Gap6 (X),
or Gap8 (X) do you think will grow faster? How much would you
bet on the truth of your guess? [18]

27
Here is a curious question that you can easily begin to check out for small
numbers. We know, of course, that the even numbers and the odd numbers are
nicely and simply distributed: after every odd number comes an even number,
after every even, an odd, there are an equal number of odd number as even
numbers less than any given odd number, and there may be nothing else of
interest to say about the matter. Things change considerably, though, if we
focus our concentration on multiplicatively even numbers and multiplicatively
odd numbers.
A multiplicatively even number is one that can be expressed as a product
of an even number of primes; and a multiplicatively odd number is one that
can be expressed as a product of an odd number of primes. So, any prime is
multiplicatively odd, the number 4 = 2 · 2 is multiplicatively even, and so is
6 = 2 · 3, 9 = 3 · 3, and 10 = 2 · 5; but 12 = 2 · 2 · 4 is multiplicatively odd. Below
we list the numbers up to 25, and underline and bold the multiplicatively odd
numbers.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Table 6.2 gives some data:

Table 6.2: Count of multiplicatively odd and multiplicatively even positive numbers ≤ X
X
m. odd
m. even

1
0
1

2
1
1

3
2
1

4
2
2

5
3
2

6
3
3

7
4
3

8
5
3

9
5
4

10
5
5

11
6
5

12
7
5

13
8
5

14
8
6

15
8
7

16
8
8

Now looking at this data, a natural, and simple, question to ask about the
concept of multiplicative oddness and evenness is:
Is there some X ≥ 2 for which there are more multiplicatively even numbers less
than or equal to X than multiplicatively odd ones?
Each plot in Figure 6.3 gives the number of multiplicatively odd numbers between 2 and X minus the number of multiplicatively even numbers between 2
and X, for X equal to 10, 100, 1000, 10000, 100000, and 1000000. The above
question asks whether these graphs would, for suﬃciently large X, ever cross
the X-axis.

28

CHAPTER 6. FURTHER QUESTIONS ABOUT PRIMES
3
2

0
-2

1

-4
-6

0

-8
-10

-1

2

4

6

8

10

12

14

16

20

0
-5

60

80

100

2000

4000

6000

8000

10000

2e5

4e5

6e5

8e5

1e6

-20

-10

40

0

-40
-60

-15

-80

-20

-100

-25

-120

0

200

400

600

800

1000

0
0

0

-200
-100

-400
-600

-200

-800
-300
-400

-1000
-1200
0

2e4

4e4

6e4

8e4

1e5

0

Figure 6.3: Racing Multiplicatively Even and Odd Numbers.
A negative response to this question—i.e., a proof that any plot as drawn in
Figure 6.3 never crosses the X-axis—would imply the Riemann Hypothesis! In
contrast to the list of previous questions, the answer to this question is known
(see [19]): alas, there is such an X. In 1960, Lehman showed that for X =
906, 400, 000 there are 708 more multiplicatively even numbers up to X than
multiplicatively odd numbers (Tanaka found in 1980 that the smallest X such
that there are more multiplicative even than odd numbers is X = 906, 150, 257).
These are questions that have been asked about primes (and we could give
bushels more as is done in [20]), questions expressible in simple vocabulary, that
we can’t answer today. We have been studying numbers for over two millenia
and yet we are indeed in the infancy of our understanding.
So we’ll continue our discussion by returning to the simplest counting question
about prime numbers.

Chapter 7

How many primes are
there?

2
29
43
71
113
127
169

3
17
31
59
73
101

5
19

7

23
37

47
61

79

89
103

107

131
157

149
163

173

191

11
53
67
109
137
151
179
193

13
41
83
97
139
167
181

Figure 7.1: Sieving Primes up to 200

Slow as we are to understand primes, at the very least we can try to count them.
You can see that there are 10 primes less than 30, so you might encapsulate this
by saying that the chances that a number less than 30 is prime is 1 in 3. This
frequency does not persist, though; here is some more data: There are 25 primes
less than 100 (so 1 in 4 numbers up to 100 are prime), there are 168 primes less
than a thousand (so we might say that among the numbers less than a thousand
the chances that one of them is prime is roughly 1 in 6).
29

30

CHAPTER 7. HOW MANY PRIMES ARE THERE?

0.6
0.5
0.4
0.3
0.2
0.1
20

40

60

80

100

Figure 7.2: Graph of the proportion of primes up to X for each integer X ≤ 100
There are 78,498 primes less than a million (so we might say that the chances
that a random choice among the ﬁrst million numbers is prime have dropped to
roughly 1 in 13).
0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1
200

400

600

800

1000

2000

4000

6000

8000

10000

Figure 7.3: Proportion of primes for X up to 1,000 (left) and 10,000 (right)
There are 455,052,512 primes less than ten billion; i.e., 10,000,000,000 (so we
might say that the chances are down to roughly 1 in 22).
Primes, then, seem to be thinning out. We return to the sifting process we
carried out earlier, and take a look at a few graphs, to get a sense of why that
might be so. There are a 100 numbers less than or equal to 100, a thousand
numbers less than or equal to 1000, etc.: the shaded graph in Figure 7.4 that
looks like a regular staircase, each step the same length as each riser, climbing
up at, so to speak, a 45 degree angle, counts all numbers up to and including X.
Following Eratosthenes, we have sifted those numbers, to pan for primes. Our
ﬁrst move was to throw out roughly half the numbers (the even ones!) after the
number 2. The cross-hatched bar graph in this ﬁgure which is, with one hiccup,
a regular staircase climbing at a smaller angle, each step twice the lengths of each
riser, illustrates the numbers that are left after one pass through Eratosthenes’
sieve, which includes, of course, all the primes. So, the chances that a number

31
bigger than 2 is prime is at most 1 in 2. Our second move was to throw out a
good bunch of numbers bigger than 3. So, the chances that a number bigger
than 3 is prime is going to be even less. And so it goes: with each move in
our sieving process we are winnowing the ﬁeld more extensively, reducing the
chances that the later numbers are prime.
All Numbers

100
80
60

Sieve by 2

40
20

Primes
20

40

60

80

100

Figure 7.4: Sieving by removing multiples of 2 up to 100
All Numbers

1000
800
600
400
200

Primes
200

400

600

800

1000

Figure 7.5: Sieving for primes up to 1000
The red curve in these ﬁgures actually counts the primes: it is the beguilingly
irregular staircase of primes. Its height above any number X on the horizontal
line records the number of primes less than or equal to X, the accumulation
of primes up to X. Refer to this number as π(X). So π(2) = 1, π(3) = 2,
π(30) = 10; of course, could plot a few more values of π(X), like π(ten billion) =
455, 052, 512.
Let us accompany Eratosthenes for a few further steps in his sieving process.
Figure 7.6 contains a graph of all whole numbers up to 100 after we have removed
the even numbers greater than 2, and the multiples of 3 greater than 3 itself.
Sieve by 2, 3

35
30
25
20
15
10
5
20

40

60

80

100

Figure 7.6: Sieving out multiples of 2 and 3.
From this graph you can see that if you go “out a way” the likelihood that

32

CHAPTER 7. HOW MANY PRIMES ARE THERE?

a number is a prime is less that 1 in 3. Figure 7.7 contains a graph of what
Eratosthenes sieve looks like up to 100 after sifting 2, 3, 5, and 7.
Sieve by 2, 3, 5, 7

25
20
15
10
5
20

40

60

80

100

Figure 7.7: Sieving out multiples of 2, 3, 5, and 7.
This data may begin to suggest to you that as you go further and further out
on the number line the percentage of prime numbers among all whole numbers
tends towards 0% (it does).
To get a sense of how the primes accumulate, we will take a look at the staircase
of primes for X = 25 and X = 100 in Figures 7.8 and 7.9.
8
6
4
2
0

5

10

15

20

25

Figure 7.8: Staircase of primes up to 25
25
20
15
10
5
20

40

60

80

Figure 7.9: Staircase of primes up to 100

100

Chapter 8

Prime numbers viewed
from a distance
The striking thing about these ﬁgures is that as the numbers get large enough,
the jagged accumulation of primes, those quintessentially discrete entities, becomes smoother and smoother, to the eye. How strange and wonderful to watch,
as our viewpoint zooms out to larger ranges of numbers, the accumulation of
primes taking on such a smooth and elegant shape.
1200

150
1000
800

100

600
400

50

200

200

400

600

800

1000

2000

4000

6000

8000

10000

Figure 8.1: Staircases of primes up to 1,000 and 10,000

8000
6000
4000
2000
2e4

4e4

6e4

8e4

1e5

Figure 8.2: Primes up to 100,000. Note that the axis label “6e4” means 6 × 104 .
33

34

CHAPTER 8. PRIME NUMBERS VIEWED FROM A DISTANCE

But don’t be fooled by the seemingly smooth shape of the curve in the last
ﬁgure above: it is just as faithful a reproduction of the staircase of primes as
the typographer’s art can render, for there are thousands of tiny steps and risers
in this curve, all hidden by the thickness of the print of the drawn curve in the
ﬁgure. It is already something of a miracle that we can approximately describe
the build-up of primes, somehow, using a smooth curve. But what smooth curve?
That last question is not rhetorical. If I draw a curve with chalk on the blackboard, this can signify a myriad of smooth (mathematical) curves all encompassed within the thickness of the chalk-line, all–if you wish–reasonable approximations of one another. So, there are many smooth curves that ﬁt the
chalk-curve. With this warning, but very much fortiﬁed by the data of Figure 8.2, let us ask: what is a smooth curve that is a reasonable approximation
to the staircase of primes?

Chapter 9

Pure and applied
mathematics
Mathematicians seems to agree that, loosely speaking, there are two types of
mathematics: pure and applied. Usually—when we judge whether a piece of
mathematics is pure or applied—this distinction turns on whether or not the
math has application to the “outside world,” i.e., that world where bridges are
built, where economic models are fashioned, where computers churn away on
the Internet (for only then do we unabashedly call it applied math), or whether
the piece of mathematics will ﬁnd an important place within the context of
mathematical theory (and then we label it pure). Of course, there is a great
overlap (as we will see later, Fourier analysis plays a major role both in data
compression and in pure mathematics).
Moreover, many questions in mathematics are “hustlers” in the sense that, at
ﬁrst view, what is being requested is that some simple task be done (e.g., the
question raised in this book, to ﬁnd a smooth curve that is a reasonable approximation to the staircase of primes). And only as things develop is it discovered
that there are payoﬀs in many unexpected directions, some of these payoﬀs being genuinely applied (i.e., to the practical world), some of these payoﬀs being
pure (allowing us to strike behind the mask of the mere appearance of the mathematical situation, and get at the hidden fundamentals that actually govern the
phenomena), and some of these payoﬀs defying such simple classiﬁcation, insofar as they provide powerful techniques in other branches of mathematics. The
Riemann Hypothesis—even in its current unsolved state—has already shown
itself to have all three types of payoﬀ.
The particular issue before us is, in our opinion, twofold, both applied, and pure:
can we curve-ﬁt the “staircase of primes” by a well approximating smooth curve
given by a simple analytic formula? The story behind this alone is marvelous,
has a cornucopia of applications, and we will be telling it below. But our
35

36

CHAPTER 9. PURE AND APPLIED MATHEMATICS

curiosity here is driven by a question that is pure, and less amenable to precise
formulation: are there mathematical concepts at the root of, and more basic
than (and “prior to,” to borrow Aristotle’s use of the phrase,) prime numbers–
concepts that account for the apparent complexity of the nature of primes?

Chapter 10

A probabilistic “ﬁrst” guess

Figure 10.1: Gauss
The search for such approximating curves began, in fact, two centuries ago when
Carl Friedrich Gauss deﬁned a certain beautiful curve that, experimentally,
seemed to be an exceptionally good ﬁt for the staircase of primes.
50
40
30
20
10
0

50

100

150

200

250

Figure 10.2: Plot of π(X) and Gauss’s smooth curve G(X)
37

38

CHAPTER 10. A PROBABILISTIC “FIRST” GUESS

Let us denote Gauss’s curve G(X); it has an elegant simple formula comprehensible to anyone who has had a tiny bit of calculus. If you make believe that the
chances that a number X is a prime is inversely proportional to the number of
digits of X you might well hit upon Gauss’s curve. That is,

G(X)

X
.
the number of digits of X

is roughly proportional to

But to describe Gauss’s guess precisely we need to discuss the natural logarithm
“log(X)” which is an elegant smooth function of real numbers X that is roughly
proportional to the number of digits of the whole number part of X.
4
3
2
1
20

40

60

80

100

-1

Figure 10.3: Plot of the natural logarithm log(X)
Euler’s famous constant e = 2.71828182 . . ., which is the limit of the sequence
1+

1
2

2

, 1+

1
3

3

, 1+

1
4

4

,...,

is used in the deﬁnition of log:
A = log(X) is the number A for which eA = X.
Before electronic calculators, logarithms were frequently used to speed up calculations, since logarithms translate diﬃcult multiplication problems into easier
addition problems which can be done mechanically. Such calculations use that
the logarithm of a product is the sum of the logarithms of the factors; that is,
log(X · Y ) = log(X) + log(Y ).

Figure 10.4: A slide rule computes 2X by using that log(2X) = log(2) + log(X)

39
In Figure 10.4 the numbers printed (on each of the slidable pieces of the rule) are
spaced according to their logarithms, so that when one slides the rule arranging
it so that the printed number X on one piece lines up with the printed number
1 on the other, we get that for every number Y printed on the ﬁrst piece, the
printed number on the other piece that is aligned with it is the product XY ; in
eﬀect the “slide” adds log(X) to log(Y ) giving log(XY ).

Figure 10.5: A Letter of Gauss
In 1791, when Gauss was 14 years old, he received a book that contained logarithms of numbers up to 7 digits and a table of primes up to 10,009. Years
later, in a letter written in 1849 (see Figure 10.5), Gauss claimed that as early
as 1792 or 1793 he had already observed that the density of prime numbers over
intervals of numbers of a given rough magnitude X seemed to average 1/ log(X).
Very very roughly speaking, this means that the number of primes up to X is
approximately X divided by twice the number of digits of X. For example, the
number of primes less than 99 should be roughly
99
= 24.75 ≈ 25,
2×2
which is pretty amazing, since the correct number of primes up to 99 is 25. The
number of primes up to 999 should be roughly
999
= 166.5 ≈ 167,
2×3

40

CHAPTER 10. A PROBABILISTIC “FIRST” GUESS

which is again close, since there are 168 primes up to 1000. The number of
primes up to 999,999 should be roughly
999999
= 83333.25 ≈ 83,333,
2×6
which is close to the correct count of 78,498.
Gauss guessed that the expected number of primes up to X is approximated by
the area under the graph of 1/ log(X) from 2 to X (see Figure 10.6). The area
under 1/ log(X) up to X = 999,999 is 78,626.43 . . ., which is remarkably close
to the correct count 78,498 of the primes up to 999,999.

1/log(x)
1

1

1

Area ~ 11.977468

Area ~ 176.564494

Area ~ 29.080978

0.5

0

1/log(x)

1/log(x)

0.5

0.5

5

10

15

20

10 Primes

25

30

0

20

40

60

80

100

25 Primes

200

400

600

800

1000

168 Primes

Figure 10.6: The expected tally of the number of primes ≤ X is approximated
by the area underneath the graph of 1/ log(X) from 1 to X.
Gauss was an inveterate computer: he wrote in his 1849 letter that there are
216,745 prime numbers less than three million (This is wrong: the actual number
of these primes is 216,816). Gauss’s curve G(X) predicted that there would be
216,970 primes—a miss, Gauss thought, by
225 = 216,970 − 216,745
(but actually he was closer than he thought: the prediction of the curve G(X)
missed by a mere 154 = 216,970 − 216,816) Gauss’s computation brings up two
queries: will this spectacular “good ﬁt” continue for arbitrarily large numbers?
and, the (evidently prior) question: what counts as a good ﬁt?

Chapter 11

What is a “good
approximation”?
If you are trying to estimate a number, say, around ten thousand, and you get it
right to within a hundred, let us celebrate this kind of accuracy by saying that
√
you have made an approximation with square-root error ( 10,000 = 100). Of
course, we should really use the more clumsy phrase “an approximation with at
worst square-root error.” Sometimes we’ll simply refer to such approximations
as good approximations. If you are trying to estimate a number in the millions,
and you get it right to within a thousand, let’s √
agree that—again—you have
made an approximation with square-root error ( 1,000,000 = 1,000). Again,
for short, call this a good approximation. So, when Gauss thought his curve
missed by 226 in estimating the number of primes less than three million, it was
well within the margin we have given for a “good approximation.” [21]
More generally, if you are trying to estimate a number that has D digits and
you get it almost right, but with an error that has no more than, roughly, half
that many digits, let us say, again, that you have made an approximation with
square-root error or synomymously, a good approximation.
This rough account almost suﬃces for what we will be discussing below, but to
be more precise, the speciﬁc gauge of accuracy that will be important to us is
not for a mere single estimate of a single error term,
Error term = Exact Value - Our “good approximation”
but rather for inﬁnite sequences of estimates of error terms. Generally, if you
are interested in a numerical quantity q(X) that depends on the real number
parameter X (e.g., q(X) could be π(X), “the number of primes < X”) and if
you have an explicit candidate “approximation,” qapprox (X), to this quantity, let
us say that qapprox (X) is essentially a square-root accurate approximation to q(X) if for any given exponent greater than 0.5 (you choose it: 0.501,
0.5001, 0.50001, . . . for example) and for large enough X—where the phrase
41

42

CHAPTER 11. WHAT IS A “GOOD APPROXIMATION”?

“large enough” depends on your choice of exponent—the error term—i.e., the
diﬀerence between qapprox (X) and the true quantity, q(X), is, in absolute value,
less than qapprox (X) raised to that exponent (e.g. < X 0.501 , < X 0.5001 , etc.).
Readers who know calculus and wish to have a technical formulation of this deﬁnition of good approximation might turn to the endnote 2 for a precise statement.
If you found the above confusing, don’t worry: again, a square-root accurate
approximation is one in which at least roughly half the digits are correct.
Remark 11.1. To get a feel for how basic the notion of approximation to data
being square root close to the true values of the data is—and how it represents
the “gold standard” of accuracy for approximations, consider this fable.
Imagine that the devil had the idea of saddling a large committee of people with
the task of ﬁnding values of π(X) for various large numbers X. This he did in the
following manner, having already worked out which numbers are prime himself.
Since the devil is, as everyone knows, into the details, he has made no mistakes:
his work is entirely correct. He gives each committee member a copy of the list
of all prime numbers between 1 and one of the large numbers X in which he was
interested. Now each committee member would count the number of primes by
doing nothing more than considering each number, in turn, on their list and
tallying them up, much like a canvasser counting votes. But since they are
human, they will indeed be making mistakes, say 0.001% of the time. Assume
further that it is just as likely for them to make the mistake of undercounting
or overcounting. If many people are engaged in such pursuit, some of them
might over-count π(X); some of them might under-count it. The average error
√
(over-counted or undercounted) would be proportional to X.

Chapter 12

What is Riemann’s
Hypothesis? (ﬁrst
formulation)
Recall from Chapter 10 that a rough approximation to π(X), the number of
primes < X, is given by the function X/ log(X); and Gauss’s guess for an
approximation to π(X) was in terms of the area of the region from 2 to X
under the graph of 1/ log(X), a quantity sometimes referred to as Li(X). “Li”
(pronounced Li) is short for logarithmic integral, because the area of the region
t
from 2 to X under 1/ log(X) is the integral 2 1/ log(X)dt.
Figure 12.1 contains a graph of the three functions Li(X), π(X), and X/ log X
for X ≤ 200. But data, no matter how impressive, may be deceiving (as we
learned in Chapter 6). If you think that the three graphs never cross for all large
values of X, and that we have the simple relationship X/ log(X) < π(X) <
Li(X) for large X, turn to this endnote. [22]
50
40
30
20
10
50

100

150

200

Figure 12.1: Plots of Li(X) (top), π(X) (in the middle), and X/ log(X) (bottom).
43

44

CHAPTER 12. WHAT IS RIEMANN’S HYPOTHESIS?

Let X = 1023 . Then (see [23])
π(X) = 1925320391606803968923,
Li(X) = 1925320391614054155138.780 . . . ,
X/(log(X) − 1) = 1924577459166813514799.7932241 . . . .
Li(X) − π(X) =
√
X log(X) =

7250186215.78002929687500 . . . ,
16747250820487.142114662460299 . . . .

Since Li(X) seems to start out impressively close in value to our π(X) (at least
in this range) it is natural to hope that in full generality it is essentially a square
root approximation to π(X). This gives us our ﬁrst formulation of Riemann’s
Hypothesis:

The Riemann Hypothesis (ﬁrst formulation)
For any real number X the number of prime numbers less than X is
approximately Li(X) and this approximation is essentially square root
accurate. See [24]

Chapter 13

The Prime Number
Theorem
Take a look at Figure 12.1 again. All three functions, X/ log(X), Li(X) and
π(X) are “going to inﬁnity with X” (this means that for any real number R, if
X is taken to be suﬃciently large, the values of these functions at X will exceed
R).
Are these functions “going to inﬁnity” at the same rate?
To answer such a question, we have to know what we mean by going to inﬁnity
at the same rate. So, here’s a deﬁnition. Two functions, A(X) and B(X), that
each go to inﬁnity will be said to go to inﬁnity at the same rate if their
ratio
A(X)/B(X)
tends to 1 as X goes to inﬁnity.
If for example two functions, A(X) and B(X) that take positive whole number
values, have the same number of digits for large X and if, for any number you
give us, say: a million (or a billion, or a trillion) and if X is large enough, then
the “leftmost” million (or billion, or trillion) digits of A(X) and B(X) are the
same, then A(X) and B(X) go to inﬁnity at the same rate.
While we’re deﬁning things, let us say that two functions, A(X) and B(X), that
each go to inﬁnity go to inﬁnity at similar rates if there are two positive
constants c and C such that for X suﬃciently large the ratio
A(X)/B(X)
is greater than c and smaller than C.
45

46

CHAPTER 13. THE PRIME NUMBER THEOREM

30000

1

25000

0.8

20000

B(X)

15000

0.6

A(X)/B(X)

0.4

10000

A(X)

5000
20

40

60

80

0.2
20

100

40

60

80

100

Figure 13.1: The polynomials A(X) = 2X 2 + 3X − 5 (bottom) and B(X) =
3X 2 − 2X + 1 (top) go to inﬁnity at similar rates.
For example, two polynomials in X with positive leading coeﬃcient go to inﬁnity
at the same rate if and only if they have the same degrees and the same leading
coeﬃcient; they go to inﬁnity at similar rates if they have the same degree. See
Figure 13.1.
10000

6

A(X)

8000

5
4

6000

B(X)

4000

3
2

2000

1
20

40

60

80

100

0

A(X)/B(X)
20

40

60

80

100

Figure 13.2: The polynomials A(X) = X 2 + 3X − 5 (top) and B(X) = X 2 −
2X + 1 (bottom) go to inﬁnity at the same rate.
Now a theorem from elementary calculus tells us that the ratio of Li(X) to
X/ log(X) tends to 1 as X gets larger and larger. That is—using the deﬁnition
we’ve just introduced— Li(X) and X/ log(X) go to inﬁnity at the same rate
(see [25])
Recall (on page 10 above) that if X = 1023 , the top eleven digits of π(X) and
Li(X) are the same: 19253203916. Well, that’s a good start. Can we guarantee
that for X large enough, the “top” million (or billion, or trillion) digits of π(X)
and Li(X) are the same? I.e., that these two functions go to inﬁnity at the same
rate?
The Riemann Hypothesis, as we have just formulated it, would tell us that the
diﬀerence between Li(X) and π(X) is pretty small in comparison with the size of
X. This information would imply (but would be much more precise information

47
than) the statement that the ratio Li(X)/π(X) tends to 1, i.e., that Li(X) and
π(X) go to inﬁnity at the same rate.
This last statement gives, of course, a far less precise relationship between Li(X)
and π(X) than the Riemann Hypothesis (once it is proved!) would give us. The
advantage, though, of the less precise statement is that it is currently known
to be true, and—in fact—has been known for over a century. It goes under the
name of
The Prime Number Theorem:
rate.

Li(X) and π(X) go to inﬁnity at the same

Since Li(X) and X/ log(X) go to inﬁnity at the same rate, we could equally
well have expressed the “same” theorem by saying:
The Prime Number Theorem:
same rate.

X/ log(X) and π(X) go to inﬁnity at the

This fact is a very hard-won piece of mathematics! It was proved in 1896
independently by Hadamard and de la Vall´e Poussin.
e
A milestone in the history leading up to the proof of Prime Number Theorem is the earlier work of Chebyshev (see http://en.wikipedia.org/wiki/
Chebyshev_function) showing that (to use the terminology we introduced)
X/ log(X) and π(X) go to inﬁnity at similar rates.
The elusive Riemann Hypothesis, however, is much deeper than the Prime Number Theorem, and takes its origin from some awe-inspiring, diﬃcult to interpret, lines in Bernhard Riemann’s magniﬁcent 8-page paper, “On the number
of primes less than a given magnitude,” published in 1859 (see [26]).

Figure 13.3: From Riemann’s 1859 Manuscript

Riemann’s hypothesis, as it is currently interpreted, turns up as relevant, as a
key, again and again in diﬀerent parts of the subject: if you accept it as hypothesis you have an immensely powerful tool at your disposal: a mathematical
magnifying glass that sharpens our focus on number theory. But it also has a
wonderful protean quality—there are many ways of formulating it, any of these
formulations being provably equivalent to any of the others.

48

CHAPTER 13. THE PRIME NUMBER THEOREM

Figure 13.4: Bernhard Riemann (1826–1866)
This Riemann Hypothesis remains unproved to this day, and therefore is “only
a hypothesis,” as Osiander said of Copernicus’s theory, but one for which we
have overwhelming theoretical and numerical evidence in its support. It is the
kind of conjecture that contemporary Dutch mathematician Frans Oort might
label a suﬀusing conjecture in that it has unusually broad implications: many
many results are now known to follow, if the conjecture, familiarly known as
RH, is true. A proof of RH would, therefore, fall into the applied category, given
our discussion above in Chapter 9. But however you classify RH, it is a central
concern in mathematics to ﬁnd its proof (or, a counter-example!). RH is one of
the weightiest statements in all of mathematics.

Chapter 14

The information contained
in the staircase of primes

We have borrowed the phrase “staircase of primes” from the popular book The
Music of Primes by Marcus du Sautoi, for we feel that it captures the sense that
there is a deeply hidden architecture to the graphs that compile the number of
primes (up to N ) and also because—in a bit—we will be tinkering with this
carpentry. Before we do so, though, let us review in Figure 14.1 what this
staircase looks like, for diﬀerent ranges.
49

50

CHAPTER 14. THE STAIRCASE OF PRIMES
25
8

20
6

15

4

10

2

5
5

15

10

20

20

40

60

80

100

2000

25

4000

6000

8000

10000

1200

150
1000
800

100

600
400

50

200

200

400

600

800

1000

8000
6000
4000
2000
2e4

4e4

6e4

8e4

1e5

Figure 14.1: The Staircase of Primes
The mystery of this staircase is that the information contained within it is—in
eﬀect—the full story of where the primes are placed. This story seems to elude
any simple description. Can we “tinker with” this staircase without destroying
this valuable information?

Chapter 15

Tinkering with the
carpentry of the staircase of
primes
For starters, notice that all the risers of this staircase (Figure 14.1 above) have
unit length. That is, they contain no numerical information except for their
placement on the x-axis. So, we could distort our staircase by changing (in any
way we please) the height of each riser; and as long as we haven’t brought new
risers into—or old risers out of—existence, and have not modiﬁed their position
over the x-axis, we have retained all the information of our original staircase.
A more drastic-sounding thing we could do is to judiciously add new steps to
our staircase. At present, we have a step at each prime number p, and no step
anywhere else. Suppose we built a staircase with a new step not only at x = p
for p each prime number but also at x = 1 and x = pn where pn runs through
all powers of prime numbers as well. Such a staircase would have, indeed, many
more steps than our original staircase had, but, nevertheless, would retain much
of the quality of the old staircase: namely it contains within it the full story of
the placement of primes and their powers.
A ﬁnal thing we can do is to perform a distortion of the x-axis (elongating or
shortening it, as we wish) in any speciﬁc way, as long as we can perform the
inverse process, and “undistort” it if we wish. Clearly such an operation may
have mangled the staircase, but hasn’t destroyed information irretrievably.
We shall perform all three of these kinds of operations eventually, and will see
some great surprises as a result. But for now, we will perform distortions only
of the ﬁrst two types. We are about to build a new staircase that retains the
precious information we need, but is constructed according to the following
architectural plan.
51

52

CHAPTER 15. TINKERING WITH THE STAIRCASE OF PRIMES
• We ﬁrst build a staircase that has a new step precisely at x = 1, and
x = pn for every prime power pn with n ≥ 1. That is, there will be a new
step at x = 1, 2, 3, 4, 5, 7, 8, 9, 11, . . .
• Our staircase starts on the ground at x = 0 and the height of the riser of
the step at x = 1 will be log(2π). The length of the riser of the step at
x = pn will not be 1 (as was the length of all risers in the old staircase of
primes) but rather: the step at x = pn will have the height of its riser equal
to log p. So for the ﬁrst few steps listed in the previous item, the risers
will be of length log(2π), log 2, log 3, log 2, log 5, log 7, log 2, log 3, log 11, . . .
Since log(p) > 1, these vertical dimensions lead to a steeper ascent but no
great loss of information.
Although we are not quite done with our architectural work, Figure 15.1
shows what our staircase looks like, so far.

8
80

6

60

4

40

2

20

2

4

6

8

20 40 60 80 100

Figure 15.1: The newly constructed staircase that counts prime powers
Notice that this new staircase looks, from afar, as if it were nicely approximated
by the 45 degree straight line, i.e., by the simple function X. In fact, we have—
by this new architecture—a second equivalent way of formulating Riemann’s
hypothesis. For this, let ψ(X) denote the function of X whose graph is depicted
in Figure 15.1 (see [27]).

The Riemann Hypothesis (second formulation)
This new staircase is essentially square root close to the 45 degree
straight line; i.e., the function ψ(X) is essentially square root close to
the function f (X) = X.

53

1000
800
600
400
200
200 400 600 8001000
Figure 15.2: The newly constructed staircase is close to the 45 degree line.
Do not worry if you do not understand why our ﬁrst and second formulations of Riemann’s Hypothesis are equivalent. Our aim, in oﬀering the second
formulation—a way of phrasing Riemann’s guess that mathematicians know to
be equivalent to the ﬁrst one—is to celebrate the variety of equivalent ways we
have to express Riemann’s proposed answers to the question “How many primes
are there?” and to point out that some formulations would reveal a startling
simplicity—not immediately apparent—to the behavior of prime numbers, no
matter how erratic primes initially appear to us to be. After all, what could be
simpler than a 45 degree straight line?

Chapter 16

What do computer music
ﬁles, data compression, and
prime numbers have to do
with each other?
Sounds of all sorts—and in particular the sounds of music—travel as vibrations
of air molecules at roughly 768 miles an hour. These vibrations—ﬂuctuations of
pressure—are often represented, or “pictured,” by a graph where the horizontal
axis corresponds to time, and the vertical axis corresponds to pressure at that
time. The very purest of sounds—a single sustained note—would look something
like this (called a “sine wave”) when pictured (see Figure 16.1), so that if you
ﬁxed your position somewhere and measured air pressure due to this sound at
that position, the peaks correspond to the times when the varying air pressure
is maximal or minimal and the zeroes to the times when it is normal pressure.
1

0.5

-6

-4

-2

2

4

6

-0.5

-1

Figure 16.1: Graph of a Sine Wave
You’ll notice that there are two features to the graph in Figure 16.1.
54

55
1. The height of the peaks of this sine wave: This height is referred to as the
amplitude and corresponds to the loudness of the sound.

2. The number of peaks per second: This number is referred to as the frequency and corresponds to the pitch of the sound.

Of course, music is rarely—perhaps never—just given by a single pure sustained
note and nothing else. A next most simple example of a sound would be a simple
chord (say a C and an E played together on some electronic instrument that
could approximate pure notes). Its graph would be just the sum of the graphs
of each of the pure notes (see Figures 16.2 and 16.3).
1

0.5

5

10

15

-0.5

-1

Figure 16.2: Graph of Two Sine Waves with Diﬀerent Frequencies

2
1.5
1
0.5
5

10

15

-0.5
-1
-1.5

Figure 16.3: Graph of Sum of the Two Sine Waves with Diﬀerent Frequencies

So the picture of the changing frequencies of this chord would be already a
pretty complicated conﬁguration. What we have described in these graphs are
two sine waves (our C and our E) when they are played in phase (meaning they
start at the same time) but we could also “delay” the onset of the E note and
play them with some diﬀerent phase relationship, for example, as illustrated in
Figures 16.4 and 16.5.

56

CHAPTER 16. COMPUTER MUSIC FILES AND PRIME NUMBERS
1

Phase

0.5

5

10

15

-0.5

-1

Figure 16.4: Graph of two “sine” waves with diﬀerent phase.
1.5
1
0.5
5

10

15

-0.5
-1
-1.5

Figure 16.5: Graph of the sum of the two “sine” waves with diﬀerent frequency
and phase.
So, all you need to reconstruct the chord graphed above is to know ﬁve numbers:
• the two frequencies—the collection of frequencies that make up the sound
is called the spectrum of the sound,
• the two amplitudes of each of these two frequencies,
• the phase between them.
Now suppose you came across such a sound as pictured in Figure 16.5 and
wanted to “record it.” Well, one way would be to sample the amplitude of the
sound at many diﬀerent times, as for example in Figure 16.6.
1.5
1
0.5
5

10

15

-0.5
-1
-1.5

Figure 16.6: Graph of sampling of a sound wave
Then, ﬁll in the rest of the points to obtain Figure 16.7.

57
1.5
1
0.5
5

10

15

-0.5
-1
-1.5

Figure 16.7: Graph obtained from Figure 16.6 by ﬁlling in the rest of the points
But this sampling would take an enormous amount of storage space, at least
compared to storing ﬁve numbers, as explained above! Current audio compact
discs do their sampling 44,100 times a second to get a reasonable quality of
sound.
Another way is to simply record the ﬁve numbers: the spectrum, amplitudes,
and phase. Surprisingly, this seems to be roughly the way our ear processes
such a sound when we hear it [28].
Even in this simplest of examples (our pure chord: the pure note C played
simultaneously with pure note E) the eﬃciency of data compression that is the
immediate bonus of analyzing the picture of the chords as built just with the
ﬁve numbers giving spectrum, amplitudes, and phase is staggering.

Figure 16.8: Jean Baptiste Joseph Fourier (1768–1830)
This type of analysis, in general, is called Fourier Analysis and is one of the
glorious chapters of mathematics. One way of picturing spectrum and amplitudes
of a sound is by a bar graph which might be called the spectral picture of the
sound, the horizontal axis depicting frequency and the vertical one depicting
amplitude: the height of a bar at any frequency is proportional to the amplitude
of that frequency “in” the sound.
So our CE chord would have the spectral picture in Figure 16.9.

58

CHAPTER 16. COMPUTER MUSIC FILES AND PRIME NUMBERS
1.5
1
0.5
5

15

10

-0.5
-1

C

D

E

Figure 16.9: Spectral Picture of a CE chord

This spectral picture ignores the phase but is nevertheless a very good portrait
of the sound. The spectral picture of a graph gets us to think of that graph as
“built up by the superposition of a bunch of pure waves,” and if the graph is
complicated enough we may very well need inﬁnitely many pure waves to build
it up! Fourier analysis is a mathematical theory that allows us to start with any
graph—we are thinking here of graphs that picture sounds, but any graph will
do—and actually compute its spectral picture (and even keep track of phases).
The operation that starts with a graph and goes to its spectral picture that
records the frequencies, amplitudes, and phases of the pure sine waves that, together, compose the graph is called the Fourier transform and nowadays there
are very fast procedures for getting accurate Fourier transforms (meaning accurate spectral pictures including information about phases) by computer [29].
The theory behind this operation (Fourier transform giving us a spectral analysis of a graph) is quite beautiful, but equally impressive is how—given the power
of modern computation—you can immediately perform this operation for yourself to get a sense of how diﬀerent wave-sounds can be constructed from the
superposition of pure tones.
The sawtooth wave in Figure 16.10 has a spectral picture, its Fourier transform,
given in Figure 16.11:
1
0.8
0.6
0.4
0.2
1

2

4

3

Figure 16.10: Graph of Sawtooth Wave
1
0.8
0.6
0.4
0.2
0

5

10

15

Figure 16.11: The Spectrum of the Sawtooth Wave Has a Spike of Height 1/k
at each integer k

59
2
1
5

10

15

-1
-2

Figure 16.12: A Complicated Sound Wave
Suppose you have a complicated sound wave, say as in Figure 16.12, and you
want to record it. Standard audio CD’s record their data by intensive sampling
as we mentioned. In contrast, current mp3 audio compression technology uses
Fourier transforms plus sophisticated algorithms based on knowledge of which
frequencies the human ear can hear. With this, mp3 technology manages to get
a compression factor of 8–12 with little perceived loss in quality, so that you can
ﬁt your entire music collection on your iPod, instead of just your favorite 10
CD’s.

Chapter 17

Spectra and Trigonometric
Sums
As we saw in Chapter 16, a pure tone can be represented by a periodic sine
wave—a function of time f (t)— the equation of which might be:
f (t) = a · cos(b + θt).

2
1
-15

-10

-5

5

10

15

-1
-2

Figure 17.1: Plot of the periodic sine wave f (t) = 2 · cos(1 + t/2)

The θ determines the frequency of the periodic wave, the larger θ is the higher
the “pitch.” The coeﬃcient a determines the envelope of size of the periodic
wave, and we call it the amplitude of the periodic wave.
Sometimes we encounter functions F (t) that are not pure tones, but that can
be expressed as (or we might say “decomposed into”) a ﬁnite sum of pure tones,
for example three of them:

F (t) = a1 · cos(b1 + θ1 t) + a2 · cos(b2 + θ2 t) + a3 · cos(b3 + θ3 t)
60

61
10
8
6
4
2
-15

-10

-5

-2
-4

5

10

15

Figure 17.2: Plot of the sum 5 cos (−t − 2) + 2 cos (t/2 + 1) + 3 cos (2t + 4)
We’ll refer to such functions F (t) as ﬁnite trigonometric sums, because —well—
they are. In this example, there are three frequencies involved—i.e., θ1 , θ2 , θ3 —
and we’ll say that the spectrum of F (t) is the set of these frequencies, i.e.,
The spectrum of F (t) = {θ1 , θ2 , θ3 }.
More generally we might consider a sum of any ﬁnite number of pure cosine
waves—or in a moment we’ll also see some inﬁnite ones as well. Again, for
these more general trigonometric sums, their spectrum will denote the set of
frequencies that compose them.

Chapter 18

The spectrum and the
staircase of primes
25
20
15
10
5
20

40

60

80

100

Figure 18.1: The Staircase of Primes
In view of the amazing data-compression virtues of Fourier analysis, it isn’t
unnatural to ask these questions:
• Is there a way of using Fourier analysis to better understand the complicated picture of the staircase of primes?
• Does this staircase of primes (or, perhaps, some tinkered version of the
staircase that contains the same basic information) have a spectrum?
• If such a spectrum exists, can we compute it conveniently, just as we have
done for the saw-tooth wave above, or for the major third CE chord?
• Assuming the spectrum exists, and is computable, will our understanding of this spectrum allow us to reproduce all the pertinent information
about the placement of primes among all whole numbers, elegantly and
faithfully?
62

63
• And here is a most important question: will that spectrum show us order
and organization lurking within the staircase that we would otherwise be
blind to?
Strangely enough, it is towards questions like these that Riemann’s Hypothesis
takes us. We began with the simple question about primes: how to count them,
and are led to ask for profound, and hidden, regularities in structure.

Chapter 19

To our readers of Part I
The statement of the Riemann Hypothesis—admittedly as elusive as before—
has, at least, been expressed elegantly and more simply, given our new staircase
that approximates (conjecturally with essential square root accuracy) a 45 degree
straight line.
We have oﬀered two equivalent formulations of the Riemann Hypothesis, both
having to do with the manner in which the prime numbers are situated among
all whole numbers.
In doing this, we hope that we have convinced you that—in the words of Don
Zagier—primes seem to obey no other law than that of chance and yet exhibit
stunning regularity. This is the end of Part I of our book, and is largely the
end of our main mission, to explain—in elementary terms—what is Riemann’s
Hypothesis?
For readers who have at some point studied Diﬀerential Calculus, in Part II we
shall discuss Fourier analysis, a fundamental tool that will be used in Part III
where we show how Riemann’s hypothesis provides a key to some deeper structure of the prime numbers, and to the nature of the laws that they obey. We
will —if not explain–at least hint at how the above series of questions have been
answered so far, and how the Riemann Hypothesis oﬀers a surprise for the last
question in this series.

64

Part II

Distributions

65

Chapter 20

How Calculus manages to
ﬁnd the slopes of graphs
that have no slopes
Diﬀerential Calculus, initially the creation of Newton and/or Leibniz in the
1680s, acquaints us with slopes of graphs of functions of a real variable. So, to
discuss this we should say a word about what a function is, and what its graph
is.

Figure 20.1: Isaac Newton and Gottfried Leibniz created Calculus
A function (let us refer to it in this discussion as f ) is often described as a kind
of machine that for any speciﬁc input numerical value a will give, as output, a
well-deﬁned numerical value.
This “output number” is denoted f (a) and is called the “value” of the function
f at a. For example, the machine that adds 1 to any number can be thought of
as the function f whose value at any a is given by the equation f (a) = a + 1.
Often we choose a letter—say, X—to stand for a “general number” and we
66

67
denote the function f by the symbol f (X) so that this symbolization allows to
“substitute for X any speciﬁc number a” to get its value f (a).
The graph of a function provides a vivid visual representation of the function in
the Euclidean plane where over every point a on the x-axis you plot a point above
it of “height” equal to the value of the function at a, i.e., f (a). In Cartesian
coordinates, then, you are plotting points (a, f (a)) in the plane where a runs
through all real numbers.
8
6
4
2
0
-2
-4

-4

-2

0

4

2

6

8

Figure 20.2: Graph of the function f (a) = a + 1
In this book we will very often be talking about “graphs” when we are also
speciﬁcally interested in the functions—of which they are the graphs. We will
use these words almost synonymously since we like to adopt a very visual attitude towards the behavior of the functions that interest us.
What is the slope of the tangent line?

2
1.5
1

Here it is!

0.5
0

How to compute the slope? This is Calculus.

-0.5
1

2

3

4

5

6

7

8

Figure 20.3: Calculus
Figure 20.3 illustrates a function (blue), the slope at a point (black straight
line), and the derivative (red) of the function; the red derivative is the function
whose value at a point is the slope of the blue function at that point. Diﬀerential
Calculus explains to us how to calculate slopes of graphs, and ﬁnally, shows us
the power that we then have to answer problems we could not answer if we
couldn’t compute those slopes.
Usually, in elementary Calculus classes we are called upon to compute slopes
only of smooth graphs, i.e., graphs that actually have slopes at each of their
points, such as in the illustration just above. What could Calculus possibly do

68

CHAPTER 20. SLOPES OF GRAPHS THAT HAVE NO SLOPES

if confronted with a graph that has jumps, such as in Figure 20.4:

f (x) =

1
2

x≤3
x > 3.

(Note that for purely aesthetic reasons, we draw a vertical line at the point
where the jump occurs, though technically that vertical line is not part of the
graph of the function.)
2

1.5

1

0.5

1

2

3

4

5

6

Figure 20.4: The graph of the function f (x) above that jumps—it is 1 up to 3
and then 2 after that point.

The most comfortable way to deal with the graph of such a function is to just
approximate it by a diﬀerentiable function as in Figure 20.5. Note that this
function is not smooth, since the derivative is continuous but not diﬀerentiable;
in our discussion all we will need is that our approximating function is once
diﬀerentiable.
2
1.8
1.6
1.4
1.2
1
0

1

2

3

4

5

6

Figure 20.5: A picture of a smooth graph approximating the graph that is 1 up
to some point x and then 2 after that point, the smooth graph being ﬂat mostly.

Then take the derivative of that smooth function. Of course, this is just an
approximation, so we might try to make a better approximation, which we do
in each successive graph starting with Figure 20.6 below.

69
2

1.5

1

0.5

1

2

4

3

5

6

Figure 20.6: A picture of the derivative of a smooth approximation to a function
that jumps.

Note that—as you would expect—in the range where the initial function is
constant, its derivative is zero. In the subsequent ﬁgures, our initial function
will be nonconstant for smaller and smaller intervals about the origin. Note also
that, in our series of pictures below, we will be successively rescaling the y-axis;
all our initial functions have the value 1 for “large” negative numbers and the
value 2 for large positive numbers.

3.5
3
2.5
2
1.5
1
0.5
1

2

3

4

5

6

Figure 20.7: Second picture of the derivative of a smooth approximation to a
function that jumps.

14
12
10
8
6
4
2
1

2

3

4

5

6

Figure 20.8: Third picture of the derivative of a smooth approximation to a
function that jumps.

70

CHAPTER 20. SLOPES OF GRAPHS THAT HAVE NO SLOPES
70
60
50
40
30
20
10
1

2

3

4

5

6

Figure 20.9: Fourth picture of the derivative of a smooth approximation to a
function that jumps.
Notice what is happening: as the approximation gets better and better, the
derivative will be zero mostly, with a blip at the point of discontinuity, and the
blip will get higher and higher. In each of these pictures, for any interval of real
numbers [a, b] the total area under the red graph over that interval is equal to
the height of the blue graph at x = b
minus
the height of the blue graph at x = a.
This is a manifestation of one of the fundamental facts of life of Calculus relating
a function to its derivative:
Given any real-valued function F (x)—that has a derivative—for any
interval of real numbers [a, b] the total area under the graph of the
derivative of F (x) over that interval is equal to F (b) − F (a).
What happens if we take the series of ﬁgures 20.6–20.9, etc. to the limit? This
is quite curious:
• the series of red graphs: these are getting thinner and thinner and
higher and higher: can we make any sense of what the red graph might
mean in the limit (even though the only picture of it that we have at
present makes it inﬁnitely thin and inﬁnitely high)?
• the series of blue graphs: these are happily looking more and more
like the tame Figure 20.4.
Each of our red graphs is the derivative of the corresponding blue graph. Wouldn’t
it be tempting to think of the limit of the red graphs—whatever we might construe this to be—as standing for the derivative of the limit of the blue graphs,
i.e., of the graph in Figure 20.4?
Well, the temptation is so great that, in fact, mathematicians and physicists of
the early twentieth century struggled to give a meaning to things like the limit of
the red graphs—such things were initially called generalized functions which

71
might be considered the derivative of the limit of the blue graphs, i.e., of the
graph of Figure 20.4.
Of course, to achieve progress in mathematics, all the concepts that play a role
in the theory have to be unambiguously deﬁned, and it took a while before
generalized functions such as the limit of our series of red graphs had been
rigorously introduced.
But many of the great moments in the development of mathematics occur when
mathematicians—requiring some concept not yet formalized—work with the
concept tentatively, dismissing—if need be—mental tortures, in hopes that the
experience they acquire by working with the concept will eventually help to put
that concept on sure footing. For example, early mathematicians (Newton, Leibniz)—in replacing approximate speeds by instantaneous velocities by passing to
limits—had to wait a while before later mathematicians (e.g., Weierstrass) gave
a rigorous foundation for what they were doing.

Figure 20.10: Karl Weierstrass (1815–1897) and Laurent Schwartz (1915–2002)
Karl Weierstrass, who worked during the latter part of the nineteenth century,
was known as the “father of modern analysis.” He oversaw one of the glorious
moments of rigorization of concepts that were long in use, but never before
systematically organized. He, and other analysts of the time, were interested
in providing a rigorous language to talk about functions and more speciﬁcally
continuous functions and smooth (i.e., diﬀerentiable) functions. They wished
to have a ﬁrm understanding of limits (i.e., of sequences of numbers, or of
functions).
For Weierstrass and his companions, even though the functions they worked with
needn’t be smooth, or continuous, at the very least, the functions they studied
had well-deﬁned—and usually ﬁnite—values. But our “limit of red graphs” is
not so easily formalized as the concepts that occupied the eﬀorts of Weierstrass.
Happily however, this general process of approximating discontinuous functions

72

CHAPTER 20. SLOPES OF GRAPHS THAT HAVE NO SLOPES

more and more exactly by smooth functions, and taking their derivatives to get
the blip-functions as we have just seen in the red graphs above was eventually
given a mathematically rigorous foundation; notably, by the French mathematician, Laurent Schwartz who provide a beautiful theory that we will not go into
here, that made perfect sense of “generalized functions” such as our limit of
the series of red graphs, and that allows mathematicians to work with these
concepts with ease. These “generalized functions” are called distributions in
Schwartz’s theory [30].

Chapter 21

Distributions: sharpening
our approximating
functions even if we have to
let them shoot out to
inﬁnity

The curious limit of the red graphs of the previous section, which you might
be tempted to think of as a “blip-function” that vanishes for t nonzero and is
somehow “inﬁnite” (whatever that means) at 0 is an example of a generalized
function (in the sense of the earlier mathematicians) or a distribution in the
sense of Laurent Schwartz.

This particular limit of the red graphs also goes by another name (it is oﬃcially
called a Dirac δ-function, the adjective “Dirac” being in honor of the physicist
who ﬁrst worked with this concept, the “δ” being the symbol he assigned to
these objects). The noun “function” should be in quotation marks for, properly
speaking, the Dirac δ-function is not—as we have explained above—a bona ﬁde
function but rather a distribution.
73

74

CHAPTER 21. DISTRIBUTIONS

Figure 21.1: Paul Adrien Maurice Dirac (1902–1984)
Now may be a good time to summarize what the major diﬀerence is between
honest functions and generalized functions or distributions.
An honest (by which we mean integrable) function of a real variable f (t) possesses two “features.”
• It has values. That is, at any real number t, e.g., t = 2 or t = 0 or t = π
etc., our function has a deﬁnite real number value (f (2) or f (0) or f (π)
etc.) and if we know all those values we know the function.
• It has areas under its graph. If we are given any interval of real
numbers, say the interval between a and b, we can talk unambiguously
about the area “under” the graph of the function f (t) over the interval
between a and b. That is, in the terminology of Integral Calculus, we can
talk of the integral of f (t) from a to b. And in the notation of Calculus,
this—thanks to Leibniz—is elegantly denoted
b

f (t)dt.
a

1

∞

0.8
0.6

f(x)dx

0.4

−∞
-10

0.2
-5

5
∞

10

Figure 21.2: This ﬁgure illustrates −∞ f (x)dx, which is the signed area between
the graph of f (x) and the x-axis, where area below the x-axis (yellow) counts
negative, and area above (grey) is positive.
In contrast, a generalized function or distribution

75
• may not have “deﬁnite values” at all real numbers if it is not an
honest function, nevertheless:
• It has well-deﬁned areas under portions of its “graph.” If we are
given any interval of real numbers, say the (open) interval between a and
b, we can still talk unambiguously about the area “under” the graph of the
generalized function D(t) over the interval between a and b and we will
denote this–extending what we do in ordinary calculus—by the symbol
b

D(t)dt.
a

This description is important to bear in mind and it gives us a handy way of
thinking about “generalized functions” (i.e., distributions) as opposed to functions: when we consider an (integrable) function of a real variable, f (t), we may
invoke its value at every real number and for every interval [a, b] we may consider
b
the quantity a f (t)dt. BUT when we are given a generalized function D(t) we
only have at our disposal the latter quantities. In fact, a generalized function of
a real variable D(t) is (formally) nothing more than a rule that assigns to any
b
ﬁnite interval [a, b] (a ≤ b) a quantity that we might denote a D(t)dt and that
behaves as if it were the integral of a function and in particular—for three real
numbers a ≤ b ≤ c we have the additivity relation
c

b

D(t)dt =

c

D(t)dt +

a

D(t)dt.

a

b

SO, any honest function integrable over ﬁnite intervals clearly is a distribution
(forget about its values!) but . . . there are many more generalized functions,
and including them in our sights gives us a very important tool.
It is natural to talk, as well, of Cauchy sequences, and limits, of distributions.
We’ll say that such a sequence D1 (t), D2 (t), D3 (t), . . . is a Cauchy sequence
if for every interval [a, b] the quantities
b

b

D1 (t)dt,
a

b

D2 (t)dt,

D3 (t)dt, . . .

a

a

form a Cauchy sequence of real numbers. Now, any Cauchy sequence of distributions converges to a limiting distribution D(t) which is deﬁned by the rule
that for every interval [a, b],
b

b

D(t)dt = lim
a

i→∞

Di (t)dt.
a

If, by the way, you have an inﬁnite sequence—say—of honest, continuous, functions that converges uniformly to a limit (which will again be a continuous

76

CHAPTER 21. DISTRIBUTIONS

function) then that sequence certainly converges—in the above sense—to the
same limit when these functions are viewed as generalized functions. BUT, there
are many important occasions where your sequence of honest continuous functions doesn’t have that convergence property and yet when they are viewed as
generalized functions they do converge to some generalized function as a limit.
We will see this soon when we get back to the “sequence of the red graphs.”
This sequence does converge (in the above sense) to the Dirac δ-function when
these red graphs are thought of as a sequence of generalized functions.
The integral notation for distribution is very useful, and allows us the ﬂexibility
to deﬁne, for nice enough—and honest—functions c(t) useful expressions such
as
b

c(t)D(t).
a

Figure 21.3: The Dirac δ-“function” (actually distribution), where we draw a
vertical arrow to illustrate the delta function with support at a given point.
For example, for the Dirac δ-function we have been discussing (i.e., the limit of
the red graphs of the previous section) is an honest function away from t = 0
and —in fact—is the “trivial function” zero away from the origin. And at 0, we
may say that it has the “value” inﬁnity, in honor of it being the limit of blip
functions getting taller and taller at 0 but the feature that pins it down as a
distribution is given by its behavior relative to the second feature above, the
area of its graph over the open interval between a and b:
• If both a and b have the same sign (i.e., if the origin is not in the open
interval spanned by a and b) then the “area under the graph of our Dirac
δ-function” is 0.
• If a is negative and b is positive then the “area under the graph of our
Dirac δ-function” is 1—in notation
b

δ = 1.
a

We sometimes summarize the fact that these areas vanish so long as the origin
is not included in the interval we are considering by saying that the support
of this δ-function is “at the origin.”

77
Once you’re happy with this Dirac δ-function, you’ll also be happy with a Dirac
δ-function—call it δx —with support concentrated at any speciﬁc real number
x gotten by “translating” the one we’ve been talking about appropriately; δx
vanishes for t = x and intuitively speaking, has an inﬁnite blip at t = x.
So, the original delta-function we were discussing, i.e., δ(t) would be denoted
δ0 (t).
A question: If you’ve never seen distributions before, but know the Riemann
b
integral, can you guess at what the deﬁnition of a c(t)D(t) is, and can you
formulate hypotheses on c(t) that would allow you to endow this expression
with a deﬁnite meaning?
A second question: if you have not seen distributions before, and have answered the ﬁrst question above, let c(t) be an honest function for which your
deﬁnition of
b

c(t)D(t)
a

applies. Now let x be a real number. Can you use your deﬁnition to compute
+∞

c(t)δx (t)?
−∞

The answer, by the way, is:
later sections!

+∞
−∞

c(t)δx (t) = c(x). This will be useful in the

The theory of distributions gives a partial answer to the following funny question:
How in the world can you “take the derivative” of a function F (t)
that doesn’t have a derivative?
The short answer to this question is that this derivative F (t) which doesn’t
exist as a function may exist as a distribution. What then is the integral of that
distribution? Well, it is given by the original function!
b

F (t)dt = F (b) − F (a).
a

Let us practice this with simple staircase functions. For example, what is the
derivative—in the sense of the theory of distributions—of the function in Figure 21.4. Answer δ0 + 2δ1 .

78

CHAPTER 21. DISTRIBUTIONS
3
2.5
2
1.5
1
0.5
-1

-0.5

0.5

1

1.5

2

Figure 21.4: The staircase function that is 0 for t ≤ 0, 1 for 0 < t ≤ 1 and 3 for
1 < t ≤ 2 has derivative δ0 + 2δ1 .
We’ll be dealing with much more complicated staircase functions in the next
section, but the general principles discussed here will nicely apply there [31].

Chapter 22

Fourier transforms: second
visit
In Chapter 16 above we wrote:
The operation that starts with a graph and goes to its spectral picture that records the frequencies, amplitudes, and phases of the pure
sine waves that, together, compose the graph is called the Fourier
transform.
Now let’s take a closer look at this operation Fourier transform.
We will focus our discussion on an even function f (t) of a real variable t.
“Even” means that its graph is symmetric about the y-axis; that is, f (−t) =
f (t). See Figure 22.1.
2.5
2
1.5
1
0.5
-4

-3

-2

-1

1

2

3

4

Figure 22.1: The graph of an even function is symmetrical about the y-axis.
When we get to apply this discussion to the staircase of primes π(t) or the
tinkered staircase of primes ψ(t) both of which being deﬁned only for positive
values of t, then we would “lose little information” in our quest to understand
them if we simply “symmetrized their graphs” by deﬁning their values on negative numbers −t via the formulas π(−t) = π(t) and ψ(−t) = ψ(t) thereby
turning each of them into even functions.
79

80

CHAPTER 22. FOURIER TRANSFORMS: SECOND VISIT
14
12
10
8
6
4
2
-40

-20

20

40

Figure 22.2: Even extension of the staircase of primes.
The idea behind the Fourier transform is to express f (t) as made up out of sine
and cosine wave functions. Since we have agreed to consider only even functions, we can dispense with the sine waves—they won’t appear in our Fourier
analysis—and ask how to reconstruct f (t) as a sum (with coeﬃcients) of cosine functions (if only ﬁnitely many frequencies occur in the spectrum of our
function) or more generally, as an integral if the spectrum is more elaborate.
For this work, we need a little machine that tells us, for each real number θ
whether or not θ is in the spectrum of f (t), and if so, what the amplitude is
of the cosine function cos(θt) that occurs in the Fourier expansion of f (t)—this
amplitude answers the awkwardly phrased question: how much cos(θt) “occurs
ˆ
in” f (t)? We will denote this amplitude by f (θ), and refer to it as the Fourier
transform of f (t). The spectrum, then, of f (t) is the set of all frequencies θ
where the amplitude is nonzero.

ˆ (θ)
f

f(t)

ˆ
Figure 22.3: The Fourier Transform Machine, which transforms f (t) into f (θ)
+∞

Now in certain easy circumstances—speciﬁcally, if −∞ |f (t)|dt (exists, and)
is ﬁnite—the Integral Calculus provides us with an easy construction of that
machine (see Figure 22.3); namely:
+∞

ˆ
f (θ) =

f (t) cos(−θt)dt.
−∞

This concise machine manages to “pick out” just the part of f (t) that has
frequency θ!! It provides for us the analysis part of the Fourier analysis of our
function f (t).
But there is a synthesis part to our work as well, for we can reconstruct f (t)
from its Fourier transform, by a process intriguingly similar to the analysis part;

81
namely: if

+∞
−∞

ˆ
|f (θ)|dθ (exists, and) is ﬁnite, we retrieve f (t) by the integral

f (t) =

1
2π
+∞

+∞

ˆ
f (θ) cos(θt)dθ.
−∞

We are not so lucky to have −∞ |f (t)|dt ﬁnite when we try our hand at a
Fourier analysis of the staircase of primes, but we’ll work around this!

Chapter 23

What is the Fourier
transform of a delta
function?
Consider the δ-function that we denoted δ(t) (or δ0 (t)). This is also the “generalized function” that we thought of as the “limit of the red graphs” in Chapter 21
above. Even though δ(t) is a distribution and not a bona ﬁde function, it is
symmetric about the origin, and also
+∞

|δ(t)|dt
−∞

exists, and is ﬁnite (its value is, in fact, 1). All this means that, appropriately
understood, the discussion of the previous section applies, and we can feed
this delta-function into our Fourier Transform Machine (Figure 22.3) to see
what frequencies and amplitudes arise in our attempt to express—whatever
this means!—the delta-function as a sum, or an integral, of cosine functions.
ˆ
So what is the Fourier transform, δ0 (θ), of the delta-function?
Well, the general formula would give us:
+∞

ˆ
δ0 (θ) =

cos(−θt)δ0 (t)dt
−∞

and as we mentioned in section 18, for any nice function c(t) we have that the
integral of the product of c(t) by the distribution δx (t) is given by the value of
the function c(t) at t = x. SO:
+∞

ˆ
δ0 (θ) =

cos(−θt)δ0 (t)dt = cos(0) = 1.
−∞

82

83
In other words, the Fourier transform of δ0 (t) is the constant function
ˆ
δ0 (θ) = 1.
One can think of this colloquially as saying that the delta-function is a perfect
example of white noise in that every frequency occurs in its Fourier analysis and
they all occur in equal amounts.
To generalize this computation let us consider for any real number x the symmetrized delta-function with support at x and −x, given by
dx (t) = (δx (t) + δ−x (t))/2

−x

x

Figure 23.1: The sum (δx (t) + δ−x (t))/2, where we draw vertical arrows to
illustrate the Dirac delta functions.
What is the Fourier transform of this dx (t)? The answer is given by making the
same computation as we’ve just made:

+∞
1
ˆ
dx (θ) =
cos(−θt)δx (t)dt +
2
−∞
1
cos(−θx) + cos(+θx)
=
2
= cos(xθ)

+∞

cos(−θt)δ−x (t)dt
−∞

To summarize this in ridiculous (!) colloquial terms: for any frequency θ the
amount of cos(θt) you need to build up the generalized function (δx (t)+δ−x (t))/2
is cos(xθ).
So far, so good, but remember that the theory of the Fourier transform has—
like much of mathematics—two parts: an analysis part and a synthesis part.
We’ve just performed the analysis part of the theory for these symmetrized
delta functions (δx (t) + δ−x (t))/2.
Can we synthesize them—i.e., build them up again—from their Fourier transforms?
We’ll leave this, at least for now, as a question for you.

Chapter 24

Trigonometric series
Given our interest in the ideas of Fourier, it is not surprising that we’ll want to
deal with things like
∞

ak cos(sk · θ)

F (θ) =
k=1

where the sk are real numbers tending (strictly monotonically) to inﬁnity. These
we’ll just call trigonometric series without asking whether they converge in
any sense for all values of θ, or even for any value of θ. The sk ’s that occur
in such a trigonometric series we will call the spectral values or for short,
the spectrum of the series, and the ak ’s the (corresponding) amplitudes. We
repeat that we impose no convergence requirements at all. But we also think of
these things as providing “cutoﬀ” ﬁnite trigonometric sums, which we think of
as functions of two variables, θ and C (the “cutoﬀ”) where
ak cos(sk · θ).

F (θ, C) :=
sk ≤C

These functions F (θ, C) are ﬁnite trigonometric series and therefore “honest
functions” having ﬁnite values everywhere.

24.1

Spike values

Deﬁnition 24.1. Say that a trigonometric serties F (θ) has a spike at the real
number θ = τ ∈ R if the set of absolute values |F (τ, C)| as C ranges through
positive number cutoﬀs is unbounded. A real number τ ∈ R is, in contrast, a
non-spike if those values admit a ﬁnite upper bound.
One main mission in this book will be furthered just by paying attention to the
spike values of certain (inﬁnite) trigonometric series. The trigonometric sums we
84

24.2. TRIGONOMETRIC SERIES AS FOURIER TRANSFORMS85
study are going to be particularly nice. Under the assumption of the Riemann
Hypothesis, they will have the property that the set of their spike values are
particularly interesting discrete subsets of the real line.

24.2

Trigonometric Series as Fourier Transforms

Recall, as in Chapter 23, that for any real number x, we considered the symmetrized delta-function with support at x and −x, given by
dx (t) = (δx (t) + δ−x (t))/2

−x

x

Figure 24.1: The sum (δx (t) + δ−x (t))/2, where we draw vertical arrows to
illustrate the Dirac delta functions.
and noted that the Fourier transform of this dx (t) is
ˆ
dx (θ) = cos(xθ).
It follows, of course, that a cutoﬀ ﬁnite trigonometric series, F (θ, C) associated
to an inﬁnite trigonometric series
∞

ak cos(sk · θ)

F (θ) =
k=1

is the Fourier transform of the distribution
D(t, C) :=

ak dsk (t).
sk ≤C

Given the discreteness of the set of spectral values sk (k = 1, 2, . . . ) and given
the eﬃcacy of the theory of distributions, we can perfectly well consider the
inﬁnite sum
∞

D(t) :=

ak dsk (t),
k=1

viewed as distribution, and playing the role of the ‘inverse Fourier transform’ of
our trigonometric series F (t).

Part III

The Riemann Spectrum of
the Prime Numbers

86

Chapter 25

A sneak preview
To get a sense of what we’re in for, let us consider two inﬁnite trigonometric
sums—that seem to be related one to the other in that the frequencies of the
terms in the one trigonometric sum give the spike values of the other, and vice
versa: the frequencies of the other give the spike values of the one: a kind
of duality as in the theory of Fourier transforms. We show this duality by
exhibiting the graphs of more and more accurate ﬁnite approximations (cutoﬀs)
of these inﬁnite sums. More speciﬁcally,
1. The ﬁrst inﬁnite trigonometric sum is a sum of pure cosine waves with
frequencies given by logarithms of powers of primes and with amplitudes
that will be described below. The graphs of longer and longer ﬁnite truncations of these trigonometric sums, as you will see, have “higher and
higher peaks” concentrated more and more accurately at a certain inﬁnite discrete set of real numbers that what we will be referring to as the
Riemann spectrum indicated in our pictures below (Figures 25.2-25.5)
by the series of vertical red lines.
2. In contrast, the second inﬁnite trigonometric sum is a sum of pure cosine
waves with frequencies given by what we have dubbed above the Riemann
spectrum and with amplitudes all equal to 1. These graphs will have
“higher and higher peaks” concentrated more and more accurately at the
logarithms of powers of primes indicated in our pictures below (see
Figure 25.6) by the series of vertical blue lines.
That the series of blue lines (i.e., the logarithms of powers of primes) in our
pictures below determines—via the the trigonometric sums we describe—the
series of red lines (i.e., what we are calling the spectrum) and conversely is a
consequence of the Riemann Hypothesis.
87

88

CHAPTER 25.

A SNEAK PREVIEW

1. Getting the Riemann Spectrum as the spike values of a trigonometric series with frequencies equal to (logs of ) powers of the
primes:
To get warmed up, let’s plot the positive values of the following sum of
(co-)sine waves:

f (t) = −

log(3)
log(2)
cos(t log(2)) − 1/2 cos(t log(3))
1/2
2
3
log(2)
log(5)
− 1/2 cos(t log(4)) − 1/2 cos(t log(5))
4
5

1.5
1
0.5
20

40

60

80

100

Figure 25.1: Plot of f (t)
Look at the peaks of this graph. There is nothing very impressive about
them, you might think; but wait, for this is just a very “early ” piece of
an expression that consists of a sum1 of inﬁnitely many (co-)sine waves:
−
pn

log(p)
cos(t log(pn ))
pn/2

the summation being over all powers pn of all prime numbers p.
Let us cut this inﬁnite sum oﬀ taking only ﬁnitely many terms, by choosing
various “cutoﬀ values” C and forming the ﬁnite sums
−
pn ≤C

log(p)
cos(t log(pn ))
pn/2

and plotting their positive values. Figures 25.2-25.5 show what we get for
a few values of C.
In each of the graphs, we have indicated by red vertical arrows the real
numbers that give the values of the Riemann spectrum that we will be
discussing. These numbers at the red vertical arrows in the graphs above,
θ1 , θ2 , θ3 , . . .
1 Here

we make use of the Greek symbol
as a shorthand way of expressing a sum of
many terms. We are not requesting this sum to converge.

89
are spike values—as described in Chapter 24—of the inﬁnite trigonometric
series
log(p)
cos(t log(pn )).
−
pn/2
n <C
p
They constitute what we are calling the Riemann spectrum and are key
to the staircase of primes [32].
• The sum with pn ≤ C = 5
Here is the function f (t) we displayed above; it consists in the sum
of the ﬁrst four terms of our inﬁnite sum, and doesn’t yet show very
much “structure”:
1.5
1
0.5
0

20

Figure 25.2: Plot of −
spectrum of the primes

40
log(p)
pn ≤5 pn/2

60

80

100

cos(t log(pn )) with arrows pointing to the

• The sum with pn ≤ C = 20
Something, (don’t you agree?) is already beginning to happen here:
3
2.5
2
1.5
1
0.5
0

20

Figure 25.3: Plot of −
spectrum of the primes

40
log(p)
pn ≤20 pn/2

60

80

100

cos(t log(pn )) with arrows pointing to the

• The sum with pn ≤ C = 50
Note that the high peaks seem to be lining up more accurately with
the vertical red lines. Note also that the y-axis has been rescaled.

90

CHAPTER 25.

A SNEAK PREVIEW

4
3
2
1
0

20

Figure 25.4: Plot of −
spectrum of the primes

40
log(p)
pn ≤50 pn/2

60

80

100

cos(t log(pn )) with arrows pointing to the

• The sum with pn ≤ C = 500
Here, the peaks are even sharper, and note that again they are higher;
that is, we have rescaled the y-axis.
8
7
6
5
4
3
2
1
0

20

Figure 25.5: Plot of −
spectrum of the primes

40
log(p)
pn ≤500 pn/2

60

80

100

cos(t log(pn )) with arrows pointing to the

We will pay attention to:
• how this “plays out” as we take the sums of longer and longer pieces
of the inﬁnite sum of cosine waves above, given by larger and larger
cutoﬀs C,
• how this spectrum of red lines more closely matches the high peaks
of the graphs of the positive values of these ﬁnite sums,
• how these peaks are climbing higher and higher,
• what relationship this has to the Fourier analysis of the staircase of
primes,
• and, equally importantly, what these mysterious red lines signify.
2. Towards (logs of ) powers of the primes, starting from the Riemann Spectrum:

91
Here we will be making use of the series of numbers
θ1 , θ2 , θ3 , . . .
comprising what we called the spectrum. We consider the inﬁnite trigonometric series
1 + cos(θ1 t) + cos(θ2 t) + cos(θ3 t) + . . .
or, using the

notation,
1+

cos(θt)
θ

where the summation is over the spectrum, θ = θ1 , θ2 , θ3 , . . . . Again we
will consider ﬁnite cutoﬀs C of this inﬁnite trigonometric sum,
ˆ
Φ≤C (θ) = 1 +

cos(θt)
θ ≤C

and plot their graphs, over various ranges.
1.5
1
0.5
0

3
2.5
2
1.5
1
0.5
0
4
3
2
1
0
8
7
6
5
4
3
2
1
0

20

40

60

80

100

20

40

60

80

100

20

40

60

80

100

20

40

60

80

100

ˆ
Figure 25.6: Plots of Φ≤C (θ) for C = 5 (top), 20, 50, and 500 (bottom).
This passage—thanks to the Riemann Hypothesis— from spectrum to prime
powers and back again via consideration of the “high peaks” in the graphs of the
appropriate trigonometric sums provides a kind of visual duality emphasizing,

92

CHAPTER 25.

A SNEAK PREVIEW

for us, that the information inherent in the wild spacing of prime powers, is
somehow “packaged” in the series of mysterious numbers we have called the
Riemann Spectrum, and reciprocally, the information given in that series of
mysterious numbers is obtainable from the sequence of prime powers.

Chapter 26

On losing no information
To manage to repackage the “same” data in various ways—where each way
brings out some features that would be kept in the shadows if the data were
packaged in some diﬀerent way—is a high art, in mathematics. In a sense
every mathematical equation does this, for the “equal sign” in the middle of
the equation tells us that even though the two sides of the equation may seem
diﬀerent, or have diﬀerent shapes, they are nonetheless “the same data.” For
example, the equation

log(XY ) = log(X) + log(Y )

which we encountered earlier in Chapter 10, is just two ways of looking at
the same thing, yet it was the basis for much manual calculation for several
centuries.
Now, the problem we have been concentrating on, in this book, has been—in
eﬀect—to understand the pattern, if we can call it that, given by the placement
of prime numbers among the natural line-up of all whole numbers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

Figure 26.1: Prime Numbers up to 37
There are, of course, many ways for us to present this basic pattern. Our initial
strategy was to focus attention on the staircase of primes which gives us a vivid
portrait, if you wish, of the order of appearance of primes among all numbers.
93

94

CHAPTER 26. ON LOSING NO INFORMATION
12
10
8
6
4
2
5

10

15

20

25

30

35

Figure 26.2: Prime Numbers up to 37
As we have already hinted in the previous sections, however, there are various
ways open to us to tinker with—and signiﬁcantly modify—our staircase without
losing the essential information it contains. Of course, there is always the danger
of modifying things in such a way that “retrieval” of the original data becomes
diﬃcult. Moreover, we had better remember every change we have made if we
are to have any hope of retrieving the original data!
With this in mind, let us go back to Chapter 14 (discussing the staircase
of primes) and Chapter 15, where we tinkered with the original staircase of
primes—alias: the graph of π(X)—to get ψ(X) whose risers look—from afar—
as if they approximated the 45 degree staircase.
At this point we’ll do some further carpentry on ψ(X) without destroying the
valuable information it contains. We will be replacing ψ(X) by a generalized
function, i.e., a distribution, which we denote Φ(t) that has support at all positive integral multiples of logs of prime numbers, and is zero on the complement
of that discrete set. Recall that by deﬁnition, a discrete subset S of real numbers is the support of a function, or of a distribution, if the function vanishes
on the complement of S and doesn’t vanish on the complement of any proper
subset of S.
Given the mission of our book, it may be less important for us to elaborate on
the construction of Φ(t) than it is to (a) note that Φ(t) contains all the valuable
information that ψ(X) has and (b) pay close attention to the spike values of
the trigonometric series that is the Fourier transform of Φ(t).
For the deﬁnition of the distribution Φ(t) see the end-note [33].
A distribution that has a discrete set of real numbers as its support—as Φ(t)
does, we sometimes like to call spike distributions since the pictures of functions approximating them tend to look like a series of spikes.
We have then before us a spike distribution with support at integral multiples of
logarithms of prime numbers, and this generalized function retains the essential
information about the placement of prime numbers among all whole numbers,
and will be playing a major role in our story: knowledge of the placement of the
“blips” constituting this distribution (its support), being at integral multiples of
logs of prime numbers, would allow us to reconstruct the position of the prime
numbers among all numbers. Of course there are many other ways to package
this vital information, so we must explain our motivation for subjecting our

95
poor initial staircase to the particular series of brutal acts of distortion that we
described, that end up with the distribution Φ(t).

Chapter 27

Going from the primes to
the Riemann Spectrum
We’ve discussed the nature of the Fourier transform of (symmetrized) δ-functions
in Chapter 23. In particular, recall the “spike function”
dx (t) = (δx (t) + δ−x (t))/2
that has support at the points x and −x. We mentioned that its Fourier transˆ
form, dx (θ), is equal to cos(xθ) (and gave some hints about why this may be
true).
Our next goal is to work with the much more interesting “spike function”
Φ(t) = e−t/2 Ψ (t),
which was one of the generalized function that we engineered in Chapter 26, and
that has support at all nonnegative integral multiples of logarithms of prime
numbers.
As with any function—or generalized function—deﬁned for non-negative values
of t, we can “symmetrize it” (about the t-axis) which means that we can deﬁne
it on negative real numbers by the equation
Φ(−t) = Φ(t).
Let us make that convention, thereby turning Φ(t) into an even generalized
function.
96

97

log(16)

log(5)

log(2)

log(2)

log(5)

log(16)

Figure 27.1: Φ(t) is a sum of Dirac delta functions at the logarithms of prime
powers pn weighted by p−n/2 log(p) (and log(2π) at 0). The more blue the
arrow, the smaller the weight.

We may want to think of Φ(t) as a limit of the following sequence of distributions,

Φ(t) =

lim Φ≤C (t)

C→∞

where Φ≤C (t) is the ﬁnite linear combinations of (symmetrized) δ-functions
dx (t):

Φ≤C (t)

:=

p−n/2 log(p)dn log(p) (t).

2
prime powers pn ≤C

Since the Fourier transform of dx (t) is cos(tθ), the Fourier transform of each
dn log(p) (t) is cos(n log(p)θ) so the Fourier transform of Φ≤C (t) is:

ˆ
Φ≤C (θ) := 2

p−n/2 log(p) cos(n log(p)θ).
prime powers

pn <C

We have come then—again—to the expressions graphed at the beginning of
Part III, in Chapter 25.

98CHAPTER 27. GOING FROM THE PRIMES TO THE RIEMANN SPECTRUM
1.5
1
0.5
0

3
2.5
2
1.5
1
0.5
0
4
3
2
1
0
8
7
6
5
4
3
2
1
0

20

40

60

80

100

20

40

60

80

100

20

40

60

80

100

20

40

60

80

100

ˆ
Figure 27.2: Plots of − 1 Φ≤C (θ) for C = 5 (top), 20, 50, and 500 (bottom).
2
We already see hints of the Riemann Spectrum: The θ-coordinates of
the peaks of these graphs seem to be vaguely clustered about a discrete set of
positive real numbers. In fact, this is already a hint of what happens to the
ˆ
graphs of these functions Φ≤C (θ) as we allow the cut-oﬀ C to tend to inﬁnity.
But to make it more than a hint, and to “see” a certain inﬁnite set of positive
real numbers
θ1 , θ2 , θ3 , . . .
ˆ
emerge, we are going to subject our functions Φ≤C (θ) to a few simple cleansing
operations.
Once we deﬁne them, we will be calling the set
θ1 , θ2 , θ3 , . . .
the Riemann spectrum of the primes and if the Riemann Hypothesis holds,
these numbers would be the key to the placement of primes on the number line.
By tabulating these peaks (see [34] for a precise deﬁnition) we can compute—at

99
least very approximately—. . .
θ1

=

14.134725 . . .

θ2

=

21.022039 . . .

θ3

=

25.010857 . . .

θ4

=

30.424876 . . .

θ5

=

32.935061 . . .

θ6

=

37.586178 . . .

Riemann deﬁned this sequence of numbers in his 1859 article in a manner somewhat diﬀerent from the treatment we have given (in that article these θi appear
as the “imaginary parts of the nontrivial zeroes of his ζ-function;” we will discuss
this brieﬂy in Chapter 30 below). Riemann wrote:
“One now ﬁnds indeed approximately this number of real roots
within these limits, and it is very probable that all roots are real.
Certainly one would wish for a stricter proof here; I have meanwhile
temporarily put aside the search for this after some futile attempts,
as it appears unnecessary for the next objective of my investigation.”
Nowadays, these mysterious numbers, these spectral lines for the staircase of
primes are known to great accuracy. Here is the smallest one, θ1 , given with
over 1,000 digits of its decimal expansion:

14.134725141734693790457251983562470270784257115699243175685567460149
9634298092567649490103931715610127792029715487974367661426914698822545
8250536323944713778041338123720597054962195586586020055556672583601077
3700205410982661507542780517442591306254481978651072304938725629738321
5774203952157256748093321400349904680343462673144209203773854871413783
1735639699536542811307968053149168852906782082298049264338666734623320
0787587617920056048680543568014444246510655975686659032286865105448594
4432062407272703209427452221304874872092412385141835146054279015244783
3835425453344004487936806761697300819000731393854983736215013045167269
6838920039176285123212854220523969133425832275335164060169763527563758
9695376749203361272092599917304270756830879511844534891800863008264831
2516911271068291052375961797743181517071354531677549515382893784903647
4709727019948485532209253574357909226125247736595518016975233461213977
3160053541259267474557258778014726098308089786007125320875093959979666
60675378381214891908864977277554420656532052405

and if, by any chance, you wish to peruse the ﬁrst 2,001,052 of these θi ’s
calculated to an accuracy within 3 · 10−9 , consult Andrew Odlyzko’s tables:
http://www.dtc.umn.edu/~odlyzko/zeta_tables

100CHAPTER 27. GOING FROM THE PRIMES TO THE RIEMANN SPECTRUM
Since people have already computed at least 1029.9 billion θ’s (as of Feb. 18,
2005) and have never found one with multiplicity > 1; it is generally expected
that the multiplicity of all the theta’s in the Riemann Spectrum is 1. But,
independent of that expectation, our convention in what follows will be that we
count each of the elements in the Riemann Spectrum repeated as many times
as their multiplicity. So, if it so happens that θn occurs with multiplicity two,
we view the Riemann spectrum as being the series of numbers
θ1 , θ2 , . . . , θn−1 , θn , θn , θn+1 , . . .

Chapter 28

Going from the Riemann
Spectrum to the primes
To justify the name Riemann spectrum of primes we will investigate graphically
whether in an analogous manner we can use this spectrum to get information
about the placement of prime numbers. We might ask, for example, if there is a
trigonometric function with frequencies given by this collection of real numbers,
θ1 , θ2 , θ3 , . . .
that somehow pinpoints the prime powers, just as our functions
ˆ
Φ(θ)≤C = 2

p−m/2 log(p) cos(n log(p)θ)
prime powers pn ≤C

for large C pinpoint the spectrum (as discussed in the previous two chapters).
To start the return game, consider this sequence of trigonometric functions that
have (zero and) the θi as spectrum
cos(θi · x).

GC (x) := 1 +
i<C

As we’ll see presently it is best to view these functions on a logarithmic scale so
we will make the substitution of variables x = log(s) and write
cos(θi · log(s)).

HC (s) := GC (log(s)) = 1 +
i<C

Now take a look at the graphs of our functions HC (s) for various choices of C
and ranges of s.
101

102CHAPTER 28. GOING FROM THE RIEMANN SPECTRUM TO THE PRIMES
To get a cleaner picture let us subject our functions HC (s) to Cesaro smoothing
as in the previous chapter. Deﬁne:

1
˜
HC (s) :=
C

C

Hc (θ),
c=1

and let’s take another look.
[35]

150

100

50

0

2

3

4

5
5

7

8

9

10

11

13

15

16 17

19

20

23

25
25

27

29

30

1000

Figure 28.1: Illustration of − i=1 cos(log(s)θi ), where θ1 ∼ 14.13, . . . are the
ﬁrst 1000 contributions to the Riemann spectrum. The red dots are at the prime
powers pn , whose size is proportional to log(p).

27

29

1000

31

32

Figure 28.2: Illustration of − i=1 cos(log(s)θi ) in the neighborhood of a twin
prime. Notice how the two primes 29 and 31 are separated out by the Fourier
series, and how the prime powers 33 and 25 also appear.

103

1013

1019

1021

1024

Figure 28.3: Fourier series from 1, 000 to 1, 030 using 15,000 of the numbers θi .
Note the twin primes 1019 and 1021 and that 1024 = 210 .

Part IV

Back to Riemann

104

Chapter 29

How to build π(X) knowing
the Spectrum (Riemann’s
way)
We have been dealing in Part III of our book with Φ(t) a distribution that—
we said—contains all the essential information about the placement of primes
among numbers. We have given a clean restatement of Riemann’s hypothesis,
the third restatement so far, in term of this Φ(t). But Φ(t) was the eﬀect of a
series of recalibrations and reconﬁgurings of the original untampered-with staircase of primes. A test of whether we have strayed from our original problem—to
understand this staircase—would be whether we can return to the original staircase, and “reconstruct it” so to speak, solely from the information of Φ(t)—or
equivalently, assuming the Riemann Hypothesis as formulated in Chapter 15—
can we construct the staircase of primes π(X) solely from knowledge of the
sequence of real numbers θ1 , θ2 , θ3 , . . . ?
The answer to this is yes (given the Riemann Hypothesis), and is discussed very
beautifully by Bernhard Riemann himself in his famous 1859 article cited above.
Bernhard Riemann used the spectrum of the prime numbers to provide an exact
analytic formula that analyzes and/or synthesizes the staircase of primes. This
formula is motivated by Fourier’s analysis of functions as constituted out of
sines. Riemann started with a speciﬁc smooth function, which we will refer to
as R(X), a function that Riemann oﬀered, just as Gauss oﬀered his Li(X), as a
candidate smooth function approximating the staircase of primes. Recall from
X
Chapter 12 that Gauss’s guess is Li(X) = 2 dt/log(t). Riemann’s guess for
a better approximation to π(X) is obtained from Gauss’s using the Moebius
105

106

CHAPTER 29. BUILDING π(X) KNOWING THE SPECTRUM

function µ(n), which is deﬁned by

1

µ(n) = −1


0

if n is a square-free positive integer with an even number of distinct prime factors
if n is a square-free positive integer with an odd number of distinct prime factors
if n is not square-free.

See Figure 29.1 for a plot of the Moebius function.
1
0.5
10

-0.5

20

30

40

50

-1

Figure 29.1: The blue dots plot the values of the Moebius function µ(n), which
is only deﬁned at integers.
Riemann’s guess is
∞

R(X) =

1
µ(n)
Li(X n ),
n
n=1

where µ(n) is the Moebius function introduced above.

Figure 29.2: Riemann deﬁning R(X) in his manuscript
In Chapter 13 we encountered the Prime Number Theorem, which asserts that
X/ log(X) and Li(X) are both approximation for π(X), in the sense that both go
to inﬁnity at the same rate. Our ﬁrst formulation of the Riemann Hypothesis
(see page 44) was that Li(X) is an essentially square root accurate approximation of π(X). Figures 29.3–29.4 illustrate that Riemann’s function R(X)
appears to be an even better approximation to π(X) than anything we have
seen before.

107

25

150

20

100

15
10

50

5
0

20

40

60

80

200

100

400

600

800

Figure 29.3: Comparisons of Li(X) (top), π(X) (middle), and R(X) (bottom,
computed using 100 terms)
1340
1320
1300
1280
1260
1240
10000

10200

10400

10600

10800

11000

Figure 29.4: Closeup comparison of Li(X) (top), π(X) (middle), and R(X)
(bottom, computed using 100 terms)
Think of Riemann’s smooth curve R(X) as the fundamental approximation to
π(X). Riemann oﬀered much more than just a (conjecturally) better approximation to π(X) in his wonderful 1859 article. He found a way to construct what
looks like a Fourier series, but with sin(X) replaced by R(X) and spectrum the
θi , which conjecturally exactly equals π(X). He gave an inﬁnite sequence of
improved guesses,
R(X) = R0 (X),

R1 (X),

R2 (X),

R3 (X),

...

and he hypothesized that one and all of them were all essentially square root
approximations to π(X), and that the sequence of these better and better approximations converge to give an exact formula for π(X).
Thus not only did Riemann provide a “fundamental” (that is, a smooth curve
that is an astoundingly close to π(X)) but he viewed this as just a starting point,
for he gave the recipe for providing an inﬁnite sequence of corrective terms—
call them Riemann’s harmonics; we will denote the ﬁrst of these “harmonics”
C1 (X), the second C2 (X), etc. Riemann gets his ﬁrst corrected curve, R1 (X),
from R(X) by adding this ﬁrst harmonic to the fundamental,
R1 (X) = R(X) + C1 (X),
he gets the second by correcting R1 (X) by adding the second harmonic
R2 (X) = R1 (X) + C2 (X),

1000

108

CHAPTER 29. BUILDING π(X) KNOWING THE SPECTRUM

and so on
R3 (X) = R2 (X) + C3 (X),
and in the limit provides us with an exact ﬁt.

Figure 29.5: Riemann analytic formula for π(X).
The Riemann Hypothesis, if true, would tell us that these correction terms
C1 (X), C2 (X), C3 (X), . . . are all square-root small, and all the successively corrected smooth curves
R(X), R1 (X), R2 (X), R3 (X), . . .
are good approximations to π(X). Moreover,
∞

π(X) = R(X) +

Ck (X).
k=1

The elegance of Riemann’s treatment of this problem is that the corrective terms
Ck (X) are all modelled on the fundamental R(X) and are completely described
if you know the sequence of real numbers θ1 , θ2 , θ3 , . . . of the last section.
Here in the book will be where we refer explicitly to complex numbers for the
ﬁrst time!! √ deﬁnition of Riemann’s Ck (X) requires complex numbers a + bi,
The
where i = −1, and requires extending the deﬁnition of the function Li(X) to
make sense when given complex numbers as input. Assuming the Riemann
Hypothesis, the Riemann correction terms Ck (X) are then
1

Ck (X) = −R(X 2 +iθk ),
where θ1 = 14.134725 . . . , θ2 = 21.022039 . . . , etc., are the spectrum of the
prime numbers [36].
Riemann provided an extraordinary recipe that allows us to work out the harmonics,
C1 (X), C2 (X), C3 (X), . . .
without our having to consult, or compute with, the actual staircase of primes.
As with Fourier’s modus operandi where both fundamental and all harmonics
are modeled on the sine wave, but appropriately calibrated, Riemann fashioned
his higher harmonics, modeling them all on a single function, namely his initial
guess R(X).

109
The convergence of Rk (X) to π(X) is strikingly illustrated in the plots in Figures 29.6–29.11 of Rk for various values of k.

25
20
15
10
5

20

40

60

80

100

Figure 29.6: The function R1 approximating the staircase of primes up to 100

25
20
15
10
5
0

20

40

60

80

100

Figure 29.7: The function R10 approximating the staircase of primes up to 100

110

CHAPTER 29. BUILDING π(X) KNOWING THE SPECTRUM
25
20
15
10
5
0

20

40

60

80

100

Figure 29.8: The function R25 approximating the staircase of primes up to 100
25
20
15
10
5
0

20

40

60

80

100

Figure 29.9: The function R50 approximating the staircase of primes up to 100

80
60
40
20
100

200

300

400

500

Figure 29.10: The function R50 approximating the staircase of primes up to 500

111
84
82
80
78
76
74
72
70
350

360

370

380

390

400

Figure 29.11: The function Li(X) (top, green), the function R50 (X) (in blue),
and the staircase of primes on the interval from 350 to 400.

Chapter 30

As Riemann envisioned it,
complex analysis relates the
staircase of primes to its
Riemann Spectrum
In the previous chapters we have described—using Riemann’s Hypothesis—how
to obtain the spectrum
θ1 , θ2 , θ3 , . . .
from the staircase of primes, and hinted at how to go back. Roughly speaking, we
were performing “Fourier transformations” to make this transit. But Riemann,
on the very ﬁrst page of his 1859 memoir, construes the relationship we have
been discussing, between spectrum and staircase, in an even more profound way.
To talk about this extraordinary insight of Riemann, one would need to do two
things that might seem remote from our topic, given our discussion so far.
• We will discuss a key idea that Leonhard Euler had (circa 1740).
• To follow the evolution of this idea in the hands of Riemann, we would
have to assume familiarity with basic complex analysis.
We will say only a few words here about this, in hopes of giving at least a shred
of a hint of how marvelous Riemann’s idea is. We will be drawing, at this point,
on some further mathematical background. For readers who wish to pursue the
themes we discuss, here is a list of sources, that are our favorites among those
meant to be read by a somewhat broader audience than people very advanced
in the subject. We list them in order of “required background.”
112

113
1. John Derbyshire’s Prime Obsession: Bernhard Riemann and the Greatest
Unsolved Problem in Mathematics. We have already mentioned this book
in our introduction, but feel that it is so good, that it is worth a second
mention here.
2. The Wikipedia entry for Riemann’s Zeta Function (http://en.wikipedia.
org/wiki/Riemann_zeta_function) It is diﬃcult to summarize who wrote
this, but we feel that it is a gift to the community in its clarity. Thanks
authors!
3. Enrico Bombieri’s article [37]. To comprehend all ten pages of this excellent and fairly thorough account may require signiﬁcant background, but
try your hand at it; no matter where you stop, you will have seen good
things in what you have read.

Leonhardt Euler’s idea ( 1740):
As readers of Jacob Bernoulli’s Ars Conjectandi (or of the works of John Wallis)
know, there was already a rich mathematical theory of ﬁnite sums of (nonnegative k-th powers) of consecutive integers. This sum,
Sk (N ) = 1k + 2k + 3k + · · · + N k ,
is a polynomial in N of degree k + 1 with no constant term, a leading term equal
1
to k+1 N k+1 , and a famous linear term. The coeﬃcient of the linear term of the
polynomial Sk (N ) is the Bernoulli number Bk :
n(n − 1)
n2
1
=
−
· n,
2
2
2
1
n3
+ ··· −
· n,
S2 (n) = 12 + 22 + 32 + · · · + (n − 1)2 =
3
6
n4
S3 (n) = 13 + 23 + 33 + · · · + (n − 1)3 =
+ · · · − 0 · n,
4
n5
1
S4 (n) = 14 + 24 + 34 + · · · + (n − 1)4 =
+ ··· −
· n,
5
30
S1 (n) = 1 + 2 + 3 + · · · + (n − 1) =

etc. For odd integers k > 1 this linear term vanishes. For even integers 2k the
2k
Bernoulli number B2k is the rational number given by the coeﬃcient of x in
2k!
the power series expansion

x
x
= 1− +
ex − 1
2

∞

(−1)k+1 B2k
k=1

x2k
.
2k!

114

CHAPTER 30. AS RIEMANN ENVISIONED IT

So

1
1
1
1
,
B4 =
,
B6 =
,
B8 =
,
6
30
42
30
and to convince you that the numerator of these numbers is not always 1, here
are a few more:
B2 =

B10 =

5
,
66

B12 =

691
,
2730

B14 =

7
.
6

If you turn attention to sums of negative k-th powers of consecutive integers,
then when k = −1,
S−1 (N ) =

1
1 1 1
+ + + ··· +
1 2 3
N

tends to inﬁnity like log(N ), but for k < −1 we are facing the sum of reciprocals
of powers (of exponent > 1) of consecutive whole numbers, and Sk (N ) converges.
This is the ﬁrst appearance of the zeta function ζ(s) for arguments s = 2, 3, 4, . . .
So let us denote these limits by notation that has been standard, after Riemann:
ζ(s) :=

1
1
1
+ s + s + ···
1s
2
3

The striking reformulation that Euler discovered was the expression of this inﬁnite sum as an inﬁnite product of factors associated to the prime numbers:
|f rac1ns = prodp

ζ(s) =
n

prime

1
,
1 − p−s

where the inﬁnite sum on the left and the inﬁnite product on the right both
converge (and are equal) if s > 1. He also evaluated these sums at even positive
integers, where—surprise—the Bernoulli numbers come in again; and they and
π play a combined, fascinating role:

1
1
+ 2 + · · · = π 2 /6 1.65 . . .
2
1
2
1
1
ζ(4) = 4 + 4 + · · · = π 4 /90 1.0823 . . .
1
2

ζ(2) =

and, in general,
ζ(2n) =

1
1
+ 2n + · · ·
2n
1
2

=

(−1)n+1 B2n π 2n ·

22n−1
.
2n!

A side note to Euler’s formulas comes from the fact (only known much later)
that no power of π is rational: do you see how to use this to give a proof that

115
there are inﬁnitely many primes, combining Euler’s inﬁnite product expansion
displayed above with the formula for ζ(2), or with the formula for ζ(4), or, in
fact, for the formulas for ζ(2n) for any n you choose?
Pafnuty Lvovich Chebyshev’s ( 1845): The second moment in the history
of evolution of this function ζ(s) is when Chebyshev used the same formula as
above in the extended range where s is allowed now to be a real variable—not
just an integer— greater than 1. Making use of this extension of the range of
deﬁnition of Euler’s sum of reciprocals of powers of consecutive whole numbers,
Chebyshev could prove that for large x the ratio of π(x) and x/ log(x) is bounded
above and below by two explicitly given constants. He also proved that there
exists a prime number in the inteval bounded by n and 2n for any positive
integer n (this was called Bertrand’s postulate).

Riemann’s idea (1859): It is in the third step of the evolution of ζ(s) that
something quite surprising happens. Riemann extended the range of Chebyshev’s sum of reciprocals of positive real powers of consecutive whole numbers
allowing the argument s to range over the entire complex plane s (avoiding
s = 1). Now this is a more mysterious extension of Euler’s function, and it is
deeper in two ways:
• The formula

1
1
1
+ s + s + ···
1s
2
3
does converge when the real part of the exponent s is greater than 1 (i.e.,
this allows us to use the same formula, as Chebyshev had done, for the
right upper half plane in the complex plane determined by the condition
s = x + iy with x > 1 but not beyond this). You can’t simply use the
same formula for the extension.
ζ(s) :=

• So you must face the fact that if you wish to “extend” a function beyond
the natural range in which its deﬁning formula makes sense, there may be
many ways of doing it.
To appreciate the second point, the theory of a complex variable is essential.
The uniqueness (but not yet the existence) of Riemann’s extension of ζ(s) to the
entire complex plane is guaranteed by the phenomenon referred to as analytic
continuation. If you are given a function on any inﬁnite subset X of the complex
plane that contains a limit point, and if you are looking for a function on the
entire complex plane1 that is diﬀerentiable in the sense of complex analysis,
there may be no functions at all that have that property, but if there is one,
that function is unique. But Riemann succeeded: he was indeed able to extend
Euler’s function to the entire complex plane except for the point s = 1, thereby
deﬁning what we now call Riemann’s zeta function.
1

or to any connected open subset that contains X

116

CHAPTER 30. AS RIEMANN ENVISIONED IT

Those ubiquitous Bernoulli numbers, by the way, reappear yet again as values
of this extended zeta function at negative integers:
ζ(−n) = −Bn+1 /n + 1
so since the Bernoulli numbers indexed by odd integers > 1 all vanish, the
extended zeta function ζ(s) actually vanishes at all even integers.
The even integers −2, −4, −6, . . . are often called the trivial zeroes of the
Riemann zeta function. There are indeed other zeroes of the zeta function, and
those other zeroes could—in no way—be dubbed “trivial,” as we shall shortly
see.

It is time to consider these facts:
1. Riemann’s zeta function codes the placement of prime powers
among all numbers The key here is to take logarithm and then the
derivative of ζ(s) (this boils down to forming dζ (s)/ζ(s)). Assuming that
ds
the real part of s is > 1, taking the logarithm of ζ(s) —using Euler’s
inﬁnite product formulation—gives us
− log 1 − p−s ,

log ζ(s) =
p prime

and we can do this term-by-term, since the real part of s is > 1. Then
taking the derivative gives us:

∞

dζ
Λ(n)n−s
(s)/ζ(s) = −
ds
n=1
where
Λ(n) := log p
k

if n = p for p a prime number and k > 0; and
Λ(n) := 0
if n is not a power of a prime number. In particular, Λ(n) “records” the
placement of prime powers.

2. You know lots about a function if you know its zeroes and poles
This is certainly true, for example for polynomials, or even rational functions: if someone told you, for example, that a certain function f (s) vanishes to order 1 at 0 and at ∞; and that it has a double pole at s = 2 and

117
at all other points has ﬁnite nonzero values, then you can immediate say
that this mystery function is a nonzero constant times s/(s − 2)2 .
Knowing the zeroes and poles (in the complex plane) alone of the Riemann
zeta function doesn’t entirely pin it down—you have to know more about
its behavior at inﬁnity since—for example, multiplying a function by ez
doesn’t change the structure of its zeroes and poles in the ﬁnite plane.
but a complete understanding of the zeroes and poles of ζ(s) will give all
the information you need to pin down the placement of primes among all
numbers.
So here is the score:
• As for poles, ζ(s) has only one pole. It is at s = 1 and is of order 1
( a “simple pole”).
• As for zeroes, we have already mentioned the trivial zeroes (at negative even integers), but ζ(s) also has inﬁnitely many nontrivial zeroes.
These nontrivial zeroes are known to lie in the vertical strip
0 < real part of s < 1.
And here is yet another equivalent statement of Riemann’s Hypothesis—this
being the formulation closest to the one given in his 1859 memoir:

The Riemann Hypothesis (fourth formulation)
All the nontrivial zeroes of ζ(s) lie on the vertical line in the complex
1
plane given by real part of s = 2 , and these zeroes are none other than
1
1
1
2 + iθ1 , 2 + iθ2 , 2 + iθ3 , . . . , where θ1 , θ2 , θ3 , . . . comprise the spectrum
of primes we talked about in the earlier chapters.

The zeta function, then, is the vise, that so elegantly clamps together information about the placement of primes and their spectrum!
That a simple geometric property of these zeroes (lying on a line!) is directly
equivalent to such profound (and more diﬃcult to express) regularities among
prime numbers suggests that these zeroes and the parade of Riemann’s corrections governed by them–when we truly comprehend their message–may have
lots more to teach us, may eventually allow us a more powerful understanding
of arithmetic. This inﬁnite collection of complex numbers, i.e., the nontrivial
zeroes of the Riemann zeta function, plays a role with respect to π(X) rather
like the role the spectrum of the Hydrogen atom, plays in Fourier’s theory. Are
the primes themselves no more than an epiphenomenon, behind which there
lies, still veiled from us–a yet-to-be-discovered, yet-to-be-hypothesized, profound
conceptual key to their perplexing orneriness. Are the many innocently posed,
yet unanswered, phenomenological questions about numbers–such as in the ones

118

CHAPTER 30. AS RIEMANN ENVISIONED IT

listed earlier– waiting for our discovery of this deeper level of arithmetic? Or
for layers deeper still? Are we, in fact, just at the beginning?
These are not completely idle thoughts, for a tantalizing analogy relates the
number theory we have been discussing to an already established branch of
mathematics–due, largely, to the work of Alexander Grothendieck, and Pierre
Deligne–where the corresponding analogue of Riemann’s hypothesis has indeed
been proved. . .

Chapter 31

Companions to the zeta
function
Our book, so far, has been exclusively about Riemann’s ζ(s) and its zeroes. We
have been discussing how the (placement of) the zeroes of ζ(s) in the complex
plane contains the information needed to understand the (placement of) the
primes in the set of all whole numbers; and conversely.
It would be wrong—we think—if we don’t even mention that ζ(s) ﬁts into a
broad family of similar functions that connect to other problems in number
theory.
For example—instead of the ordinary integers—consider the Gaussian integers.
This is the collection of numbers
{a + bi}
√

where i = −1 and a, b are ordinary integers. We can add and multiply two
such numbers and get another of the same form. The only “units” among the
Gaussian integers (i.e., numbers whose inverse is again a Gaussian integer) are
the four numbers ±1, ±i and if we multiply any Gaussian integer a + bi by any
of these four units, we get the collection {a + bi, −a − bi, −b + ai, b − ai}. We
measure the size of Gaussian integer by the square of its distance to the origin,
i.e.,
|a + bi|2 = a2 + b2 .
This size function is called the norm of the Gaussian integer a + bi and can also
be thought of as the product of a + bi and its “conjugate” a − bi. note that the
norm is a nice multiplicative function on the set of Gaussian integers, in that
the norm of a product of two Gaussian integers is the product of the norms of
each of them.
We have a natural notion of prime Gaussian integer, i.e., one with a > 0
and b ≥ 0 that cannot be factored as the product of two Gaussian integers
119

120

CHAPTER 31. COMPANIONS TO THE ZETA FUNCTION

of smaller size. Given what we have just discussed, can you prove that if a
Gaussian integer is a prime Gaussian integer, then it size must either be an
ordinary prime number, or the square of an ordinary prime number?

Here is a plot of a large number of Gaussian primes as they display themselves
amongst complex numbers:

———
To be done: Possibly a picture of the primes in the Gaussian integers. Here is an
example:
http://en.wikipedia.org/wiki/File:Gauss-primes-768x768.png
———–

6
5
4
3
2
1
2

4

6

8

10

12

Figure 31.1: Staircase of Gaussian primes of norm up to 14

121

25
20
15
10
5
20

40

60

80

Figure 31.2: Staircase of Gaussian primes of norm up to 100

150

100

50

200

400

600

800

1000

Figure 31.3: Staircase of Gaussian primes of norm up to 1000

122

CHAPTER 31. COMPANIONS TO THE ZETA FUNCTION

1200
1000
800
600
400
200
2000

4000

6000

8000

10000

Figure 31.4: Staircase of Gaussian primes of norm up to 10000
The natural question to ask, then, is: how are the Gaussian prime numbers
distributed? Can one provide as close an estimate to their distribution and
structure, as one has for ordinary primes? The answer, here is yes: there is a
companion theory, with an analogue to the Riemann zeta function playing a role
similar to the prototype ζ(s). And, it seems as if its “nontrivial zeroes” behave
similarly: as far as things have been computed, they all have the property that
their real part is equal to 1 . That is, we have companion Riemann Hypothesis.
2
This is just the beginning of a much larger story related to what has been
come to be called the “Grand Riemann Hypotheses” and connects to analogous
problems, some of them actually solved, that give some measure of evidence for
the truth of these hypotheses. For example, for any system of polynomials in
a ﬁxed number of variables (with integer coeﬃcients, say) and for each prime
number p there are “zeta-type” functions that contain all the information needed
to count the number of simultaneous solutions in ﬁnite ﬁelds of characteristic
p. That such counts can be well-approximated with a neatly small error term
is related to the placement of the zeroes of these “zeta-type” functions. There
is then an analogus “Riemann Hypothesis” that prescribes precise conditions
on the real parts of their zeroes—this prescription being called the “Riemann
Hypothesis for function ﬁelds.” Now the beauty of this analogous hypothesis is
that it has, in fact, been proved!
Is this yet another reason to believe the Grand Riemann Hypothesis?

Chapter 32

Glossary
Here we give all the connections with the standard literature and conventional
terminology that we restrained ourselves from giving in the text itself.
For the moment the list of entries is the following but it will expand.
π(X) = π0 (X), Q(X) = ψ0 (X), log, exp, δ, distributions, RSA cryptography,
Mersenne prime, Li(x), random walk, spectrum, harmonic, fundamental, frequency, phase, amplitude, band-pass, complex numbers, complex plane, Riemann Zeta function, zeroes of zeta.
Also mention Brian Conrey’s Notices article on RH as “among the best 12
pages of RH survey material that there is—at least for an audience of general
mathematicians.” And mention mention Sarnak and and Bombieri articles at
the CMI website on RH.

123

Endnotes
[1] See, e.g., The Music of the Primes by Marcus du Sautoy.
[2] See page 222 of The Riemann hypothesis: the greatest unsolved problem in
mathematics by Karl Sabbagh (2002).
[3] All of the ﬁgures were created using the free Sage mathematical software
[38]. Complete Sage code is available at http://wstein.org/rh that can
be used to recreate every diagram in this book. More adventurous readers
can to experiment with the parameters for the ranges of data illustrated,
so as to get an even more vivid sense of how the numbers “behave.”.
[4] See Weinberg’s book Dreams of a Final Theory: The Search for the Fundamental Laws of Nature, by Steven Weinberg (New York: Pantheon Books,
1992).
[5] See Chapter IV of the Second Part of the Ingenius Gentleman Don Quixote
of La Mancha.
[6] See Section 1.1 of Stein’s book Elementary Number Theory: Primes, Congruences, and Secrets (2008) for a proof of the “fundamental theorem
of arithmetic”, which is the statement that every positive whole number
factors uniquely as a product of primes. This book is freely available at
http://wstein.org/ent/.
[7] How not to factor the numerator of a Bernoulli number:
As mentioned in Chapter 30, the coeﬃcient Bk of the linear term of the
polynomial
Sk (n) = 1k + 2k + 3k + · · · + (n − 1)k
is (up to sign) the k-th Bernoulli number. These numbers are rational
numbers and, putting them in lowest terms, their numerators play a role
in certain problems, and their denominators in others. (This is an amazing
story, which we can’t go into here!)
One of us (Barry Mazur) in the recent article How can we construct abelian
Galois extensions of basic number ﬁelds? (see http://www.ams.org/
journals/bull/2011-48-02/S0273-0979-2011-01326-X/) found himself
124

ENDNOTES

125

dealing (for various reasons) with the fraction −B200 /400, where B200 is the
two-hundredth Bernoulli number. The numerator of this fraction is quite
large: it is—hold your breath—
389 · 691 · 5370056528687

times this 204 − digit number :

N := 3452690329392158031464109281736967404068448156842396721012
9920642145194459192569415445652760676623601087497272415557
0842527652727868776362959519620872735612200601036506871681
124610986596878180738901486527
and he incorrectly asserted that it was prime. Happily, Bartosz Naskr¸cki
e
spotted this error: our 204-digit N is not prime.
How did he know this? By using the most basic test in the repertoire of
tests that we have available to check to see whether a number is prime:
we’ll call it the “Fermat 2-test.” We’ll ﬁrst give a general explanation of
this type of test before we show how N fails the Fermat 2-test.
The starting idea behind this test is the famous result known as Fermat’s
Little Theorem where the “little” is meant to alliteratively distinguish it
from you-know-what.
Theorem 32.1 (Fermat’s Little Theorem). If p is a prime number, and a
is any number relatively prime to p then ap−1 − 1 is divisible by p.
A good exercise is to try to prove this, and a one-word hint that might lead
you to one of the many proofs of it is induction.1
Now we are going to use this as a criterion, by—in eﬀect—restating it in
what logicians would call its contrapositive:
Theorem 32.2 (The Fermat a-test). If M is a positive integer, and a is
any number relatively prime to M such that aM −1 − 1 is not divisible by
M , then M is not a prime.
Well, Naskr¸cki computed 2N −1 − 1 (for the 204-digit N above) and saw
e
that it is not divisible by N .2 Ergo, our N fails the Fermat 2-test so is not
prime.
But then, given that it is so “easy” to see that N is not prime, a natural
question to ask is: what, in fact, is its prime factorization? This—it turns
out—isn’t so easy to determine; Naskr¸cki devoted 24 hours of computer
e
1 Here’s

the proof:
(N + 1)p ≡ N p + 1 ≡ (N + 1),

where the ﬁrst equality is the binomial theorem and the second equality is induction.
2 and has a residue of 333458110059595302515396973928279031739460667738197064561672528599692566100005682
927273357926209571597827398131150054514508640724258354848985651127636929707992693354028195076056916221
73717318335512037458 after division by N .

126

ENDNOTES
time setting standard factorization algorithms on the task, and that was
not suﬃcient time to resolve the issue. The factorization of the numerators
of general Bernoulli numbers is the subject of a very interesting web site of
Samuel Wagstaﬀ (http://homes.cerias.purdue.edu/~ssw/bernoulli).
Linked to this web page one ﬁnds (http://homes.cerias.purdue.edu/
~ssw/bernoulli/composite) which gives a list of composite numbers
whose factorizations have resisted all attempts to date. The two-hundredth
Bernoulli number is 12th on the list.
The page http://en.wikipedia.org/wiki/Integer_factorization_
records lists record challenge factorization, and one challenge that was
completed in 2009 involves a diﬃcult-to-factor number with 232 digits; its
factorization was completed by a large team of researchers and took around
2000 years of CPU time. This convinced us that with suﬃcient motivation
it would be possible to factor N , and so we asked some leaders in the ﬁeld
to try. They succeeded!
Factorisation of B200
by Bill Hart on 4 Aug 05, 2012 at 07:24pm
We are happy to announce the factorization of the numerator of the
200th Bernoulli number:
N = 389 · 691 · 5370056528687 · c204
c204 = p90 · p115
p90 = 149474329044343594528784250333645983079497454292
= 838248852612270757617561057674257880592603
p115 = 230988849487852221315416645031371036732923661613
= 619208811597595398791184043153272314198502348476
= 2629703896050377709
The factorization of the 204-digit composite was made possible with
the help of many people:
• William Stein and Barry Mazur challenged us to factor this
number
• Sam Wagstaﬀ maintains a table of factorizations of numerators
of Bernoulli numbers at http://homes.cerias.purdue.edu/
~ssw/bernoulli/bnum. According to this table, the 200th
Bernoulli number is the 2nd smallest index with unfactored numerator (the ﬁrst being the 188th Bernoulli number)
• Cyril Bouvier tried to factor the c204 by ECM up to 60-digit
level, using the TALC cluster at Inria Nancy - Grand Est
• yoyo@home tried to factor the c204 by ECM up to 65-digit
level, using the help of many volunteers of the distributed computing platform http://www.rechenkraft.net/yoyo/. after
ECM was unsuccessful, we decided to factor the c204 by GNFS

ENDNOTES

127

• many people at mersenneforum helped for the polynomial
selection. The best polynomial was found by Shi Bai, using his implementation of Kleinjung’s algorithm in CADONFS: http://www.mersenneforum.org/showthread.php?
p=298264#post298264 sieving was performed by many
volunteers NFS@home, thanks to Greg Childers. See
http://escatter11.fullerton.edu/nfs for more details
NFS@home This factorization showed that such a distributed
eﬀort might be feasible for a new record GNFS factorization,
in particular for the polynomial selection. This was the largest
GNFS factorization performed by NFS@home to date, the second largest being 21040 + 1 at 183.7 digits.
• two independent runs of the ﬁltering and linear algebra were
done: one by Greg Childers with msieve (http://www.boo.
net/~jasonp/qs.html) using a 48-core cluster made available by Bill Hart, one by Emmanuel Thom and Paul Zimmermann with CADO-NFS (http://cado-nfs.gforge.inria.
fr/), using the Grid 5000 platform.
• the ﬁrst linear algebra run to complete was the one with CADONFS, thus we decided to stop the other run.
Bill Hart
Verify the above in Sage as follows:
sage: p90 = 1494743290443435945287842503336459830794974542928382\
48852612270757617561057674257880592603
sage: p115 = 230988849487852221315416645031371036732923661613619\
2088115975953987911840431532723141985023484762629703896050377709
sage: c204 = p90 * p115
sage: 389 * 691 * 5370056528687 * c204 == -numerator(bernoulli(200))
True
sage: is prime(p90), is prime(p115), is prime(c204)
(True, True, False) .
[8] Given an integer n, there are many algorithms available for trying to write
n as a product of prime numbers. First we can apply trial division, where
we simply divide n by each prime 2, 3, 5, 7, 11, 13, . . . in turn, and see what
small prime factors we ﬁnd (up to a few digits). After using this method
to eliminate as many primes as we have patience to eliminate, we typically
next turn to a technique called Lenstra’s elliptic curve method, which allows
us to check n for divisibility by bigger primes (e.g., around 10–15 digits).
Once we’ve exhausted our patience using the elliptic curve method, we
would next hit our number with something called the quadratic sieve, which
works well for factoring numbers of the form n = pq, with p and q primes
of roughly equal size, and n having less than 100 digits (say, though the

128

ENDNOTES
100 depends greatly on the implementation). All of the above algorithms—
and then some—are implemented in Sage, and used by default when you
type factor(n) into Sage. Try typing factor(some number, verbose=8)
to see for yourself.
If the quadratic sieve fails, a ﬁnal recourse is to run the number ﬁeld sieve
algorithm, possibly on a supercomputer. To give a sense of how powerful (or powerless, depending on perspective!) the number ﬁeld sieve is, a
record-setting factorization of a general number using this algorithm is the
factorization of a 232 digit number called RSA-768:
n = 12301866845301177551304949583849627207728535695953347921973224521
517264005072636575187452021997864693899564749427740638459251925573263
034537315482685079170261221429134616704292143116022212404792747377940
80665351419597459856902143413
which factors as pq, where
p = 334780716989568987860441698482126908177047949837137685689124313889
82883793878002287614711652531743087737814467999489
and
q = 367460436667995904282446337996279526322791581643430876426760322838
15739666511279233373417143396810270092798736308917.
I encourage you to try to factor n in Sage, and see that it fails. Sage does not
currently (as of 2011) include an implementation of the number ﬁeld sieve
algorithm, though there are some free implementations currently available
(see http://www.boo.net/~jasonp/qs.html).

[9] Nobody has ever published a proof that there is no fast way to factor
integers. This is an article of “faith” among some cryptographers.
[10] The GIMPS project website is http://www.mersenne.org/.
[11] We can use Sage (at http://sagemath.org) to quickly compute the “hefty
number” p = 243,112,609 − 1. Simply type p = 2^43112609 - 1 to instantly
compute p. In what sense have we computed p? Internally, p is now stored
in base 2 in the computer’s memory; given the special form of p it is not
surprising that it took little time to compute. Much more challenging is
to compute all the base 10 digits of p, which takes a few seconds: d =
str(p). Now type d[-50:] to see the last 50 digits of p. To compute the
sum 58416637 of the digits of p type sum(p.digits()).
[12] The sequence of prime numbers we ﬁnd by this procedure is discussed in
more detail with references in the Online Encyclopedia of Integer Sequences
http://oeis.org/A126263.
[13] See http://www.eff.org/press/archives/2009/10/14-0. Also the 46th
Mersenne prime was declared by Time Magazine to be one of the top 50 best
“inventions” of 2008: http://www.time.com/time/specials/packages/
article/0,28804,1852747_1854195_1854157,00.html.

ENDNOTES

129

[14] In contrast to the situation with factorization, testing integers of this size
(e.g., the primes p and q) for primality is relatively easy. There are fast
algorithms that can tell whether or not any random thousand digit number
is prime in a fraction of second. Try for yourself using the Sage command
is prime. For example, if p and q are as in endnote 8, then is prime(p)
and is prime(q) quickly output True and is prime(p*q) outputs False.
However, if you type factor(p*q, verbose=8) you can watch as Sage tries
forever and fails to factor pq.
[15] In Sage, the function prime range enumerates primes in a range, e.g.,
prime range(50) outputs the primes up to 50 and prime range(50,100)
outputs the primes between 50 and 100. Typing prime range(10^8)
in Sage enumerates the primes up to a hundred million in around
a second. You can also enumerate primes up to a billion by typing
v=prime range(10^9), but this will use a large amount of memory, so
be careful not to crash your computer if you try this. Notice that there are
π(109 ) = 50,847,534 primes up to a billion by then typing len(v). You can
also compute π(109 ) directly, without enumerating all primes, using the
command prime pi(10^9). This is much faster since it uses some clever
counting tricks to ﬁnd the number of primes without actually listing them
all.
In Chapter 15 we tinkered with the staircase of primes by ﬁrst counting both primes and prime powers. It turns out that there are comparatively few prime powers that are not prime. Up to 108 , only 1,405
of the 5,762,860 prime powers are not themselves primes. To see this,
ﬁrst enter a = prime pi(10^8); pp = len(prime powers(10^8)). Typing (a, pp, pp-a) then outputs the triple (5761455, 5762860, 1405).
[16] For example, according to http://oeis.org/A007508 there
10,304,185,697,298 such pairs less than 10,000,000,000,000,000.

are

[17] See http://primes.utm.edu/largest.html#twin for the top ten largest
known twin primes.
[18] Hardy and Littlewood give a nice conjectural answer to such questions
about gaps between primes. See Problem A8 of Guy’s book Unsolved Problems in Number Theory (2004). Note that Guy’s book discusses counting
the number Pk (X) of pairs of primes up to X that diﬀer by a ﬁxed even
number k; we have Pk (X) ≥ Gapk (X), since for Pk (X) there is no requirement that the pairs of primes be consecutive.
[19] For more details, see P. Borwein, Sign changes in sums of the Liouville
Function and the nice short paper of Norbert Wiener Notes on Polya’s and
Turan’s hypothesis concerning Liouville’s factor (page 765 of volume II of
Wiener’s Collected Works); see also: G. Polya Vershiedene Bemerkungen
zur Zahlentheorie Jahresbericht de Deutschen Mathematiker-Vereinigung,
28 (1919) 31–40.

130

ENDNOTES

[20] Richard Guy’s book Unsolved Problems in Number Theory (2004).
[21] If f (x) and g(x) are real-valued functions of a real variable x such that
for any > 0 both of them take their values between x1− and x1+ for x
suﬃciently large, then say that f (x) and g(x) are good approximations
of one another if, for any positive the absolute values of their diﬀerence
1
is less than x 2 + for x suﬃciently large. The functions Li(X) and R(X) of
are good approximations of one another.
[22] Very, very brieﬂy tell people about Littlewood’s Theorem, Skewes Number,
etc.
[23] This computation of π(1023 ) was done on a single computer in 2007 by
Oliveira e Silva, and is the largest value of π(X) ever computed. See http:
//www.ieeta.pt/~tos/primes.html for more details.
√
[24] In fact, | Li(X) − π(X)| ≤ X log(X) for all X ≥ 2.01. See Section 1.4.1 of
Crandall-Pomerance’s book Prime numbers, a computational perspective.
[25] Give a proof of this statement in an endnote.
[26] See http://www.maths.tcd.ie/pub/HistMath/People/Riemann/Zeta/
for for the original German version and an English translation.
[27] We have
ψ(X) =

log p
pn ≤X

where the summation is over prime powers pn that are ≤ X.
[28] We recommend to our readers that they download Dave Benson’s marvelous
book Music: A Mathematical Oﬀering from http://www.maths.abdn.ac.
uk/~bensondj/html/music.pdf. This is free, and gives a beautiful account
of the superb mechanism of hearing, and of the mathematics of music.
[29] Discuss some good readable article on the Fast Fourier Transform algorithm; there are probably many such articles.
[30] See http://en.wikipedia.org/wiki/Distribution_%28mathematics%
29 for more about distributions. Also, Schwartz’s explains on page 238 of
his autobiography: “Why did we choose the name distribution? Because,
if µ is a measure, i.e., a particular kind of distribution, it can be considered
as a distribution of electric charges in the universe. Distributions give
more general types of electric charges, for example dipoles and magnetic
distributions. If we consider the dipole placed at the point a having
magnetic moment M , we easily see that it is deﬁned by the distribution
−DM δ(a) . These objects occur in physics. Deny’s thesis, which he defended
shortly after, introduced electric distributions of ﬁnite energy, the only
ones which really occur in practice; these objects really are distributions,
and do not correspond to measures. Thus, distributions have two very

ENDNOTES

131

diﬀerent aspects: they are a generalization of the notion of function, and
a generalization of the notion of distribution of electric charges in space.
[...] Both these interpretations of distributions are currently used.”.
[31] From Wikipedia:
“Generalized functions” were introduced by Sergei Sobolev in
1935. They were independently introduced in the late 1940s by
Laurent Schwartz, who developed a comprehensive theory of distributions.
.
[32] If the Riemann Hypothesis holds they are precisely the imaginary parts of
the “nontrivial” zeroes of the Riemann zeta-function.
[33] The construction of Φ(t) from φ(X):
Succinctly, for positive real numbers t,
Φ(t) : = e−t/2 Ψ (t),
where Ψ(t) = φ(et ), and Ψ is the derivative of Ψ(t) viewed as distribution.
We extend this function to all real arguments t by requiring Φ(t) to be an
even function of t, i.e., Φ(−t) = Φ(t). But, to review this in a more leisurely
pace,
1. Distort the X-axis of our staircase by replacing the variable X by et
to get the function
Ψ(t) := ψ(et ).
No harm is done by this for we can retrieve our original ψ(X) as
ψ(X) = Ψ(log(X)).
Our distorted staircase has risers at (0 and) all positive integral multiples of logs of prime numbers.
2. Now we’ll do something that might seem a bit more brutal: take the
derivative of this distorted staircase Ψ(t). This derivative Ψ (t) is a
generalized function with support at all nonnegative integral multiples
of logs of prime numbers.

132

ENDNOTES

log(1)

log(2)

log(3)

log(5)

log(8)

log(13) log(19)

Figure 32.1: Ψ (t) is a sum of Dirac delta functions at the logarithms of prime
powers pn weighted by log(p) (and log(2π) at 0). The more red the arrow, the
larger the weight.
3. Now—for normalization purposes—multiply Ψ (t) by the function
e−t/2 which has no eﬀect whatsoever on the support.

200
150
100
50
50 100 150 200
Figure 32.2: Illustration of the staircase ψ(x) constructed in Chapter 15 that
counts weighted prime powers.
So what happens when we take the derivative—in the sense of
distributions—of a complicated staircase? For example, see Figure 32.2.
Well, we would have blip-functions (alias: Dirac δ-functions) at each point
of discontinuity of Ψ(x); that is, at x = any power of a prime.
Let us denote by Φ(t) the generalized function that resulted from the above
carpentry:
Φ(t) = e−t/2 Ψ (t)

ENDNOTES

133

.
[34] The cleansing operations: Here is a long list of things we are going to
ˆ
do to the Φ≤C (θ)’s to make them reveal their secrets. These operations are
given in the order in which we will apply them.
ˆ
1. Add to Φ≤C (θ) a certain nice smooth function3 of C and θ that we
will denote by TC (θ). This function divided by C 1/2 is bounded in the
variable C and goes to zero if θ tends to inﬁnity.
So, after this operation we will be dealing with
ˆ
AC (θ) := Φ≤C (θ) + TC (θ)

2. Now let us smooth out AC (θ) by replacing it with the function that
computes its “average” up to C. Now consider
˜
AC (θ) :=

C
c=1

Ac (θ)
.
C

This type of operation on functions is called Cesaro smoothing. Applying Cesaro smoothing to a function UC (θ) gives a new function
1
˜
UC (θ) :=
C

C

Uc (θ)
c=1

with useful properties, at times:
• If, for a given value of θ,
lim UC (θ)

C→∞

actually exists, then the limit also exists for its Cesaro
smoothing, and the limits are equal:
˜
lim UC (θ) = lim UC (θ).

C→∞

C→∞

• But sometimes the ﬁrst limit doesn’t exist and yet the
second one does, and moreover,
• if the ﬁrst limit exists, sometimes the second one converges faster than the ﬁrst.
3 The function is given by the following formula: for C reasonably large and for positive
real θ put
C 1/2
TC (θ) := 1
· cos(θ · log(C)) + 2θ sin(θ · log(C)) .
+ θ2
4

134

ENDNOTES
˜
3. Now we normalize AC (θ) by dividing by log(C), (supposing that C >
1) to get a function we’ll call:
˜
BC (θ) := AC (θ)/ log(C).
If we stop here, this is what the graphs look like:

1
0.8
0.6
0.4
0.2
0

20

40

60

80

100

1
0.8
0.6
0.4
0.2
0

20

40

60

80

100

1
0.8
0.6
0.4
0.2
0

20

40

60

80

100

1
0.8
0.6
0.4
0.2
0

20

40

60

80

100

Figure 32.3: Plots of −BC (θ) for C = 5, 20 (top), 50 and 100 (bottom).

4. But we can also perform a second Cesaro smoothing to get
1
˜
BC (θ) :=
C

C

Bc (θ)
c=2

and this is what the graphs look like now
———
To be done: Should we just drop the second Cesaro smoothing?
———–

ENDNOTES

135

1
0.8
0.6
0.4
0.2
0

20

40

60

80

100

1
0.8
0.6
0.4
0.2
0

20

40

60

80

100

1
0.8
0.6
0.4
0.2
0

20

40

60

80

100

1
0.8
0.6
0.4
0.2
0

20

40

60

80

100

˜
Figure 32.4: Plots of −BC (θ) for C = 5, 20 (top), 50 and 100 (bottom).

Here is a—perhaps surprising—theorem that tells us what sort of
˜
strange functions BC (θ) and BC (θ) are.
Theorem: Suppose the Riemann Hypothesis is true.
(a) The limit
B(θ) := lim BC (θ)
C→∞

˜
exists for every positive real number θ; therefore B(θ) :=
˜
limC→∞ BC (θ) also exists and these limits are equal.
(b) The values of −B(θ) are either zero or positive integers.
˜
(c) The set of arguments θ for which −B(θ) = −B(θ) > 0 is an
inﬁnite discrete set of positive real numbers,
θ1 , θ2 , θ3 , . . .
This collection of real numbers we will now call the Reimann Spec˜
trum of the primes. The value of −B(θ) is called the multiplicity
of θ.
.
[35] The theoretical story behind the phenomena that we have seen graphically in this chapter is yet another manifestation of Riemann’s explicit

136

ENDNOTES
formula. Modern references for this are [Iwaniec-Kowalski] and [Davenport: Multiplicative Number Theory]. (see also the bibliography
in these books.) Many ways of seeing the explicit relationship are given in
Chapter 5 of [I-K]. For example, consider Exercise 5 on page 109.
ˆ
φ(ρ) = −
ρ

Λ(n)φ(n) + I(φ),
n≥1

where
• φ is any smooth complex-valued function on the reals with compact
support,
ˆ
• φ is its Mellin transform:
∞

ˆ
φ(s) :=

φ(x)xs−1 dx,
0

• the last term on the right, I(φ), is just
∞

(1 −

I(φ) :=
1

1
)φ(x)dx
x3 − x

(coming from the pole at s = 1 and the “trivial zeroes”).
• The more serious summation on the left hand side of the equation is
over the nontrivial zeroes ρ, noting that if ρ is a nontrivial zero so is
ρ.
¯
Of course, this ‘explicit formulation’ is not immediately applicable to the
ˆ
graphs we are constructing since we cannot naively take φ to be a function
forcing the left hand side to be GC (x).
See also Exercise 7 on page 112, which discusses the sums
1

x−
|θ|≤C

x 2 +iθ − 1
.
1
2 + iθ

.
[36] People who know that these correction terms are index by the nontrivial
zeroes of the Riemann zeta-function may well ask how we propose to order
them if RH is false; the following prescription will do: order them in terms
of (the absolute value of) their imaginary part, and in the unlikely situation
that there is more than one zero with the same imaginary part, order zeroes
of the same imaginary part by their real parts, going from right to left.
[37] Bombieri, The Riemann Hypothesis, available at http://www.claymath.
org/millennium/Riemann_Hypothesis/riemann.pdf.
[38] The computations for this book were done using Sage, which is a free open
source mathematical software package (http://www.sagemath.org). The
code for the book is available at https://github.com/williamstein/rh.

