In this note, we will use Python to explore the following problem: How many primes are there whose average digit is 1? In other words, how many \(k\)-digit prime numbers exist whose sum of digits is \(k\)?

Here is a direct approach. We generate all prime numbers up to some limit and calculate the sum of digits for each prime. We use the `primerange`

function from the SymPy package to generate the prime numbers.

from sympy import primerange
def digitsum(n):
s = 0
while n > 0:
s += (n % 10)
n //= 10
return s
def primes_avg1(N):
for n in primerange(2, N):
if digitsum(n) == len(str(n)):
yield n

Listed below are the prime numbers less than a million whose average digit is 1. Note that there are no examples with 3 or 6 digits. In fact, there is never an example with \(3k\) digits, because a number is divisible by 3 if and only if the sum of its digits is divisible by 3.

print list(primes_avg1(10**6))
[11, 1021, 1201, 2011, 3001,
10103, 10211, 10301, 11003,
12011, 12101, 13001, 20021,
20201, 21011, 21101, 30011]

This method is inefficient for large numbers because prime numbers are relatively common, even when the number of digits is large. According to the prime number theorem, the number of prime numbers less than \(N\) is approximately \(N/\ln(N)\). The exact number of primes having \(k\) digits or fewer has been computed up to \(k = 26\). (See http://en.wikipedia.org/wiki/Prime-counting_function)

But the number of \(k\)-digit numbers with average digit 1 increases more slowly as a function of \(k\) than the number of \(k\)-digit prime numbers. We can estimate this number as follows. Any positive integer can be represented by a pattern of stars and bars, where the bars separate the stars into groups. For example, the pattern shown below represents the number 4102 because there are 4 stars in the first group, 1 star in the second group, 0 stars in the third group, and 2 stars in the fourth group.

★ ★ ★ ★ | ★ | | ★ ★

If a number has \(k\) digits that sum to \(k\), then it can be represented by a certain arrangement of \(k\) stars and \((k-1)\) bars. The total number of permutations of \(k\) stars and \((k-1)\) bars is

\[\binom{2k-1}{k-1} = \frac{(2k-1)!}{(k-1)!\,k!}\]

which is less than \(2^{2k-1}\), since the sum of the numbers in row \((2k-1)\) of Pascal’s triangle is \(2^{2k-1}\). Therefore, the number of \(k\) digits with sum of digits \(k\) is less than \(2^{2k-1}\). Although this quantity grows exponentially with \(k\), it grows much more slowly than the number of \(k\)-digit primes.

The exact count of \(k\)-digit numbers with digit sum \(k\) is given by the following summation, which can be obtained by means of the inclusion-exclusion principle.

\[\sum_{j\ge0} (-1)^j \binom{k}{j} \binom{2k-10j-2}{k-1} .\]

**Wait, where are you going with this?**

Since \(k\)-digit primes are common, while \(k\)-digit numbers with digit sum \(k\) are rare, we realize that filtering a list of primes was a mistake. It is much better to generate a list of numbers having average digit 1, and test each of these numbers for primality. Fortunately, this list can be generated using an efficient recursive algorithm.

def gen(k, s):
"""k-digit numbers
with digit sum s"""
if k == 1:
if s < 10:
yield s
else:
for a in range(min(s, 10)):
for m in gen(k-1, s-a):
yield 10*m + a

As a further refinement, we can use the fact that the last digit of a prime greater than 5 is always 1, 3, 7, or 9. Here is our Python code for generating \(k\)-digit prime numbers with sum of digits \(s\).

from sympy.ntheory import primetest
def primes(k, s):
for a in 1, 3, 7, 9:
if s <= a:
break
for m in gen(k-1, s-a):
n = 10*m + a
if primetest.isprime(n):
yield n

We can use this to function to count the number of \(k\)-digit primes with digit sum \(k\). Since we are using generators instead of lists, our code does not use a lot of memory. However, it is time consuming to test a large number for primality, so a probabilistic test is used.

for k in range(2, 18):
if k % 3 > 0:
print k, sum(1
for p in primes(k, k))

The output is shown below. The code uses the Miller-Rabin primality test, which is guaranteed to be correct for \(n < 10^{16}\) but has a small chance of error \((<10^{-28})\) for large inputs. Consequently, the last value in the table should be viewed as highly probable, but not certain.

Digits |
Count |

2 |
1 |

4 |
4 |

5 |
12 |

7 |
95 |

8 |
212 |

10 |
2395 |

11 |
10657 |

13 |
126068 |

14 |
375941 |

16 |
4943357 |

17 |
20513691 |