A few months ago I discovered "self-describing" numbers, like for example 12143133, which includes one two, one four, three ones and three threes, abbreviated: 1 2, 1 4, 3 1s and 3 3s, put together 12143133 again.

I wanted to figure out the pattern behind them. But first some additional restrictions:

Because you can arbitrarily rearrange the number pairs (31331412, 12143331, 14123331, …), I excluded rearrangements.

Because you can choose arbitrary numbers where you say "one <?>" (12153133, 12163133, 12173133, …), I chose to only include one set of arbitrary digits per set of digits that actually "matter". Completely excluding those would limit the entire set of self-describing numbers to only the number 22 (proving that is hard, see followup posts).

Some numbers also partially describe themselves, like 1421, which does include one four and two ones, but doesn't actually tell you how many twos it contains. I excluded those as well.

Some numbers describe themselves redundantly, like 4444, which does include four fours and four fours, but the second pair doesn't tell you anything new, so I excluded those as well.

And a clarification: The base doesn't actually matter at all. You can pretend that "12143133" was written in base 5, base 8, base 10, base 16, base 42069, …, it always gives you a self-describing number.

So then I started writing a program that generates these numbers. First brute force checking every number (1111, 1112, 1113, 1114, …), then with more and more optimisations. The brute force version didn't even finish going through the 12-digit numbers in an entire night, but after a LOT of optimisations it took less than 0.15 seconds. These optimisations already came from a lot of interesting mathematical considerations. Then I rewrote a big part of the program in a new class to make it even faster, but before I could re-integrate that into the program, I had to pause because I didn't have much free time anymore.

At that point I had already come up with a way to generate arbitrarily long self-describing numbers, but I had no idea whether there would be any more that didn't follow this pattern.

Today I wanted to continue and mainly just clean up the program, comment a few things and so on, so that I could attach it in applications for programming jobs. But after not even 30 minutes of thinking about the program, I came up with a MUCH better algorithm that essentially skips all the number trying and generates a very small amount of numbers to check. For example the latest version of the program tried 341907 12-digit numbers, this new one would have had to try only 10. I could have thrown away most of the program and rewritten it for a performance gain with an estimated factor of a few million again (for bigger numbers). But then I optimised this idea even further, for many hours today, until I had it down to only 2 12-digit numbers to check and for example only 4 20-digit numbers. And then I thought about it even more and eventually I proved to myself that the pattern I already knew for weeks actually contained ALL self-describing numbers (at least with a length of 14 or more).

Sooo… I now have no program to write anymore. I started with a better algorithm and optimised this algorithm so far that I ended up with practically nothing. This is amazing from a mathematical standpoint, it's really great to know the exact pattern to generate these numbers, but I also don't have a program anymore with which I can brag about my programming and algorithm optimisations skills. I basically did my job too well. :D

About the proof that all self-describing numbers follow this pattern: That's pretty complicated to write down. I'll do it at some point, but not in this post, it will be a followup post at a later date. But the pattern itself is pretty easy to understand, I don't think I even need to explain it:

```
22
12143133
14212332
1415223133
(pattern starts here)
15161723243241
1416171823253251
141517181923263261
…
```

Edit: I wrote down the proof now, split into three posts:

Part 1: https://projectatmos.space/profile/1p8WCZnqqG6N3ZOsJxBgUTo/p1SJCxFjA6yCA52Hz

Part 2: https://projectatmos.space/profile/1p8WCZnqqG6N3ZOsJxBgUTo/p1zcNRQIrjGHqXHN0

Part 3: https://projectatmos.space/profile/1p8WCZnqqG6N3ZOsJxBgUTo/p1mZOwFRzbxYBjidY

Oh damn, that's amazing. Have you studied number theory? I'm sure some mathematician has thought of this already but the fact that you figured this all out by yourself is hella cool.

Nope, I've not studied. But a few months ago I finished my apprenticeship as mathematical-technical software developer. And I always had an interest in Maths, especially mathematically optimising programs.

BTW, Atmos bug found: If the triple backticks are not in their own lines, they do work, but also remove information from the first line and appear themselves in the last line.

(edited 1 year ago)

No, that's just how markdown works

The text on the same line is the programming language name, which is in theory used for syntax highlighting

e.g.

```php

echo 'Hello world';

```

if you post that and inspect element you'll see it adds the class "language-php" to the <code> tag. it's unused though.

Just so this is written down somewhere; Here are some of the optimisations I made BEFORE I came up with the "new approach" that ended up reducing the program to nothing but a mathematical proof:

• The obvious one: Multithreading. The main loop went over the length of numbers (4, 6, 8, …), then it created as many new threads as there were numbers to try for the second digit (not the first digit, see next point). This was not a completely equal separation of calculation time, but the bigger the number was, the more it approximated 100% CPU usage, because for example a 100 digit number would have used 50 threads, so the CPU would have been used 100% until the first 42 threads were completely done. The numbers that went through the first few basic validation steps got added to a "synchronizedList" of the main class, which was acceptable, because the basic validations already excluded way over 99% of all numbers, so the threads pretty much never had to pause to wait for write access to the list. This list was then checked further once all threads were done. Even though this part at the end was already pretty fast, I still optimised it more, for example by implementing a variation of insertion sort for the first time in my life.

• The first digit always had to be 1 (for 4 or more digits). Back then I hadn't thought about the entire "describers first" thing, so I scribbled down a very complicated proof that a 1 was always first.

• The second digit could never be more than length/2. Also in this case I had considered way more cases than necessary, but using the proof linked in this post, you can clearly see that it would have needed more describer 1s than describers are available if any digit was bigger than length/2.

• The describers had to at least be as high as the previous describer and at most the length minus the sum of all previous describers, divided by the number of remaining describers. That's a complicated way of saying that the describers are sorted in ascending order, so they had some maximum possible value at each position, which can be figured out pretty early and easily, thereby discarding huge amounts of numbers to check.

• The last two digits could be generated without trying many alternatives, I only had two different cases to consider. The describer was just the length minus the sum of all other describers and the described digit… honestly, I don't even understand it myself anymore, this code is pretty messy, reason in the next bullet point.

(edited 6 months ago)

• I optimised every little bit of the code. I researched a lot about which alternative ways of programming had better or worse performance in Java, for example I used arrays instead of lists for almost everything, I didn't use any method calls apart from the initial thread start (because of the tiny overhead it causes), I even refactored the entire program multiple times, especially to save the garbage collector as much work as possible (for example by reusing variables wherever possible), because at 100% CPU usage every bit of garbage collector work is less CPU time for the program itself. On one day I measured how much a few hours of doing these tiny optimisations combined sped up the program, it was a factor of about 2. The downside of programming like this was just that the program was pretty hard to read. Not because of code style, but because what the program actually did was incredibly complex. I would definitely not recommend going into such extremes for performance in any project where performance isn't everything that counts. Or anything where other people need to read the code. That's also why I never showed the code itself to anyone, it's already confusing myself, even with about every tenth line being a comment.

• I used a special way of looping over the numbers (from the third digit on), which already included multiple big optimisations in just the loop head. Again, I honestly don't fully understand it anymore today, I programmed this months ago:

I'm pretty sure that

`maxNumber`

had something to do with the length.• I planned to replace the ~10 line number generator with a new one that was 130 lines long and included way more optimisations. This would have sped up the program by another massive factor once again, to a total speedup relative to the original version in the billions or trillions (also depends on the length of the number, it might have even changed it from an exponential or quadratic runtime behaviour to a logarithmic one). But before I could integrate it into the main program I had to pause for multiple weeks and after my pause I came up with the way faster algorithm (that I then optimised into nothingness) within 30 minutes of thinking about the program again.

The amount of numbers that were even checked is also interesting. The original brute-force version that only worked with base 10 needed 10^6, 10^8, 10^10, … tries for the numbers with 6, 8, 10, … digits. The last actually fully programmed version (which worked for arbitrary bases) needed 24, 536, 17179, 574170, 14314415, 556353114, 843787044 tries for 6, 8, 10, 12, 14, 16, 18 digits. Its number generator was a tiny bit slower, because it was more complex, but this version also discarded invalid numbers WAY faster, so it took even less time per checked number. The "new" generator that I had programmed in a new class would have had to check only 4, 23, 66, 156, 442, 1151, 3331 numbers with 6, 8, 10, 12, 14, 16, 18 digits. And the latest version, the mathematical concept: 0, 2, 1, 0, 1, 1, 1.

But now an actual Atmos bug, a pretty bad one: Attempting to submit a comment with over 3000 characters fails silently. Luckily I copied the entire thing to the clipboard before clicking "send".

The character limit is supposed to be there. Not preserving your message is the bug

(edited 8 months ago)

I recently wrote a program that generates these numbers according to the ideas specified here, to actually have this "runtime optimisation" project done:

The

`print`

method prints anything to console that you throw at it, in an expected format. You could imagine it being replaced with`System.out.println(deepToString([…]))`

for every argument, it acts very similarly to that in this context.(edited 8 months ago)

Two other remarks:

· It's a bit misleading to look at these as digits of a number, it's more like numbers in a list. So "number bases" don't actually apply here, because it doesn't matter where in the list you put the pairs of numbers and nothing ever ticks over to the next "digit".

· All self-describing numbers are the (cycling) end point for a sequence of numbers where each describes the last one. For example, if you start at

`1`

and always describe the previous number, you get this:The last 4 entries there are just

`14212332`

reordered.If you reorder the pairs at every step, you get this:

Maybe I should also one day look at which starting points go to which self-describing number.

BTW, this post is the one that triggered me to look at these numbers this much: https://www.reddit.com/r/Showerthoughts/comments/al66ow/the_number_14233221_describes_itself_it_has_one/

I had already heard about it before, but after I read that, I started this project.

The restrictions that I put on the numbers myself, like ascending describers, only one set of 1-described digits, etc. are not really that arbitrary, they simplify the problem without removing information. You can use the proof and then afterwards remove the limitations to get more self-describing numbers by just exchanging the digits described by 1 and/or reordering pairs. That still gives you a pretty easily predictable pattern for all possible self-describing numbers with the looser definition.

Another alternative problem I have considered recently: If only the describers were to be counted, that would lower the count of every digit by 1, which would have cascading effects for the rest of the number, so you would get an entirely different sequence. I feel like the pattern would be similar and the proof for that as well, but I don't know yet. That's one of the great things about Maths, there's always more to ask and to discover!

Maybe a better sorting would be to sort by the described digit. That would not give you the lowest possible number for each combination and it would have been much harder to come up with that proof, but it would follow more naturally from starting with 1 and always describing the previous number.