### Pattern recognition

The question is: how do you recognize patterns?

Let's get it systematically.

What is a pattern? If you look at a string [AAAAAAA...] it is clear that there is a simple pattern here: A is always followed by an A. Expressing this as a pattern however is a little bit problematic. What pattern should we choose? [AA]? Or [AAA]? Or [A]?

Then another pattern: [ABABABAB...]. This is simpler. The pattern is [AB]. Thanslate [AB] to [X] and you have an output [XXXX...]. Which is the same as the problematic pattern above.

Then you can go to [ABCABCABCABC...]. With the pattern being [ABC].

Then what about [ABABXYABXYABXYXYABABXYXY] with two patterns repeating ([AB] and [XY]) seemingly randomly. Here we should learn the [AB] and [XY] patterns, translate it to [T] and [U], outputting [TTUTUTUUTTUU...], and let the next level find the higher order patterns.

Then there is the problem of noise. What if you have [ABABABXABABXABABXABAB] with a regular [AB] sometimes interrupted by an X?

Based on these cases I started to play with different algorithms. Some of them work well in simple cases, others work well in longer contexts, others perform equally poor on any input pattern.

Seems like a long way ahead...

I'll come back to keep you informed.

Let's get it systematically.

What is a pattern? If you look at a string [AAAAAAA...] it is clear that there is a simple pattern here: A is always followed by an A. Expressing this as a pattern however is a little bit problematic. What pattern should we choose? [AA]? Or [AAA]? Or [A]?

Then another pattern: [ABABABAB...]. This is simpler. The pattern is [AB]. Thanslate [AB] to [X] and you have an output [XXXX...]. Which is the same as the problematic pattern above.

Then you can go to [ABCABCABCABC...]. With the pattern being [ABC].

Then what about [ABABXYABXYABXYXYABABXYXY] with two patterns repeating ([AB] and [XY]) seemingly randomly. Here we should learn the [AB] and [XY] patterns, translate it to [T] and [U], outputting [TTUTUTUUTTUU...], and let the next level find the higher order patterns.

Then there is the problem of noise. What if you have [ABABABXABABXABABXABAB] with a regular [AB] sometimes interrupted by an X?

Based on these cases I started to play with different algorithms. Some of them work well in simple cases, others work well in longer contexts, others perform equally poor on any input pattern.

Seems like a long way ahead...

I'll come back to keep you informed.