- In the discussion thread there seems to be a somewhat contentious argument about what is a Markov Model, whether LLMs are one, RNNs are one and so on.
Markov Models are anything that has state and emit tokens based only on its current state and undergoes a state transition. The token emission and state transitions are usually probabilistic -- a statistical/probabilistic analogue of a state machine. The deterministic state machine is a special case where the transition probabilities are degenerate (concentrated at an unique point).
For a Markov Model to be non-vacuous, non-vapid discussion point, however, one needs to specify very precisely the relationships allowed between state and tokens/observations, whether it's hidden or visible, discrete or continuous, fixed context length or variable context length, causal or non causal ...
The simplest such model is one where the state is a specified, computable function of the last k observations. One such simple function is the identity function -- the state then is the last k tokens. This is called a k order Markov Chain and is a restriction of the bigger class -- Markov Models.
One can make the state a specified, computable function of (k) previous states and k most recent tokens/observations. (Equivalently RNNs)
The functions may be specified only upto a class of computable functions, finite or infinite in size. They may be stochastic in the sense they define only the state transition probabilities.
You can make the context length a computable function of the k most recent observations (therefore they can be of varying length), but you have to ensure that the contexts are always full for this model to be well defined.
Context length can be a computable function of both the (el) most recent states and k most recent observations.
Crazy ones emit more than one token based on current state.
On and on.
Not all Markov Models are learnable.
- Cool article, it got me to play around with Markov models, too! I first did a Markov model over plain characters.
> Itheve whe oiv v f vidleared ods alat akn atr. s m w bl po ar 20
Using pairs of consecutive characters (order-2 Markov model) helps, but not much:
> I hateregratics.pyth fwd-i-sed wor is wors.py < smach. I worgene arkov ment by compt the fecompultiny of 5, ithe dons
Triplets (order 3) are a bit better:
> I Fed tooks of the say, I just train. All can beconsist answer efferessiblementate
> how examples, on 13 Debian is the more M-x: Execute testeration
LLMs usually do some sort of tokenization step prior to learning parameters. So I decided to try out order-1 Markov models over text tokenized with byte pair encoding (BPE).
Trained on TFA I got this:
> I Fed by the used few 200,000 words. All comments were executabove. This value large portive comment then onstring takended to enciece of base for the see marked fewer words in the...
Then I bumped up the order to 2
> I Fed 24 Years of My Blog Posts to a Markov Model
> By Susam Pal on 13 Dec 2025
>
> Yesterday I shared a little program calle...
It just reproduced the entire article verbatim. This makes sense as BPE removes any pair of repeated tokens, making order-2 Markov transitions fully deterministic.
I've heard that in NLP applications, it's very common to run BPE only up to a certain number of different tokens, so I tried that out next.
Before limiting, BPE was generating 894 tokens. Even adding a slight limit (800) stops it from being deterministic.
> I Fed 24 years of My Blog Postly coherent. We need to be careful about not increasing the order too much. In fact, if we increase the order of the model to 5, the generated text becomes very dry and factual
It's hard to judge how coherent the text is vs the author's trigram approach because the text I'm using to initialize my model has incoherent phrases in it anyways.
Anyways, Markov models are a lot of fun!
- I did something similar many years ago. I fed about half a million words (two decades of mostly fantasy and science fiction writing) into a Markov model that could generate text using a “gram slider” ranging from 2-grams to 5-grams.
I used it as a kind of “dream well” whenever I wanted to draw some muse from the same deep spring. It felt like a spiritual successor to what I used to do as a kid: flipping to a random page in an old 1950s Funk & Wagnalls dictionary and using whatever I found there as a writing seed.
- I can’t believe no one’s mentioned the Harry Potter fanfic written by a Markov Chain. If you’re familiar with HP, I highly recommend reading Harry Potter and the Portrait of What Looked Like a Large Pile of Ash.
Here’s a link: https://botnik.org/content/harry-potter.html
- I think this is more correctly described as a trigram model than a Markov model, if it would naturally expand to 4-grams when they were available, etc, the text would look more coherent
Iirc there was some research on "infini-gram", that is a very large ngram model, that allegedly got performance close to LLMs in some domains a couple years back
- The one author that I think we have a good chance of recreating would be Barbara Cartwright. She wrote 700+ romance novels all pretty much the same. It should be possible to generate another of her novels given that large a corpus.
- I recall a Markov chain bot on IRC in the mid 2000s. I didn't see anything better until gpt came along!
- Megahal/Hailo (cpanm -n hailo for Perl users) can still be fun too.
Usage:
Where "corpus.txt" should be a file with one sentence per line. Easy to do under sed/awk/perl.hailo -t corpus.txt -b brain.brn
This spawns the chatbot with your trained brain.hailo -b brain.brnBy default Hailo chooses the easy engine. If you want something more "realistic", pick the advanced one mentioned at 'perldoc hailo' with the -e flag.
- First of all: Thank you for giving.
Giving 24 years of your experience, thoughts and life time to us.
This is special in these times of wondering, baiting and consuming only.
- You could literally buy this at Egghead software for $3 from the bargain bin in 1992. I know, because I did. I fed it 5 years worth of my juvenile rants, and laughed at how pompous I sounded through a blender.
https://archive.org/details/Babble_1020
A fairly prescient example of how long ago 4 years was:
https://forum.winworldpc.com/discussion/12953/software-spotl...
- Quick test for Perl users (so anyone there with a Unix-like). Run these as a NON root user:
As corpus.txt, you can use a Perl/sed command for instance with book from Gutenberg.cpanm -n local::lib cpanm -n Hailo ~/perl5/bin/hailo -E Scored -t corpus.txt -b brain.brn ~/perl5/bin/hailo -b brain.brnI forgot to put the '-E' flag in my previous comments, so here it is. It's to select a more 'complex' engine, so the text output looks less gibberish.
- Here's a quick custom markov page you can have fun with, (all in client) https://aperocky.com/markov/
npm package of the markov model if you just want to play with it on localhost/somewhere else: https://github.com/Aperocky/weighted-markov-generator
- Really fascinating how you can get such intriguing output from such a simple system. Prompted me to give it a whirl with the content on my own site.
- When the order argument is cranked up to 4, it looks to the average LLMvangelist like it is thinking.
- I love the design of the website more than the Markov model. Good Job!
- So, are current LLMs better because artificial neural networks are better predictors than Markov models, or because of the scale of the training data? Just putting it out there..
- "I Fed 24 Years of My Blog Posts to a Markov Model" you're not the first who did it. Already dozens of LLMs did it.
- In 2020, a friend and I did this with our mutual WhatsApp chat history.
Except instead we fine-tuned GPT-2 instead. (As was the fashion at the time!)
We used this one, I think https://github.com/minimaxir/gpt-2-simple
I think it took 2-3 hours on my friend's Nvidia something.
The result was absolutely hilarious. It was halfway between a markov chain and what you'd expect from a very small LLM these days. Completely absurd nonsense, yet eerily coherent.
Also, it picked up enough of our personality and speech patterns to shine a very low resolution mirror on our souls...
###
Andy: So here's how you get a girlfriend:
1. Start making silly faces
2. Hold out your hand for guys to swipe
3. Walk past them
4. Ask them if they can take their shirt off
5. Get them to take their shirt off
6. Keep walking until they drop their shirt
Andy: Can I state explicitly this is the optimal strategy
- > Also, these days, one hardly needs a Markov model to generate gibberish; social media provides an ample supply.
- I just realized, one of the things that people might start doing is making a gamma model of their personality. I won't even approach who they were as a person, but it will give their Descendants (or bored researchers) a 60% approximation of who they were and their views. (60% is pulled from nowhere to justify my gamma designation, since there isn't a good scale for personality mirror quality for LLMs as far as I'm aware.)
- now i wonder if you can compare vs feeding into a GPT style transformer of a similar Order of Magnitude in param count..
- https://cdn.cs50.net/ai/2023/x/lectures/6/src6/markov/# This is a nice Markov text generator.
- It laughed and gave him a kiss
- When I was in college my friends and I did something similar with all of Donald Trump’s tweets as a funny hackathon project for PennApps. The site isn’t up anymore (RIP free heroku hosting) but the code is still up on GitHub: https://github.com/ikhatri/trumpitter
- Damn interesting!
- Should call it Trump Speech Generator. Loads of gibberish.
- I usually have this technical hypothetical discussions with ChatGpt, I can share if you like, me asking him this: aren't LLMs just huge Markov Chains?! And now I see your project... Funny