My project was a Random Text Generator. The whole idea was essentially based upon Markov chains. Markov chains are named after Andrei Markov, who produced the first results for these processes. Markov was a mathematician. Dr. Sabieh Anwar says mathematicians aren’t humans, they are super humans. So, even though Markov chains are related to some fundamental physical concepts such as Brownian motion and the Ergodic hypothesis, Markov’s motivation was purely mathematical, closely relating it to probability theory. Thus, to me it looks intensely complicated.
Formally, a Markov chain is defined as a “discrete random process with the Markov property that goes on forever.” By now, I suppose we all very well know what discrete means. A discrete random process would then mean that the system changes between the ‘discrete’ states ‘randomly’. Having the Markov property means that given the present state, the future states do not depend on the past states, that the current state has ‘all’ the information necessary for the evolution of the process. And what happens next (future states) will be reached through a purely probablistic process.
A good example I found to explain this is the comparison between board games, suppose snakes and ladders and card games, suppose blackjack. In snakes and ladders, the future state depends only on the current state of the board and what number the dice will give. It doesn’t matter ‘how’ the current state of the board was achieved and it doesn’t affect what will happen next in the game. So the process is a Markov chain. However, in blackjack, ‘how’ the current state was achieved matters a lot. The cards serve as a memory, so a player who remembers better which cards have been shown and which are still in the deck gains an advantage over other players, and the game is not independent of the past states.
Markov chains have many uses in many disciplines including Physics, Chemistry, Mathematical Biology, Economics, as well as gambling, music etc. In Physics, Markovian systems appear extensively in Thermodynamics and Statistical Mechanics. And for anyone who took the Biochemistry or Mathematical Biology course this semester, the classical model of enzymatic activity, Michaelis-Menten kinetics, is a Markov chain. But what is even more amazing and seems slightly confusing to me is that Markov chains can also be used to simulate brain function, such as the mammalian neocortex, which is involved in higher brain functions such as sensory perception, motor commands, conscious thought and language.
But the most interesting thing you can do with Markov chains is to make a Markov text generator. The text generator produces superficially ‘real-looking’ text which is based on a sample text. How the text generator works is this: to begin with it picks up a word randomly from the original sample text, suppose its “the”. The program knows the words and how many times each of them occurs in the original text after the word “the”, so each of the words has different probabilities. The next word produced depends on these probabilities. And the process keeps occuring for every word produced. But this is only a first order Markov chain. This means that a word (n) produced depends only on the word (n-1). An order-2 Markov chain would have the word (n) depending on word (n-1) ‘and’ word (n-2). Subsequently the new text produced has been done so by a completely probablistic process and is absolute gibberish, but it looks amazingly like the original sample text. The higher the order of the Markov chain the greater the similarity between the new text produced and the sample. I heard that some MIT students produced a research paper using many other research papers as their sample texts and it actually got accepted in one of the journals.
My personal favorite was an Oscar Wilde text. The result was hilarious! It really gives a new dimension to ‘playing with words’.
While I was researching for my project, I found out that it was closely related to the ‘Infinite Monkey Theorem’. “The infinite monkey theorem revolves around the idea that a monkey hitting random keys on a typewriter for an infinite amount of time will almost surely type a given text, usually defined as the complete works of William Shakespeare”.
This theorem had incredible and far-reaching consequences. The physicist Arthur Eddington wrote in The Nature of the Physical World (1928):
If I let my fingers wander idly over the keys of a typewriter it might happen that my screed made an intelligible sentence. If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favourable than the chance of the molecules returning to one half of the vessel.
Eddington was trying for us to consider the great improbability of a large but finite number of monkeys working for a large but finite amount of time to produce a great amount of work and compare this to the even greater improbability of certain physical events (here entropy). What is significant is that anything that is even less probable than monkeys doing this is in effect impossible.
Another argument involves Evolution. Reverend John F. MacArthur claimed:
The genetic mutations necessary to produce a tapeworm from an amoeba are as unlikely as a monkey typing Hamlet's soliloquy, and hence the odds against the evolution of all life are impossible to overcome.
But the argument has also been used in favour of Evolution so there’s really no saying anything.
The Argentine writer Jorge Luis Borges wrote in his essay The Total Library (1939) (Also mentioning the same idea in his short story The Library of Babel later):
Everything would be in its blind volumes. Everything: the detailed history of the future, Aeschylus' The Egyptians, the exact number of times that the waters of the Ganges have reflected the flight of a falcon, the secret and true nature of Rome, the encyclopedia Novalis would have constructed, my dreams and half-dreams at dawn on August 14, 1934, the proof of Pierre Fermat's theorem, the unwritten chapters of Edwin Drood, those same chapters translated into the language spoken by the Garamantes, the paradoxes Berkeley invented concerning Time but didn't publish, Urizen's books of iron, the premature epiphanies of Stephen Dedalus, which would be meaningless before a cycle of a thousand years, the Gnostic Gospel of Basilides, the song the sirens sang, the complete catalog of the Library, the proof of the inaccuracy of that catalog. Everything: but for every sensible line or accurate fact there would be millions of meaningless cacophonies, verbal farragoes, and babblings. Everything: but all the generations of mankind could pass before the dizzying shelves—shelves that obliterate the day and on which chaos lies—ever reward them with a tolerable page.
But everthing about the theorem is so subtle and the concepts of infinity, probability and time beyond average human experience and practical comprehension that you can’t really encompass the whole meaning of it. And I hope they explain this in our Probability course next semester. Fat chance though.
Nayab