Deception Detection In Non Verbals And Linguistics

Unmasking The JonBenet Ransom Note With Stylometry Software

The tragic and bizarre murder of JonBenet Ramsey is 20 years old. The ransom note from this case is analysed using the latest stylometric software to determine the authorship.

The program Jstylo has Writeprints as it's backbone, which "automatically extracts thousands of multilingual, structural, and semantic features to determine who is creating 'anonymous' content online. Writeprint can look at a posting on an online bulletin board, for example, and compare it with writings found elsewhere on the Internet. 

By analyzing these certain features, it can determine with more than 95 percent accuracy if the author has produced other content in the past." (University Arizona)

The software uses "cutting-edge technology and novel new approaches to track their moves online, providing an invaluable tool in the global war on terror" . (University Arizona)

Over the years there have been dozens of handwriting studies done, but considering that this long, rambling,strange and bizarre ransom note was designed to disguise handwriting (the letter a for example changes 6 times in it's construction), logically there would never be a match that would stand up in court.

Drexel Research University released anti plagiarism software called Jstylo which perked my interest in this murder case and the ransom note.

There are 375 words in the ransom note. Forsyth and Holmes show that a minimum of 250 words are required to attribute a document to an anonymous author. 
 R. S. Forsyth and D. I. Holmes, “Feature-finding for test classification,” Literary and Linguistic Computing, vol. 11, no. 4, pp. 163–174, 1996

This made it viable to test the ransom note against writing from Patsy and John Ramsey.

The software has been shown to be effective with accuracy rates of around 80% in identifying anonymous users on hack forums, with probabilities rising to 93%-97% accuracy in identifying a target document from among 50 authors (Abbasi and Chen). Rates drop to around 90% for 100 authors.

Its' also been used to identify programming source code authorship.

I downloaded their software JSTYLO

This is superb for a few reasons: it has embedded in it WEKA, an incredible data mining suite from the university of Waikato, NZ, and also WRITEPRINTS, the gold standard forensic stylometric characteristic generator for author identification with an automated interface.

Combined together, over 800 variables are created by Writeprints limited for each piece of text, which is then analysed by Weka for a linguistic "fingerprint" amongst all the test samples you give it.

Stylometry is the statistical analysis of writing style to identify authorship. This style involves many "invisible" words such as articles, function words, adverbs and pronouns which become unique to us as we develop our writing style, it not just a frequency count of obvious words. The hidden unconscious aspect of this makes it ideal for computer analysis. (James Pennebaker, The Secret Life Of Pronouns)

There was always suspicion on the parents because of their strange behaviour. There is interesting video interview footage on the internet, where they ask Patsy if she would take a lie detector test and she says she would whilst simultaneously shaking her head in a no motion, a classic incongruity between what was said and done (see former FBI agent Joe Navarro's book What Everybody Is Saying for more this non verbal cue).

Deceptive people use language differently to innocent people, see ten Brinke and Porter, Psychology, Crime & Law 2015). Another interesting study on language changes in deception relates to Dutch Professor Diederik Stapel who reported false data in 25 of his academic papers. The study compared his 25 fraudulent papers with his 25+ legitimate papers. Academic Fraud Study

The outcome: "This research supports recent findings that language cues vary systematically with deception, and that deception can be revealed in fraudulent scientific discourse."

For this post, I will only look at the stylometric aspect of the ransom not. A future post will look at the linguistics of this case.

Above: page 1 of the two and a half page ransom note.

I located 5 notes written by Patsy Ramsey, including 1995 and 1996 Xmas notes. I haven't had much luck locating anything sizable written by John Ramsey, however.

But I needed a placebo--lots of random notes and emails to test against.

Many universities are using the Enron Email Corpus from Carnegie Mellon--

The email servers were seized during the Enron fraud trails where a dozen executives went to jail. After the court case, the emails were acquired by a university and have been made available for various political/social studies. It is the largest email corpus (1.5 million emails) which show day to day life in a large corporation.

The emails make a perfect training set, and have been used as that in various studies, as well as creating models such as being able to identify male and female writing with 80% probability.

Schein and Caver show that attribution accuracy is greatly affected by topic, I've tried to avoid this by greatly varying topics by using the Enron dataset.

The reason the Enron corpus is being used by the University of British Columbia and others for language and social engineering studies is that Enron was in effect a small city -- it was a vast corporate structure that had thousands of daily emails on all subjects, from business, to small talk to flirting to deception.

I downloaded the Enron corpus and randomly selected about 60 emails and added the Ramsey letters mentioned above.

All this was put into Jstylo, the authorship attribution software.

The ransom note was put in the Test side, the 80 odd emails and text was in the Training side.

With all the emails and ransom note loaded in, I went to the data mining section and selected an  algorithm with the least error after cross validation which looks for similarity between the writing samples.

The next step was to run the "trained" model (trained on Enron and Patsy writing)  on the test writing (ransom note) and look for the closest match. In effect I am asking it--which text does this ransom note look most like?

Writeprints creates 800 variables per document, creating  a "sliding window" as it analyses a broad range of text characteristics.

Result--Patsy Ramsey at 75%.

I ran it again with different emails and text, and then different data mining algorithms, same result.

There is some good advice on which classifier to pick by Edwin Chan.


Patsy died of cancer in 2006, and that is probably why the Police Commissioner Mark Beckner said they don't expect to make any arrests in the future, even though the case is still open.

Police Chief  Mark Beckner did participate in an interview on Reddit, and one of the questions that always stuck out to me was this:

Q: “When Patsy wrote out the sample ransom note for handwriting comparison, it is interesting that she wrote “$118,000″ out fully in words (as if trying to be different from the note).
Who writes out long numbers in words? Does this seem contrived to you?”
Beckner: “The handwriting experts noted several strange observations.”

Update 1: Sept 2016

It has been pointed out to me by two people, DocG and also Eve Berger (no relation) from Linkedin that John Ramsey was also reported as having used the notorious "and hence" in an interview. I did find a transcript of this interview with both John and Patsy talking to student journalists, including an incredible part where Patsy says, "...Even If We Are Guilty.....".

Shades of O.J Simpson and "If I Did It.....".
That's worth a look all on it's own which I'll do in the next day or two.

John + Patsy Transcript

But, how unusual is "and hence"? Well, using the Google Ngram viewer which searches books from the 1800 to 2008, here's a graph I made:

Very uncommon, it would seem.
I will look into the linguistics using the interview material soon, referencing some of the recent automated deception detection methods.

Update 2:

I've had a few questions about Jstylo.

Let's get something out of the way, DocG asked me to use his text to test, which turned out to be speech, a NO-NO.  The results didn't work because it should be speech to speech, text to text. I told him this when I found out, and said it wasn't valid. He couldn't accept it because of the Sunk Cost Fallacy. He loved the outcome because he thought he had found a weak link. 

DocG said he uses "instinct", "intuition" and "social research experience".  I told him I was only interested in EMPIRICAL results against his "intuition", so we agree to disagree.

1 -- Firstly, Jstylo is a closed system.
This means that the suspect must be among the text samples you are analysing. The software will pick the closest match.

2 -- Speech with speech and text with text. People use language differently when they talk compared to how they write. Different parts of the brain are used for speech and writing. If you want to identify speech, use all speech as your input. If you want to identify written text, all your inputs should be text.

The Pennebaker text analysis software LIWC has frequency averages over many thousand of samples for blogs, speech, newspapers etc. This program shows the dramatic and consistent differences between speech and written text, see below for average frequencies.

3 -- Generally, the more text samples that you have from your target, the better. Recommended amount of text to ID document Target is 550 words, but Forsyth and Holmes show that 250 words is a minimum. For various authors to test against, about 5000+ words recommended.

I have been reading a study where reviewers on YELP are linked (identified) and where the reviews are only average 149 words in length:
I don't have more details on this.

4 -- There seems to be a way to create an open system with Jstylo, where if it doesn't identify an author, it won't just point to the closest match, but will come up with unknown.
I don't have more details on this.

5 -- Jstylo is not a black box, it is an automated GUI or interface combining established open source established software: JGAAP, Writeprints and Weka. Writeprints uncovers writing characteristics. Input features can be added or removed, and the spreadsheet can be exported showing the most significant important variables.

 6 -- News, Academic papers and Security Conferences using Jstylo around the world:

7 --  All software works on this principle:
garbage in = garbage out

Ransom Note Contradictions

The writer of the ransom note probably did not commit the murder, although they were part of the cover up. The note is a contradictory and naive attempt to use psychological misdirection to point the investigators in another direction.

First it's a "faction, (a small dissenting group within a larger group??), then there's a suggestion it may be someone at John Ramsay's workplace who is aware of his exact Christmas bonus, there are numerous movie quotes in an effort to appear more criminal, and a psychological attempt to issue a secondary threat of not releasing the body for "proper" burial because the writer knew the child was already dead.

The numerous contradictions involve telling a sleeping person to be well rested, not realising a kidnapper doesn't deliver a victim, crossing out deliver then using the word pickup.

There is also the issue of a kidnapper calling between 8.00-10.00am with delivery instructions, yet banking hours start at 9.00am, and the option of withdrawing the money earlier for an earlier delivery/pickup phone call from the kidnapper!

The CBS show established the murder weapon was the flashlight. The expert forensic pathologist was able to show that a 10 year old child could create the exact injury (same hole dimensions too) on a human skull with pigskin using the flashlight. The flashlight belonged to the Ramsey household, yet had been wiped of prints, as well as the batteries. The motivation to wipe the batteries clean becomes clear if you think about guilty knowledge.

Pathologist Dr. Werner Spitz said that the child was brain dead from the blow to the skull, so the intricate garrote was theatrical misdirection to shift attention away.

The Ramsey's themselves ignored nearly all the instructions on the note, they phoned the police, they invited friends over, John sent his friend to the bank, they had no concern when the telephone call deadline passed without incident, and so on.

Guilty knowledge relies not on lying but recognition of information you shouldn't know with resultant anomalous behaviour.

 911 Call

The 911 call also stood out in using the strange phrase, "We have a kidnapping..."
Many 911 calls are used to set up an alibi.
This one is no exception, IMO.

Check out FBI research on guilty and innocent 911 calls and their checklist.

Porter and ten Brinke 2015 note that females give off more guilty verbal cues than males, and that is certainly the case here with Patsy giving more red flag cues over the course of the investigation, particularly in her video interviews and her statements. Automated software using verbal and written analysis also confirms this.

Update 2 Sept 2016:
2nd Jstylo Run

I have been studying and testing more of the Jstylo software capabilities over the last week. I've decided to run it again over different training samples instead of Enron.

Drexel University provide different problem sets, and there is one with a couple of dozen authors, each with 4 or 5 pieces of text to test against .

I used 2 of the top classifiers here, Weka's SMO and Random Forest with 300 trees on a shortened version of Writeprints, Called Writeprints Limited.

I includes 2 of Patsy's known texts, and John Ramsey's written speech when he was running for office in Michigan.

Using different classifiers and different training authors from my first test, I got the same results with Patsy leading the pack in both classifiers and John Ramsey barely moving the needle. I removed each of the four texts from Patsy one at a time and retested, and each text made a difference --  each written text from Patsy contributed something to the classification. These are not probabilities, but ranking results.

Patsy has linguistic fingerprints on the ransom note. Even a visual examination shows she uses exact whole sentence structures, not just the words "and hence".

The first sentence is from the ransom note, the second is from her Christmas note to friends. The word delivery was crossed out and pickup was added when the author realised that a kidnapper would not deliver the kidnap victim back, but would phone to say where the victim could be found.

The complete sentence structure is identical, on each side of "and hence". It is part of her "linguistic fingerprint", besides all the invisible characteristics that get picked up by the Writeprints software.

Different software, different analysis--
Different Ransom Notes Comparisons Using Linguistic Inquiry and Word Count software

Also known as LIWC, this software from psychologist James Pennebaker from the University of Texas has been well validated and used in many studies, over 6000 on Google Scholar, to date.

According to Tausczik and Pennebaker:
"LIWC is a transparent text analysis program that counts words in psychologically meaningful categories. Empirical results using LIWC demonstrate its ability to detect  meaning in a wide variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles, and individual differences."

LIWC has been used to in various studies, from assessing depression  to deception detection (Newman Pennebaker).

Of interest to me is the Gender analysis, again from Tausczik and Pennebaker:

"Sex differences in language use show that women use more social words and references to others, and men use more complex language. A meta-analysis of the texts Tausczik and Pennebaker from many studies shows that that the largest language differences between males andfemales  are in the complexity of the language used and the degree of social references (Newman, Groom, Handelman, & Pennebaker, 2008). Males had higher use of large words, articles, and prepositions. 
Females had higher use of social words, and pronouns, including first-person singular and third-person pronouns."

I located 2 more actual ransom notes, the longest ones I could find. These are the Barbara Mackle kidnapping and the Leopold and Loeb kidnapping. All the kidnappers were caught and convicted and were men.

LIWC was run on all the ransom notes as well as a complete average on 4 of Patsy's notes she wrote.

As per Pennebaker above, the Mackle Leopold notes have no I pronoun and lower We  He She pronouns. Women use less articles and again the Mackle Leopold notes have more articles. Women use more social words, and the JonBenet note has very high social language.

What is very interesting here is that anxiety of the letter writer is revealed in writing, and even though that JonBenet note was written in the house and would have taken about half an hour to write (21 minutes just to copy it, as the CBS show noted), there was NO anxiety. Yet there was anxiety in the other pre-written notes!

Also, as a measure of authenticity, the JonBenet note is very low and there were more tentative words (not shown, but also a female indicator).

3rd Supporting Software Analysis

Whissell's Dictionary of Affect is a very useful measure of pleasantness, not what the words mean but a sentiment rating of the overall pleasantness of the text.

I have found a direct correlation to pleasantness and deception, and a study at Columbia University confirms this, but increased social language increases pleasantness too:

The JonBenet note is above average in pleasantness and social language and higher than both other ransom notes, showing it more likely to be written by a female as per Pennebaker above.

As FBI profiler Roger Depue wrote in his book, the ransom note was essentially nonsensical, obviously staged, and  was "feminine" with terms such as "gentlemen watching over", and telling sleeping people to "be well rested.".


  1. I'm very interested in trying to replicate your results, but I'm not sure where to begin. The links you provided are very confusing. I was hoping to find a version of Jstylo online but so far no luck. I'm not a statistician and am not interested in learning to use the R program, so I'm wondering if there's a more straightforward method of making this sort of comparison. Thank you.

  2. Hi DocG, thanks for your comments. I checked the download link of Jstylo and it works. Click the green button on Github that say Clone or Download. This will download a zip file to desktop.

    The program requires you have Java on your computer, which you probably have already. Drexel University developed the software, and you can find video clips on the internet of them showing it's use.

    This has nothing to do with R, where you have to learn a programming language. You also need writing from Patsy Ramsey, I suggest you get all her Xmas notes. If you can't find by searching, send me your email address and I will attach. Look at her police statements too, as well as John Ramsey, all on the net. You also need a lot of general emails/text files to compare.

    I use some Enron emails as per University of British Columbia, because you can get random samples without consciously selecting what you want to use. If you have SPECIFIC questions, I am happy to help. Cheers, Tom

    1. Thanks so much for this advice and also for your very interesting post. You may not realize it, but the Xmas messages were composed jointly by Patsy and John. And my suspicion is that the "and hence" most likely originates with him, especially since he used that phrase in one of his interviews. I've never found any document or interview with Patsy where she used it -- and her style strikes me as much more informal than John's, which as far as I can tell, is much more like that of the ransom note. So your result took me by surprise.

      Does the software enable you to pinpoint specific aspects of Patsy's style that contributed to the result or is it just a black box. In any case, I think you've hit on something that could be very important in solving this case, and I'm wondering if you've taken your results to law enforcement in Boulder, or the DA.

      What I think is needed at this point is an independent study, conducted perhaps by the people at Drexel who developed this system. Do you know them? In the meantime I really want to look into this myself and will get back to you with my findings.

      You might want to take a look at my blog, "Solving the JonBenet Ramsey Case," where I explain my reasons for suspecting John, not Patsy:

      You are welcome to email me if you like, at

    2. Hi I am currently conducting a research project on this case and wondered if somebody could send me the xmas letters from patsy. I'm trying to find some texts from john but there appears to be nothing. can anyone help at all? my email is

    3. Hi Sarah, I sent you an email as well as some attachments to gladden your heart! As mentioned, I am about to update the blog, with very interesting research I have done over the last 4 months, and that has never been done on this case ( nothing on the net, anyhow). Stay tuned, updates to follow tomorrow.

  3. Hi DocG. I will send you an email. Great blog, by the way. Tom

  4. I would love for you to take some of the purported letters that Edward Wayne Edwards wrote (they are readily available on the internet) and run them with the Enron ones, and Patsy's too. I would love to see what happens. There are some people who absolutely convinced that Edward Wayne Edwards is guilty of this crime as well as many other very high profile crimes.

  5. Thanks for your comment. I will look into it. It seems he has written a book:

    If I come up with anything I will post it.

    1. Yes he's written a book... I'm reading it right now. With what we know about him and what is suspected of him in regards to whom he has killed, it's a rather creepy read.


© ElasticTruth

This site uses cookies from Google to deliver its services - Click here for information.

Professional Blog Designs by pipdig