**MOST STATIONS FAIL ON SOME METRICS, BOURKE FAILS ON EVERY SINGLE ANALYTIC. **

Bourke is the poster child for data fabrication. It is crying out, "Audit Me, Audit Me!" And it comes amidst stiff competition. There is more wrong with Bourke on a point by point basis than any other.

Firstly, there are ample highly unlikely sequences right up to impossible sequences. SAS JMP software rates the probability of seeing each sequence at random as a *rarity score*. A rarity of 20 for example equates to the probability of a coin landing heads 20 times in a row. The software accounts for how many unique values there are sample size and so on.

Each of these 2 sequences have a rarity score of 35, an extremely remote chance of seeing this by random:

The next sequence has a 30 rarity score:

Extremely remote, so you can be pretty sure these temperatures have been fabricated.

One more to give you an idea, these sequences vary from 20 to 30 in rarity score:

There are more sequences that show "Raw" temperatures are anything but raw, but I have kept it to a minimum for this post.

The BOM tell us that *"absolute raw" ***doesn't exist** for *"2/3 of the network"*, and what we call raw is a "composite" or 2 or more stations along with *"known network changes."* In other words, Raw is a modeled number, a computer generated and even fabricated number.

We can tell how fabricated it is by running fraud analytic software over it. The adjusted "homogenised" data is an adjustment on top of this "composite" variable. The end result is data that can be compared to known observational data. Fabricated/tampered data doesn't look the same as natural or observational data, and is picked up by software as such.

As mentioned in my last post, histograms are 101 data analysis and using them with BOM data exposes strange engineered sequences with 1 high frequency (repeated) temp, then 4 low repeated temps, then 1 high then 5 low...and so on with the sequence repeats. My last post had this is detail.

I have received a couple of emails suggesting this may be because in 1976 Australia went metric, changing temperature observations from Fahrenheit to Celcius. I decided to test this by looking at stations from 2000 onwards and compared this to 1910-1999 data.

In Bourke's case, there was no major difference:

The histogram is a basic statistical analysis tool and shows the frequency of the distribution. Here however, the data is not binned. These histograms show temperature by how often they occur. The more they are repeated, the higher the spikes.

These histograms are showing specific patterns that are "unnatural" ie man made. There are systematic highs and lows, all with the same spacing. There are more details in my last post, this is only to show that going metric didn't change much for Bourke.

As the temperature sequences I opened with show, the most common way data is fabricated/tampered is by duplicating or repeating numbers.

So how do we detect that?

Besides the SAS JMP pattern explorer software to pick up unlikley sequences, there are 2 other ways that both measure if a number is repeated too much:

(1) Benford's Law applies to most natural and observational data if it has several orders or magitude and doesn't have a lower/upper limit. Temperature does have a lower and upper limit, we can only observe temperatures in a range of say -50C to +50C give or take a few degrees. But as Malcolm Sambridge at the University of Canberra has shown us, temperature anomalies the climate industry use (simply an average offset), perfectly follows Benfords Law.

So turning temps into anomalies allows us to apply Benford's Law to tell us which numbers are over used or under used ie the frequency of repeated values.

(2) Number Bunching, a forensic tool in R developed by data detective Uri Simonsohn --link. This also allows us to see the frequency of repeated values.

Let's start with number bunching, this uses bootstrapping with replacement to determine whether we get more repeated temperatures than we expect.

**Number Bunching Simulations.**

Placebo Check

Body Temperature samples

An analysis from a study using body temperatures to try to determine the "average" body temp with a sample of 130 people. Number bunching simulation on this sample:

This is exactly what we expect-- the expected average frequencies (blue distribution) vs. the observed average frequencies (red line). Close together, they are in agreement, there is only a variation of 1.02 standard errors about the mean. We would expect to see this level of repeated temperatures occuring in 1 in 3 cases.

Test on Walgett:

Only -0.95 standard errors, similar to above, this is a normal variation of expected average repeats vs. observed avaerage repeats that is what we would expect normally.

Bourke Simulations

We'll look at Bourke from 2010-2015. This is a recent enough block of years to overcome complaints about going metric, dirty data etc.

The blue histogram shows us the distribution of the average expected repeats the simulation.

This shows expected average frequencies of temperature vs. observed average frequencies of temperatures. The red line is the observed is outside the distribution--it is outside the histogram by 3.17 standard errors about the mean.

We would expect to see this level of repeats in fewer than 1 in 370 times.

Now things only get worse:

Over 21 standard errors, the observed repeated temps are FAR more than what we would expect -- we would expect to see this level of repeats in fewer than 1 in many hundreds of millions of samples!

1990-1995 is similar. Virtually impossible that this occurs naturally.

Same below for 6 year blocks:

All the number repeat simulations show extreme levels of *average observed repeated temperatures* that cannot be due to chance in every decade (many hundreds of millions to 1) ie the data has been fabricated and tampered with on a large scale.

**Benfords Law And Bourke.**

Consistant with the Number Bunching simulations, Benfords Law using the values of numbers from the first 2 digits show that 1980 is worse than 2010 ie more tampered.

My last blog went into Benford's Law in detail, so I won't repeat it all here. I want to show just what is relevant to Bourke.

The spike above the red line show heavy over use of specific digits, 25-30, 35-45, 89-90 and so on. This shows that certain numbers are repeated so much, that the p value is too small to calculate, so the null hypothesis that the numbers conform to Benford's law is rejected.

An overall look at Bourke using all the data from 1910 to 2019 shows the same thing:

There is a lack of conformance to Benford's Law at an extreme level with p values too small to be computed. This would be an automatic red flag for a forensic audit.

Once again, Benfords Law indicated over use of numbers ie excessive repeated temperatures, as does the Number Bunching simulations, as do the histograms in the beginning, high frequency repeated temps (spikes) and low frequency repeated temps (gaps).

All this confirms that temperatures have been tampered by artificially increasing/reducing the frequency with which they occur.

**Imputation/Infilling Missing Temperatures.**

When variables are missing, imputation or infilling is sometimes done. This should be a careful procedure in pre processing which includes outliers being removed. Here we have a situation where outliers are created and inserted.

First, lets look at the missing data pattern for Bourke:

1 is the missing data flag. What we are interested in looking at is the second line where MinRaw is missing but Minv2.1 is not, indicating that it has been imputed/infilled. Only 526 values were imputed with many more left blank. So the question is -- why were those specific values imputed and not the rest?

I selected these 526 imputed/infilled values in jump and highlighted them against all the rest of the data. They were put into 2007 and 2010, including some outliers. Outliers should be removed, here we have had some inserted.. Notice too, the 526 values by themselves have a strong upward trend. These are fabricated values, proper imputation is not being done because outliers are being created. Only 2 years and 1 oulier around 1910 were infilled, over 700 were left empty. So this is strategic infilling, used in places where needed.

**Rounding/Truncating, The Disappearing Decimal Digits.**

There are blocks of years common to specific stations where there are NO decimal values. All the values were rounded (or more likely truncated)

Some stations have this phenomenon, and of course Bourke has it too. You would think that rounding was an issue in the earlier part of last century. Nope, it's fairly recent, indicating this is probably strategic too. Truncating values would add about 0.5C to the temperature value.

This shows how many days of the year contain no decimal values. 1999-2001 contain nearly NO decimal values, while 2002 has 2/3 no decimal. These are the exact same years as half a dozen other stations, so were deliberately done. These data examples highlight the lack of integrity BOM have in working with data.

**Bourke Adjustments And Violin Plots.**

Violin plots are a fantastic way to get an overview of the adjustments done on top of 'raw'.

They are similar to box plots but show more information.

The 'violin' is the distribution of the adjustments. I subtracted Minv2.1-Minraw to get a number called Mindiff which is the difference between Minv2.1 and Raw. This is the adjustment put on top of raw, either cooling or warming.

The 'violin' represents each adjustment for each month. If the violin is fat and long, it has a large number distribution with many temperatures, the more they are repeated the fatter the violin. The long strands are single numbers are are mostly outliers if they are long.

All the adjustments are on the left hand Y Axis, below the adj go from nearly +4C to -4C.

But look at the interesting bit-- where only *one month or two extend far enough to get big adjustments all by themselves.*

This means that May and August extend so far down, they are near the -3.5 -4C, all by themselves! So all the big negative cooling adjustments in all the years from 1910 to now went into 2 months!

And look at the top--April extends by itself showing that most of the large + positive adjustments went into April.

So how did this go in reality with actual data?

April below shows that nearly all adjustments over 1.5C went there. April Minimum was being warmed up with a vengeance.

Above we see that nearly ALL big negative adjustments done to Bourke from 1910 onwards (over 2.7C cooling adj) went into 2 months, May and August.

These 2 months were being excessively cooled.

So the BOM told us that the station could have been moved, vegetation grew up around the station, the thermometer drifted etc so they *carefully* did adjustments -- in reality, it didn't matter what happened because for the last 90 years May and August were cooled with the biggest negative adjustments, and April was warmed with the biggest positive adjustments--irrespective to what was going on.

**Time Tracking ALL Adjustments With Bayesian Model**

The University of Edinburgh kindly sent me their software to time track all adjustments done to Bourke using a Baysian model. It tracks the first digit value of all the adjustments year by year and updates Benfords law probabilities with Bayes Theorem.

This means we get a complete overview from 1910 to 2019 to see which digits are overly repeated.

Looking at Minv2.1 1911-2019, all 40 000 days (missing values in 1910):

The first digit is tracked and when its value is 1 (which happens 30% of the time and thus is the most important), we can see EXACTLY when this happens.

The value 1 has always been used too little, it gradually became more in use in the 1970's and then became OVER used in the late 90's to 2000's. It then dropped off again in 2010's. Because there is a shortfall of value 1 in the first digit, some other numbers need to be excessive, and 2,3,4 and 5's are all travelling upwards (after dipping when 1 was over used) indicating overuse around 2015.

This means that numbers 2-5 are being used excessively, with a large shortfall in 1 values. *In other words, the 1 values in the first digit have been replaced with 2-5, increasing temperatures.*

**Result:**

Bourke fails the number bunching simulation for 2000-2015 where some stations such as Walgett pass. It fails conformance completely with benfords law too.

It fails as observational data just by looking at the histogram, even the recent 2000-2019 years, showing it is engineered. It has imputed/infilled numbers that create outliers, nearly all adjustments in the last 90 years went into 2 months (May August) when cooling, and April when warming. It contains duplicated fabricated sequences you can easily see, but the worst abuse of data integrity is what you can't see, and that's what all the software picked up.

Bourke is a worthy winner.