BOM Sydney Climate Data Audit Using Benfords Law And Statistical Analysis.

Summary:

We test daily climate data for Sydney from 1910-2018, 40 000 days for both Max and Min temperature time series for conformance to Benfords Law of first digit and first two digits, as it is commonly used for fraud detection and data intregrity checks. We find the data fails conform to Benfords Law test criteria by Chi-square and Kolmogorov-Smirnov indicating tampering, even with raw data. We also use the bayesian time-varying model from University of Edinburgh to test first digit homogeneity differences over time to pinpoint the years involved.

Then using pattern exploration software, we find large clumps of data, over two and a half months worth, that has been "copy/pasted" into other years as well as multiple smaller "above chance" sequences that match across different years. These patterns exists in raw data as well.

Trailing digit analysis confirms data tampering and extends the results from University of Portland analysis of Tasmania climate data showing likely tampering, to Sydney data showing the same thing.

Focusing on the most repeated temperatures in a time series, a new technique of repeated numbers or "number bunching" from fraud analytics is used to identify cases where repeated temperatures occur exceed expectation too often.

Prelude:

In the computer industry, we used to say Garbage In, Garbage Out. It expressed the idea that flawed or incorrect input data will always produce faulty output. It's been claimed that 90% of the world's data has been created in the last two years (Horton, 2015), making it even more critical to check data integrity.

The Australian Government announced in 2016 that it has committed $2.55 billion dollars for carbon reduction and other $1 billion to support developing countries reduce their carbon dioxide emissions. (Link)

The premise behind the spending is that world wide temperatures have risen to dangerous levels, and are caused by man made emissions. The most cited dataset used to prove this is the HadCRUT4 from Met Office Hadley Centre UK, and before 2017 this data had never had an independent audit.

John McLean published his dissertation McLean, John D. (2017) An audit of uncertainties in the HadCRUT4 temperature anomaly dataset plus the investigation of three other contemporary climate issues. PhD thesis, James Cook University. showing comprehensively how error- ridden and unreliable this dataset actually was. (Link)

In Australia, the Bureau Of Meterology (BOM) created and maintains The Australian Climate Observations Reference Network-Surface Air Temperature (ACORN SAT) which "provides the best possible dataset for analyses variability and change of temperature in Australia."

This dataset has also never had an independent audit despite claims that "The Bureau's ACORN-SAT dataset and methods have been thoroughly peer-reviewed and found to be world-leading." (Link)

The review panel in 2011 assessed the data analysis methodology, and compared the temperature trends to "several global datasets", finding they "exhibited essentially the same long term climate variability". This "strengthened the panels view" that the dataset was "robust". (Link)

Benford's Law

It has been shown that temperature anomalies conform to Benford's law, as do a large number of natural phenomena and man-made data sets. (Benford's Law In The Natural Sciences, M.Sambridge et al 2010)

Benford's law has been widely applied to many varied data sets for statistical fraud and data integrity analysis, yet surprisingly has never been used to analyse climate data.

Some examples of Benfords Law: (Hill, 1995a; Nigrini, 1996; Leemis, Schmeiser, and Evans, 2000; Bolton and Hand, 2002; Applying Benford’s law to detect fraudulent practices in the banking industry Theoharry Grammatikos a∗ and Nikolaos I. Papanikolaou 2015, Benford’s Law in Time Series Analysis of Seismic Clusters Gianluca Sottili ·2015; Schräpler, Jörg-Peter (2010) : Benford's Law As an Instrument for Fraud Detection in Surveys Using the Data of the Socio-Economic Pane 2019; Using Benford’s law to investigate Natural Hazard dataset homogeneity Renaud Joannes-Boyau et al 2015; Indentifying Falsified Clinical Data Joanne Lee, George Judge 2008; self-reported toxic emissions data (de Marchi and Hamilton, 2006), numerical analysis (Berger and Hill, 2007), scientific fraud detection (Diekmann,2007), quality of survey data (Judge and Schechter, 2009), election fraud analysis (Mebane, 2011)

Benford's Law states that the leading digit will occur with a probability of 30.1% for many naturally occuring datasets such as the length of rivers or distance travelled by hurricanes, street addresses, and also man-made data such as tax returns and and invoices, making this a very useful tool for accounting forensics.

To conform to Benfords law, the leading digit takes the value of 1 about 30.1% of the time, the value of 2 about 17.6% of the time, and so on, see table below. So the probability that nearly half the population live at a street address with the first number being a 1 or a 2 is 47.7%. Essentially this means that in the universe there are more one's than two's, more two's than three's and so on.

Data conformance to Benfords Law can be visually checked by looking at the graph of the actual versus the expected frequencies, and statistically confirmed with a Chi-square test to compare expected frequencies with actual, the Kolomogorov-Smirnov test was used as a back up confirmation. These tests were validated in this application for accuracy using monte-carlo simulations. (Two Digit Testing for Benford’s Law, Dieter W. Joensseny, 2013)

Scammer Bernie Madoff's financial returns are a great example and can be found here (Link)

This website has calculated the Benford curve for one digit and first two digit probabilities from Madoff's financial returns:

The first graph shows the leading digit did not have enough one's, and that there were too many two's three's, four's and fives.

Using the first digit and the second digit adds more power to Benfords Law. (Two Digit Testing for Benford’s Law Dieter W. Joenssen, University of Technology Ilmenau, Ilmenau, Germany 2013)

The second graph shows even more clearly the increase in power and how non-conforming Madoff's financials were by using first and second leading digits in the analysis.

But for Benfords Law to apply, it must cover multiple orders of magnitude, and the numbers must not be constrained by an upper or lower limit. (S.Miller, Benford's Law: Theory and Applications, 2015)

Surface temperatures won't work with Benford because they are constrained - they may range from from -30 to +50 C, for example. You won't find 99 C degree surface temps (unless you count the errors in the HadCRUT4 data set), so the digits are constrained and therefore don't conform to Benfords Law.

However, temperature anomalies DO conform to Benfords law. Malcolm Sambridge from the University Of Canberra showed this - (Benford’s law in the natural sciences, M. Sambridge, 1 H. Tkalčić, and A. Jackson 2010)

What Are Temperature Anomalies?

National Oceanic And Atmospheric Adminstration describe a temperature anomaly as:

"A temperature anomaly is the difference from an average, or baseline, temperature.
A positive anomaly indicates the observed temperature was warmer than the baseline,
while a negative anomaly indicates the observed temperature was cooler than the baseline."

This means that temperatures above a determined "average block of years" are classified as warmer, and temperatures below this average are cooler. The "average" acts as a pivot point with above and below average anomalies clearly displayed in the Met Office plot below.

The reason temperature anomalies are used is because it makes it easy to compare and blend neighbouring stations into a spatial grid. Climatologists claim anomalies are more accurate than temperatures. From NOAA website:

“Anomalies more accurately describe climate variability over larger areas than absolute temperatures do, and they give a frame of reference that allows more meaningful comparisons between locations and more accurate calculations of temperature trends.”

In fact, anomalies are in most cases less accurate than temperatures in spatial grids. (New Systematic Errors In Anomalies Of Global Mean Temperature Time Series, Michael Limburg, Germany, 2019)

Anomalies are widely used in climate analysis and do conform to Benfords Law which gives us a very useful powerful tool for auditing climate data.

----------------------------------------------------------------------------------------------------------------------------

Benford's Law Analysis:

Data Integrity Audit Of BOM Climate Data Using Benfords Law

And Statistical Pattern Exploration using R and JMP

The Bureau Of Meteorology provides Raw and Adjusted data here. Raw Data is "is quality controlled for basic data errors". Adjusted Data "has been developed specifically to account for various changes in the network over time, including changes in coverage of stations and observational practices."

The "adjustments" by BOM are called homogeneity adjustments to account for various "errors", although it has been shown that half the global warming is due to this homogenisation procedure. (Investigation of methods for hydroclimatic data homogenization, E. Steirou and D. Koutsoyiannish, 2012)

Sydney Daily Max And Min Temperature Time Series

The daily temperature time series for Sydney max and min temperatures extends from 1910-2018, nearly 40 000 days.

Temperature Anomalies are created for each temperature time series as per BOM methodology using R code. The Benford's Law analysis and conformance tests are also done using R code.

---------------------------------------------------------------------------------------------------------------------------

NOTE: The Minimum and Maximum adjusted data is called Minv2 and Maxv2 respectively, and the Min Raw and Max Raw is the Minimum and Maximum Raw daily data from 1910-2018 as supplied by BOM.

--------------------------------------------------------------------------------------------------------------------------

Benford's Law NOTE: Temp anomalies are used for Benfords Law first digit and first two digits test. In the first digit test, only the leading digit is used after the - + or 0 are stripped away. In other words, according to Benfords Law, leading digit 0 is thrown away, as is - or + signs. Only digits 1-9 are used.

In the Benfords Law two digit test, only the leading two digit values (10-99) are used after stripping out - or + or leading 0.

----------------------------------------------------------------------------------------------------------------------------

Below: All Sydney days for Maxv2 data with first digit Benford's law test, expected (dotted red line) versus actual frequency.

Above: The first (leading) digit for the complete Daily Sydney Maximum Adjusted Temps (maxv2) from 1910-2018, nearly 40 000 days. The red dotted line is the expected, the bars are the actual.

It shows a weakly conforming curve to Benford's Law over the full data set, but with too few one's and too many three's and four's overall. This curve fails the chi-square test with a very small p value but is "weak" according to the Nigrini MAD index. To gain more power, the first two digits are used in the next Benford Test below.

Below: Maxv2 with first 2 digit Benfords law test, expected and actual frequencies.

Above: This shows a better picture why the data fails conformance. The first two digits test gives a more complete picture and is more powerful. The data set also fails the conformance tests with two digits. You can clearly see some digits are in use too much and some too little.

These are the values of the first two digits flagged by the software for the biggest deviations from expected:

digits absolute.diff

17 317.3937215

38 255.1850009

37 204.2152006

10 203.807979

27 172.6250801

82 172.5622119

42 170.4305132

22 165.9444006

85 159.0889232

19 151.2663636

There are far too many 17's, 38's and 37's, 42's and 85's. Looking at the curve, you can see systematic increase with "blocks" of numbers. There are too few 10's as well. The numbers seem to be in blocks of two's and three's, either too many or too few. Overall, as seen in both graphs, the mid range and larger numbers are over used. The Maxv2 data is non conforming to Benford's distribution.

Minv2

Below: The Daily Minimum Temperatures Adjusted (Minv2)

Above: The minimum adjusted temps (minv2) for the first digit fails chi-square conformance test with a a small p value below our 0.005 cutoff. It is worse than the maximum temperatures graph for single and double digit test using the complete data.

There are too many 1's, 2's and 3's, with 4-9's being scarce. This shows that the numbers from 4-9 are underused and 1-3 are overused in this dataset.

Lower numbers get higher frequency than expected in Minv2 thus upward warming trend.

Below: The Daily Minimum Temperatures Adjusted (Minv2), first 2 digits Benford's test.

Above: This Minv2 graph for 2 digit test is more dramatic -- it clearly show how Peter has been robbed to pay Paul -- the higher numbers from 40-90 or so have been reduced in frequency, the lower numbers around 15-38 have been increased in frequency.

The difference to Benfords Law here is striking, the data has a very large bias and this is with a large data sample of nearly 40 000 days. This has the potential to be more extreme when looking at specific months.

----------------------------------------------------------------------------------------------------------------------------

Lets separate the Maxv2 data in positive and negative anomalies (above/below average) before the + and - signs are stripped away to test for Benford's. This will show us if the anomalies in the above average or below average Maxv2 groups changes.

Below:

Sydney Maxv2 Data, ONLY Positive Temp Anomalies Tested.

Looking at only the positive Maxv2 anomalies ie when temperature anomalies are above-average, there is a greater lack of conformance to Benford's Law.

Particular numbers have increased and decreased with regularity, there is nothing "natural" in this number distribution. This appears to be data tampering in the resultant above-average temp anomalies.

Higher numbers have more dramatically increased in frequency in the Maxv2 data.

Above: ONLY POSITIVE temp anomalies for Maxv2.

You can clearly see the spike in numbers that appear too often and the gaps where they are too sparse.

What About Above-Average Minimum Temps?

The biases are more evident in the Sydney Minv2 temps. The higher numbers are reduced and the frequency of the lower numbers increased. The biases in the data are more extreme in the Minv2 dataset.

The resultant above-average temperature in the Minv2 data appears to have tbeen tampered with quite dramatically.

Below: ONLY POSITIVE temp anomalies for Minv2.

Results Of Min Max Temp Anoms + Benfords Law

Neither Maxv2 or Minv2 temperature anomalies data conform to Benfords law. There are very large deviations from the expected Benford's curve, particularly when looking at only the positive anomalies for Minv2 and Maxv2.

The claim made by BOM that the homogeneity adjustments that are made to Maxv2 and Minv2 Data sets are to "remove" biases of non climatic effects is doing the opposite - in fact very large biases are added because normal observational data with occasional corrections/adjustments would not look like this on data known to conform to Benford's Law. This is nearly 40 000 observations in sample size, the "adjustments" have to be very large to look like this.

In any financial situation, this data would be flagged for a forensic audit, it suggests tampering.

But What About RAW Data?

But what about the raw temperature data? The BOM say they are "unadjusted" and are only subject to "pre-processing" and "quality control." (Link) This consists of:

"To identify possible errors, weather observations received by the Bureau of Meteorology are run through a series of automated tests which include:

‘common sense’ checks (e.g. wind direction must be between 0 and 360 degrees)
climatology checks (e.g. is this observation plausible at this time of year for this site?)
consistency with nearby sites (e.g. is this observation vastly different from nearby sites?)
consistency over time (e.g. is a sudden or brief temperature spike realistic?)"

To test this, we will use the raw maximum and minimum temperature anomalies.

Lets start with Maximum Raw Data:

We can see below that the Benford 2 digit test on Maximum Raw Temp Anomalies reveals extremely biased data, about as "unnatural" a distribution as you can get, with periodic spikes and dips. There is a man-made fingerprint here in the rugularity. This RAW data fails a chi-square test for Benford conformance.

Below: This is the Maximum Raw Temperature Anomalies with a two digit Benford test.

Below: This is the Minimum Raw Temperature Anomalies with a two digit Benford test. Again, biased data and definitely not raw observational data with minor preprocessing. Very cooked. Too many 15-47's, too few 10-13 and higher numbers, such as 59, 69, 79, 89.

Results Of Raw Temperature Anomalies Analyses And Benfords Law 2 Digit Test

The systematic tampering of particular digits forms periodic patterns.

The RAW data Min and Max is not raw, it is cooked. It is very cooked.

The raw data fails the chi-square test with tiny p values, the Nigrini MAD index and the Kolmogorov-Smirnov test for Benford's Law conformance.

Comparison With Other Climate Data Sets

"Berkeley Earth is a source of reliable, independent, non-governmental,

and unbiased scientific data and analysis of the highest quality." (Link)

Berkely Earth has released their world-wide daily global temperature anomalies data set with over 50 000 temperatures. It's not the Sydney daily data, it's global, but we can still have a quick comparison to see if the direction of the deviations is the same.

Their global analysis (below) and the same biases appear, the low numbers have been reduced in frequency, the same high numbers have been increased. What makes this stand out is how carefully the data has been manipulated above and below the expected frequency curve. The BOM data appears much more heavy handed.

Below: Berkely Earth Global Temp Anomalies, First 2 Digits.

Above: Benford's 2 digit test shows increased frequencies of digits 40-90 and reduced frequencies of digits 10-35 in Berkley Earth Gobal Anoms.

Below:

Plotting increased frequencies by years against anomaly size.

Increasing the frequency of numbers increases their effect on the average. In this case, increasing a trend upwards.

Below: Nasa GISS Global Temp Anomalies.

Looking at Nasa GISS yearly world temperature anomalies below. Only first digit analysis can be done because the dataset is small, using averaged yearly anomalies, averages that are averaged. The worst of the lot. For entertainment purposes only.

Results Of Comparison

Although we weren't comparing the same thing (Sydney specific compared to global), the data from the other climate temperature providers shows the same biases of data in the same direction. Comparing the data with each other confirms the same biases.

Difference Between RAW data and Adjusted data?

Extra temperature adjustments are added on top of raw in the BOM adjusted data sets.

These are the Adjusted data sets called Maxv2 and Minv2.

The adjustements are done by using "homogeneity" software, creating "adjusted" data sets. This is supposed to remove biases but instead adds biases as was shown above with the Benford tests

What has the homogeneity software done?

BOM claim the adjustments are small. They say that the adjustments are not needed to see the warming trends, which we know is true because looking at Raw above we know it's actually cooked -- biases increase frequency of large numbers and reduces the small ones in Max data, and the opposite is true in Min data. Natural numbers follow Benfords Law, the BOM ACORN data set does not.

Lets look at exact temperature differences between raw and adjusted.

This shows the result of the adjustments done to raw.

This is simply done by:

1: maxv2 - max raw

2: minv2 - min raw

The outcome of this is that anytime the we get a positive number, the adj temp is warmer than to raw, and when it's a negative number, it is being cooled compared to raw.

This lets us see what warming/cooling the BOM is adding on top of the "raw".

Above:

The graph in blue represents the Maxv2 adjustments that warm raw.

The orange graph shows the the Minv2 adjustments that warm raw.

This is the extra warming done by software on top of Raw. The adjustments are regularly updated and tweaked by BOM as the "science changes." and "network changes" are detected.

To plot the curves, average temperature values were used on the left vertical axis. The actual values of how much the temps were modified is below.

In blue curve we see 1910-1920 had the most warming added to adjusted data. Actual data belows tells us that the temps have been increased by around 3.5 C degrees on top of Raw. Around 1920-1940 it shot up again and then dropped again around 1980 and so forth. The orange curve tells a similar story with Minimum temps data.

The adjustments went to zero at the end of the time series, but we know that raw data has warming factored in already from the Benford analysis.

The below graph shows data points with actual temp degrees of added warming.

Above:

This graph above shows the difference between maxv2 and raw and minv2 and raw but with actual data points, no averaging. There are about 30 cases where maxv2 temperatures are increased 3 to 3.5 C degrees on top of raw.

There is an outlier -- look at the data point in blue down near year 2000 on the horizontal axis.

It's nearly -8 C, in fact it was cooled by -7.6 C degrees at that point.

What month's are getting most of the warming from Raw to Adjusted data set?

In Maxv2, the biggest warming over raw with 3 C degrees or more in the above graphs is January, February, October, June. Looking at sheer number of times warming has been applied to each month, January, February, July and November stand out.

In Minv2 data set, the months that get most of the warming temperature wise are January, February and December. The months that are warmed most by number of adjustments are September, October and November.

To investigate the different treatment over different months by BOM, I have separated all the days of January, then February and so on.

There are 3380 days in January from 1910-2018 so that will be our sample for Jan. All the months have over 3000 days.

Monte Carlo simulations confirm the validy of using chi-square test and Kolmogorov-Smirnov test to validate Benfords Law at sample sizes over 2500 using the first two digits for analyses. This means our sample size of over 3000 days is large enough for a two digit test. (Two Digit Testing for Benford’s Law,Dieter W. Joenssen, 2013)

Specific Months Using Benfords Law.

Below: JANUARY Maxv2 Temp Anomalies - First 2 digits Benfords Law

Below: FEBRUARY Maxv2 Temp Anomalies - First 2 digits Benfords Law

Below: JANUARY Minv2 Temp Anomalies - First 2 digits Benfords Law

Below: FEBRUARY Minv2 Temp Anomalies - First 2 digits Benfords Law

Benfords Law Results For Individual Months:

The Chi-square and Kolmogorov-Smirnov tests comprehensively fail all the individual months for not conforming to Benfords Law. The p value is 2.2e-16 in most cases, a tiny number. All the months exhibit very large biases. The lack of conformance to Benfords Law is extreme. These results would red flag any financial data set for a forensic audit. This signals very large data tampering.

*********************************************************************************

University Of Edinburgh Bayesian R Code

Tracking Data Conformance to Benfords Law Over Time.

This means that running the below model on our daily temperature anomalies data sets from 1910-2018 will track Benfords Law conformance using the first digit over time. This would tell us exactly at what point the data was modified (what year) and by how much and how little.

It has been shown by Miguel De Carvalho to be more accurate than empirical methods of evaluation because of the discretisation effect.(Link)

Miguel and Junho kindly tweaked the software and sent me the R code to run on BOM data sets to create their superb time varing graphs. This shows you exactly when a change was made to the data.

They used it to track homogeneity of a data set which tracked the distanced travelled by hurricanes over the years. They used it to show that data in recent years was less homogenous!

Their paper and link below.

Miguel De Carvalho and Junho Lee from University of Edinburgh have created a state-of-the-art Bayesian time-varying model "that tracks periods at which conformance to

Benford’s Law is lower. Our methods are motivated by recent attempts to assess how the

quality and homogeneity of large datasets may change over time by using the First-Digit

Rule."

I ran the model as a first run over the Berkely Earth Global Temperature Anomalies. This is a 50 000 sample data set from 1880-2018 I referenced above.

The software used the first digit of the temperature anomalies and tracked conformance to Benfords Law by years.

The time varying output graphed was as follows:

The outputs show the posterior mean for the leading digit of the temperature anaomalies taking the value 1 to 9. The leading digit with value 1 has the biggest effect with the probability going up at 110 years (the years 1880-2018 makes this about 1990), going up way past the dotted line which is the expected value for leading digit=1. The digit 1 was under expected dotted line for most of the time, going up and down, but 1990 was the critical point of a large increase.

With digit=2 there is a small decrease at about 1890 then a levelling off where the probability is roughly what is expected, then it also dives at about year 110 which equals 1990 as well. This means value 2 is under used. This is similar for values 3,4 and 5.

It is difficult to see on this plot, but digits 7,8,9 where over used from the 100 year mark (1980) with a gradual decline.

The plots are difficult to see when shrunk, so for the Sydney model I have used the raw numbers output by the model and plotted those in JMP.

The above output shows the net effect of non conformance to Befords Law with the leading digit. The smooth SSD (smooth sum of squared deviations) statistics assesses overall conformance over nine digits with the First-Digit Rule in each year, "which avoids overestimation of the misfit due to a discretization effect, whereas a naive empirical SSD as in can be shown to be biased." (Link)

This clearly shows that at around the 115 year mark (1880+115=1995) there has been a large upward trend increasing lack of homogeneity by lack of conformance to Benfords Law. In other words, certain digits have been used excessively and some too sparsely in the leading digit values of temperature anomalies of Berkley Earth Daily Global Amomalies. The trend increases dramatically at 2008, suggesting much more data tampering in the latter years.

Sydney JUNE Maxv2 Bayesian Tracking 1910-2018

The time tracking Benfords Law conformance model was run over all the Sydney Maxv2 Daily temperature Anomalies from 1910 to 2018 for June. June was one of the months that seemed to get extra attention from the BOM with warming, shown in the difference between raw and adjusted, above. So it was worth checking overtall conformance.

The actual output from the model is a bit hard to see exactly when posted on this blog, so I used the raw numbers that are output by the model to graph it in JMP in large format.

To recap -- the first digit for each temp anomaly was checked for the values of 1-9, and was tracked over the years for conformance to the first-digit rule from Benfords Law. This shows conformance behaviour over years (time) for each leading digit value.

Above: This if the leading digit with value 1. This has the largest effect, and the orange line is the number of times we expect to see value =1. The blue is the actual variation from that. We see that 1's were over used till about 1940, were under used in the 1950's, increased in the 1980's, and then shot up in the late 1990's with high useage. The trend upwards is similar to Berkely Earth Globals above.

Above: The first digit is now equal to 2 and the use was excessive around 1910, declined in the 1920's and was overused in the 1980's, and reducing in use in the last 5 years or so.

Above: Leading digit =3, use declined greatly from 1960's, although there was a leveling out in the 1990's before dropping down to around normal expected level.

Above: Leading digit = 4. Almostly cyclical in use, and in the decline in recent years.

Above: Leading digit = 5 shows complete under use throughout the years, with an increase in the 1990's but still below expected.

Above: Leading digit = 6, shows under use and then a sharp increase in the 1950-1980's. It has been under used from the early 90's.

Above: Leading digit = 7, this shows excessive use in 1920's -- then a gradually declining use.

Above: Leading digit = 8, the magical date of 1980 where so much happens in the climate world comes into again with excessive use in the 1980's and the 90's.

Above: Leading digit = 9, this generally shows under use over the years.

The net result of posterior probabilities of all the digits is in the SSD curve. The lack of conformance is higher than Berkely Earth, there is a higher overall lack of conformance to Benfords Law. This can be see on the left hand side vertical axis. The lack of conformance to Benford's law is relatively flat with slight cyclic variations around 1910, 1930's, 1950's and gradually increasing from the 1970's, with accelerated increase in the last 5 years. That signals the worst lack of conformance, suggesting Benford's Law conformance has been getting worse in the last 5 years or so.

Summary: The use of leading digit value = 1 increases dramatically from the 2000's, causing negative anomalies in the June data set to be warmed. Overall lack of conformance to Benfords first-digit rule is worse than the Berkely Earth global data set as shown on left axis values of SSD graph.

----------------------------------------------------------------------------------------------------------------------------

Statistical Analysis Of BOM Data Sets Without Benford's Law:

Pattern Exploration, Trailing Digits And Repeated Numbers.

Leaving Benford's law behind, there are other tools to help with analysis of data quality fraud.

Replication problems have been increasing in scientific studies, with data fabrication increasing.

Retractionwatch.com list hundreds of studies that have been retracted, many for data fabrication.

Uri Simonsohn at datacolada.com is a "data detective" that has been responsible for getting several big name professors to retract their studies and resign from their posts for data fabrication. His website statistically tests and attempts to replicate studies causing many retractions.

The pharmaceutical industry is also actively involved in replication of studies-

The University Of Portland did an analysis of the trailing digits in Tasmanian Climate data taken from “Proxy Temperature Reconstruction" data from “Global Surface Temperatures Over the Past Two Millenia" (Phil D. Jones, Michael E. Mann), the infamous "climategate" dataset.

They found:

Trailing Digit Analysis With Sydney BOM Daily Data Sets

Unlike the leading digit, which is logarithmically distributed in most data (Durtschi et al., 2004), the trailing digit is typically uniformly distributed (Preece, 1981)

The 3rd digit of a number has a nearly uniform distribution with the 4th digit being close to uniform. The Sydney ACORN data is rounded to 1/10 of a degree, so the 3rd digit will be analysed.

NOTE: The BOM thermometers have a tolerance of 0.5 of a degree, this includes their electronic thermometers. This tolerance is below WMO guidlines of 0.2 of a degree.

(The Australian Climate Observations Reference Network – Surface Air Temperature (ACORN-SAT) Data-set Report of the Independent Peer Review Panel 4 September 2011)

Trailing Digit R code from Jean Ensminger and Jetson Leder-Luis World Bank audit is used here to test various months from the Sydney Minimum and Maximum temperature data sets, both raw and adjusted.(Measuring Strategic Data Manipulation: Evidence from a World Bank Project).

Only October will be graphed or the analysis would be too long.

Above: All the days for October, about 3300 of them from 1910-2018. This is from the Sydney Max Raw data set, this is unadjusted data from the BOM. We are looking at the 3rd digit in all the raw temperatures (not anomalies) because this test is regardless of Benfords and can thus be used directly on temperature data.

It produces a Chi-square p value of 9.4e-82, a tiny number meaning it's highly significant to reject the null hypothesise that the distribution is uniform.

Next, October Mav2, the adjusted data set.

Above: Sydney October days 1910-2018 using Maxv2 adjusted data set. This also fails the uniform distribution. The 5 digit has increased dramatically from Raw.

Next, October Min Raw Data.

Above: Sydney Min Raw data from 1910-2018, the the 3300 October days. This data is supposed to be unadjusted but fails the Chi-square test for uniform distribution. The 5 digit has a too low probabilty in 3rd postion again.

Next: October Minv2.

Above: Sydney October Minv2 adjusted dataset fails to comply with a uniform distribution as well, with an equally low p value conpared to the raw data.

----------------------------------------------------------------------------------------------------------------------------

Note: The 5 digit is often low and occurs in other months too. This could be indicative of a double-rounding error, where the majority of temperature readings were done in Fahrenheit and rounded to 1/10 of a degree, then later converted to Celcius and rounded to 1/10 of a degree again.

"Statistical methods, especially those concerned with assessing distributional changes or temperature extremes on daily time-scales, are sensitive to rounding, double-rounding, and precision or unit changes. Application of precision-decoding to the GHCND database shows that 63% of all temperature observations are misaligned due to unit conversion and double-rounding, and that many time series

contain substantial changes in precision over time." (Decoding the precision of historical temperature observations, Andrew Rhines et al)

-----------------------------------------------------------------------------------------------------------------------------

Result Of Trailing Digits Analysis For Sydney Daily

Min Raw, Minv2, Max Raw and Maxv2 Data Sets

All months have some problem with trailing digits not conforming to a uniform distribution. The Min temperature data for winter months are worst, closely followed with the Max temperatures in December, January and February.

Lack of uniformity with Trailing Digits are a classic marker of data tampering (Uri Simonsohn, http://datacolada.org/74)

----------------------------------------------------------------------------------------------------------------------------

Pattern Exploration Of Sydney Daily Min Max Data Sets--

Looking for Duplication and Repeated Sequences-

Beyond Chance.

If sequences from the temperature data sets are duplicated over different years, or multiple days have duplicated temperatures beyond what can be expected from chance, we have found potential data integrity issues and possible tampering.

We will be using a specialised software module from JMP to find duplicated and sequences repeated beyond chance. The software calculates the probability of an event happening by chance, considering the data set size, number of unique values and repetitions within the data set.

Above: Daily Min Raw Data Set For Sydney for December, about 3300 days.

Straight away we find a problem, a big one. The software flags that 15 days temps are exactly duplicated to 1/10 C and repeated in another year.

It looks like a copy/paste somehwere in the 40 000 days time series, the sheer number of days probably being the reason this hasn't been picked up before.

The software gives this a probability of being of by chance as zero.

Looking at the December Min Raw data set, we can we that an exact sequence has been duplicated in the following year. Recall, this is RAW, unadjusted data with just basic data quality checks and preprocessing! This identical sequence also exists in the Minv2 data set.

But things get worse for July daily temps for Sydney 1910-2018.

Above: Both Min Raw and Minv2 for July have 31 days, a complete month, "copy pasted" into another year. The probability of this happening by chance is zero again.

Above: A snapshot of a full month being copy pasted into another year in both Sydney Minv2 and Min Raw data. Again, the Raw is supposed to be relatively untouched according to BOM. Yet this copied sequence gets carried over to the adjusted Minv2 set.

But there's more:

June Minv2 + Min Raw also have a full month of 30 days copy pasted into another year.

Above: Sydney June Daily Temps, Minv2 + Min Raw duplicated 30 day sequence.

Above: A duplicated 30 day sequence for June Minv2 and Min Raw.

There are also linear relationships between datasets too, suggesting linear regression being used from raw to adjusted. For example a constant of 0.6 and slope of 1 exists between minv2 and minraw in January--

But in some sequences between raw and adjusted, the constant is 0.2 slope 1, then 0.3 slope then 0.4 slope 1 and so on in a regular pattern.

Above: Direct linear relationships between, minimum and maximum adjusted daily temperatures in March.

Shorter sequences that are duplicated but are still fairly rare.

The below sequence in the Maxv2 June data that has a rarity of 16 heads in a row, equivalent to more than a 1 in 65 500 of occurring by chance.

In Minv2 September below, the number of unique temperatures and the size of the dataset gives a rarity of 15.3 for the below sequence which equals to a 1 in 40 300 chance for that event happening by chance.

Looking at the complete Sydney Maxv2 dataset with 40 000 days and looking for sequences duplicated ACROSS MONTHS, two extreme cases with rarity scores of 16.5 which equals a 1 in 92000 chance pop up:

Above: A shorter yet still improbable sequence in March Maxv2 dailies. Only sequences above the probability of 1 in 40 000 being chance are shown here, there are many many shorter sequences in the BOM data that are more unusual. For example 2 cities in the Netherlands (De Kooy and Amsterdam) were checked as well as 2 regions in the U.S (nw and sw regions from NOAA) and were compared to Sydney sequences, none came close to the large number of rare events.

Results Of Pattern Exploration:

Sydney has sequences copied between months and years that have

zero probability of being a chance occurance. The large number of the shorter duplicated series are also improbable.

There are multiple linear relationships between raw and adjusted data suggesting linear regression adjustments between raw and adj.

Generally speaking, the country data sets (not yet posted) are even worse the Sydney data. Charleville has 2 months copied, Port Mcquarie has large sequences copied, Cairns has January 1950 copied into December 1950. This exists in Raw Data and is carried over into adjusted data.

The data has been tampered with. Missing data cannot be an explanation for copy/pasting sequences because:

1 - Data is imputed via neural nets etc. In the climate industry, data is imputed via neighboring stations with close correlation.

2 - Nearly all BOM data has some missing temps, some data sets have years of empty spaces. There are over 200 in these data sets. Why would there be an attempt to conceal 1 month of missing temp sequences?

Temperature records are being reported to 1/10 C of a degree.

Copy/pasting months into different years, or worse, into different months as has happened in other data sets, is data tampering. This should not happen with time series data. See below:

-------------------------------------------------------------------------------------------------------------------------

Weather Data: Cleaning and Enhancement, Auguste C. Boissonnade; Lawrence J. Heitkemper and David Whitehead, Risk Management Solutions; Earth Satellite Corporation

"CLEANING OF WEATHER DATA

Weather data cleaning consists of two processes: the replacement of missing values

and the replacement of erroneous values. These processes should be performed

simultaneously to obtain the best result.

The replacement of one missing daily value is fairly easy. However, the problem

becomes much more complicated if there are blocks of daily missing values. Such

cases are not uncommon, particularly several decades ago. The problem of data

cleaning then becomes a problem of replacing values by interpolations between

observations across several stations (spatial interpolation) and interpolations

between observations over time (temporal interpolation)."

----------------------------------------------------------------------------------------------------------------------------

The Best For Last........

An Analysis Of Repeating Numbers In Climate Data.

Uri Simonsohn is amongst other things a "data detective" who specialises is statisical analysis of published studies. He attempts to replicate these studies and tests the data for tampering and fabrication.

He has produced a very useful tool tool for forensic data analysis.

The R code is available from him to do what he calls a "number bunching" test -- this test for repaeted numbers that occur more than expected for a particular data set.

I have used this code to test the bunching of repeat temperatures in the Sydney Daily Min Max Temperature time series.

Problem:

Above: This example is from all the days in March in the Sydney daily Min Raw and Minv2 temp time series. A massive increase in repeated numbers from raw to minv2!

Looking at the bottom picture first, shows that the most repeated temp in this series was 17.8 and it was repeated 88 times. The next highest repeating temp was 18.3 at 86 times and so on.

The first picture in this example is showing Minv2, the adjusted temperatures for Max temps of March.

Notice what happens to the repeats. They increase a lot.

Increasing number repetition is a common way of manipulating data.

Lets look at December Max and Min temps. December is one of the suspect months that has a high level of tampering, from Benfords law to number sequences that are repeated.

At this point we are looking for repeated numbers. To get a quick view of this, lets graph the repeated numbers in the Min Max December time series.

Above: Repeated temps in Dec, Minv2 in blue and Min Raw in orange.

The most repeated temps have the longest spikes.

How many time they repeat is on the left vertical axis, the bottom axis is the actual temps. Min Raw (orange) has a single peak that is highest, but Minv2 (blue) has more overall higher spikes. Minv2 also appears to the eye to be more "bunchy"....more spaces and blocks or grouping.

But how much bunching is normal and how much is suspicious?

This is where the number bunching software helps us. A formula is created (similar to entropy) to average frequency of each distinct number (repeated temp) , and then 5000 - 10000 boostraps are run and a graph with the results is output showing observed repeated numbers against expected repeated numbers for this sample. See the website for more details. (Link)

Above: This is the Min Raw in orange and Minv2 in blue for December. The number bunching analysis for repeated numbers will be run again with this data to asssess the bunching of repeats.

Number Bunching Results.

Above: Results of number bunching analysis for Max raw Sydney temps.

This shows the expected average frequencies against the observed average frequencies. The red line is the observed average frequencies for Max Raw data. The red line is within the distribution, it is 2.02 Std errors from the mean, about a 1 in 20 occurance. This is well within expectation.

Above: Maxv2 -- the expected average frequencies and the observed average frequencies have been separated by a massive Std error of 27.9. We are seeing far too many observed average repeated numbers against what is expected for this sample. We would expect to see this bunching in fewer than 1 in 100 million times.

Above: The Min Raw Data tells a similar story, there are too many repeated numbers. The observed repeats have a 7.6 Std error. This is more than a 1 in a million occurance.

Above: Minv2 - the observed average repeated numbers (red line) here is so far out of expectation, 41.5 Std errors, we never expect to see this. The numbers become too tiny for any meaningful computation. The data has extremely high rate of bunching. Extremely high number of repeated temps.

June below:

Above: June Max Raw data has standard error of nearly 12, a very high level of bunching we would virtually never expect to see.

Above: The Maxv2 adjusted data for June....and is it adjusted! It was bad in Raw, it is a whopper in adjusted Maxv2 data. The standard error of 49 is massive, the chance of seeing this in this sample is nil. A high level of manipulation in repeated numbers (temps).

October below:

Above: The October Max Raw data has observed average repeated numbers against expected average repeated numbers of 4.8 Std errors past the mean, highly unusual but not beyond expectation. More than 1 in 150 000 event.

Above: The October Maxv2 adjusted data set has far too many oberserved repeats against expected repeated temps, over 24 Std errors. Too tiny a probabilty to calculate. We would not expect to see this.

Results:

The frequency of repeated temperatures, called number bunching" in this software analysis, tests how likely the data has been tampered with. A much more extreme outcome exists here than in the study Uri Simonsohn highlights on this website and where he supplies the R code to test this. The suspect study he used was shown was retracted for suspected fabrication. The BOM data is extremely suspicious.

--------------------------------------------------------------------------------------------------------------------------

Wrapping Up:

The first step and the biggest one that takes up most analysis time, is data cleaning and preprocessing and integrity checks. If the data has no integrity at input, it is not worth persuing.

There are many questions to be answered on the data integrity of not only the Sydney Min Max temperature time series, but many/all other cities and towns.

Preliminary work shows even worse results for the smaller towns compared to the Sydney data.

More posts will follow to document more from the BOM temperature data time series that is used for data modeling and projections. The garbage in - garbage out scenario means no credibilty can be given to climate modeling using this data.

Looking at other climate data providers such as Berkley Earth shows similar problems. The ocean temperature anomalies from Berkely earth will be looked at in the future, but preliminary work shows they have no use whatsoever. The ocean surface temp anomalies are so far from conforming to Benfords Law, it is clear they are only "guesstimates" (interpolations, they call it). Any meaningful modeling output from these anomalies is doomed.

At the very least, a Government forensic audit should be performed on The BOM climate data.

It is extremely suspicious and would have been flagged in any financial data base for an audit.

-----------------------------------------------------------------------------------------------------------------------------

Increased Uncertainty Besides Dirty Data

Errors that Increase Uncertainy Even More

1 - Double Rounding errors exist in most climate data and have mostly not been corrected.

(Decoding the precision of historical temperature observations, Andrew Rhines, Marrtin P. Tingley, Karen A. McKinnon, Peter Huybers)

2 - Errors in using anomalies ( New systematic errors in anomalies global mean temperatures time-series, by Michael Limburg , 2014)

3 - Uncertainty. Autocorrelation time series do not follow Gaussian error propogation. Darwin 30 temp average has an uncertainty of plus or minus 0.4 C degree, making any warming within the boundaries of error.

(Can we trust time series of historical climate data? About some oddities in applying standard error

propagation laws to climatological measurements Michael Limburg (EIKE) Porto Conference)

4 - Flaw Of Averages. Using averages means that on average you are wrong. Particularly when you use averages of averages.

5 - BOM thermometer (including electronic) tolerances are 0.5 C degrees, below WMO suggested spics of 0.2 C degrees.

6 - Errors in inadequate spatial sampling. "While the Panel is broadly satisfied with the ACORN-SAT network coverage, it is concerned that network coverage in some of the more remote areas of Australia is sparse." Report of the Independent Peer Review Panel 4 September 2011.

This relates to : "Global and hemispheric temperature trends: uncertainties related to inadequate spatial sampling", (Thomas R. Karl, Richard W. Knight, John R. Christy, 1993)

7 - Confidence intervals for time averages in the presence of long-range correlations, a case study on Earth surface temperature anomalies, M. Massah 1 and H. Kantz, Max Planck Institute for the Physics of Complex Systems, Dresden, Germany) -- "Time averages, a standard tool in the analysis of environmental data, suffer severely from long-range correlations." Uncertaintaines larger than expected, again.

More analysis to follow in other blogs.

BOM Sydney Climate Data Audit Using Benfords Law And Statistical Analysis.

No comments

Post a Comment

google

statcounter