search
top

When Is An Average Meaningful?

So, as a continuation of this series on azimuths1 I offer the final question. In the original post, I mentioned that the entire purpose of my generating the rose diagrams was to visualize my data. I was exploring the averaging of angles and asking the question: “How does one do that?” and, more importantly: “Does the average mean anything?”

It turns out that the second question is no easier to answer with visualization.

a mean, smack in the middle of the data

An average, smack in the middle of the data

I’m pretty proud of my rose diagrams solution. I love programming because you can create something so useful that didn’t exist before. I took my new rose diagrams and went to town averaging my data.

So, I had my azimuth averaged, which was now verified using my new rose diagrams. I could easily look at a diagram with a big green line pointing in the direction of the azimuth, and a circular histogram showing me the data that was used to calculate that average. Assuming a normal distribution, my green line should be smack in the middle of that group of bars in my histogram.

But what if the green line is not smack in the middle of the data. More correctly, what if the green line is in the middle of the data, but the data is so widely spaced that being in the middle is somewhat meaningless?

To wit:

azimuths = [47.51, 186.39, 120.05, 165.1, 170.03, 199.38, 23.47, 212.34, 232.77, 295.25,
275.93, 282.61, 359.75, 14.04, 211.7, 73.8, 117.8, 191.01, 341.67, 59.69,
314.74, -32768, 211.89, 279.3, 43.39, 309.28, 27.71, 116.83, 245.99, 165.39,
301.76, 8.89, 79.17, 62.21, -32768, 323.4, 79.17, 228.46, -32768, 120.16, 7.05,
13.5, 301.6, 106.54, 101.53, 84.4, 255.79, 95.12, 44.47, 83.84, 22.04, 184.51,
 -32768, -32768, 285.7]

Filtering the null values (-32768) and averaging this data yields an average of 46.1°. Average it and check it yourself. It works out.

A questionable mean

A questionable mean

But if we look at the rose diagram, things are not so simple.

The diagram shows lines on the r axis2 with dark gray lines at 100% and 50% and light gray lines at 25% and 75%. Looking at the list of data, we see that there are 4 values that go into the 10° block (values from 5°-15° are included in that block). Likewise, we can see that 80° and 120° both have 4 values, that 20° and 40° have 2 value, that 300° has 3 values and 340° has 1 value.

Using a chart like this, we can do things like count the total number of datapoints within 180° of our mean. Since our mean is 46.1°, we are looking for values between 316.1° and 136.1°. We find that 27 values were measured in that half of the compass. We can also count up the number of values measured in the other half of the circle and we come up with 23.

This is where I start to get worried. Think about it, 46% nearly half of the values measured are, in essence, pointing in the opposite direction from our mean.

So, is average (no pun intended) meaningful? If we have a measurement that can realistically contain values anywhere on the compass, is it appropriate to take an average? Given the first example rose diagram, it seems as though it is. That diagram shows an average that’s, as I want it to be, smack in the middle of the data. What’s the scoop?

The scoop, of course, is statistical characterization. I’m sure some clever, angle-aware standard deviation could be calculated3 and used with standard error or some other characterization. Doing this, we could set a threshold beyond which the data is thrown out, or attach a confidence interval to the data. Looking simply, the normal standard deviation for our questionable mean above weighs in at a whopping 2.76 radians (That’s 158°, folks!) so perhaps even that characterization might be useful.

In the end, we decided to include all the data, whether meaningful or not. This project is a contract from The Army Corps, so we’re supplying them with the data plus the rose diagrams, to let them make their own decisions. This brings our work on this to a close, but not my interest.

I’m certainly going to think twice when someone talks about “average wind direction” again. That might be a meaningful number, or it might not. Only looking carefully at the data would tell you.

  1. which, incidentally, was meant to be a post, but which keeps coming back– like a zombie looking for a tasty serving of brains much bigger than my own. []
  2. The r axis is the radius, and is the equivalent of a y axis on a cartesian (x,y) chart []
  3. I’m not convinced the normal standard deviation is appropriate, since the mean was calculated in polar space. []

Comments are closed.

top