## Fridge temperature filtering

Friday, November 21st, 2014 at 1:50 pm

Sensor readings generally have to be processed before you can use them. Patrick’s explanation of how he filtered the CoffeeMon signal (by picking the maximum value in each time window) suggested that there’s something fishy going on and it would be a mistake to treat the readings as subjected to mere noise.

Here’s a zoomed-in section of my fridge temperature as it rises by about 0.75 degrees an hour, or 12 units of 1/16th of a degree which my Dallas OneWire DS18B20 digital temperature sensor reads at its maximum 12 bits of resolution.

The readings don’t jump between more than two levels when the temperature is stable. You’d expect some more random hopping from signal noise.

Indeed, applying a crude Guassian filter doesn’t seem to do much good. This (in green) is the best I got by convolving it with a kernel 32 readings wide (equating to about 25 seconds)

```filteredcont = [ ]   # cont = [ (time, value) ]
k = [math.exp(-n*n/150)  for n in range(-16, 17)]
sk = sum(k)    # (the 1/150 const chosen for small tail beyond 16 units)
k = [x/sk  for x in k]
for i in range(16, len(cont) - 16):
xk = sum(x[1]*kx for x, kx in zip(cont[i-16:i+17], k))
filteredcont.append((cont[i][0], xk))
```

The filtered version still has steps, but with a rough slope at the change levels. This filter is very expensive, and not any better than the trivially implementable Alpha-beta filter, which smoothed it like so:

```a, b = 0.04, 0.00005  # values picked by experimentation
dt = 0.5
vk, dvk = cont[0][1], 0
cont3 = [ ]
for t, v in cont:
vk += dvk * dt   # add on velocity
verr = v - vk    # error measurement
vk += a * verr   # pull value to measured value
dvk += (b * verr) / dt  # pull velocity in direction of difference
cont3.append((t, vk))
```

With such a slowly varying signal, the noise seems to encounter different zones of variability.

Take three temperatures 1/16 of a degree apart: 3.4375 and 3.5 and 3.5625. When the fridge temperature is anywhere between 3.48 and 3.52 degrees it will read a rock steady constant 3.5 degrees, but when it’s somewhere in the boundary zone between, say between 3.52 and 3.5425 degrees, then the sensor will choose at random either 3.5 or 3.5625 (with a weighted probability of which end of the interval it is close to), before locking on to a steady 3.5625 as the temperature rises into that stable zone.

[I am now doubtful that this observation of stickiness is true. It is, however, an emergent anomaly that makes it impossible to filter these sequences into a straight line when the underlying temperature change is shallow and exactly linear.]

As mentioned, the 1/16th of a degree Celsius readings correspond to 12 bits of resolution. This requires 750ms for a reading, according to the documentation. If you want it to 1/8 degree precision (11 bits) you can get the answer in 375ms. 10 bits precision takes 187.5ms, 9 bits precision (half a degree) takes 93.75ms. A doubling of the precision takes double the time to read.

Maxim Integrated uses its unique manufacturing capabilities to provide factory-calibrated digital temperature sensors with accuracy as high as ±0.5°C…

[You can successfully compensate for the error offset of the device which is a simple second order curve and] of a repeatable nature for bandgap-based sensors.

This will not work for dual-oscillator-based thermal measurement circuits.

The silicon bandgap temperature sensor is a peculiar circuit that produces an almost exactly linear voltage relative to the temperature. The proposed compensation above requires you to measure and model the second order response curve of the device, as they couldn’t be bothered to calibrate and implement it in the factory.

To convert the voltage into a number, you need an analog to digital converter, of which there are at least 10 implementations.

Because the time duration required to read the temperature doubles for each extra bit of precision, I don’t think it’s using the successive approximation method which should take a constant time for each extra bit. Instead, it must be a Wilkinson ADC, where the voltage is converted to an oscillation and the device counts the number of oscillations within a fixed time window. It’s like a series of sparrows hitting your front door at exactly one second intervals. If you open your door for exactly 15.5 seconds, then you would expect half the time to get 15 sparrows in your living room, and the other half the time 16 sparrows. You will never get 14 or 17 sparrows if the sparrows are evenly spaced. More consistent measures could be made if you opened your door the immediately after the last sparrow had hit instead of simply at random.

Let’s take a break from all this hard theory, and look at the readings I got from the DoESLiverpool fridge:

That’s a beautiful 30 minute cycle between 5 and 9 degrees, exponential curves with about half the time spent running and burning electricity. (Vertical lines are hourly intervals, and horizontal lines are degree intervals from zero.)

We have various fridge opening events, usually in the form of double spikes in the light measurement, because you open it to take the milk out for your tea, and then open it again to put the milk back in.

The bimodal hopping at 1/16ths of a degree occurs all the way along.

I decided to port my damn-efficient convex-hull-based run-time line fitting arduino code into Python so I could plot the results.

Below is the fitting of a curve with 67265 sample points to a series of 123 straight line intervals at a precision of 0.3 degrees, which is deliberately loose to show how it works.

This is same again, but to a precision of 0.09 degrees resulting in 525 subsamples (less than 1% sampling) which represent the curve.

This is about the threshhold. If you go down to 0.06 degrees it’s 10756 points because it starts following all the up and down little steps.

Line fitting of this kind could be applicable for all time-series data so you aren’t tempted to undersample to reduce the data at the cost of losing temperal resolution. If you think about it, your standard time series data

[ (t0, v0), (t1, v1), … (tn, vn) ]

is recorded with the time intervals (ti – ti-1) virtually the same, which means that these values contain no information (subject to the odd glitch).

The only unsatisfying thing about it is that it can only refer to points that are in the data. If you had something simple like a staircase curve:

[(0,0), (0,1), (1,1), (1,2), (2,2), (2,3), …]

there isn’t any subsampling of this set that will fit quite as perfectly as the line y = x + 0.5.

It might be possible to get a better line fit by considering the simple linear regression within the section windows, and simply trigger a new point whenever it goes out of tolerance.

The computational advantage here is that we can find the best fit curve from the cumulative sums of t, t2, t*v, v and v2, which means it could be realistic to encode into the arduino.

Here’s how we can plot the regression across several segments:

```def lf(cont):
n = len(cont)
st = sum(t  for t, v in cont)
st2 = sum(t*t  for t, v in cont)
stv = sum(t*v  for t, v in cont)
sv = sum(v  for t, v in cont)
sv2 = sum(v*v  for t, v in cont)
m = (stv*n - st*sv)/(st2*n - st*st)
c = (sv - m*st)/n
return [(cont[0][0], cont[0][0]*m+c), (cont[-1][0], cont[-1][0]*m+c)]

for i in range(0, len(cont)-500, 500):
sendactivity("contours", contours=[lf(cont[i:i+500])])
```

I made a slight tweak to the algorithm so it fits these least squares lines as it goes along, but my implementation is subject to over-shooting.

So that’s a bust.

The next plan is to look one node back and see if it can be adjusted for a better fit of the lines on either side of the sequence.

Before that, I wired up a TMP36 sensor to a proper ADS1115 analog to digital converter, and got this trace in yellow above the Dallas sensor readings (scaled and translated to fit the picture):

Applying the maximum reading filter on a sample window of data (where the samples can be so much more frequent than the Dallas readings), produces this exceptionally clean curve:

This leads me to conclude that these fancy Dallas sensors are a waste of time. If I am trying to do some basic research I should start with as clean a set of data as possible to make my algorithms work. Then, when the theory is functioning, we can talk about making it work on crappier instruments where the signal is harder to pick out.

Update: A 2 window second max value of sequences of 10 samples taken at 1/5 second intervals is giving superlative results around the fridge opening events.

I’m going to put fridge monitoring aside and play with my other toys for a bit.