Wednesday, April 19th, 2017 at 11:41 pm - - Flightlogger

I’m not going to learn too much from myself, so I’ve got the raw data from Tim’s flight as measured by his Kobo/bluefly IGC logger.

timgps

As you can see, he ranged a good deal further than I did from Mam Tor, reaching an altitude of 1500m, and he reported scoring glide angles of 10 to 1 on his glider much-fancier-than-mine-of-which-I-am-now-jealous.

Let’s see if we can corroborate this from his data with the power of pandas and matplotlib.

pIGCtim = fd.LoadIGC("Tim-1492470000.igc")
pQ = pIGCtim.resample("1S").interpolate(method="time")  # fill uneven records

dstep = 30 # rolling windowed measurements in seconds
vario = (pQ.alt.diff(dstep)/dstep)  # m/s averaged over 30seconds
groundvelocity = numpy.sqrt(pQ.x.diff(dstep)**2 + pQ.y.diff(dstep)**2)/dstep

Here is vario:
timsvario

Here is ground velocity:
timsgroundvel

The glide slope (normally called the glide-angle, even though we don’t quote it as an angle) is the groundvelocity divided by the descent rate. When the descent rate is close to zero, you get stupid numbers, so we need to throw those out and only quote for the parts where there is reasonably direct gliding flight.

(more…)

Friday, April 14th, 2017 at 4:32 pm - - Machining

Building on the weighted servo motor (having spent the first 20 minutes of the day trying to find where I left my code) I’ve tried to code it to run on a sine wave as an oscillation.

It’s not easy, of course.
voltssinwave

How did I get this?

From the fixed voltage experiments I computed the function from the voltage and velocity to the acceleration for motor1 as follows:

cv0, cm0, cv1, cm1, cvm, cvc = 53.202, -1036.634, 56.027, -974.375, -7.497, -1521.571
acceleration1 = (velocity1 - (min(volts1-cv0, 0)*cm0 + max(volts1-cv1, 0)*cm1))*cvm + cvc

Therefore, given the velocity, if you want a particular acceleration, you need the following voltage:

tq = velocity1 - (acceleration1 - cvc)/cvm
if tq < 0:
    volts1 = tq/cm1 + cv1
else:
    volts1 = +tq/cm0 + cv0

The above was the result of trying to make the motor oscillate like a pendulum by setting the acceleration to a negative number times the displacement:

acceleration1 = -acc1fac*(position1 - position1c)

Of course, one source of the problem is knowing the velocity. I’ve bodged this by recording a couple of positions up to 0.2seconds in the past and measuring to them.

if currenttime - prevtime > 0.1:
    position1prev2 = position1prev
    position1prev = position1
    prev2time = prevtime
    prevtime = currenttime
velocity1 = (position1 - position1prev2)/(currenttime - prev2time)

Unfortunately, this produces lots of bumps, so I think I’ll have to do this properly by applying, say, an exponential filter to the position value which delays the position, and use the difference between that and the current position as a measure of velocity.

Like all such runtime velocity measurements (unlike the ones done in pandas that have been calculated by picking a position on either side of the current time, like so:

(df.shift(-10).position1 - df.shift(10).position1)/(df.time.shift(-10) - df.time.shift(10))

…it’s also going to be delayed by a certain amount and be invalid under accelerations.

(more…)

Thursday, April 6th, 2017 at 11:22 am - - Flightlogger

On my last sorry flight I was carrying a total of four GPS units. One of them is a pile of poo. It’s the bluefly one I programmed to read at the frequency of 10 times a second, so greedy was I for precision and data.

gpscompare

Why do they have that option if it’s a complete waste of my time? All that work learning how to remotely step up its baudrate so that it could transmit data at this higher rate.

Here’s how the error comparisons line up, including the minus 3second correction on the 6030 vario’s timestamps. The “u” value is the number of milliseconds from midnight according to the GPS satellite timestamp.

gnames = [ "gps6030", "gpsxcsoar", "gpsTopShelf", "gpsBlueFly" ]
print("samples, gpsA, gpsB, stddev")
for i in range(len(g)):
    for j in range(i+1, len(g)):
        qA, qB = g[i], g[j]
        dt = 3000 if i == 0 else 0 # observed timeslip on 6030 unit
        dx = qA.x - numpy.interp(qA.u-dt, qB.u, qB.x)
        dy = qA.y - numpy.interp(qA.u-dt, qB.u, qB.y)
        k = (dx**2 + dy**2)[tf0:tf1]
        print(len(k), gnames[i], gnames[j], math.sqrt(k.mean()))
samples gpsA gpsB stddev
400 gps6030 gpsxcsoar 6.086
400 gps6030 gpsTopShelf 7.914
400 gps6030 gpsBlueFly 17.040
550 gpsxcsoar gpsTopShelf 7.845
550 gpsxcsoar gpsBlueFly 19.095
4120 gpsTopShelf gpsBlueFly 17.451

Now I have to program the BlueFly to sample at 200ms like the spare top-shelf GPS does, and see if that fixes it.

Meantime, has is that the accelerometer doing with this not-crap GPS?
(more…)

Tuesday, April 4th, 2017 at 1:29 pm - - Machining

Suppose we apply a random series of fixed duty cycles to a servo motor, like so:

volts1

A 50% duty cycle means that the volts are applied half one way and half the other at about a frequency of 100kHz, so it’s equivalent to zero volts.

I can plot the position from the encoder at the same time, like so:

volts1pos

There’s about 1000 ticks per revolution of the wheel and it’s wired backwards so a positive voltage makes it spin backwards.

Amazingly, with the control loop written in Python on a beaglebone controlling the H-bridges, it makes 4500 samples per second, which is perfectly adequate and I have not had to resort to any fancy tricks with C++ or the PRUs.

With a pandas timeseries Series object pos, we can calculate the velocity over a window of +-100 samples (about 1/20th of a second) like so:

# I'd like a better way to shift and difference a timeseries index.
vel = (pos.shift(-100) - pos.shift(100)) / \
      (pos.index.to_series().shift(-100) - pos.index.to_series().shift(100))

And then there is the acceleration:

acc = (vel.shift(-100) - vel.shift(100)) / \
      (vel.index.to_series().shift(-100) - vel.index.to_series().shift(100))

We can plot them all up like so:

volts1velacc

Here’s what the first two seconds look like.
(more…)

Friday, March 31st, 2017 at 10:10 am - - Machining

No time for blogging, but here is a video of the servo motor doing some random lifts with the string.

And this is the graph of the data acquired (red is the step-changing voltage applied) including position, velocity and acceleration:
timetrackservo
Back later when I have time to write this up and do some further analysis.

Friday, March 24th, 2017 at 12:19 pm - - Flightlogger

We move on to the temperature sensor work, and the controversial concept that the temperature of the rising air in a thermal is hotter than the non-upwardly-mobile surrounding environmental atmosphere.

I say it’s controversial because the meaning of “the temperature” in relation to a mobile and turbulent airmass whose structure spans hundreds of metres in the vertical dimension and thousands of Pascals in relative pressure is undefined. Remember that the adiabatic temperature differential is about 0.7degrees per 100m change in altitude.

We do, however, have a single point temperature sensor on a mobile glider which is also progressing up and down the air column (depending on the wind currents and skill of the pilot). The location of the glider with the single temperature sensor is imperfectly known (due to bad GPS (see previous post) and inexplicable barometric behavior (see next post)), and the sensor itself has a thermal mass which means its readings have a delay half-life of about 9 seconds in flowing air (see this other post).

I have taken the precaution of jettisoning my slow, accurate and low resolution dallas temperature sensor for two humidity/temperature sensor combos and an infrared thermometer that has quite a good ambient temperature sensor within its metal can.

These sensors tracked one another pretty well, except for one small problem when I spotted a series of spikes in one and then the other humidity/temperature sensor.

temperaturespikes

What is going on?

(more…)

Monday, March 20th, 2017 at 1:45 pm - - Flightlogger 2 Comments »

Last week I finally had my first flight of the year with my newly build flight data logger. I can’t believe the number of issues it’s already thrown up.

At least I may be making quick enough progress to get past the issues (rather than being swamped by them) using this exceptionally efficient Jupyter/Pandas technology.

For example, my code for parsing and loading the IGC file is 15 lines long.

The code for loading in my flight logger data into a timeseries is just as brief, if you consider each data type individually (there are more than 13 of them from humidity sensors to an orientation meter).

The GPS time series from my flight logger (at 10Hz) can be crudely converted it to XYs in metres, like so:

# pQ is the GPS position pandas.DataFrame
earthrad = 6378137
lng0, lat0 = pQ.iloc[0].lng, pq.iloc[0].lat
nyfac = 2*math.pi*earthrad/360
exfac = nyfac*math.cos(math.radians(lat0))
pQ["x"] = (pQ.lng - lng0)*exfac
pQ["y"] = (pQ.lat - lat0)*nyfac
plt.plot(pQ.x, pQ.y)

gpstrack1

Note the suspicious sharp turn near (-1000, -400). Here’s another sharp turn somewhere else in the sequence covering a 1 minute 5 second period using time slicing technology:

t0, t1 = Timestamp("2017-03-09 15:42:55"), Timestamp("2017-03-09 15:44:00")
q = fd.pQ[t0:t1]
plt.plot(q.x, q.y)

gpstrack2

The dot is at the start point, time=t0.

(more…)

Friday, March 3rd, 2017 at 11:04 am - - Flightlogger, Hang-glide, Uncategorized

To be clear, I haven’t got mathematical proofs here (I don’t have the time), but the experimental evidence is quick to get.

Take the differential barometer sensor (used to measure airspeed) of the hang-glider flight logger. The Arduino code which updates the reading every 200ms looks like this:

long lastpx4timestamp; 
void Flylogger::FetchPX4pitot()
{
    long mstamp = millis(); 
    if (mstamp >= lastpx4timestamp + 200) {
        px4pitot->readpitot(); 
        sdlogger->logpitot(px4timestampset, px4pitot-rawpressure, px4pitot->rawtemp); 
        lastpx4timestamp = mstamp; 
    }
}

Why did I choose 200 milliseconds? It sounded like a good number to read it at. This is a quick way to program it to be a regular reading.

A better way is to actually synchronize it with the clock divided rather than simply add 200ms to the next time, like so:

int mstampdivider = 20; 
int prevmstampdivided = 0; 
void loop()
{
    long mstampdivided = millis()/mstampdivider; 
    if (mstampdivided != prevmstampdivided) {
        prevmstampdivided = mstampdivided; 
        P(micros());  P(" ");  P(singlereading());  P("\n"); 
    }
}

Now that code reads at 20ms rather than 200ms, but it prints a load of output which I can cut and paste into a file and read into pandas, like so:

rows = [ (int(s[0]), int(s[1]))  for s in (ln.split()  for ln in open("../logfiles/dmprapidtest.txt").readlines())  if len(s) == 2]
k = pandas.DataFrame.from_records(rows, columns=["t", "d"])

And then we can plot the autocorrelation (the covariance) with itself shifted in time, like so:

d = k.d   # just the measurement Series
dm = d.mean()
ss = [((d - dm)*(d.shift(i) - dm)).mean()  for i in range(400)]

autocov1

Let’s zoom in on the first 50 covariances:
(more…)

Wednesday, March 1st, 2017 at 7:55 pm - - Flightlogger

Lately I have become interested in noisy sensors; all sensors are noisy, so this is an applicable subject.

The interest comes from my attempts to read up on Kalman filters where the variance of the sensor measurements is a crucial input into calculation of the weight to apply to that measurement as well as the subsequent variance of the output.

I was wanting to do something more scientific with this noisy CO2 sensor, other than simply filtering it and saying to myself: “That looks like a nice clean line”.

noisyco2

At the same time, a minor brush with the concepts of statistical seasonal adjustment lead eventually to the autoregressive model and a way to test the quality of the noise — eg whether it is pure noise or the result of noise that has been filtered by a smoothing process.

For example, if you take pure white noise function X[*] with a standard deviation of 1 and pass it through the usual exponential decay filter:

Y[n] = Y[n-1]*f + X[n]*(1-f)

the standard deviation of Y[*] is

(1-f)^2/(1-f^2)

and the covariance between Y[*] and Y[*-1] (formed by averaging the product of the sequence with an offset of itself) is f times this value (whereas the covariance between the purely random sequence X[*] and X[*-1] is obviously zero).

In the absence of attendance at a university lecture course taught by a professor who has structured a procession of scientific concepts into a coherent order in which to learn over the course of a semester, I am happy to follow this journey where it takes me.

My glider data logger has two wind sensors now which, as with most sensor problems, disagree unpredictably. Airspeed is a tricky thing, due to local turbulence and large scale vortices which exist owing to the existence of a levitating wing.

One of the sensors is a pitot tube connected to a MS4525DO differential pressure device purporting to measure to 14 bits of precision as often as every 0.5milliseconds.

I’ve set up the pitot to read as fast as possible, about every millisecond, like so:

Wire.beginTransmission(0x28);
Wire.endTransmission();
delayMicroseconds(500); // needs a delay of 500microseconds or the reading is marked stale

Wire.requestFrom(0x28, 4);
uint16_t pmsb = Wire.read(); // this reading takes 400microseconds whatever happens
uint16_t stat = (pmsb & 0xC0) >> 6; // 0 good, 2 stale, 3 fault
uint16_t plsb = Wire.read();
uint16_t rawpressure = ((pmsb & 0x3F) << 8) | (plsb);

uint16_t tmsb = Wire.read();
uint16_t tlsb = Wire.read();
uint16_t rawtemperature = (tmsb << 3) | (tlsb >> 5);

Now immediately you’re going to have a problem because the measurements aren’t continuous; the values are quantized.

shortplotdmb

(more…)

Monday, February 27th, 2017 at 7:13 pm - - Machining, Whipping

For a number of years I have been familiar with the observation that the sophistication of, in particular, time series data analysis is adversely impacted by the use of the Excel spreadsheet program. More recently I have discovered exactly how it is an irreparably deficient application and I am convinced that its use should be abolished from all non-small business accounting applications (ie everything except what it was originally intended for).

Hitherto I did not attach much importance to this view, owing to the fact that it is considered an anti-Microsoft bias as well as a “lost cause” because “everyone uses it”. However, on learning the existence of a large body of signal processing theory which is all but inaccessible to users of Excel due to its fundamental nature, I submit my observations for consideration below.

My first remark is that if data scientists don’t know about the benefits and substantial applications of multi-source data combinations, Kalman filters and seasonal adjustments reliant on the autoregressive-moving-average model, they are missing an important part of their job and are deferring the implementations of these concepts to mere “estimation by eye” from the graphs.

My second remark is that when external software exists that can be used to, say, calculate and compensate for the seasonal adjustment, it generally requires the data to be submitted in a time series format, and this requires a great deal of preparation of the spreadsheet data. Thus the appearance of being able to open up and immediately (and supposedly do) work with the spreadsheet data within seconds is deceptive, because there is now a longer route for the data to move it back out in a form to be processed and re-imported back into the spreadsheet for familiar viewing.

Let us consider a couple of time-series data sets. For example, the monthly national GDP and the employment statistics, or imagine one minute intervals of temperature and electricity use in a building.

What elements of the data are required to perform any useful processes on it, beyond simply the production of visual graphs?

For time series data (which a great proportion of data can be described as being), the existence of a reliable datetime value is paramount. Excel may in theory have a datetime cell type, but it is not visibly distinguishable from an unstructured string type with its ambiguous American and English date orderings. As such, it cannot be consistently used because improper use does generate an unignorable error (eg anything in column A must be in this datetime form or you can’t save the file).

Furthermore, just the datetime is not enough, because there are time series intervals (for example, monthly or quarterly data) and these cannot always be approximated by a single time point. By convention quarterly intervals can either be represented by the end of the quarter (eg sales results) or the start of the quarter (eg sales targets) but both need to be boxed into the same entity in any subsequent regression model.

Finally, when you have different data series from different data sources they usually work to different sample rates, so you cannot represent them adequately as a single row per time sample. This would apply to the power use for the heating system which is provided every minute, when the average outdoor temperature is recorded daily.

Accordingly, the primary dimension of the data points, the datetimes, are problematical. But what of the data point values, the measured quantities? If they are each recorded into a single spreadsheet cell we will invariably be lacking an associated standard deviation/confidence interval for them. The standard deviation is an crucial input to the Kalman filter for the determination of the weight applied to each individual measure.

Take the example of the monthly political polling data prior to an election. These are often supplied by different agencies and almost always come with a confidence interval that depends on the size of sample so we know to take less notice of a poll which defies the steady trend when it has a wide margin of error. But then if there are more polls with the same wide margin of error that are also in line with that new trend, the best guess of the trend will be pulled in the new direction as much as it would have been by one very good accurate poll with a narrow margin of error. This balancing of the estimations from the aggregation of the location and accuracy of the measures is optimized by the Kalman filter, and should not be done by eye from the charts themselves merely because it can’t easily be applied in Excel and we’re too lazy to convert our working platform to something where it could have been easily applied.

And this brings me to the final point about Excel, which apparently can do anything because it can run programmed macros. Really? Who can honestly think, if they have every stopped to consider it, that it is a good idea to co-mingle software with data? You might as well nail your vinyl record onto the record player and then parcel-tape it into a heavy cardboard box to prevent interchangeability.

The co-mingling of data and code with no reliable means of disconnection leads to dangerous and ridiculous practices, such as copying the data into and out of the spreadsheet by hand just to access the service of the macros.

Come on folks. If you’re going to call yourself data scientists, you cannot rely on a single tool that prevents you from applying the last fifty years of data science know-how optimally and productively — and then rely on its inadequacy as an excuse to not challenge yourself to learn and master the amazing mathematical technology that would have been at your disposal had you not chosen ignorance over education.

We have got to get beyond the pointless issue of formatting data into “machine readable form” for the mere purpose of making graphs for visual titillation, and get to grips with actual intelligence and effective control theory.

There is, for example, nothing smart about having to control a car with a human looking out through a window for white lines on the tarmac and the traffic lights on the verge in order to move the steering wheel and pedals in response. Smart is getting the data to feed-back directly into a computer that controls these motions optimally while you sit back in awe having merely specified the destination. But if someone out there building the tech has dared to embed a copy of Excel within the process chain between the sensor acquisition and the motor control actuators, then we are doomed.

Thanks to the famous Go to considered harmful letter of 1968. There was a heated debate about it, and 30 years later it was unconscionable that programming languages could even be conceived of to include a goto statement. Kids these days probably don’t even know what one is.

But just think about all that programming wasted and how much further on we could have been without the inclusion of that single statement which caused so much unnecessary expense and buggy code throughout the years, and then imagine how much damage is being caused by inappropriate use of this inadequate data non-analysis tool up to now and for the next 20 years before it too finally gets buried in the ash-can of history and people don’t even remember that we ever used it in the first place.

This has been the moment of truth.

Good day.