I am into long overdue project with personal weather station data.
My weatherstation produces, every minute, a record containing just over 100 fields of data - where perhaps only 30-40 contains real data, the rest is just "filled" because Ambient's Virtual Weather Station (VWS) can gather data from a huge number of different weather stations, in different formats/types.
The csv files from VWS are populating a MySQL database via VWSql, a product which 'just' reads the csv file and pumps the data into the MySQL database.
So, it is only one table.
Sometimes I get some bad data, which I know how to look out for. So, my idea is to write a trigger that checks for bad data, and then just shovel such possibly bad data into a table with exactly the same fields but with another name.
If good data, I want to update hourly, dayly, monthly, yearly tables tables with with statistics. Like average, max, min temps, max wind, wind direction, UV, solar incidence, barometrics, precipitation, [there are 7 basic readings made], and maybe some derived data like air density (which is calculated from the basic seven readings) - It all adds up to some 30-40 fields.
My thought was to store these updated values, in the different calculated tables every minute - I think that this goes against some kind of database rule like: Don't store calculated values - get them on the fly by SQL SELECT statements...
On the otherhand, it may be faster - especially for another site, in a city, where there could be lots and lots of visitors to my web site.
I'd like to present each visitor with current weather, minute and hourly average, todays average, ... and comparisions with last two-five years, and ten years ago (I only have 5+ years of data so far).
Users would only through PHP code see result sets from views (security).
The main table uses a DATETIME field as unique (incl minutes - YYYYMMDDHHMM format).
I have had a look at splitting up just the datetime, which by itself is quite a bit of coding. So, I thought it would be simpler SQL, but on the other hand quite a bit of programming for the stored procedures/triggers.
Database space is not that much of a problem - the main table grows with just around 140 MB per year.
So, what are your recomendations? Large SQL SELECT statements or large trigger/stored procedures with simpler SELECTs?
(The old stubborn man (also in a positive way) - in dialects in north of Sweden)
first off, i would simply store the raw data, or the cleaned up data, whichever you want
and i would calculate statistics on the fly only for hourly and daily, because this limits the number of rows involved in the calculation
but any data older than today would come out of a statistics table
that means you need to run something once a day (overnight) to add yesterday's stats to the weekly, monthly, yearly tables
does that help?
Thanks Rudy! Yes that absolutely makes sence - a little of both - I immediately see the advantages.
With such a quick (and good) answer I go to bed now without any worries!
PS! I have read the downloadble pdf of chapters of Simply SQL - I wish I had that back with Knowledgeman/Kman and MDBS IV... (yes, the mid 80-ies) The book is already ordered - really superb to brush up my SQL - and with all the modern stuff! (I have only used MS Access and SQLServer a bit professionally later on) but not in much contact with db:s almost for the last ten years.