How select on interval of 15min for drawing charts with adding empty records?

Hi all,

I’m trying to build a query that is a little out of my leak. The thread on http://www.sitepoint.com/forums/showthread.php?594274-How-do-I-return-a-group-count-for-every-15-minutes-of-the-hour-on-my-Table/page2 is almost the same what i want to achieve to. But (ofcourse) a little different:)

As i’m trying to build my own ‘smart home’ as a hobby project. I’ve put several sensors across my house. Temprature and humidity mostly for the moment. Every minute I poll those sensors and put the data in a MySQL database.

I use a very simple tabel:

CREATE DATABASE IF NOT EXISTS `Arduino`;
CREATE TABLE `sensorid18` (
  `DateTime` datetime DEFAULT NULL,
  `SensorValue` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

/*Data for the table `sensorid18` */
insert into `sensorid18` values ('2012-01-18 02:00:00',19);
insert into `sensorid18` values ('2012-01-18 02:15:00',18);
insert into `sensorid18` values ('2012-01-18 02:30:00',18);
insert into `sensorid18` values ('2012-01-18 02:45:00',17);
insert into `sensorid18` values ('2012-01-18 03:00:00',17);
insert into `sensorid18` values ('2012-01-18 03:15:00',17);
insert into `sensorid18` values ('2012-01-18 17:30:00',14);
insert into `sensorid18` values ('2012-01-18 17:45:00',14);

Note: De example data above is actually every minute. But i just post enough test data to support my question. And not all the data in my table.

From the thread I mentioned above, r937 explains very well how the build the query. Espacially the second question from olivier89. But I can’t get my head arround it to give me the results I want.

So far I took r937’s example from the excellent post above and altered it a little bit into this:

SELECT 

FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(sensorid18.DateTime)/900)*900) AS timeslice,
sensorid18.SensorValue AS SensorValue

FROM sensorid18
WHERE sensorid18.DateTime BETWEEN '20120118020000000' AND '20120118180000000'

GROUP BY timeslice
Order by sensorid18.DateTime ASC;

This gives me almost the results that i want, but not really. As you can see there is a gap between ‘2012-01-18 03:15:00’ and 2012-01-18 17:30:00’. So the result that i’m getting is:

timeslice,SensorValue
2012-01-18 02:00:00,19
2012-01-18 02:15:00,18
2012-01-18 02:30:00,18
2012-01-18 02:45:00,17
2012-01-18 03:00:00,17
2012-01-18 03:15:00,17
2012-01-18 17:30:00,14
2012-01-18 17:45:00,14

And what i need is records that will fill in the time gaps in between ‘2012-01-18 03:15:00,17’ and ‘2012-01-18 17:30:00,14’.
Like this:

timeslice,SensorValue
2012-01-18 02:00:00,19
2012-01-18 02:15:00,18
2012-01-18 02:30:00,18
2012-01-18 02:45:00,17
2012-01-18 03:00:00,17
2012-01-18 03:15:00,17
2012-01-18 03:30:00,0
2012-01-18 03:45:00,0
2012-01-18 04:00:00,0
2012-01-18 04:15:00,0
2012-01-18 04:30:00,0
2012-01-18 04:45:00,0
...
2012-01-18 17:30:00,14
2012-01-18 17:45:00,14

I know that i must use the INNER JOIN statement. But i can’t get my head around it. I know there are guy’s here that know how to do it :slight_smile: I’ve beeing reading many threads on this forum :smiley: So, who is giving me a nutch in the right direction ?

Thank you in advange,
Greetings from Belgiüm.
Sven.

no, you want LEFT OUTER JOIN :slight_smile:

the left table will be the generated datetimes for all time slices you want to see, including those for which there is no corresponding data

that’s what the LEFT OUTER JOIN will accomplish – all timeslices, with or without matching sensor data

generating the timeslices is not a big deal, and i’d be happy to write that part for you, if you can always provide an initial starting datetime value, plus an interval number (e.g. 15 minutes), plus an ending datetime value

can you explain again what it is you want? the minimum sensor value in the timeslice? the average? the maximum?

because the GROUP BY query you posted provides an indeterminate value for each timeslice (for reasons i won’t go into unless you’re really curious)

Hi r937,

I was hoping you will be commenting my question :slight_smile: Because all your answers in the numerous threads i’ve been reading are always spot on :slight_smile: So thank you for that in advance.

In theorie I know what I have to do. The mechanism behind getting 2 tables, one with preformated data (in this case timeslices from one datetime stamp to another). And another table with the actual timeslice data pulled from the database. After getting to 2 tables populated. I just do LEFT OUTER JOIN to merge the 2 tabels together. With alle the records found in table 1, and only the records from table 2 that holds the same ‘ID’ as table 1.

But I have no idea how to translate this theorie in one big fat query :slight_smile: After spending more then 7 houres staring at your previous thread. I dedcided to ask the question myself :wink: Before that, i’ve spend 3 day’s browsing the web to find your initial thread in the first place. So i’m very excited to learn from you.

The value that I want to retrieve in the end, is the average value. I already modified my query so it will give me the average instead of the indeterminate value. I think I did the right thing to modify the string into FORMAT(AVG(sensorid18.SensorValue),0). But i still have to do a manual calculation to be absolutely sure. Thank you for pointing that out !


SELECT 

FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(sensorid18.DateTime)/900)*900) AS timeslice,
FORMAT(AVG(sensorid18.SensorValue),0) AS SensorValue

FROM sensorid18
WHERE sensorid18.DateTime BETWEEN '20120118020000000' AND '20120118180000000'

GROUP BY timeslice
Order by sensorid18.DateTime ASC;

So the problems that i’m facing are:

  1. How can I make a ‘dummy’ table from a ‘start datetime stamp’ to an ‘stop datetime stamp’ with a interval of 15mins
  2. And how do i ‘integrate’ this with my query I have so far (LEFT OUTER JOIN) ?

I really appreciate your input/insight r937.

Thank you in advance,
Greetings from Belgiüm.
Sven.

please see this post for creating the timeslice “dummy” table

then i’ll help you with the LEFT OUTER JOIN if you need it

Hi r937,

Oké, i’ll give it another try.

This is the first post I was refering to in my initial post. I’ve been studying this post very well prior to asking my question on this forum :slight_smile:

i’ll give it another shot !

Greetings from Belgiüm.
Sven.

Hi r937,

ALRIGHT !!!

After having another look at your example I suddently got the ‘AHA’ moment :D. So I altered it a bit so i will give me exact the same result as the query I use for the sensor data.


SELECT STR_TO_DATE('20120118020000000','%Y%m%d%h%i%s') + INTERVAL (n-1) * 15 MINUTE AS timeslice

FROM numbers

WHERE STR_TO_DATE('20120118020000000','%Y%m%d%h%i%s') + INTERVAL n * 15 MINUTE <= STR_TO_DATE('20120118030000000','%Y%m%d%h%i%s');

I never looked at the example that closely before because it relies on another fysical table to do the math. And I was seeking for a solution insight the query itself. The reason i was pursuing this methode was because I thought fysical reads would be slower then virtual reads. But Hey !! This query is blasing fast :slight_smile: So i’ll stick with this :smiley:

So now having 2 bits of code:

Query #1
SELECT 

FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(sensorid18.DateTime)/900)*900) AS timeslice,
FORMAT(avg(sensorid18.SensorValue),0) AS SensorValue

FROM sensorid18
WHERE sensorid18.DateTime BETWEEN '20120118020000000' AND '20120118030000000'

GROUP BY timeslice
Order by sensorid18.DateTime ASC;

Query #2
SELECT STR_TO_DATE('20120118020000000','%Y%m%d%h%i%s') + INTERVAL (n-1) * 15 MINUTE AS timeslice
FROM numbers
WHERE STR_TO_DATE('20120118020000000','%Y%m%d%h%i%s') + INTERVAL n * 15 MINUTE <= STR_TO_DATE('20120118030000000','%Y%m%d%h%i%s');

All I need now is a LEFT OUTER JOIN :slight_smile: I’m going to give it a try :cool:

Greetings from Belgiüm.
Sven.

And this version is even better :slight_smile:

SELECT min_date + INTERVAL (n-1) * 15 MINUTE AS timeslice
FROM (
   SELECT
   STR_TO_DATE('20120118020000000','%Y%m%d%h%i%s') AS min_date,
   STR_TO_DATE('20120118030000000','%Y%m%d%h%i%s') AS max_date
   ) AS m
CROSS JOIN numbers
WHERE min_date + INTERVAL n*15 MINUTE <= max_date

Greetings from Belgiüm.
Sven.

i am pleased that you had an “aha” moment :slight_smile:

regarding your LEFT OUTER JOIN, i will help you with that if you run into trouble

i don’t think you will need to use STR_TO_DATE, though

try this for your query #2

SELECT timeslice
  FROM ( SELECT '2012-01-18 02:00' + INTERVAL n*15 MINUTE AS timeslice
           FROM numbers ) AS t
 WHERE timeslice <= '2012-01-18 03:00'

i don’t really know why you used n-1 instead of n… your numbers table should have 0 as the first number

Hi,

I got the LEFT OUTER JOIN working. At least I think. It is extreemly slow :confused: Take a look at the screenshot. I takes 11 seconds to return 164 records. If I run the two querys indepent, i get 70ms for each of them.

After this I used the EXPLAIN object to find out why it’s taking so long. After this I know it is the variable ‘t’ that’s causing the slow query excution time. But i can’t get it to run faster.

This is the query i have so far.

SELECT d.timeslice,t.SensorValue

FROM 	(
	SELECT timeslice
	FROM 	(
		SELECT '2012-01-18 02:14' + INTERVAL (n-1) * 15 MINUTE AS timeslice
		FROM numbers
		)
	AS t
	WHERE timeslice <= '2012-01-19 19:00'
	)
	AS d
LEFT OUTER JOIN Sensorid18 AS t ON t.DateTime BETWEEN d.timeslice AND (d.timeslice + INTERVAL 15 MINUTE)	

GROUP BY d.timeslice

LIMIT  500

I did put a PRIMARY KEY on the sensorid18.datetime field and the query execution time reduced to 8 seconds. But this is still ridiculous long.

Any Idea where I went wrong ?

BTW: The reason i use (n-1) is because my numbers table is starting from “1”. Not “0” :slight_smile:

Greetings from Belgiüm.
Sven.

i think you wanted AVG(t.SensorValue) in the SELECT clause, didn’t you?

i don’t know why it’s ridiculously long…

please do an EXPLAIN on your query and let’s see if that tells us anything

Hi,

Yes indeed. I still have to add AVG(t.SensorValue). Thank you for pointing that out. Actually, I use FORMAT(avg(t.SensorValue),0) to get a nice round figure.

Look at my previous post (Attachment 58830), this image has the result of the EXPLAIN object.

Greetings from Belgiüm.
Sven.

oops :blush:

i’m afraid i cannot figure out why it’s not using the index on the sensorid18 table

Hi,

@r937
This is a new screenshot that shows the index beeing used. The query went from 11sec to 8sec execution time.

@DerekWayne
Hmm, interresting … the timeslice file is not a fysical file. How do i add an index on that ? I only have a table called numbers. With one field (=n). And that field has an index. How do i put an index on timeslice ?

Greetings from Belgiüm.
Sven.

Hi,

I tried to convert my db from Innodb to MyIsam. Ad the query took 16sec instead of 8. I also noticed that during the query the CPU of my MySQL server went to 100%. I’m running Win2008R2x64 on AMD64+3500+ with 4GBram. And the version of MySQL is x64 5.5.

This on a side note :slight_smile:

The reason i think this query is so slow is because his using all the records inside the table sensorid18. There are now 28981 rows in it. And every minute there is a new line added. I think I have to narrow the initial selection of table sensorid18 to the records that are between the 2 dates (start en stop date). Does this make any sence ?

Hi,

Alright. We are now at 6,8 sec instead of 8,4 sec :slight_smile: We have won 1,6 sec.

LEFT OUTER JOIN Sensorid18 AS t ON t.DateTime BETWEEN d.timeslice AND (d.timeslice + INTERVAL 15 MINUTE)

to

LEFT OUTER JOIN Sensorid18 AS t ON t.DateTime >= timeslice AND t.DateTime <= (d.timeslice + INTERVAL 15 MINUTE)

want another suggestion?

instead of running the query based on the numbers table, you might consider setting up a table of timeslices

this table would be easy to generate, and could be initialized to contain enough timeslices for the future so that you would only need to re-populate it occasionally…

:slight_smile:

hmm, i’m afraid this is not an option. Because i use different timeslices all the time. I’ll try to explain

Start: 2012-01-18 02:14 to 2012-01-18 03:24 with an interval of 1min
Start: 2012-01-18 02:14 to 2012-01-19 03:24 with an interval of 10min
Start: 2012-01-18 02:14 to 2012-01-18 03:11 with an interval of 30min
Start: 2012-01-18 02:14 to 2013-01-18 03:24 with an interval of 1 month
ect …

I’m writing an interface that will give me a chart from every giving start time to every giving stop time. With a selectable interval from 1min, 10min, 15min, 30min, 1hour, 6hours, 12houres, 1 day, 1week, 1month en 1 year. I don’t know if you know Cacti ? If you do, it’s the same graphing functionality i’m trying te achieve. But for numerous reasons I can’t use Cacti.

But it is not a bad idea at all :slight_smile:

sounds like your minimum timeslice interval is one minute

create a table with timeslices spaced one minute apart

there’s only a half million rows per year, quite a manageable size for a mysql table, and and index would be effective

That’s correct. My minimum timeslice is 1min.

I’m going to have a crack at your idea r937.

I’ll create a table starting from 2011-01-01 00:00:00 till 2012-12-31 23:59:00 right ? So, if i’m planning to hold like 5 years of data. I need to create a table starting from 2011-01-01 00:00:00 till 2017-12-31 23:59:00. it’s about 2,5mil records. I have databases with over 2mil records and they still perform great. So I have no worries about that.

The only thing i don’t like in this idea, is the fact that it depends from pre-formated timeslices. It feels like a ‘dirty’ solution. But i’m going to give it a try nevertheless … It should be nicer not to be restricted by the timeslice table i’ll create.

But hey, thanks for the idea and the effort you put into this. It’s getting quite challenging …

Alright, after a few day’s not beeing able to work on the query i finally found some time to report.

Like you suggested in your previous post, i’ve made a table with timeslices starting from 2012-01-11 00:00:00 to 2026-12-31 12:59:00. So covering 5 years.

The query i made so far dossen’t give me the right results dow.

SELECT "2012-04-04 12:14:00" + INTERVAL 15 MINUTE AS timeslice1
FROM timeslices where timeslice >= "2012-04-04 13:14:00" and timeslice <= "2012-04-04 14:14:00";

Is giving me:


2012-04-04 12:29:00
2012-04-04 12:29:00
2012-04-04 12:29:00
2012-04-04 12:29:00
2012-04-04 12:29:00
2012-04-04 12:29:00
...

The record count is ok (61 records) and the result according the query is also correct :slight_smile: But not what i want. But i have no idea how to alter the query so it will give me


2012-04-04 12:14:00
2012-04-04 12:29:00
2012-04-04 12:44:00
2012-04-04 12:59:00
2012-04-04 13:14:00
2012-04-04 13:29:00
...

Anyone has an idea ?

Regarding to the initial query that works perfectly but is ridiculous slow. I got another insight to share. I’ve put the Log-slow-query feature on. And after looking at the output. I now know why the query is slow.

SELECT d.timeslice, FORMAT(avg(t.SensorValue),0)

FROM 	(
	SELECT timeslice
	FROM 	(
		SELECT '2012-01-18 02:14' + INTERVAL (n-1) * 15 MINUTE AS timeslice
		FROM numbers
		)
	AS t
	WHERE timeslice <= '2012-01-19 19:00'
	)
	AS d
LEFT OUTER JOIN Sensorid18 AS t ON t.DateTime >= timeslice AND t.DateTime <= (d.timeslice + INTERVAL 15 MINUTE)	

GROUP BY timeslice

Look at the records he has to go through. So for returing 164 records, he has to walk trough 5,7mil records. No wonder that it takes more then 8 seconds to complete. Any ideas to optimize this query ? All the indexes are in place.

Query_time: 8.775000
Lock_time: 0.000000
Rows_sent: 164
Rows_examined: 5704100

Greetings from Belgiüm.
Sven.