I’m trying to extract start and end dates as the delimiters of different periods. I would like them to be on the same row, but I can’t figure out how to do it.
The query I’m using is:
-- This select gets the start and end dates, but on different rows
SELECT date AS start_date
, date_sub(date, interval 1 second) AS end_date
FROM log
WHERE date(date) >= '2013-06-10 00:00:00'
AND date(date) < '2013-06-16 23:59:59'
AND type in (1,2,3)
GROUP BY date(date), type
UNION
-- This next one gets the end date for the last date in the period
SELECT date_sub(date, interval 1 second) AS end_date
, date
from (
SELECT date
FROM log
WHERE date(date) > '2013-06-16 23:59:59'
AND type IN (1,2,3)
GROUP BY date(date), type
LIMIT 1
) AS next_one
ORDER BY start_date
An example of the data I’m working with is:
date type
2013-06-06 13:05:44 1
2013-06-13 15:34:45 2
2013-06-13 15:31:08 1
2013-06-20 13:10:32 1
2013-06-27 14:05:16 1
2013-07-04 13:58:14 1
The way you do this is to do a join to the same table, joining on the earliest date after the date in the main table. And you get the earliest date by doing a subquery. It always feels like I’m doing one join too many when I do this, so I’m open to correction.
SELECT `log`.type, `log`.`date` AS start_date, next_log_entry.date AS endDate
FROM `log`
LEFT JOIN `log` AS next_log_entry
ON next_log_entry.type = `log`.type AND next_log_entry.date = (
SELECT MIN(next_log_entry2.date)
FROM
`log` next_log_entry2
WHERE next_log_entry2.type=`log`.type
AND next_log_entry2.date>log.date
)
WHERE `log`.`date` >= '2013-06-01 00:00:00'
AND `log`.`date` < '2013-06-16 23:59:59'
AND `log`.`type` IN (1,2,3)
ORDER BY `log`.`type`, `log`.`date`
I’d also be inclined to avoid using function calls in your WHERE statements (eg DATE(date)) because, as far as I know, MySql will have to apply that function to every row in the table, thus skipping indexes and slowing down the query.
Yes, I do agree about the use of the date function. That was an oversight, as I have to use it on another very similar table that has its dates stored in unix timestamp format in an integer column. I’ve never understood why people do that when there are perfectly good date data types available. Oh well, never mind.
r937 - That’s certainly another way to do it. Or one could use BETWEEN, which is what I did.
BETWEEN will do an equals match at both end points of the range, and that would be wrong at the top end
whereas what i posted will work correctly always, regardless of the precision of the column in question, whether there are seconds, milliseconds, or whatever
the benefit of my technique (not specifying BETWEEN) becomes obvious when you are dealing with the last day in february
Unfortunately I have to work with the dates that are fed from the application, which means I need to include the upper one in the range, so the between will do that, although it may not be the most future proof method.
The application is only a few years old, so no legacy data.
Date data types give more scope for easy date manipulation than a straight integer does.
But that doesn’t detract from your points, though, as they all quite valid.
if you are tasked with returning data for the dates of january 27 through january 29, and given that your date column is a datetime and not just a simple date, would you please show me your BETWEEN code
okay, at least you captured most of the datetimes on the 31st (you’re missing any that took place between 23:59:59.000 and 00:00:00.000, which is admittedly a small window of error, but don’t forget our dear cousin murphy)
when you say the dates are fed from the application, are you saying that the application will already pre-calculate things like properly handling the last day of february?