Struggling with a JOIN query

Been working on this report query for quite some time and hitting a brick wall at every turn. No matter which way I approach it the figures suddenly go coompletley out, so I’m hoping that someone can help. I have two tables - delay and delayHistory. I started off creating a query against the delay table which works great. giving me the results I was looking for

     SELECT 
        delayID, delayBookClientRef, MONTH(delayCreated)-1 AS mnth,
        COUNT(CASE WHEN delayDecision >=0  OR delayDecision IS NULL THEN delayID end) AS toteD,
        COUNT(CASE WHEN delayDecision >=1 THEN delayID end) AS cfrmd,
        COUNT(CASE WHEN delayDecision IS NULL THEN delayID end) AS ucfrmd,
        COUNT(CASE WHEN delayDecision = 0 THEN delayID end) AS eie,
        
        SUM(
            IF(DATEDIFF(delay.delayStopped, delay.delayCreated) IS NULL, 
               DATEDIFF(CURDATE(),delayCreated),
            IF(DATEDIFF(delay.delayBookADD,delay.delayBookEDD) > DATEDIFF(delay.delayStopped, delay.delayCreated) 
            OR DATEDIFF(delay.delayBookADD,delay.delayBookEDD) IS NULL, 
               DATEDIFF(delay.delayStopped, delay.delayCreated),DATEDIFF(delay.delayBookADD,delay.delayBookEDD)))) 
            AS maxD,
        SUM(
            IF(DATEDIFF(delay.delayStopped, delay.delayCreated) IS NULL, 
               DATEDIFF(CURDATE(),delayCreated),
            IF(DATEDIFF(delay.delayBookADD,delay.delayBookEDD) > DATEDIFF(delay.delayStopped, delay.delayCreated) 
            OR DATEDIFF(delay.delayBookADD,delay.delayBookEDD) IS NULL, 
               DATEDIFF(delay.delayStopped, delay.delayCreated),DATEDIFF(delay.delayBookADD,delay.delayBookEDD)))) * 120 
            AS maxC
FROM delay
        GROUP BY  MONTH(delayCreated)

What I’m trying to do now is get a SUM of the accumulating delay days and cost figures for each delay in the delayHistory table and this is where it all goes horribly wrong. None of the column names are the same in either table and yet by adding this simple line

JOIN delayhistory ON delayhistory.delayIDH = delay.delayID

between the FROM and the GROUP throws all my previous figures out and I’m failing to understand why. I feel as though I’m disappearing up my own query…
Clearly I’m missing something and its probably something really obvious.! or am I going about this in the wrong way? Clearly I’m not a SQL guy but If anyone can help ( in easy to understand words) it would be appreciated.

i don’t know your data but from your description of the problem, there is a one-to-many relationship between delay (one) and delayhistory (many)

please explain why you wanted to add that join

also, please note that if you GROUP BY month only, then the values in delayID and delayBookClientRef are meaningless

@r937, thanks for coming to my assitance once again. In response to your comment - I don’t have a relationship between the two tables (Perhaps thats one of the issue!? I didn’t realise until I checked just now).
I Appreciate that the delayID and delayBookClientRef are meanigless when grouped by month but I want to construct a more dynamic query that might be grouped by WEEK or by just delayID to give a more granular perspective.

The delay table holds all delays and relfects their current status, so there’ll only ever be one line per delay
The delayHistory holds pretty much the same table structure but obviously is updated by a trigger on the delay table everytime a delay is updated and thus there’ll be many references to the one delay.
From the delay table my query can bring back such things as total delays, delays that are confirmed as delays, delays that were entered in error (eie)and unconfirmed delays etc… The maxD and maxC give figures for the worst case scenario if the fullness of these delays is realised. If I construct a similar query on the delayHistory table, I’d get the currD and currC which reflects the current amount of delay as opposed to the potential ‘max’. like this

SELECT MONTH(delayCreatedH)-1 AS mnth,
SUM(CASE WHEN delayReviewStatusH = 2 AND delayUpdateTypeH = 1 THEN delayReviewDaysH ELSE 0 END) AS currD,
SUM(CASE WHEN delayReviewStatusH = 2 AND delayUpdateTypeH = 1 THEN delayReviewDaysH ELSE 0 END) * 120 as currC

FROM delayhistory
    GROUP BY MONTH(delayCreatedH)-1

So, in my crazy little head I thought I might be able to JOIN the two tables in a query or create a subQuery to bring everything together. JOINS I am OK with to an extent but subQueries baffle me at times (probably because I don’t do them very often)… What I wanted to do is have the results from the delay table and delayhistory table in one result set showing month, toteD, cfrmd, ucfrmd, eie, maxD, maxC, currD, currC.

Thanks again for your input

[quote=“ReggieTheDog, post:3, topic:192118, full:true”]
The delay table holds all delays and relfects their current status, so there’ll only ever be one line per delay
The delayHistory holds pretty much the same table structure but obviously is updated by a trigger on the delay table everytime a delay is updated and thus there’ll be many references to the one delay.[/quote]that wasn’t obvious at all :wink:

give me a day to think this over please

@r937, apologies, you are right it isn’t/wasn’t obvious… :blush: I’ve been plugging away at this for quite a few days now so perhaps can’t see the wood for the trees. Appreciate your continued support. Sometimes SQL is an enigma to me, other days it falls neatly into place…!

okay, i’ve jumped into your query a couple of times and i’m drowning

i guess it’s because i don’t understand your columns

i can understand the COUNTs but not the SUMs with those wacky IFs

i wonder if i could get you to test a modified query that returns raw data from the delay table –

SELECT delay.* -- all original columns, plus some more... , MONTH(delay.delayCreated) AS mth , CASE WHEN COALESCE(delay.delayDecision,0) >=0 THEN delayID END AS toteD , CASE WHEN delay.delayDecision >=1 THEN delayID END AS cfrmd , CASE WHEN delay.delayDecision IS NULL THEN delayID END AS ucfrmd , CASE WHEN delay.delayDecision = 0 THEN delayID END AS eie , DATEDIFF(delay.delayStopped,delay.delayCreated) AS diff1 , DATEDIFF(delay.delayBookADD,delay.delayBookEDD) AS diff2 FROM delay

if you declare the above as a view, it will make any subsequent queries easier to write and understand

anyhow, can you please talk some more about what you want from the history table?

@r937… “but not the SUMs with those wacky IFs”… It took me hours to craft those wakcy IFs… :blush:

From the history table I only want three result colunms > Where the diff1 and diff2 represent the projected max days and costs, the history colums will/should return the actual days and the actual costs and the month but they must corespond to the same delayID’s… hopefully this little crseenshot will reflect what I’m looking for in the end result…
In the hisotry table there may be numerous entries for each delayID and some may reflect an update to actualD or actualC values. Where this happens I need to sum the values to give a total to each delayID. Have I gone off on a tangent or was this the type of information you were looking for?

you’re making my head hurt :slight_smile:

your query has GROUP BY month, yet the screenshot of the results you want shows individual delays

and nowhere do you explain how “projected max days and costs” (if these represent the diff1 and diff22 amounts) should match actual days and the actual costs in the history table

@r937… Sorry about that. If its any concelation - my head hurts too.
maxD and maxC came from the initial query I posted - those wacky IF’s (represented in your response query by diff1 and diff2).
Delay generation is pretty difficult to explain… When a delay is created it is given a start date and a projected end/stop date (Estimated End Delay Date “EDD”). The initial value of madD is also set to the EDD. However, if the delay exceeds the projected end delay date the maxD will continue to expand until such time as the delay finally stops. So when the delay finally stops we have a start date, Estimated End Delay Date and a the final end/stop Date.

Scenario: A hotel room cannot be booked because it is being referbished so a delay is generated. If the delay contiues past the Estimated End Delay Date, This extnded delay may be down to poor hotel/project management or lazy decorators but the delay continues to grow until such time as the refurbishment is complete ( lets say 30 days atfer the Estimated End Delay Date) so the maxD and maxC are then fixed values.
However, each week the hotel/project management sits down with the decorating/refurb company and decide who is to blame for the extended delay for that week. If the the fault of the first 7 days of extended delay is down to the hotel because they couldn’t find the key, the delay is updated accordingly in the delay table, reflecting that the hotel is responsible and a trigger ( on the delay table for after update) fires to insert a row into the history table. Now let’s say that the second 7 days of delay is down to the referb/decorating comapny, because they brought the wrong color paint. Again, the delay table is updated (same as above) this time showing that the refurb Co is to blame. Once all 30 days of extended delay have been accounted for, the hotel chain directors want to know how much the extended delay cost them and how much they can claim back from the poorly performing refeb company.
(NB: They may also want to know this count on a week by week basis before the extended delay has ended).

The history table holds all the update rows for the delay and thus, will reflect the number of days that the hotel is responsible for and the number of days the refurb company is responsible for. The actualD result is the SUM of the days that the refurb is responsible for so it’s highly unlikely that actualD and maxD would be the same (unless of course the refurb Co and hotel is totally incompitent).
So the key figures for the report will will be:- each delay (delayID), the projected max days (maxD) , projected max cost of delay (maxC) and, where the 3rd party was responsible, the actual days (actualD) and the Actual cost (actualC)…

This is a very basis sceanario but hopefully reflects the key elements. If you see my screenshot and look at delay 3 you’ll see that there was a toal of 63 extra days delay but it was agreed between the refurbCo and hotel that the refurbCo was only responsible for 40 of those days. This is all test data but hopefully make sense.

Apologies for any head pain hat you experience but mine’s right up there in Migrane territory with mySQl and wacky IF’s swimming round freely and unfettered.

I wanted to create a single dynamic query, so that I could group by month, week of delayID and if required, have a WHERE clause in there too so that individual hotels could also be queried… Again apologies for the confusion…

i appreciate the background on your business problem

i still have no clue how to query your history table, sorry

@r937 I’m really sorry but clear I’m too close to this… The query I wrote for the Hisotry table returns the data I want ie any delay which have been updated

SELECT MONTH(delayCreatedH)-1 AS mnth,
SUM(CASE WHEN delayReviewStatusH = 2 AND delayUpdateTypeH = 1 THEN delayReviewDaysH ELSE 0 END) AS actualD,
SUM(CASE WHEN delayReviewStatusH = 2 AND delayUpdateTypeH = 1 THEN delayReviewDaysH ELSE 0 END) * 120 as actualC

FROM delayhistory
    GROUP BY MONTH(delayCreatedH)-1

And, if I want to, I can GROUP BY WEEK or BY delayID and the query totals everything up nicely. The same can be said for the first query I posted.

Where I struggle is putting the delay table query and the delayhisotry query together to form a single query. However, I have made some advances with this

SELECT
   *
FROM
    (
        SELECT 
            delayIDH as dIDH, MONTH(delayCreatedH)-1 AS mnth,
            SUM(CASE WHEN delayReviewStatusH = 2 AND delayUpdateTypeH = 1 THEN delayReviewDaysH ELSE 0 END) AS ActualD,
            SUM(CASE WHEN delayReviewStatusH = 2 AND delayUpdateTypeH = 1 THEN delayReviewDaysH ELSE 0 END) * 120 as ActualC

        FROM delayhistory
        GROUP BY MONTH(delayCreatedH)-1
    ) q1
INNER JOIN
    (
        SELECT 
            delayID AS dID, MONTH(delayCreated)-1 AS mnth,
            COUNT(CASE WHEN delayDecision >=0  OR delayDecision IS NULL THEN delayID end) AS toteD,
            COUNT(CASE WHEN delayDecision >=1 THEN delayID end) AS cfrmd,
            COUNT(CASE WHEN delayDecision IS NULL THEN delayID end) AS ucfrmd,
            COUNT(CASE WHEN delayDecision = 0 THEN delayID end) AS eie,

        SUM(
            IF(DATEDIFF(delay.delayStopped, delay.delayCreated) IS NULL, 
               DATEDIFF(CURDATE(),delayCreated),
            IF(DATEDIFF(delay.delayBookADD,delay.delayBookEDD) > DATEDIFF(delay.delayStopped, delay.delayCreated) 
            OR DATEDIFF(delay.delayBookADD,delay.delayBookEDD) IS NULL, 
               DATEDIFF(delay.delayStopped, delay.delayBookEDD),DATEDIFF(delay.delayBookADD,delay.delayBookEDD)))) 
            AS maxD,
            
        SUM(
            IF(DATEDIFF(delay.delayStopped, delay.delayCreated) IS NULL, 
               DATEDIFF(CURDATE(),delayCreated),
            IF(DATEDIFF(delay.delayBookADD,delay.delayBookEDD) > DATEDIFF(delay.delayStopped, delay.delayCreated) 
            OR DATEDIFF(delay.delayBookADD,delay.delayBookEDD) IS NULL, 
               DATEDIFF(delay.delayStopped, delay.delayBookEDD),DATEDIFF(delay.delayBookADD,delay.delayBookEDD)))) * 120 
            AS maxC
        FROM delay
        GROUP BY MONTH(delayCreated)-1
    ) q2 ON
        q1.dIDH = q2.dID

Unfortunately, it doesn’t pick up delays in the delay table that have no updates in the delayHistory table.
EDIT it does pick up other things either so its back to the drawing board… again
Hoping this this is not adding fuel or confusion to the query fire, I look forward to your insight.

@r937, yippy ya hooooo,
Looks like someone had a similar issue as me in 2013 and the answer given fit like a glove… For anyone who is interested here’s the link
http://stackoverflow.com/questions/18148188/query-sum-for-two-fields-in-two-different-tables

Apprecaite all your help and responses r937 and I’m really sorry I couldn’t articulate myself as clearly as this guy did in 2013… The learning curve is still steep

whew, well, i’m relieved you’re sorted :slight_smile:

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.