Selecting one through many in a many-to-many relationship

The situations is this (somewhat simplified):

I have a MySQL database that contains data about photographs. One table (photos) contains basic data such as year, place etc. Then I have another table (names) that contains names of people in the pictures. And finally there’s a table (pic_name) that connects pictures with people, linking photoid with nameid corresponding to photos.id and names.id respectively.

Let’s say that the picture with id 1 (photos.id = 1) is a photo of three people called John, Lisa and Esmeralda having the IDs 1, 2 and 3 in the names table (as well as the names encoded in the first_name column). Then pic_name has three rows connecting photoid = 1 with nameid = 1, 2 and 3.

Now to my question: How do I construct a query that retrieves some specific info from the photos table, for example place or year, for all photos that contain both John and Lisa (not just either one of them)? That is, how do I get a list of where or when all my pictures containing both John and Lisa (and possibly others such as Esmeralda, but that’s irrelevant) were shot?

Intuitively, it seems like a fairly simple question, but I can’t for my life come up with a query that produces the desired result. Any help would be highly appreciated.

SELECT photos.id
     , photos.place
     , photos.year
  FROM ( SELECT pic_name.photoid
           FROM names
         INNER
           JOIN pic_name
             ON pic_name.nameid = names.id
          WHERE names.first_name IN ( 'John' , 'Lisa' )
         GROUP
             BY pic_name.photoid
         HAVING COUNT(*) = 2
       ) AS these
INNER
  JOIN photos
    ON photos.id = these.photoid

Thanks a lot. This seems like a very good start with fast response time and everything. There’s only one slight problem: If I have a picture with two persons called John and no Lisa (or vice versa), I get that as well. If you can fix that too I’ll definitely buy your book … Or maybe I’ll do it anyway … :slight_smile:

let me turn it around and ask you to explain how your application interface will decide which people you want to base the query on… in other words, sticking with your original example, where did John and Lisa come from? how were they chosen?

It’ll be a public archive where anyone should be able to search for pictures based on various criteria. There already is a database that I have exported parts of in order to migrate to MySQL. The original database uses 4D as its DBMS and the querys work ok, but the whole database is a complete mess structure-wise and there are some other problems with it which is why I’m trying to clean it up and switch to MySQL. I have no access to the server code of the original database, so I have no idea what’s going on there. What I can do is to export its contents to an XML file and take it from there in whatever direction I choose to. There will be a web form identical to the original (see link below) and then I’ll process the querystring through PHP and send the query to the database from there.

Here’s the web site as it is today: http://bildarkivet.orsa.se/
John and Lisa wasn’t a very good example, but you get at least two hits where the Lisas are represented by Anna-Lisa and Anna Lisa respectively. It’s all in Swedish, but the top right field in the middle column (“förnamn”) represents “first name”. So if you enter “John Lisa” there (without the quotes) you get two hits. (And if you search for “John,Lisa” instead, you get pictures with either John or Lisa.) Similarly, if you write “ko gris” (= “cow pig”) in the “Nyckelord” field in the left column (= key words) you get three pictures with both a cow and a pig.

I hope this explains the context. So far I’ve managed to build all the queries I need through PHP except the one I’m asking about here. (Sometime in the future I hope I’ll be able to transform the name mess into a decent persons table, but that’ll take quite a lot of work.)

okay, if you are going to allow a search based only on firstnames, then pretty much the only thiing you can do is change HAVING COUNT(*) = N (where N is the number of names entered) to HAVING COUNT(DISTINCT names.first_name) = N

Yes, I don’t want to remove any of the present functionality although people probably won’t search for two first names particularly often. But for keywords this kind of query will probably be more useful. Anyway, thanks a lot again! This made my day (week, month …) :slight_smile:

There’s still a lot to learn …

I have this query:

SELECT photos.id
FROM (SELECT pic_name.photoid
           FROM names
           INNER JOIN pic_name
           ON pic_name.nameid = names.id
           WHERE names.first_name IN ( 'John' , 'Lisa' )
           GROUP BY pic_name.photoid
           HAVING COUNT(DISTINCT names.first_name) = 2
        ) AS these
INNER JOIN photos
ON photos.id = these.photoid

Now I want a negative filter on the above selection. For example, I might want all pictures with John and Lisa except those with Eric in them as well. Or perhaps all pictures of John and Lisa except the one with photo.id = 1. I had assumed that it would be possible to construct a query selecting what I don’t want and then just add it with a “WHERE NOT EXISTS”, like so:

WHERE NOT EXISTS (
    SELECT id
    FROM photos
    WHERE id = 1)

but that won’t return any results whatsoever. Where did my mind wander astray? :slight_smile:

in general you need either to rewrite the HAVING clause to suit each circumstance…

  FROM ( SELECT pic_name.photoid
           FROM names
         INNER
           JOIN pic_name
             ON pic_name.nameid = names.id
          WHERE names.first_name IN ( 'John' , 'Lisa' ,[COLOR="#FF0000"] 'Eric' [/COLOR])
         GROUP
             BY pic_name.photoid
         HAVING COUNT(DISTINCT names.first_name) = 2
            AND [COLOR="#FF0000"][/COLOR][COLOR="#FF0000"]COUNT(CASE WHEN names.first_name = 'Eric'
                           THEN 'uh oh'
                           ELSE NULL END) = 0 [/COLOR]
        ) AS these

… or add WHERE conditions to the outer query…

INNER
  JOIN photos
    ON photos.id = these.photoid
 WHERE photos.id <> 1

Thanks. At least I did put my erroneous WHERE condition in the right place. :slight_smile:

The first solution seems easily expandable also to cases when pics with more than one person should be excluded, e.g.:


AND COUNT(CASE WHEN names.first_name IN ('eric', 'zoe')
    THEN 'uh oh'
    ELSE NULL END) = 0 

And it should be easy enough to construct by a PHP script.

But I also tried to write a query with a WHERE condiction to exclude Eric, but I can’t seem to come up with a working solution there either. For example

INNER JOIN pic_name as pn
ON pn.photoid = photos.id
INNER JOIN names as n
ON n.id = pn.nameid
WHERE n.first_name <> 'eric'
GROUP BY photos.id

won’t give me what I want.

Maybe I should find myself another hobby .:mad:

Could anyone tell me if I am correct in assuming that the added WHERE condition with a photo id works because the id is a unique, single, non-NULL value, whereas first names are not? I.e. “WHERE n.first_name <> ‘eric’” will retrieve rows as long as there are non-Erics in the image as well? Understanding why it doesn’t work could be a step towards understanding what will work instead and why. :slight_smile:

Ah, well. I’ll go with the suggested CASE solution anyway. I came up with the following alternative solution, which seems to be working as it should, although it’s a bit slower. But it’ll do at my present stage of learning.

WHERE NOT photos.id
IN (
    SELECT photoid
    FROM pic_name AS pn
    INNER JOIN names AS n
    ON n.id = pn.nameid
    WHERE n.first_name = 'eric'
    )

WHERE NOT photos.id
IN (
SELECT photoid
FROM pic_name AS pn
INNER JOIN names AS n
ON n.id = pn.nameid
WHERE n.first_name = ‘eric’
)
is not correct
WHERE photos.id NOT IN
(
SELECT photoid
FROM pic_name AS pn
INNER JOIN names AS n
ON n.id = pn.nameid
WHERE n.first_name = ‘eric’
)
is correct sentence

yes, it is

did you test it? i did :slight_smile:

Next problem: LIKE.

If I simplify the inner query in the above example to this

SELECT x
FROM y
WHERE z IN ('a','b','c')
GROUP BY x.id
HAVING COUNT(DISTINCT z) = 3

I could rewrite it as

SELECT x
FROM y
WHERE z = 'a' OR z = 'b' OR z = 'c'
GROUP BY x.id
HAVING COUNT(DISTINCT z) = 3

and then introduce a LIKE like this:

SELECT x
FROM y
WHERE z = 'a' OR z = 'b' OR z LIKE 'c%'
GROUP BY x.id
HAVING COUNT(DISTINCT z) = 3

But that won’t give me what I want (a, b and c), since it will also retrieve b, cd and ce; cd, ce and cf; etc.

So I tried

SELECT x
FROM y
WHERE z = 'a' OR z = 'b' OR z LIKE 'c%'
GROUP BY x.id
HAVING COUNT(DISTINCT z) = 3
AND COUNT(z NOT LIKE 'c%') = 2

which doesn’t work either.

Nor does

SELECT x
FROM y
WHERE z = 'a' OR z = 'b' OR z LIKE 'c%'
GROUP BY x.id
HAVING COUNT(DISTINCT z) = 3
AND COUNT(z = 'a') = 1
AND COUNT(z = 'b') = 1

Any suggestions what I should try next?

then you need count, separately, the different results

HAVING MAX(CASE WHEN z='a' THEN 1 ELSE NULL END) +
       MAX(CASE WHEN z='b' THEN 1 ELSE NULL END) +
       MAX(CASE WHEN z LIKE 'c%' THEN 1 ELSE NULL END) = 3

Mr. L. to the rescue … Thanks. :slight_smile:

Seeing that solution I spontaneously imagined that this would work as well:

HAVING COUNT(CASE WHEN z='a' THEN 1 ELSE NULL END) +
       COUNT(CASE WHEN z='b' THEN 1 ELSE NULL END) +
       COUNT(CASE WHEN z LIKE 'c%' THEN 1 ELSE NULL END) = 3

But obviously it doesn’t. Does anyone see why I don’t see why it doesn’t work? :slight_smile:

Edit: Or rather: I see why this solution doesn’t work but I’m not sure why the MAX solution works. :confused:

because even if there are seventeen ‘a’ rows, MAX will ensure that only one 1 is included in the count

Sweet. :tup:

And I suppose that the negative version from earlier in the thread takes care of itself automatically by excluding all possible variations of c% leaving only ‘a’ and ‘b’ to add up to the distinct 2? Or am I overlooking something?

SELECT x
FROM y
WHERE z = 'a' OR z = 'b' OR z LIKE 'c%'
GROUP BY x.id
HAVING COUNT(DISTINCT z) = 2
AND COUNT(
    CASE WHEN z LIKE 'c%'
    THEN ''
    ELSE NULL END
    ) = 0

that works :slight_smile: