Need help with a query

Hi there!

I’m building an event-related application where I have user defined “streams” of events. When a new event is added to the system, I need to find the streams to which the event should be added.

The event takes place at a venue (which belongs to a city), belongs to a category, is organized by an organization and is tagged with certain tags (among other things).

The streams have filters. So I can have a stream that only wants events from a particular city (without caring which venue it is at, who the organizer is, etc). I can have another stream where I only want events by a certain organizer. Or I can have a stream where I only want events from, say, 3 categories in a certain city.

A filter can be set to “any” (represented as 0) or it can be one or more id’s of the thing it’s filtering. To be flexible, I created a “filters” table with the following columns:

id | stream_id | filter | value

Say I only have 4 filters which are saved as such (0 means any; multiple values for the same filter are saved as separate records - see row 16 and 17):

[table=“width: 500, class: grid”]
[tr]
[td]id[/td]
[td]stream_id[/td]
[td]filter[/td]
[td]value[/td]
[/tr]
[tr]
[td]1[/td]
[td]1[/td]
[td]city[/td]
[td]1[/td]
[/tr]
[tr]
[td]2[/td]
[td]1[/td]
[td]venue[/td]
[td]0[/td]
[/tr]
[tr]
[td]3[/td]
[td]1[/td]
[td]org[/td]
[td]2[/td]
[/tr]
[tr]
[td]4[/td]
[td]1[/td]
[td]tag[/td]
[td]0[/td]
[/tr]
[tr]
[td]5[/td]
[td]2[/td]
[td]city[/td]
[td]1[/td]
[/tr]
[tr]
[td]6[/td]
[td]2[/td]
[td]venue[/td]
[td]0[/td]
[/tr]
[tr]
[td]7[/td]
[td]2[/td]
[td]org[/td]
[td]0[/td]
[/tr]
[tr]
[td]8[/td]
[td]2[/td]
[td]tag[/td]
[td]1[/td]
[/tr]
[tr]
[td]9[/td]
[td]3[/td]
[td]city[/td]
[td]1[/td]
[/tr]
[tr]
[td]10[/td]
[td]3[/td]
[td]venue[/td]
[td]3[/td]
[/tr]
[tr]
[td]11[/td]
[td]3[/td]
[td]org[/td]
[td]0[/td]
[/tr]
[tr]
[td]12[/td]
[td]3[/td]
[td]org[/td]
[td]0[/td]
[/tr]
[tr]
[td]13[/td]
[td]4[/td]
[td]city[/td]
[td]1[/td]
[/tr]
[tr]
[td]14[/td]
[td]4[/td]
[td]venue[/td]
[td]3[/td]
[/tr]
[tr]
[td]15[/td]
[td]4[/td]
[td]org[/td]
[td]0[/td]
[/tr]
[tr]
[td]16[/td]
[td]4[/td]
[td]tag[/td]
[td]2[/td]
[/tr]
[tr]
[td]17[/td]
[td]4[/td]
[td]tag[/td]
[td]3[/td]
[/tr]
[/table]

I need a query that would return stream_id’s for streams where the event should be added.

Say I have an event added that is in:
city => 1
venue => 3
organizer =>2
tag => 1

It should return stream_id: 1,2,3.

I have not been able to figure out how to apply the “where_clauses” to records sharing the same stream_id.

I’m open to changing how the data is represented as well if that will make the query easier.

Thank you very much!

V

SELECT stream_id
  FROM filters
 WHERE filter = 'city'  AND value IN ( 0 , 1 )
    OR filter = 'venue' AND value IN ( 0 , 3 )
    OR filter = 'org'   AND value IN ( 0 , 2 )
    OR filter = 'tag'   AND value IN ( 0 , 1 )
GROUP
    BY stream_id
HAVING COUNT(*) = 4 -- number of filters matched

Thank you, r937. I will try and report back. :slight_smile:

r937 - seems to work perfectly. Not that I am very surprised given your credentials. lol

Thank you kindly!

I have a follow-up question. The query seems to work, but not in all cases. The situation where it does not work is when the new event is tagged with more than one matching ‘tag’. Since the IN array is dynamically generated based on the new event details, if this is the generated query:


SELECT stream_id
  FROM filters
 WHERE filter = 'city'  AND value IN ( 0 , 1 )
    OR filter = 'venue' AND value IN ( 0 , 3 )
    OR filter = 'org'   AND value IN ( 0 , 2 )
    OR filter = 'tag'   AND value IN ( 0 , 1, 2 )
GROUP
    BY stream_id
HAVING COUNT(*) = 4 -- number of filters matched

Stream that has filters defined as:

city = 1
venue = 3
org = 2
tag = 1,2

will not be in the results (since COUNT(*) will be 5).

And filter defined as:

city = 1
venue = 3
org = 3
tag = 1,2

will match, even though ‘org’ is different, but 2 matching 'tag’s will pass it.

So, how do I adjust the query so that a stream_id is returned only if each filter matches at least once?

Thank you!

would the query make more sense if it were written like this –

SELECT stream_id
  FROM filters
 WHERE filter = 'city'  AND value IN ( 0 , 1 )
    OR filter = 'venue' AND value IN ( 0 , 3 )
    OR filter = 'org'   AND value IN ( 0 , 2 )
    [COLOR="#FF0000"]OR filter = 'tag'   AND value IN ( 0 , 1 )
    OR filter = 'tag'   AND value IN ( 0 , 2 )[/COLOR]
GROUP
    BY stream_id
HAVING COUNT(*) = [COLOR="red"][B]5[/B][/COLOR] -- number of filters matched

Hmm, I don’t think that will work:

If I have a stream with filters:

city = 1
venue = 3
org = 2
tag = 1

And an event added that generates the modified query (city = 1, venue = 3, org = 2, tag = array(1,2)), it won’t match since it will have COUNT(*) = 4.

you cannot put “array(1,2)” into a query statement, it’s not valid sql

what you want is to count the number of different value tag/value pairs

so in that case it should be 5

How about moving the checks to the HAVING?


SELECT stream_id
FROM filters
GROUP
    BY stream_id
HAVING
    MAX(CASE WHEN filter = 'city'  AND value IN ( 0 , 1 ) THEN 1
             ELSE 0
        END) = 1
AND MAX(CASE WHEN filter = 'venue' AND value IN ( 0 , 3 ) THEN 1
             ELSE 0
        END) = 1
...

guido2004: I’ll try that one

Sorry for being confusing - I’m not putting array(1,2) into the SQL…that was my way of saying that the event was tagged with 2 tags having IDs 1 and 2.

Is there a performance issue that you are using this method rather than just using AND for all 4 criteria?

If using AND instead of OR, I think you won’t get any matches because no single record will have a filter ‘city’ AND filter ‘venue’ AND etc

This seems to work. I’ll keep testing and see if there are use cases where it fails. Thanks!!