It seems that ordering within an IN clause does not work.
select user_id
from table1
where user_id in (select user_id from table2 order by created_date desc) #desc so that I get the newest entries only
limit 25
Running this query will always give me the same results. It seems that the user_id is not being sorted correctly.
I have been googling and it seems that the field() function may work for me.
But based on the documentation and examples I’ve seen so far, it seems that I have to run the select sub query twice? I’m not sure if I’m understanding it correctly.
select user_id
from table1
where user_id in (select user_id from table2)
order by field(user_id, select user_id from table2) #I need desc here
limit 25
But I’m afraid this would also be slow (I’m moving the query from a join to IN instead) because the indexed column would be inside the function. :-/
I may result in using a stored procedure in order to sort the query and then pass the result set within the IN clause.
r937 that was the original query I had. Unfortunately it takes about 3 seconds for the whole operation. There are over 100k rows in each table. That is why I was avoiding a join to begin with.
I actually got better performance out of using the IN clause. I brought it down from 3 seconds to ~ .01 seconds. The only issue I had was getting the sorting right.
All I need is the user id and both columns are indexed.
I think the problem my be how mysql was compiled. It seem’s that the cache is not working right neither. Running the same query multiple times from my machine works fine (although slow, the second+ times are much faster due to caching) and terrible on production.
EDIT:
Oops, it seems that I made a mistake when posting the query. I actually need their profile_link column as it contains important information.
It should be like this:
SELECT t1.profile_link
FROM table t1
WHERE t1.user_id IN (SELECT t2.user_id FROM table2 t2 ORDER BY t2.created_date desc) #desc so that I get the newest entries only
LIMIT 25
table2 contains data that is inserted all the time and I would like the newest entries only.
Because you can’t put a limit on the sub query using the IN clause.
I also forgot to mention that whatever the limit is, it’s also the maximum amount of IDs within the IN clause.
I may have to solve this programmatically if I can’t find a valid solution. Maybe query it through the application code and pass the results into this query. Or somewhere around those lines.
I’m nearly home and will be testing out the same tables but under inno db instead (currently running under myisam).
I just joined 500,000 from each table and it ran extremely fast (at home). I think the slow speeds are due to the way mysql was configured. I’ll have to do more research on the production server.
Do you guys have any advice on any flags I should be looking for? I’m going to be checking the cnf file and see if there is anything there weird.
SELECT table1.user_id
FROM table1
INNER JOIN table2
ON table2.user_id = table1.user_id
GROUP BY table2.user_id
ORDER
BY table1.created_date DESC LIMIT 25
Looks like when I issue a group by it creates a temporary table. I need to filter out duplicates from this query. :-/
please note, whenever you use a GROUP BY clause, then the ORDER BY clause can name only columns that are in the SELECT clause
you’re trying to sort by a column that isn’t included in the SELECT clause
not only is this problematic (you can’t sort by a column that isn’t included in the rows being sorted), but it also doesn’t make sense, since if you don’t display it, you can’t really discern the sequence of the results properly
what’s your actual query, i.e. with real not fake column and table names?
you can’t sort by created_date because it isn’t in the SELECT clause
and the GROUP BY clause makes no sense to me at all, you will get unique profile_link values for each separate user, but no way to know which user each profile_link belongs to
in the news table the same user can insert new rows. In fact, new rows are inserted by the same user around 5 times a day on a average. If the latest rows are by the same user I would like to avoid repeating the same user multiple times.