I guess I mean either/or. As being somewhere between novice and intermediate, I have yet to delve into creating temporary tables of any kind. At first thought, it would seem that creating a table, populating/updating it, then querying it would take longer than a subquery within a query. But I have a feeling that that isn’t the case.
Also, I’m a bit unclear on best practices for indexes. I know they work well with columns that are searched often, but if I search a.addressid, a.businessid, and a.zip and/or use them in WHERE clauses or JOINS, is it better to create individual indexes on each or one index on all? Should I not consider fields in my SELECT for indexes? So if I want to see all of the above fields in a query but only use one in a condition, should I only pay attention to the one in the condition? For example:
SELECT a.addressid, a.businessid, a.zip, p.title
FROM addresses a
INNER
JOIN perks p
ON p.businessid = a.businessid
Would it be most effecient for the above query if I have an index solely on p.businessid and a.businessid? Or one index on all fields mentioned in the select as well as a single index on p.businessid?
Also, does it become a bad idea to create an index for a certain number of columns? So if a table has 16 columns is it a bad idea to create an index on 8 of them if they are all frequently searched at some time or another in different clusters? i.e. on one page I need 3 of the 8 in the index but on another page I need 5 of the 8.
I must say again that I appreciate the time you are spending with me. I am gaining quite a bit of knowledge from you!
It’s funny you mention the EXPLAIN statement because I started using it right when I started this thread! That’s how I figured out the type of joins I was doing. It has been very helpful so far.
I’m happy to say that I think I have solved my query woes thanks to you along with a couple other webpages. It’s my fault for not including all information so I must apologize for that ahead of time. The zipcode table my client purchased has 16 columns that include MUCH about the zipcodes it holds. In fact, the thing is over 80000 rows large. Much of the bloat is because there are some repeating zipcodes due to a different city alias or city name or something else about that location. So 1.) I don’t need those extra rows because the lats and longs are all the same with the repeated zips and 2.) I don’t ever use anything other than zips, lats and longs so the other 13 columns are a bit moot. So I went ahead and created another table with just the distinct zip/lat/long columns and its just over 42000 rows. Then I created an index on all 3 of the columns because I am always querying them in conjunction with each other, changed my query with the new table name and field names, re-ran the query and it went from .4620 seconds down to .0017 seconds!!! WOO!
I re-ran the EXPLAIN right before and I went from
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 8 Using temporary; Using filesort
1 PRIMARY p ALL NULL NULL NULL NULL 12 Using where
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 16 Using where
1 PRIMARY a eq_ref PRIMARY,aid_bid_zip_srch PRIMARY 2 d.addressid 1
4 DERIVED a index NULL aid_bid_zip_srch 19 NULL 11 Using index
4 DERIVED z index NULL zip_lat_long 19 NULL 80171 Using where; Using index
2 DERIVED <derived3> ALL NULL NULL NULL NULL 16 Using temporary; Using filesort
3 DERIVED a index NULL aid_bid_zip_srch 19 NULL 11 Using index
3 DERIVED z index NULL zip_lat_long 19 NULL 80171 Using where; Using index
to
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 8 Using temporary; Using filesort
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 11 Using where
1 PRIMARY a eq_ref PRIMARY,aid_bid_zip_srch PRIMARY 2 d.addressid 1
1 PRIMARY p ALL NULL NULL NULL NULL 12 Using where
4 DERIVED a index NULL aid_bid_zip_srch 19 NULL 11 Using index
4 DERIVED z ref ziplatlong ziplatlong 15 p4wdb.a.zip 1 Using index
2 DERIVED <derived3> ALL NULL NULL NULL NULL 11 Using temporary; Using filesort
3 DERIVED a index NULL aid_bid_zip_srch 19 NULL 11 Using index
3 DERIVED z ref ziplatlong ziplatlong 15 p4wdb.a.zip 1 Using index
Again, I must thank you for all your time and the original query you provided. This has definitely been an enlightening experience for me!
There’s only one thing that is bothering me. I have an index on multiple columns in the perks table, including “businessid”, “perkid”, and “title.” Yet no index is being used and the type is ALL for table p in both EXPLAINs. I can’t figure it out. The index I created in the perks table was on 8 columns including the ones I mentioned above. What am I missing? Oh, and besides the 8-column index I have a PRIMARY KEY on “perkid” and a FULLTEXT index on “title” and “description”.
sorry, i’m having a little trouble associating your EXPLAINs with your query
but i think what’s happening is that because of the GROUP BY perks.id, and an absence of WHERE conditions on that table, the optimizer figured it’s gonna need all perks rows, so it does a table scan
any time the optimizer figures that it will need more than x% of a table (where x is some number like 60 or 80, i dunno), it calculates that the table scan is cheaper than going through the index
I know my response is a little late but I think you are approaching this incorrectly.
Most sites that search within a certain radius. The approach is that you look at the starting zip code and create a lat/long box based on the radius required (50 miles, 25 miles, etc). Then, just pull the records that fall within the lat/long of the box. This requires only one record to have complex trig functions, uses only indexes to find the records and only returns relevant records.