CREATE FUNCTION vs CALL PROCEDURE

Smola · March 13, 2010, 2:29am

I guess I mean either/or. As being somewhere between novice and intermediate, I have yet to delve into creating temporary tables of any kind. At first thought, it would seem that creating a table, populating/updating it, then querying it would take longer than a subquery within a query. But I have a feeling that that isn’t the case.

Also, I’m a bit unclear on best practices for indexes. I know they work well with columns that are searched often, but if I search a.addressid, a.businessid, and a.zip and/or use them in WHERE clauses or JOINS, is it better to create individual indexes on each or one index on all? Should I not consider fields in my SELECT for indexes? So if I want to see all of the above fields in a query but only use one in a condition, should I only pay attention to the one in the condition? For example:


SELECT a.addressid, a.businessid, a.zip, p.title
  FROM addresses a
INNER
  JOIN perks p
    ON p.businessid = a.businessid

Would it be most effecient for the above query if I have an index solely on p.businessid and a.businessid? Or one index on all fields mentioned in the select as well as a single index on p.businessid?

Also, does it become a bad idea to create an index for a certain number of columns? So if a table has 16 columns is it a bad idea to create an index on 8 of them if they are all frequently searched at some time or another in different clusters? i.e. on one page I need 3 of the 8 in the index but on another page I need 5 of the 8.

I must say again that I appreciate the time you are spending with me. I am gaining quite a bit of knowledge from you!

r937 · March 13, 2010, 3:21am

only one index will help this query, and it will be one of the primary key indexes, on one of the ids

why? because there is no WHERE clause, all rows from both tables will be involved in the join

so the optimizer will table scan one of the tables, and use the index on the other’s id to match those rows

obviously, it’ll choose to scan the table with the fewer rows first

one of the things you will want to get familiar with is the EXPLAIN

put the word EXPLAIN in front of your SELECT and see if you can figure out what the optimizer is doing by studying the output

try it for several different queries

Smola · March 13, 2010, 3:58am

It’s funny you mention the EXPLAIN statement because I started using it right when I started this thread! That’s how I figured out the type of joins I was doing. It has been very helpful so far.

I’m happy to say that I think I have solved my query woes thanks to you along with a couple other webpages. It’s my fault for not including all information so I must apologize for that ahead of time. The zipcode table my client purchased has 16 columns that include MUCH about the zipcodes it holds. In fact, the thing is over 80000 rows large. Much of the bloat is because there are some repeating zipcodes due to a different city alias or city name or something else about that location. So 1.) I don’t need those extra rows because the lats and longs are all the same with the repeated zips and 2.) I don’t ever use anything other than zips, lats and longs so the other 13 columns are a bit moot. So I went ahead and created another table with just the distinct zip/lat/long columns and its just over 42000 rows. Then I created an index on all 3 of the columns because I am always querying them in conjunction with each other, changed my query with the new table name and field names, re-ran the query and it went from .4620 seconds down to .0017 seconds!!! WOO!

I re-ran the EXPLAIN right before and I went from


1  	PRIMARY  	<derived2>  	ALL  	NULL  	NULL  	NULL  	NULL  	8  	Using temporary; Using filesort
1 	PRIMARY 	p 	ALL 	NULL 	NULL 	NULL 	NULL 	12 	Using where
1 	PRIMARY 	<derived4> 	ALL 	NULL 	NULL 	NULL 	NULL 	16 	Using where
1 	PRIMARY 	a 	eq_ref 	PRIMARY,aid_bid_zip_srch 	PRIMARY 	2 	d.addressid 	1 	 
4 	DERIVED 	a 	index 	NULL 	aid_bid_zip_srch 	19 	NULL 	11 	Using index
4 	DERIVED 	z 	index 	NULL 	zip_lat_long 	19 	NULL 	80171 	Using where; Using index
2 	DERIVED 	<derived3> 	ALL 	NULL 	NULL 	NULL 	NULL 	16 	Using temporary; Using filesort
3 	DERIVED 	a 	index 	NULL 	aid_bid_zip_srch 	19 	NULL 	11 	Using index
3 	DERIVED 	z 	index 	NULL 	zip_lat_long 	19 	NULL 	80171 	Using where; Using index

to


1  	PRIMARY  	<derived2>  	ALL  	NULL  	NULL  	NULL  	NULL  	8  	Using temporary; Using filesort
1 	PRIMARY 	<derived4> 	ALL 	NULL 	NULL 	NULL 	NULL 	11 	Using where
1 	PRIMARY 	a 	eq_ref 	PRIMARY,aid_bid_zip_srch 	PRIMARY 	2 	d.addressid 	1 	 
1 	PRIMARY 	p 	ALL 	NULL 	NULL 	NULL 	NULL 	12 	Using where
4 	DERIVED 	a 	index 	NULL 	aid_bid_zip_srch 	19 	NULL 	11 	Using index
4 	DERIVED 	z 	ref 	ziplatlong 	ziplatlong 	15 	p4wdb.a.zip 	1 	Using index
2 	DERIVED 	<derived3> 	ALL 	NULL 	NULL 	NULL 	NULL 	11 	Using temporary; Using filesort
3 	DERIVED 	a 	index 	NULL 	aid_bid_zip_srch 	19 	NULL 	11 	Using index
3 	DERIVED 	z 	ref 	ziplatlong 	ziplatlong 	15 	p4wdb.a.zip 	1 	Using index

Again, I must thank you for all your time and the original query you provided. This has definitely been an enlightening experience for me!

There’s only one thing that is bothering me. I have an index on multiple columns in the perks table, including “businessid”, “perkid”, and “title.” Yet no index is being used and the type is ALL for table p in both EXPLAINs. I can’t figure it out. The index I created in the perks table was on 8 columns including the ones I mentioned above. What am I missing? Oh, and besides the 8-column index I have a PRIMARY KEY on “perkid” and a FULLTEXT index on “title” and “description”.

r937 · March 13, 2010, 4:37am

sorry, i’m having a little trouble associating your EXPLAINs with your query

but i think what’s happening is that because of the GROUP BY perks.id, and an absence of WHERE conditions on that table, the optimizer figured it’s gonna need all perks rows, so it does a table scan

any time the optimizer figures that it will need more than x% of a table (where x is some number like 60 or 80, i dunno), it calculates that the table scan is cheaper than going through the index

jondolar · March 30, 2010, 11:00pm

I know my response is a little late but I think you are approaching this incorrectly.

Most sites that search within a certain radius. The approach is that you look at the starting zip code and create a lat/long box based on the radius required (50 miles, 25 miles, etc). Then, just pull the records that fall within the lat/long of the box. This requires only one record to have complex trig functions, uses only indexes to find the records and only returns relevant records.