Today I am going to share something I know about indexing. I am sure you all must be using it somewhere or must have heard of it.
Let’s see one wonderful example of its usage today which will show how important it is to use indexes properly in Database. I would like to thank my client and my mentor Stephen Fegel of Galatea Systems for imparting this wonderful knowledge.
We all use “Select” for retrieving data from the tables. In most cases if data is not too much it takes minimal time to fetch records without indexing also. But in case of large data this may be a severe problem.
Lets take an example:
I have two tables one is houses which is my main table and it has 25,666 records.
And another table is house_desc which has 6,04,255 records.
And the query is
SELECT
SQL_NO_CACHE h.*,
(SELECT
SQL_NO_CACHE desc_value
FROM
house_desc AS d
WHERE
language_id = ‘EN’
AND
d.house_id = h.house_id
AND
(
desc_type = ‘description’
OR
desc_type = ‘introduction’
)
ORDER BY desc_type
LIMIT 0 , 1
)
AS
desc_value
FROM
houses AS h
WHERE
house_id
IN (’12345678′)
Now lets join both the tables and fire sub query for retrieving data from house_desc table.
Structure of the house_desc is
`c_code` varchar(50)
`house_id` int(10)
`country` char(2)
`language_id` enum(‘NL’,'EN’,'DE’,'FR’,'ES’,'IT’)
`desc_type` varchar(50)
`desc_value` longtext
So when I fire the query without indexing and I get following result
Showing rows 0 – 0 (1 total, Query took 13.0067 sec)
It returns only 1 row but still it took 13 secs!!
Now just see the magic of indexing here. Just set the index on house_id field
Fire the same query again and here is the result:
Showing rows 0 – 0 (1 total, Query took 0.0899 sec)
Output is same but just look at the difference in time to fetch records!
Before indexing it took 13.0067 seconds and after indexing it took 0.0899 seconds i.e. it is 1000 time faster than without indexing.
The result may be different in your case as in sometimes it can be 100 time faster or 10 time faster but one thing is for sure that it will be faster than without indexing (if done correctly!).
When you have large amount of records in your table and you want to search that table then it is better to set index on those fields which are there in where clause.
You can’t set index on all the fields from table. It is also very important thing on which filed you set the index.
You can read more on indexing from this
http://www.informit.com/articles/article.aspx?p=377652
Nice article,but we need to take care when create index on any field because
it’s affect many thing.
Really a nice article, the real case…awesome…
Good Article.
However, few more things to keep in mind.
Subquery in SELECT statement runs everytime when above query runs. It should be moved ideally to FROM clause in JOIN which will further improve performance.
In this example if you are retrieving 6,04,255 subquery will run that many times, if you move that to FROM clause it will run only 1 time.
Kind Regards,
Pinal
Article is really nice. Even in my current project I used indexes with the help from Mr. Pinal Dave. And believe me I got real performance improvement for slow running queries.
Pinal
You are right that using subqueries are not the preferred way to do things, at least most of the times. The popular believe is that subqueries are a bad thing and are only used by junior developers. However, subqueries can, if used right, be a wonderful thing and can be quicker than joins. I do agree with you that they should be used carefully.
In this case, we only use the query for paging purposes. So we fetch a maximum number of 20 records, so the subquery will also run a maximum of 20 times.
Since we only want one type of description from the descriptions table, a JOIN would probably be much slower. For the JOIN to work, we also need to GROUP BY and ORDER BY which will slow it down considerably.
Stephen,
You are correct but JOIN does not need GROUP BY and ORDER BY to work. It can just work fine without them.
In above case, ORDER BY will be running 20 times in subquery, which should be looked at carefully.
Ultimately, what runs faster should be the solution.
Kind Regards,
Pinal
I think we both agree, especially that you should always look carefully at the used queries.
In general a JOIN doesn’t need a GROUP BY/ORDER BY, however in this case, we only want one record from the description table and according to the numbers as stated above, there are, for an average house, 240 descriptions per house. That is why we need a GROUP BY and ORDER BY. I don’t see another way to do it.
A quick benchmark shows the subquery-version will run in 0.0055 seconds and the JOIN will run in 0.013 seconds. In this particular case, the subquery performs better (more than two times as fast) and thus is (imho) a better solution.
Main conclusion should be, that when speed is important (I think it always is!), try out different solutions and see what is fastest. Every project is different so there is no 1 way to do things.