There are a few basic operations which SQL will perform when looking for the data that you need. Here they are listed in the order of worst to best.
- Table Scan
- Clustered Index Scan
- Index Scan
- Clustered Index Seek
- Index Seek
The basic rule to follow is Scans are bad, Seeks are good.
When SQL Server does a scan it loads the object which it wants to read from disk into memory, then reads through that object from top to bottom looking for the records that it needs.
When SQL Server does a seek it knows where in the index that the data is going to be, so it loads up the index from disk, goes directly to the part of the index that it needs and reads to where the data that it needs ends. This is obviously a must more efficient operation than a scan, as SQL already knows where the data is that it is looking for.
When SQL Server is looking for your data probably one of the largest things which will make SQL Server switch from a seek to a scan is when some of the columns are you looking for are not included in the index you want it to use. Most often this will have SQL Server fall back to doing a clustered index scan, since the Clustered index contains all the columns in the table. This is one of the biggest reasons (in my opinion at least) that we now have the ability to INCUDE columns in an index, without adding those columns to the indexed columns of the index. By including the additional columns in the index we increase the size of the index, but we allow SQL Server to read the index, without having to go back to the clustered index, or to the table it self to get these values.
We’ll look at this more shortly when we look at execution plans.
Quick question….how is a Clustered Index Scan better than an Index Scan?
Isn’t CI Scan essentially a table scan?
Your right, I somehow managed to get those backwards.
Thanks for pointing that out. I’ll get them in the correct order in just a second.
Wouldn’t a table scan better after a certain amount of data is required from the table?
It is possible that it would, but not likely. The reason is that the table has other columns which you may or may not need in your query. If there are a lot of other columns in your query the additional IO required to pull all those pages from disk, and then filtering them would almost always be slower then pulling a large number of records from an index.
Especially when you consider than an Index seek doesn’t need to load all the records of the index from disk. It only needs to load the needed pages from disk (based on the statistics) to get the information required.
“It is possible that it would, but not likely. The reason is that the table has other columns which you may or may not need in your query.”
Denny, thanx for responding. I want to learn this.
If the only COLUMNs being pulled are those on the INDEX, an INDEX anything would be faster because there’s less to read. But if other COLUMNs are used, it would have to hit the TABLE anyway. So, why not just go to the TABLE directly?
Isn’t it simple logic/math?
IF time-to-read-index + time-to-fetch-table > time-to-read-entire-table THEN go directly to the TABLE; ELSE use the INDEX;
Please expound on your comment. I have not got the point yet.
You wouldn’t want to hit the table directly because the columns in the table probably aren’t sorted in the correct order. Without knowing that the columns are in the correct order the SQL Server will go to the index which is presorted and quicker to search.
OK, sorting, interesting. Is there then an equivalent equation to
IF time-to-read-index + time-to-fetch-table > time-to-read-entire-table + time-to-sort THEN go directly to the TABLE; ELSE use the INDEX;
Or is it just use the INDEX unless the entire TABLE is requested?
Sort of. If the SQL Server decides that the most efficient method to access the data is a full table scan it will do so. If there’s no filter on the index, it’ll figure out which index to use to sort the index. If no index matches it’ll use the clustered index. If there’s no clustered index then it’ll use a table scan.
Ah, but reading the INDEX is always preferred because of the ordering, if that is required?
If you have an ORDER BY on your query then yes. If you don’t have an ORDER BY, then the SQL Server will probably sort by the clustered index and use it for filtering. If there is no clustered index it would then fall back to using the table itself and return the data in the order it comes across it.
Thanx for the explanation. That was very helpful.