tag:blogger.com,1999:blog-55536942207667036552024-03-14T00:18:36.886-07:00didrik @ MySQLCurrently I work on the optimizer team at MySQL.
Previously: Google, SUN, Telenor, Sintef, NTNU, ...didrikhttp://www.blogger.com/profile/05846571303485695354noreply@blogger.comBlogger1125tag:blogger.com,1999:blog-5553694220766703655.post-33570885748851227222011-04-12T01:44:00.000-07:002011-04-12T01:44:00.875-07:00Optimizing MySQL filesort with small limit.<div dir="ltr" style="text-align: left;" trbidi="on">"filesort" is the MySQL catchall algorithm for producing ordered results. (For an explanation on how it works, see Sergey Petrunias blog <a href="http://s.petrunia.net/blog/?p=24">http://s.petrunia.net/blog/?p=24</a>).<br />
<br />
For small data sets, "filesort" simply uses a quicksort in memory. However, for large data sets we create temporary (sorted) files, which are later merged.<br />
<br />
In the recent MySQL release candidate <a href="http://dev.mysql.com/tech-resources/articles/whats-new-in-mysql-5.6.html">http://dev.mysql.com/tech-resources/articles/whats-new-in-mysql-5.6.html</a> this has been optimized for the following kinds of queries (and subqueries)<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">SELECT ... FROM single_table ... ORDER BY non_index_column [DESC] LIMIT N [OFFSET M];</span><br />
<br />
If we have space in memory for (N + M) rows, we create a (bounded) Priority Queue to hold the data. This means we can produce the ordered result using a single table scan, thus saving the writing/reading of temporary files, and saving the merge step.<br />
<br />
How can you see it in action? Sorry, there's no explain for it (yet). What you can do currently is to inspect the status counters which are associated with filesort, and verify they are all zero.<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">FLUSH STATUS;</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">SHOW SESSION STATUS LIKE 'Sort%';</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"><run your select ... limit query></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">SHOW SESSION STATUS LIKE 'Sort%';</span></span><br />
What kind of speedup can you expect? This depends on your data set, and your sort_buffer_size of course. With this table, and about 20 million rows, and default sort buffer, execution time drops from about 40 seconds to 10 seconds on my desktop machine:<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">CREATE TABLE t1(</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> f0 int auto_increment PRIMARY KEY,</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> f1 int,</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> f2 varchar(200)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">);</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">SELECT * FROM t1 ORDER BY f2 LIMIT 100;</span></span><br />
</div>didrikhttp://www.blogger.com/profile/05846571303485695354noreply@blogger.com4