Stop Sphinx and MySQL from Truncating Grouped Data
Clinton R. Dreisbach, Former Viget
Sphinx is an open-source full-text search engine that I like to use with my Rails projects. It's very fast, easy to understand, and has a great plugin, Thinking Sphinx that provides a Ruby API for search.
I found a very annoying issue with Thinking Sphinx the other day, though. I was using its search indexing DSL with the Page model in Radiant, which has PageParts associated with it. My Page model looked similar to this:
class Page < ActiveRecord::Base define_index do indexes title, :sortable => true indexes description indexes parts.content, :as => :parts_contents end end
You can see that I'm indexing the content of all the page parts with the page in order to make pages searchable by their content. The SQL query that was generated for MySQL looked similar to like this:
SELECT `pages`.`id` * 3 + 0 AS `id` , CAST(`pages`.`title` AS CHAR) AS `title`, CAST(`pages`.`description` AS CHAR) AS `description`, CAST(GROUP_CONCAT(`page_parts`.`content` SEPARATOR ' ') AS CHAR) AS `parts_contents` FROM pages LEFT OUTER JOIN `page_parts` ON page_parts.page_id = pages.id GROUP BY `pages`.`id`
GROUP_CONCAT is a MySQL function that will join together data, which is perfect for this situation.
Whenever I would search, however, I often would not see pages that I knew had my search term in them. There didn't seem to be a reason why some pages showed up and others didn't. When I ran the SQL query myself, though, I found it:
GROUP_CONCAT was truncating what it returned, which limited the text I could search against. From the MySQL manual entry on
The result is truncated to the maximum length that is given by the
group_concat_max_lensystem variable, which has a default value of 1024. The value can be set higher, although the effective maximum length of the return value is constrained by the value of max_allowed_packet.
The solution was simple: add a line to my MySQL configuration.
[mysqld] group_concat_max_len 32768
That's all it took to fix the problem. If you expect to return data of more than 32,768 characters, you can adjust the number accordingly.