Corporate Blog


Full-text search in Rails applications with Sphinx


by , July 29, 2010, Ruby on Rails

Whether you run an online store, community or news portal, one day you might want to add a search option for your users. And they will expect it to work fast and to provide relevant results.

Problems & Solutions

When it comes to a Rails application you might face certain problems. If your application uses a MySQL database then the default engine used for tables is InnoDB. When you need to search through certain tables by several fields it becomes impossible since InnoDB doesn’t support full-text search.

What can be a possible solution for this situation? There are a few:

  • You can covert some tables to MyISAM which supports full-text search, but it’s not a good idea to keep InnoDB and MyISAM tables on a single server due to possible memory issues.
  • Use MyISAM slaves but it will increase the architectural complexity.
  • You can use an external full-text search engine such as Apache Lucine/Solr, Sphinx or Xapian.

At Sphere we tend to use Sphinx. It’s an open source, super fast and reliable solution.

Sphinx’s overview

Sphinx is an open-source full-text search server, designed from the ground up with performance, relevance and integration simplicity in mind.
Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as a database server.
Let’s look at Sphinx’s features:

  • High indexing speed (upto 10 MB/sec on modern CPUs)
  • High search speed (average query is under 0.1 sec on 2-4 GB text collections)
  • High scalability (up to 100 GB of text, up to 100 M documents on a single CPU)
  • Supports distributed searching (since v.0.9.6)
  • Supports MySQL natively (MyISAM and InnoDB tables are both supported)
  • Supports phrase searching
  • Supports phrase proximity ranking, providing good relevance
  • Supports English and Russian stemming
  • Supports any number of document fields (weights can be changed on the fly)
  • Supports document groups
  • Supports stop words
  • Supports different search modes (“match all”, “match phrase” and “match any” as of v.0.9.5)

Sphinx Installation

Let’s get the latest version from the official website and untar it:

wget http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
tar -xzf sphinx-0.9.9-rc2.tar.gz

After that, we should compile Sphinx from the source:

cd sphinx-0.9.9-rc2/
./configure
make
sudo make install

That’s it.

Thinking Sphinx Installation

Now we need to install Thinking Sphinx gem written by Pat Allen to work with Sphinx from our Rails applications. There are some other gems such as acts_as_sphinx and Ultrsphinx but they seem to be abandoned.

If you use Rails 2.x, run from the application’s root directory:

script/plugin install git://github.com/freelancing-god/thinking-sphinx.git 

If you are already on Rails 3, open Gemfile in the root directory and add the line below:

gem 'thinking-sphinx', :git => 'http://github.com/freelancing-god/thinking-sphinx.git', :require => 'thinking_sphinx', :branch => 'rails3'

And run the following command:

bundle install

Thinking Sphinx gem adds a few rake tasks to your application. The most important ones:

rake thinking_sphinx:index – Create the index
rake thinking_sphinx:reindex – Reindex Sphinx without regenerating the configuration file
rake thinking_sphinx:start – Start up Sphinx's daemon
rake thinking_sphinx:stop – Shut down the daemon

Usage

Let’s imagine we have a web site with database of potential ready-to-work candidates.
Every candidate has several documents attached such as resume, cover letter and certificates. We want to perform a search by candidate’s name, location and information inside documents.

We need to set up indexes in our Candidate Mode:

class Candidate < ActiveRecord::Base
  has_many :documents, :dependent => :destroy
  define_index do
    indexes location
    indexes [first_name, last_name], :as => :name, :sortable => true
    indexes documents.content, :as => :document_content
  end
end

Now we can perform a search by calling the search method:

Candidate.search “chicago ruby”

or

Candidate.search “smith boston ruby rails”

As you might see, we added :srtable parameter to the name, which allows us to add a search order:

Candidate.search(“chicago rails”, : order => :name )

Conclusion

MySQL can become a blocker when it comes to searching on large text fields and usage of external full-text search engines might be a good solution. As you could see above, it’s really easy to install and start using Sphinx along with Thinking Sphinx gem.

Resources

Sphinx’s official site (http://www.sphinxsearch.com/)
Thinking Sphinx gem (http://freelancing-god.github.com/ts/en/)
Thinking Sphinx PDF (http://peepcode.com/products/thinking-sphinx-pdf)
Apache Lucene (http://lucene.apache.org/)
Solr (http://lucene.apache.org/solr/)
Xapian (http://xapian.org/)

Share
Add comment Share

Add Comment

Enter symbols from the box below to make us sure that you're not a robot: