Searching

Basic Searching

Once you’ve got an index set up on your model, and have the Sphinx daemon running, then you can start to search, using a model named just that.

Article.search 'pancakes'

Please note that Sphinx paginates search results, and the default page size is 20. You can find more information further down in the pagination section.

Field Conditions

To focus a query on a specific field, you can use the :conditions option – much like in ActiveRecord:

Article.search :conditions => {:subject => 'pancakes'}

You can combine both field-specific queries and generic queries too:

Article.search 'pancakes', :conditions => {:subject => 'tasty'}

Please keep in mind that Sphinx does not support SQL comparison operators – it has its own query language. The :conditions option must be a hash, with each key a field and each value a string.

Attribute Filters

Filters on attributes can be defined using a similar syntax, but using the :with option.

Article.search 'pancakes', :with => {:author_id => @pat.id}

Filters have the advantage over focusing on specific fields in that they accept arrays and ranges:

Article.search 'pancakes', :with => {
  :created_at => 1.week.ago..Time.now,
  :author_id  => @fab_four.collect { |author| author.id }
}

And of course, you can mix and match global terms, field-specific terms, and filters:

Article.search 'pancakes',
  :conditions => {:subject => 'tasty'},
  :with       => {:created_at => 1.week.ago..Time.now}

If you wish to exclude specific attribute values, then you can specify them using :without:

Article.search 'pancakes',
  :without => {:user_id => current_user.id}

For matching multiple values in a multi-value attribute, :with doesn’t quite do what you want. Give :with_all a try instead:

Article.search 'pancakes',
  :with_all => {:tag_ids => @tags.collect(&:id)}

Application-Wide Search

You can use all the same syntax to search across all indexed models in your application:

ThinkingSphinx.search 'pancakes'

If you’re using a version of Thinking Sphinx prior to 1.2, you will need to use a slightly deeper namespaced method: ThinkingSphinx::Search.search.

This search will return all objects that match, no matter what model they are from, ordered by relevance (unless you specify a custom order clause, of course). Don’t expect references to attributes and fields to work perfectly if they don’t exist in all the models.

If you want to limit global searches to a few specific models, you can do so with the :classes option:

ThinkingSphinx.search 'pancakes', :classes => [Article, Comment]

Pagination

Sphinx paginates search results by default. Indeed, there’s no way to turn it off (but you can request really big pages should you wish). The parameters for pagination in Thinking Sphinx are exactly the same as Will Paginate: :page and :per_page.

Article.search 'pancakes', :page => params[:page], :per_page => 42

The output of search results can be used with Will Paginate’s view helper as well, just to keep things nice and easy.

# in the controller:
@articles = Article.search 'pancakes'

# in the view:
will_paginate @articles

Match Modes

Sphinx has several different ways of matching the given search keywords, which can be set on a per-query basis using the :match_mode option.

Article.search 'pancakes waffles', :match_mode => :any

Most are pretty self-explanatory, but here’s a quick guide. If you need more detail, check out Sphinx’s own documentation.

:all

This is the default for Thinking Sphinx, and requires a document to have every given word somewhere in its fields.

:any

This will return documents that include at least one of the keywords in their fields.

:phrase

This matches all given words together in one place, in the same order. It’s just the same as wrapping a Google search in quotes.

:boolean

This allows you to use boolean logic with your keywords. & is AND, | is OR, and both – and ! function as NOTs. You can group logic within parentheses.

Article.search 'pancakes & waffles', :match_mode => :boolean
Article.search 'pancakes | waffles', :match_mode => :boolean
Article.search 'pancakes !waffles',  :match_mode => :boolean
Article.search '( pancakes topping ) | waffles',
  :match_mode => :boolean

Keep in mind that ANDs are used implicitly if no logic is given, and you can’t query with just a NOT – Sphinx needs at least one keyword to match.

:extended

Extended combines boolean searching with phrase searching, field-specific searching, field position limits, proximity searching, quorum matching, strict order operator, exact form modifiers (since 0.9.9rc1) and field-start and field-end modifiers (since 0.9.9rc2).

I highly recommend having a look at Sphinx’s syntax examples. Also keep in mind that if you use the :conditions option, then this match mode will be used automatically.

:extended2

This is much like the normal extended mode, but with some quirks that Sphinx’s documentation doesn’t cover. Generally, if you don’t know you want to use it, don’t worry about using it.

fullscan

This match mode ignores all keywords, and just pays attention to filters, sorting and grouping.

Ranking Modes

Sphinx also has a few different ranking modes (again, the Sphinx documentation is the best source of information on these). They can be set using the :rank_mode option:

Article.search "pancakes", :rank_mode => :bm25

:proximity_bm25

The default ranking mode, which combines both phrase proximity and BM25 ranking (see below).

:bm25

A statistical ranking mode, similar to most other full-text search engines.

:none

No ranking – every result has a weight of 1.

:wordcount (since 0.9.9rc1)

Ranks results purely on the number of times the keywords are found in a document. Field weights are taken into factor.

:proximity (since 0.9.9rc1)

Ranks documents by raw proximity value.

:match_any (since 0.9.9rc1)

Returns rankings calculated in the same way as a match mode of :any.

:fieldmask (since 0.9.9rc2)

Returns rankings as a 32-bit mask with the N-th bit corresponding to the N-th field, numbering from 0. The bit will only be set when any of the keywords match the respective field. If you want to know which fields match your search for each document, this is the only way.

Sorting

By default, Sphinx sorts by how relevant it believes the documents to be to the given search keywords. However, you can also sort by attributes (and fields flagged as sortable), as well as time segments or custom mathematical expressions.

Attribute sorting defaults to ascending order:

Article.search "pancakes", :order => :created_at

If you want to switch the direction to descending, use the :sort_mode option:

Article.search "pancakes", :order => :created_at,
  :sort_mode => :desc

If you want to use multiple attributes, or Sphinx’s ranking scores, then you’ll need to use the :extended sort mode. This will be set by default if you pass in a string to :order, but you can set it manually if you wish. This syntax is pretty much the same as SQL, and directions (ASC and DESC) are required for each attribute.

Article.search "pancakes", :sort_mode => :extended,
  :order => "created_at DESC, @relevance DESC"

As well as using any attributes and sortable fields here, you can also use Sphinx’s internal attributes (prefixed with @). These are:

  • @id (The match’s document id)
  • @weight, @rank or @relevance (The match’s ranking weight)
  • @random (Returns results in random order)

Expression Sorting

If you’re hoping to make your ranking algorithm a bit more complex, then you can break out the arithmetic and use Sphinx’s expression sort mode:

Article.search "pancakes", :sort_mode => :expr,
  :order => "@weight * views * karma"

Reading the Sphinx documentation is required if you really want to understand the power and options around this sorting method.

Time Segment Sorting

Sphinx also has a curious sort mode, :time_segments. This breaks down a given timestamp/datetime attribute into the following segments, and then the matches within the segments are sorted by their ranking.

  • Last Hour
  • Last Day
  • Last Week
  • Last Month
  • Last 3 Months
  • Everything else

You can’t change the segment points – these are fixed by Sphinx. To use this sort method, you need to specify it as well as the attribute to use as a reference point:

Article.search "pancakes", :sort_mode => :time_segments,
  :sort_by => :updated_at

Field Weights

Sphinx has the ability to weight fields with differing levels of importance. You can set this using the :field_weights option in your searches:

Article.search "pancakes", :field_weights => {
  :subject => 10,
  :tags    => 6,
  :content => 3
}

You don’t need to specify all fields – any not given values are kept at the default weighting of 1.

If you’d like the same custom weightings to apply to all searches, you can set the values in the define_index block:

set_property :field_weights => {
  :subject => 10,
  :tags    => 6,
  :content => 3
}

Search Results Information

If you’re building your own pagination output, then you can find out the statistics of your search using the following accessors:

@articles = Article.search 'pancakes'
# Number of matches in Sphinx
@articles.total_entries
# Number of pages available
@articles.total_pages
# Current page index
@articles.current_page
# Number of results per page
@articles.per_page

Grouping / Clustering

Sphinx allows you group search records that share a common attribute, which can be useful when you want to show aggregated collections. For example, if you have a set of posts and they are all part of a category and have a category_id, you could group your results by category id and show a set of all the categories matched by your search, as well as all the posts. You can read more about it in the official Sphinx documentation.

For grouping to work, you need to pass in the :group_by parameter and a :group_function parameter.

Searching posts, for example:

Post.search 'syrup',
  :group_by       => 'category_id',
  :group_function => :attr

By default, this will return your Post objects, but one per category_id. If you want to sort by how many posts each category contains, you can pass in :group_clause :

Post.search 'syrup',
  :group_by       => 'category_id',
  :group_function => :attr,
  :group_clause   => "@count desc"

You can also group results by date. Given you have a date column in your index:

class Post < ActiveRecord::Base
  define_index
    ...
    has :created_at
  end
end

Then you can group search results by that date field:

Post.search 'treacle',
  :group_by       => 'created_at',
  :group_function => :day

You can use the following date types:

  • :day
  • :week
  • :month
  • :year

Once you have the grouped results, you can enumerate by each result along with the group value, the number of objects that matched that group value, or both, using the following methods respectively:

posts.each_with_groupby           { |post, group| }
posts.each_with_count             { |post, count| }
posts.each_with_groupby_and_count { |post, group, count| }

Searching for Object Ids

If you would like just the primary key values returned, instead of instances of ActiveRecord objects, you can use all the same search options in a call to search_for_ids instead.

Article.search_for_ids 'pancakes'
ThinkingSphinx.search_for_ids 'pancakes'

Search Counts

If you just want the number of matches, instead of the matched objects themselves, then you can use the search_count method (which accepts all the same arguments as a normal search call). If you’re searching globally, then there is an alias to the ThinkingSphinx.count method.

Article.search_count 'pancakes'
ThinkingSphinx.count 'pancakes'
ThinkingSphinx.search_count 'pancakes'

Avoiding Nil Results

Thinking Sphinx tries its hardest to make sure Sphinx knows when records are deleted, but sometimes stale objects slip through the gaps. To get around this, Thinking Sphinx has the option of retrying searches.

To enable this, you can set :retry_stale to true, and Thinking Sphinx will make up to three tries at retrieving a full result set that has no nil values. If you want to change the number of tries, set :retry_stale to an integer.

And obviously, this can be quite an expensive call (as it instantiates objects each time), but it provides a better end result in some situations.

Article.search 'pancakes', :retry_stale => true
Article.search 'pancakes', :retry_stale => 1

Automatic Wildcards

If you’d like your search keywords to be wildcards for every search, you can use the :star option, which automatically prepends and appends wildcard stars to each word.

Article.search 'pancakes waffles', :star => true
# => becomes '*pancakes* *waffles*'

Errors

At times, Sphinx will return no results, but sometimes that’s because there was a problem with the actual query provided. When this happens, Sphinx includes the error message in the results.

You can access errors with error and test for errors with error?.

If an error is encountered, ThinkingSphinx will log it and then raise a ThinkingSphinx::SphinxError exception. You can tell ThinkingSphinx to ignore errors (though it will still log them) by passing in :ignore_errors => true or setting the property in your index with set_property :ignore_errors => true.

For example:

r = Article.search '@doesntexist foo', :match_mode => :extended,
                                       :ignore_errors => true
r.error? # => true

Sphinx also issues warnings that you can test for with warning? and inspect with warning. No exception is raised on warnings.

Advanced Options

Thinking Sphinx also accepts the following advanced Sphinx arguments:

Additionally, Thinking Sphinx accepts :comment, as the search’s comment (which is printed in the query log), and :sql_order, which is passed through to the SQL query to instantiate the ActiveRecord objects. The latter might be useful if Sphinx’s data isn’t quite accurate for sorting (as can be the case with ordinal attributes).

One other option – to avoid lazily loading search results and make sure Thinking Sphinx processes the search query immediately, is the :populate option:

Article.search 'pancakes', :populate => true

This is particularly useful to ensure exceptions are raised where you expect them to.