Searching
- Basic Searching
- Field Conditions
- Attribute Filters
- Application-Wide Search
- Pagination
- Match Modes
- Ranking Modes
- Sorting
- Field Weights
- Search Results Information
- Grouping/Clustering
- Searching for Object Ids
- Search Counts
- Avoiding Nil Results
- Automatic Wildcards
- Errors
- Advanced Options
Basic Searching
Once you’ve got an index set up on your model, and have the Sphinx daemon running, then you can start to search, using a model named just that.
Article.search 'pancakes'
Please note that Sphinx paginates search results, and the default page size is 20. You can find more information further down in the pagination section.
Field Conditions
To focus a query on a specific field, you can use the :conditions
option – much like in ActiveRecord:
Article.search :conditions => {:subject => 'pancakes'}
You can combine both field-specific queries and generic queries too:
Article.search 'pancakes', :conditions => {:subject => 'tasty'}
Please keep in mind that Sphinx does not support SQL comparison operators – it has its own query language. The :conditions
option must be a hash, with each key a field and each value a string.
Attribute Filters
Filters on attributes can be defined using a similar syntax, but using the :with
option.
Article.search 'pancakes', :with => {:author_id => @pat.id}
Filters have the advantage over focusing on specific fields in that they accept arrays and ranges:
Article.search 'pancakes', :with => {
:created_at => 1.week.ago..Time.now,
:author_id => @fab_four.collect { |author| author.id }
}
And of course, you can mix and match global terms, field-specific terms, and filters:
Article.search 'pancakes',
:conditions => {:subject => 'tasty'},
:with => {:created_at => 1.week.ago..Time.now}
If you wish to exclude specific attribute values, then you can specify them using :without
:
Article.search 'pancakes',
:without => {:user_id => current_user.id}
For matching multiple values in a multi-value attribute, :with
doesn’t quite do what you want. Give :with_all
a try instead:
Article.search 'pancakes',
:with_all => {:tag_ids => @tags.collect(&:id)}
Application-Wide Search
You can use all the same syntax to search across all indexed models in your application:
ThinkingSphinx.search 'pancakes'
If you’re using a version of Thinking Sphinx prior to 1.2, you will need to use a slightly deeper namespaced method: ThinkingSphinx::Search.search
.
This search will return all objects that match, no matter what model they are from, ordered by relevance (unless you specify a custom order clause, of course). Don’t expect references to attributes and fields to work perfectly if they don’t exist in all the models.
If you want to limit global searches to a few specific models, you can do so with the :classes
option:
ThinkingSphinx.search 'pancakes', :classes => [Article, Comment]
Pagination
Sphinx paginates search results by default. Indeed, there’s no way to turn it off (but you can request really big pages should you wish). The parameters for pagination in Thinking Sphinx are exactly the same as Will Paginate: :page
and :per_page
.
Article.search 'pancakes', :page => params[:page], :per_page => 42
The output of search results can be used with Will Paginate’s view helper as well, just to keep things nice and easy.
# in the controller:
@articles = Article.search 'pancakes'
# in the view:
will_paginate @articles
Match Modes
Sphinx has several different ways of matching the given search keywords, which can be set on a per-query basis using the :match_mode
option.
Article.search 'pancakes waffles', :match_mode => :any
Most are pretty self-explanatory, but here’s a quick guide. If you need more detail, check out Sphinx’s own documentation.
:all
This is the default for Thinking Sphinx, and requires a document to have every given word somewhere in its fields.
:any
This will return documents that include at least one of the keywords in their fields.
:phrase
This matches all given words together in one place, in the same order. It’s just the same as wrapping a Google search in quotes.
:boolean
This allows you to use boolean logic with your keywords. & is AND, | is OR, and both – and ! function as NOTs. You can group logic within parentheses.
Article.search 'pancakes & waffles', :match_mode => :boolean
Article.search 'pancakes | waffles', :match_mode => :boolean
Article.search 'pancakes !waffles', :match_mode => :boolean
Article.search '( pancakes topping ) | waffles',
:match_mode => :boolean
Keep in mind that ANDs are used implicitly if no logic is given, and you can’t query with just a NOT – Sphinx needs at least one keyword to match.
:extended
Extended combines boolean searching with phrase searching, field-specific searching, field position limits, proximity searching, quorum matching, strict order operator, exact form modifiers (since 0.9.9rc1) and field-start and field-end modifiers (since 0.9.9rc2).
I highly recommend having a look at Sphinx’s syntax examples. Also keep in mind that if you use the :conditions
option, then this match mode will be used automatically.
:extended2
This is much like the normal extended mode, but with some quirks that Sphinx’s documentation doesn’t cover. Generally, if you don’t know you want to use it, don’t worry about using it.
fullscan
This match mode ignores all keywords, and just pays attention to filters, sorting and grouping.
Ranking Modes
Sphinx also has a few different ranking modes (again, the Sphinx documentation is the best source of information on these). They can be set using the :rank_mode
option:
Article.search "pancakes", :rank_mode => :bm25
:proximity_bm25
The default ranking mode, which combines both phrase proximity and BM25 ranking (see below).
:bm25
A statistical ranking mode, similar to most other full-text search engines.
:none
No ranking – every result has a weight of 1.
:wordcount
(since 0.9.9rc1)
Ranks results purely on the number of times the keywords are found in a document. Field weights are taken into factor.
:proximity
(since 0.9.9rc1)
Ranks documents by raw proximity value.
:match_any
(since 0.9.9rc1)
Returns rankings calculated in the same way as a match mode of :any
.
:fieldmask
(since 0.9.9rc2)
Returns rankings as a 32-bit mask with the N-th bit corresponding to the N-th field, numbering from 0. The bit will only be set when any of the keywords match the respective field. If you want to know which fields match your search for each document, this is the only way.
Sorting
By default, Sphinx sorts by how relevant it believes the documents to be to the given search keywords. However, you can also sort by attributes (and fields flagged as sortable), as well as time segments or custom mathematical expressions.
Attribute sorting defaults to ascending order:
Article.search "pancakes", :order => :created_at
If you want to switch the direction to descending, use the :sort_mode
option:
Article.search "pancakes", :order => :created_at,
:sort_mode => :desc
If you want to use multiple attributes, or Sphinx’s ranking scores, then you’ll need to use the :extended
sort mode. This will be set by default if you pass in a string to :order
, but you can set it manually if you wish. This syntax is pretty much the same as SQL, and directions (ASC and DESC) are required for each attribute.
Article.search "pancakes", :sort_mode => :extended,
:order => "created_at DESC, @relevance DESC"
As well as using any attributes and sortable fields here, you can also use Sphinx’s internal attributes (prefixed with @). These are:
- @id (The match’s document id)
- @weight, @rank or @relevance (The match’s ranking weight)
- @random (Returns results in random order)
Expression Sorting
If you’re hoping to make your ranking algorithm a bit more complex, then you can break out the arithmetic and use Sphinx’s expression sort mode:
Article.search "pancakes", :sort_mode => :expr,
:order => "@weight * views * karma"
Reading the Sphinx documentation is required if you really want to understand the power and options around this sorting method.
Time Segment Sorting
Sphinx also has a curious sort mode, :time_segments
. This breaks down a given timestamp/datetime attribute into the following segments, and then the matches within the segments are sorted by their ranking.
- Last Hour
- Last Day
- Last Week
- Last Month
- Last 3 Months
- Everything else
You can’t change the segment points – these are fixed by Sphinx. To use this sort method, you need to specify it as well as the attribute to use as a reference point:
Article.search "pancakes", :sort_mode => :time_segments,
:sort_by => :updated_at
Field Weights
Sphinx has the ability to weight fields with differing levels of importance. You can set this using the :field_weights
option in your searches:
Article.search "pancakes", :field_weights => {
:subject => 10,
:tags => 6,
:content => 3
}
You don’t need to specify all fields – any not given values are kept at the default weighting of 1.
If you’d like the same custom weightings to apply to all searches, you can set the values in the define_index
block:
set_property :field_weights => {
:subject => 10,
:tags => 6,
:content => 3
}
Search Results Information
If you’re building your own pagination output, then you can find out the statistics of your search using the following accessors:
@articles = Article.search 'pancakes'
# Number of matches in Sphinx
@articles.total_entries
# Number of pages available
@articles.total_pages
# Current page index
@articles.current_page
# Number of results per page
@articles.per_page
Grouping / Clustering
Sphinx allows you group search records that share a common attribute, which can be useful when you want to show aggregated collections. For example, if you have a set of posts and they are all part of a category and have a category_id, you could group your results by category id and show a set of all the categories matched by your search, as well as all the posts. You can read more about it in the official Sphinx documentation.
For grouping to work, you need to pass in the :group_by
parameter and a :group_function
parameter.
Searching posts, for example:
Post.search 'syrup',
:group_by => 'category_id',
:group_function => :attr
By default, this will return your Post objects, but one per category_id. If you want to sort by how many posts each category contains, you can pass in :group_clause :
Post.search 'syrup',
:group_by => 'category_id',
:group_function => :attr,
:group_clause => "@count desc"
You can also group results by date. Given you have a date column in your index:
class Post < ActiveRecord::Base
define_index
...
has :created_at
end
end
Then you can group search results by that date field:
Post.search 'treacle',
:group_by => 'created_at',
:group_function => :day
You can use the following date types:
:day
:week
:month
:year
Once you have the grouped results, you can enumerate by each result along with the group value, the number of objects that matched that group value, or both, using the following methods respectively:
posts.each_with_groupby { |post, group| }
posts.each_with_count { |post, count| }
posts.each_with_groupby_and_count { |post, group, count| }
Searching for Object Ids
If you would like just the primary key values returned, instead of instances of ActiveRecord objects, you can use all the same search options in a call to search_for_ids
instead.
Article.search_for_ids 'pancakes'
ThinkingSphinx.search_for_ids 'pancakes'
Search Counts
If you just want the number of matches, instead of the matched objects themselves, then you can use the search_count
method (which accepts all the same arguments as a normal search
call). If you’re searching globally, then there is an alias to the ThinkingSphinx.count
method.
Article.search_count 'pancakes'
ThinkingSphinx.count 'pancakes'
ThinkingSphinx.search_count 'pancakes'
Avoiding Nil Results
Thinking Sphinx tries its hardest to make sure Sphinx knows when records are deleted, but sometimes stale objects slip through the gaps. To get around this, Thinking Sphinx has the option of retrying searches.
To enable this, you can set :retry_stale
to true, and Thinking Sphinx will make up to three tries at retrieving a full result set that has no nil values. If you want to change the number of tries, set :retry_stale
to an integer.
And obviously, this can be quite an expensive call (as it instantiates objects each time), but it provides a better end result in some situations.
Article.search 'pancakes', :retry_stale => true
Article.search 'pancakes', :retry_stale => 1
Automatic Wildcards
If you’d like your search keywords to be wildcards for every search, you can use the :star
option, which automatically prepends and appends wildcard stars to each word.
Article.search 'pancakes waffles', :star => true
# => becomes '*pancakes* *waffles*'
Errors
At times, Sphinx will return no results, but sometimes that’s because there was a problem with the actual query provided. When this happens, Sphinx includes the error message in the results.
You can access errors with error
and test for errors with error?
.
If an error is encountered, ThinkingSphinx will log it and then raise a ThinkingSphinx::SphinxError
exception. You can tell ThinkingSphinx to ignore errors (though it will still log them) by passing in :ignore_errors => true
or setting the property in your index with set_property :ignore_errors => true
.
For example:
r = Article.search '@doesntexist foo', :match_mode => :extended,
:ignore_errors => true
r.error? # => true
Sphinx also issues warnings that you can test for with warning?
and inspect with warning
. No exception is raised on warnings.
Advanced Options
Thinking Sphinx also accepts the following advanced Sphinx arguments:
Additionally, Thinking Sphinx accepts :comment
, as the search’s comment (which is printed in the query log), and :sql_order
, which is passed through to the SQL query to instantiate the ActiveRecord objects. The latter might be useful if Sphinx’s data isn’t quite accurate for sorting (as can be the case with ordinal attributes).
One other option – to avoid lazily loading search results and make sure Thinking Sphinx processes the search query immediately, is the :populate
option:
Article.search 'pancakes', :populate => true
This is particularly useful to ensure exceptions are raised where you expect them to.