Full text search for document attachments with Rails & ElasticSearch
I've started working on a project that requires full text search on uploaded documents using ElasticSearch. Lucky enough, ElasticSearch has this Mapper Attachments Type. It is a plugin and can be easily installed. There are few important things to note here:
- ES accept attachment as an encoded string in
base64
- By default only 100,000 chars are extracted from attachments. You need to config if you need more
- It handles a lots of file types, not just document. See here
So far, there are several gems that make it easy to work with ElasticSearch such as Tire, Chewy, ElasticSearch Rails and Searchkick. Except Tire which has been retired for a long time, I believe that any of the other three gems will work well. I chose Chewy because it has a dedicate wiki that gives an example of configuration for attachment full text search.
CarrierWave is used to handle upload process.
Following is a sample code for a Product
model with two fields: name
and attachment
class ProductsIndex < Chewy::Index
define_type Product do
field :name
field :attachment, type: "attachment", value: ->product {
if product.attachment.present?
Base64.encode64 open(product.attachment.path).read
else
""
end
}
end
end
A shortcut for quick search:
class << self
def search keyword
fields = %w[name attachment]
ProductsIndex.query multi_match: {query: keyword, fields: fields}
end
end
Link for demo https://github.com/nguyenducgiang/chewy-demo
But it's not the only solution
What if we just extract text content from document ourself before passing it to ES as a normal string? It is possible using gem like Yomu