Table of Contents
Introduction
CouchDB
CouchDB is a NoSQL database that stores data as JSON documents. It is extremely helpful in situations where a schema would cause headaches and a flexible data model is required. CouchDB also supports master-master continuous replication, which means data can be continuously replicated between two databases without having to setup a complex system of master and slave databases.
ElasticSearch
ElasticSearch is a full-text search engine that indexes everything and makes pretty much anything searchable. This works extremely well with CouchDB because one of the limitations of CouchDB is that for all queries you have to either know the document ID or you have to use map/reduce.
Installing CouchDB
We will be installing CouchDB from source in order to get the latest version. A more thorough tutorial on this can be viewed progressiverobot.com.
Setting up the Environment
Update the package manager:
~~~~
apt-get update
~~~~
Install the tools to compile couch:
~~~~
apt-get install -y build-essential
~~~~
Install Erlang, the programming language that CouchDB is written in:
~~~~
apt-get install -y erlang-base erlang-dev erlang-nox erlang-eunit
~~~~
Install the rest of the libraries that CouchDB needs:
~~~~
apt-get install -y libmozjs185-dev libicu-dev libcurl4-gnutls-dev libtool
~~~~
Aquire the Source Files
—
Go to the directory where the CouchDB source files will reside:
~~~~
cd /usr/local/src
~~~~
Get the source files:
~~~~
curl -O http://apache.mirrors.tds.net/couchdb/source/1.5.0/apache-couchdb-1.5.0.tar.gz
~~~~
Untar the source files:
~~~~
tar xvzf apache-couchdb-1.5.0.tar.gz
~~~~
Go to the new directory:
~~~~
cd apache-couchdb-1.5.0
~~~~
Configure the source and install it:
~~~~
./configure
make && make install
~~~~
Note: This step can take a while. Once it is done, CouchDB will be fully installed. Now we need to create the appropriate user and assign permissions
Finalizing the CouchDB Installation
—
Create a CouchDB user:
~~~~
adduser –disabled-login –disabled-password –no-create-home couchdb
~~~~
Note: The prompts asking for things such as Name can be ignored if you would like. You can use the default values for each one.
Assign the appropriate permissions to the CouchDB user:
~~~~
chown -R couchdb:couchdb /usr/local/var/log/couchdb /usr/local/var/lib/couchdb /usr/local/var/run/couchdb
~~~~
Setup CouchDB as a service so that it does not have to be started manually:
~~~~
ln -s /usr/local/etc/init.d/couchdb /etc/init.d
update-rc.d couchdb defaults
~~~~
Start CouchDB:
~~~~
service couchdb start
~~~~
Verify that CouchDB is running
~~~~
curl localhost:5984
~~~~
You should see a response that starts with:
~~~~
{"couchdb":"Welcome"…
~~~~
Installing ElasticSearch
—
Initial Setup
—
Install the latest version of the headless open-jdk:
~~~~
apt-get install openjdk-7-jre-headless
~~~~
Get the latest version of ElasticSearch:
~~~~
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.8.deb
~~~~
Install the package:
~~~~
dpkg -i elasticsearch-0.90.8.deb
~~~~
Before continuing, you will want to configure Elasticsearch so it is not accessible to the public Internet–Elasticsearch has no built-in security and can be controlled by anyone who can access the HTTP API. This can be done by editing elasticsearch.yml. Assuming you installed with the package, open the configuration with this command:
sudo vi /etc/elasticsearch/elasticsearch.yml
Then find the line that specifies network.bind_host, then uncomment it and change the value to localhost so it looks like the following:
network.bind_host: localhost
Then insert the following line somewhere in the file, to disable dynamic scripts:
script.disable_dynamic: true
Save and exit. Now restart Elasticsearch to put the changes into effect:
sudo service elasticsearch restart
Verify that ElasticSearch is running (If the request fails the first time, try again. It can take a bit of time for it to start):
~~~~
curl http://127.0.0.1:9200
~~~~
You should see a response that starts with:
~~~~
{ "ok" : true, "status" : 200,
~~~~
Change Where ElasticSearch Stores Indices
—
Stop ElasticSearch:
~~~~
/etc/init.d/elasticsearch stop
~~~~
Create the new directory:
~~~~
mkdir /var/data/
mkdir /var/data/elasticsearch
~~~~
Change ownership of the directory to the 'elasticsearch' user:
~~~~
chown elasticsearch /var/data/elasticsearch
~~~~
Change the group:
~~~~
chgrp elasticsearch /var/data/elasticsearch
~~~~
Change the ElasticSearch configuration file to reflect the new data directory
—
Use nano to open the ElasticSearch configuration file:
~~~~
nano /etc/default/elasticsearch
~~~~
Change the line containing:
~~~~
DATA_DIR=
~~~~
to
~~~~
DATA_DIR= /var/data/elasticsearch
~~~~
Save and close the file.
Make the Two Work Together
—
Install the CouchDB River Plugin for ElasticSearch
—
Navigate to the ElasticSearch directory:
~~~~
cd /usr/share/elasticsearch/
~~~~
Install the plugin:
~~~~
./bin/plugin -install elasticsearch/elasticsearch-river-couchdb/1.2.0
~~~~
Start ElasticSearch Back Up
—
Start ElasticSearch:
~~~~
/etc/init.d/elasticsearch start
~~~~
Create the CouchDB Database and ElasticSearch Index
—
Put Some Stuff into CouchDB
—
Create the CouchDB database:
~~~~
curl -X PUT http://127.0.0.1:5984/testdb
~~~~
Create some test documents:
~~~~
curl -X PUT 'http://127.0.0.1:5984/testdb/1' -d '{"name":"My Name 1"}'
curl -X PUT 'http://127.0.0.1:5984/testdb/2' -d '{"name":"My Name 2"}'
curl -X PUT 'http://127.0.0.1:5984/testdb/3' -d '{"name":"My Name 3"}'
curl -X PUT 'http://127.0.0.1:5984/testdb/4' -d '{"name":"My Name 4"}'
~~~~
Setup ElasticSearch with the Database
—
Create the index:
~~~~
curl -X PUT '127.0.0.1:9200/_river/testdb/_meta' -d '{ "type" : "couchdb", "couchdb" : { "host" : "localhost", "port" : 5984, "db" : "testdb", "filter" : null }, "index" : { "index" : "testdb", "type" : "testdb", "bulk_size" : "100", "bulk_timeout" : "10ms" } }'
~~~~
Test it!
—
Do a test query with ElasticSearch:
~~~~
curl http://127.0.0.1:9200/testdb/testdb/_search?pretty=true
~~~~
You should see something similar to this:
~~~~
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "testdb",
"_type" : "testdb",
"_id" : "4",
"_score" : 1.0, "_source" : {"_rev":"1-7e9376fc8bfa6b8c8788b0f408154584","_id":"4","name":"My Name 4"}
}, {
"_index" : "testdb",
"_type" : "testdb",
"_id" : "1",
"_score" : 1.0, "_source" : {"_rev":"1-87386bd54c821354a93cf62add449d31","_id":"1","name":"My Name"}
}, {
"_index" : "testdb",
"_type" : "testdb",
"_id" : "2",
"_score" : 1.0, "_source" : {"_rev":"1-194582c1e02d84ae36e59f568a459633","_id":"2","name":"My Name 2"}
}, {
"_index" : "testdb",
"_type" : "testdb",
"_id" : "3",
"_score" : 1.0, "_source" : {"_rev":"1-62a53c50e7df02ec22973fc802fb9fc0","_id":"3","name":"My Name 3"}
} ]
}
}
~~~~
Now, rather than being limited to using map/reduce or the _id of each document, you can do full text queries on your data by using ElasticSearch.