I built out a new Elasticsearch 5.4 cluster today.
Typically, it’s a tedious task. I haven’t invested in any sort of infrastructure automation technology, because, well, there aren’t enough hours in the day. I remembered a trick a few of us came up with at a previous bank I used to work for. Using a shell script in AWS S3, that gets downloaded in a user init script in EC2, and bam, off to the races!
I won’t give away any tricks here, since my boss would kick me… again, but, since this processed was used heavily by me and team previously, I don’t mind sharing.
We didn’t use it specifically for Elasticsearch, but, you can get the gist of how to use it in other applications.
First step, upload the script to AWS S3. Here, I’ll use an example bucket of “notmybucket.com” – that’s my bucket, don’t try to own it. for reals.
Let’s call the script “provision.es.sh”
The provision file can look something like this:
#!/bin/bash yum -y remove java-1.7.0-openjdk yum -y install java-1.8.0-openjdk jq wget wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.4.2.rpm rpm -i elasticsearch-5.4.2.rpm chkconfig --add elasticsearch aws s3 cp s3://notmybucket.com/elasticsearch.data.yml.template /etc/elasticsearch/elasticsearch.yml cd /usr/share/elasticsearch ./bin/elasticsearch-plugin install -b discovery-ec2 ./bin/elasticsearch-plugin install -b repository-s3 ./bin/elasticsearch-plugin install -b x-pack cd /etc/elasticsearch echo "EDIT ES Settings" INSTANCE_ID=$(/opt/aws/bin/ec2-metadata --instance-id | cut -f2 -d " ") AVAIL_ZONE=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .availabilityZone) REGION=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region) NAME=$(aws ec2 describe-tags --filters "Name=resource-id,Values=$INSTANCE_ID" --region=$REGION --output=json | jq -r .Tags[0].Value) echo $INSTANCE_ID echo $AVAIL_ZONE echo $REGION echo $NAME sed -i -- "s/INPUT_INSTANCE_NAME/$NAME/" elasticsearch.yml sed -i -- "s/INPUT_RACK/$REGION/" elasticsearch.yml sed -i -- "s/REGION/$AVAIL_ZONE/" elasticsearch.yml cat elasticsearch.yml echo "Now run service elasticsearch start" service elasticsearch start
You’ll see reference to an elasticsearch.data.yml.template.. that’s super simple:
cluster.name: fauie.com.es5 node.name: INPUT_INSTANCE_NAME node.master: false node.data: true node.ingest: true node.attr.rack: AVAIL_ZONE network.host: 0.0.0.0 http.port: 9200 cloud: aws: region: REGION discovery: zen.hosts_provider: ec2 ec2: groups: es5.in script.inline: true xpack.security.enabled: false xpack.monitoring.enabled: true xpack.watcher.enabled: false
Made up a security group, etc… configure the security group to whatever you’re using for your ES cluster.. change the bucket to your bucket.
Each ES host needs a unique name (beats me what will happen to elasticsearch if you have multiple nodes with the same name.. they’re geniuses.. it’s probably fine, but, you can test it, not me). Alternatively, try to use your instance ID as your node name!
Then your user init data looks super stupid and simple:
#!/bin/bash sudo su - aws s3 cp s3://notmybucket.com/provision.es5.sh . bash ./provision.es5.sh

Add user data
Once you complete your EC2 creation, you can verify the output in:
/var/log/cloud-init-output.log
# grep "Now run" /var/log/cloud-init-output.log Now run service elasticsearch start
Leave a Reply