Fauie Technology

eclectic blogging, technology and hobby farming

Category: elasticsearch (page 2 of 2)

Simple Tip: Provision an Elasticsearch Node Automatically!

I built out a new Elasticsearch 5.4 cluster today.

Typically, it’s a tedious task.   I haven’t invested in any sort of infrastructure automation technology, because, well, there aren’t enough hours in the day.  I remembered a trick a few of us came up with at a previous bank I used to work for.  Using a shell script in AWS S3, that gets downloaded in a user init script in EC2, and bam, off to the races!

I won’t give away any tricks here, since my boss would kick me… again, but, since this processed was used heavily by me and team previously, I don’t mind sharing.

We didn’t use it specifically for Elasticsearch, but, you can get the gist of how to use it in other applications.

First step, upload the script to AWS S3.    Here, I’ll use an example bucket of “notmybucket.com” – that’s my bucket, don’t try to own it.  for reals.

Let’s call the script “provision.es.sh”

The provision file can look something like this:

#!/bin/bash
yum -y remove java-1.7.0-openjdk
yum -y install java-1.8.0-openjdk   jq  wget
 
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.4.2.rpm
rpm -i elasticsearch-5.4.2.rpm
chkconfig --add elasticsearch
aws s3 cp s3://notmybucket.com/elasticsearch.data.yml.template /etc/elasticsearch/elasticsearch.yml
cd /usr/share/elasticsearch
./bin/elasticsearch-plugin install -b discovery-ec2
./bin/elasticsearch-plugin install -b repository-s3
./bin/elasticsearch-plugin install -b x-pack
cd /etc/elasticsearch
echo "EDIT ES Settings"


INSTANCE_ID=$(/opt/aws/bin/ec2-metadata  --instance-id | cut -f2 -d " ")
AVAIL_ZONE=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .availabilityZone)
REGION=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region)
NAME=$(aws ec2 describe-tags --filters "Name=resource-id,Values=$INSTANCE_ID" --region=$REGION --output=json | jq -r .Tags[0].Value)
echo $INSTANCE_ID
echo $AVAIL_ZONE
echo $REGION
echo $NAME


sed -i -- "s/INPUT_INSTANCE_NAME/$NAME/" elasticsearch.yml
sed -i -- "s/INPUT_RACK/$REGION/"        elasticsearch.yml
sed -i -- "s/REGION/$AVAIL_ZONE/"        elasticsearch.yml

cat elasticsearch.yml
echo "Now run service elasticsearch start"
service elasticsearch start

You’ll see reference to an elasticsearch.data.yml.template.. that’s super simple:

cluster.name: fauie.com.es5

node.name: INPUT_INSTANCE_NAME
node.master: false
node.data: true
node.ingest: true

node.attr.rack: AVAIL_ZONE
network.host: 0.0.0.0
http.port: 9200
cloud:
    aws:
        region:  REGION

discovery:
    zen.hosts_provider: ec2
    ec2:
        groups:  es5.in



script.inline: true
xpack.security.enabled: false
xpack.monitoring.enabled: true
xpack.watcher.enabled: false

Made up a security group, etc… configure the security group to whatever you’re using for your ES cluster.. change the bucket to your bucket.

Each ES host needs a unique name (beats me what will happen to elasticsearch if you have multiple nodes with the same name.. they’re geniuses.. it’s probably fine, but, you can test it, not me).  Alternatively, try to use your instance ID as your node name!

Then your user init data looks super stupid and simple:

#!/bin/bash
sudo su -
aws s3 cp   s3://notmybucket.com/provision.es5.sh   .
bash ./provision.es5.sh

Add user data

Once you complete your EC2 creation, you can verify the output in:

/var/log/cloud-init-output.log

# grep "Now run" /var/log/cloud-init-output.log
Now run service elasticsearch start

 

Spilling the beans…

Today, I’m at ILTA’s LegalSec Summit 2017.    Giving a talk later about Threat Automation:

Me

Talk

I’m excited about sharing some information about Threat Intelligence, automation and application on a network sensor.  That’s all good stuff.

What I’m really happy about, is that I can be totally open with the technology.  My goal is to educate folks on how they can do what my company does on a daily basis.  As an open source advocate, and a giant fan of a lot of the technology that I use every day (duh!), it’s good to show others how to do it.  We don’t provide anything that qualifies as super cool intellectual property….  we have some, but anyone can build the basics to run in their shop.  The challenge comes with the human capital needed to build and run this stuff.   That’s a big part of the challenge.

Amazon Linux – Java 1.8.0 for Elasticsearch 5.3

Quick note, and it’s not too hard, but took a few minutes to remember.

Amazon Linux comes with Java 1.7.0 installed. I wanted to upgrade to 1.8.0 for Elasticsearch 5.3

sudo yum -y install java-1.8.0-openjdk

 

Awesome!

$ java -version
java version "1.7.0_131"

 

Not Awesome

Just yank out 1.7.0

$ sudo yum remove java-1.7.0-openjdk

If you need both installed, maybe an old piece of code needs 1.7.0 and all your other stuff can deal with a global default of 1.8.0,   update your legacy apps to specify the JAVA_HOME environment variables to the real location of java-1.7.0, and update the global system like this:

$ which java
/usr/bin/java
$ ls -altr /usr/bin/java
lrwxrwxrwx 1 root root 22 Apr 20 17:00 /usr/bin/java -> /etc/alternatives/java
$ ls -latr /etc/alternatives/java
lrwxrwxrwx 1 root root 46 Apr 20 17:00 /etc/alternatives/java -> /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java
$ sudo update-alternatives --config java

There is 1 program that provides 'java'.

Selection Command
-----------------------------------------------
*+ 1 /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java

Enter to keep the current selection[+], or type selection number: 1

Had I not removed Java 1.7 already, I’d have 1.7 and 1.8 in that list to choose from.

 

 

The Cyber Pivot

Over the years, I’ve put a lot of thought into pivoting.   Not in the startup lingo, but in the data sense.   Pivot from one piece of data to another, in order to build a picture.

Data is all about pivoting.   When I’m investigating an alert, I very rarely have a good picture of all the events/correlating data surround an alert. This leads to some frustrating , repeated times for each alert triage.   Looking at external sources, internal sources, etc.

The ‘simplest’ challenge to solve would be to auto-pivot around the internal log data we have.   Since I’m a wanna-be SOC analyst, but a pretty good software engineer, I need to build some code that will auto pivot.  Basically, given a specific moment in time, or a specific net flow record, spider out until I get 6 degrees of separation.  Effectively, build my own graph database from log storage.

To the code!

I’ve started writing some code, and will probably post it here, as long as we don’t want to claim IP at Perch (https://perchsecurity.com), but , we may, so hang tight.  Keep an eye out here:

https://github.com/chrisfauerbach

If this were Neo4J:

MATCH (a:Alert) – [] – (f:Flow) – [] – (n:NSM)

OPTIONAL MATCH (n) – [r:REFERS] – (n2:NSM {type:”HTTP”})

return a, n, f, n2

… or something like that.

 

 

 

Elasticsearch Maintenance with Jenkins

 

Maintaining production systems is one of those unfortunate tasks that we need to deal with…  I mean, why can’t they just run themselves?   I get tired of daily tasks extremely quickly.   Now that I have a few ongoing Elasticsearch clusters to deal with, I had to come up with a way to keep them singing.

As a developer, I usually don’t have to deal with these kind of things, but in startup world, I get to do it all from maintenance, monitoring, development, etc.

Jenkins makes this kind of stuff super easy.   With a slew of python programs, that use parameters/environment variables to connect to the right Elasticsearch cluster, I’m able to perform the following tasks, in order (order is key)

  1.  Create Snapshot
  2. Monitor Snapshot until it’s done
  3. Delete Old Data ( This is especially interesting in our use case, we have a lot of intentional False Positive data for connectivity testing)
  4. Force Merge Indices

I have Jenkins set up to trigger the down stream jobs after the prior completes.

I could do a cool Jenkins Pipeline…. in my spare time.

Snapshots:

Daily snapshots are critical in case of cluster failure.   With a four node cluster, I’m running in a fairly safe setup, but if something goes catastrophically bad, I can always restore from a snapshot.   My setup has my snapshots going to AWS S3 buckets.

Delete Old Data:

When dealing with network monitoring, network sensors and storing of NSM data (see Suricata NSM Fields ), we have determined one easy way to test end to end integration is by inserting some obviously fake False Positives into our system.   We have stood up a Threat Intelligence Platform (Soltra Edge) to serve some fake Indicator/Observables.   Google.com, Yahoo.com, etc.   They show up in everyone’s networks if there is user traffic.   Now, this is great to determine connectivity, but long term that comes to be LOTS of traffic that I really don’t need to store…. so, they get deleted.

Force Merge Indices

There is a lot of magic that happens in Elasticsearch.  Thats’s fantastic.  Force Merging allows ES to effectively shrink the number of segments in a shard, thereby increasing performance when querying it.  This is really only useful for indices that are no longer receiving data.  In our use case, that’s historical data.  I delete the old data, then force merge it.

 

A day in the life.. of Jenkins.

 

 

 

 

Newer posts »

© 2021 Fauie Technology

Theme by Anders NorenUp ↑