De-Coder’s Ring

Consumable Security and Technology

Category: elasticsearch (page 1 of 2)

Startup Series: Life at a Startup

My colleagues know (since I talk A LOT) that I’ve had a long loving history for startups.  Like, Honest to God, hired by the founders as the first engineer (twice) kind of startups.

The first one was a very slow process to go full time at nPulse Technologies.  Randy, the founder, and I had been friends for years.  He started this packet capture company as a lifestyle company.  Something fun to do, enough to make a living, but that’s all.  For a while, I’d bill him $500/month to write a web app and some APIs to pull packets off this appliance.  We didn’t do the SaaS thing, or the PaaS thing.  We built a linux based server that had our software running.  After a few years of that, he and his co-founder decided they wanted to make nPulse a real thing.  He pulled me in as VP of Development, and it was off to the races.  By the end of the first year, we had approximately 5 full time employees.  2 more years, and we were up to 30 by the time we got acquired by FireEye.    I stuck around FireEye for a bit, but decided it wasn’t for me, and went to a big bank.  It had been approximately 10 years since I worked for a Fortune 500 company.

After two years of ups and downs (yeah, I keep it real in my blogs), I got a call out of the blue to join another company as VP Engineering.  This one was different.  I started day one with another developer at Perch Security.  The founder hadn’t quit his day job yet, but he had landed enough funding for us to get started.  I got to build a network monitoring appliance, that shipped MASSIVE amounts of data to a cloud service running at AWS.  There, data went through a pretty big orchestration to ultimately land up in Elasticsearch for storage and search by customers.   Speaking of customers, we started signing them up early, and often.  I stayed there for 14 months, until I was confident the infrastructure and code was solid, and due to many reasons, came back to the previously mentioned big bank.

Here are a few observations:

Startups are HARD

There’s no such thing as a slow day, if you have customers.  Customers demand quality (duh) and if anything goes wrong, you have to fix it immediately.  There’s no “oh, I’ll fix it when I come in in the morning”.  Small bugs, big bugs and crashed systems.  It was critical we kept everything top notch, especially when we were trying hard to find new customers, and leverage the good will and good word of our early ones.

Scaling is HARD

Five network sensors is easy.  It looks like things will scale, since all the tools are there to scale…. then one day, it stops scaling.  Add nodes, it still doesn’t scale.   Something is wrong.  Rewrite… and fast.  I switched data platforms three times with no downtime or loss of data.  Using intermediary queues like SQS, Kafka, etc is critical for scaling.

Building things is fun!

I shine when I get to build new things.  Give me a whiteboard, and I can fill it up with a pretty darned good solution.  Building an MVP is my dream job.  I get to write just enough code to prove a point, or try out a new approach.   Then it gets harder

You can make a big impact

You can make a big impact at a large company.  You don’t need a tiny company to make a big impact.  Heck, I think I make a bigger impact here.   At my last startup, Perch Security, I had a team of Cyber Security Analysts and a team of engineers.  We were up to around 30-35 customers.   That was awesome!   I could say “I built this!”..    at the bank, I’m supporting our messaging platform as the embedded technical lead, technical platform owner, whatever you want to call me.  A huge enterprise platform with over 150 developers that sends notifications and emails to every customer account holder…   Talk about an impact!

Good and Bad

Startups can be a blast.  They’re not all foosball and free lunches.  It’s collaboration at its finest, because you know everyone involved is onboard 100%, or they will lose their job.  Not by being redeployed in a down economy, but, because if they fail to deliver, the company goes under.

When a startup succeeds, and grows, and gets acquired, then it can be REALLY rewarding for those early folks (I’m still holding out hope on my Perch stock!)

Want to talk about startups? hit me up!.

Threat Hunting with Open Source Software

I’ve begun working on a new project, with a spiffy/catchy/snazzy name:
Threat Hunting: With Open Source Software, Suricata and Bro

I’ve planned out multiple chapters, from raw PCAP analysis, building with session reassembly, into full on network monitoring and hunting with Suricata and Elasticsearch.

This project will take a long time. While I work through it, I’ll be posting here regularly. I very much welcome feedback.

Here’s a little introduction video, but , more will come as I add videos.

The next video will be looking at how data is transmitted over a network… anyone ready for a super brief OSI Network model overview?

Simple Tip: Provision an Elasticsearch Node Automatically!

I built out a new Elasticsearch 5.4 cluster today.

Typically, it’s a tedious task.   I haven’t invested in any sort of infrastructure automation technology, because, well, there aren’t enough hours in the day.  I remembered a trick a few of us came up with at a previous bank I used to work for.  Using a shell script in AWS S3, that gets downloaded in a user init script in EC2, and bam, off to the races!

I won’t give away any tricks here, since my boss would kick me… again, but, since this processed was used heavily by me and team previously, I don’t mind sharing.

We didn’t use it specifically for Elasticsearch, but, you can get the gist of how to use it in other applications.

First step, upload the script to AWS S3.    Here, I’ll use an example bucket of “notmybucket.com” – that’s my bucket, don’t try to own it.  for reals.

Let’s call the script “provision.es.sh”

The provision file can look something like this:

You’ll see reference to an elasticsearch.data.yml.template.. that’s super simple:

Made up a security group, etc… configure the security group to whatever you’re using for your ES cluster.. change the bucket to your bucket.

Each ES host needs a unique name (beats me what will happen to elasticsearch if you have multiple nodes with the same name.. they’re geniuses.. it’s probably fine, but, you can test it, not me).  Alternatively, try to use your instance ID as your node name!

Then your user init data looks super stupid and simple:

Add user data

Once you complete your EC2 creation, you can verify the output in:

/var/log/cloud-init-output.log

 

Elasticsearch Maintenance with Jenkins

 

Maintaining production systems is one of those unfortunate tasks that we need to deal with…  I mean, why can’t they just run themselves?   I get tired of daily tasks extremely quickly.   Now that I have a few ongoing Elasticsearch clusters to deal with, I had to come up with a way to keep them singing.

As a developer, I usually don’t have to deal with these kind of things, but in startup world, I get to do it all from maintenance, monitoring, development, etc.

Jenkins makes this kind of stuff super easy.   With a slew of python programs, that use parameters/environment variables to connect to the right Elasticsearch cluster, I’m able to perform the following tasks, in order (order is key)

  1.  Create Snapshot
  2. Monitor Snapshot until it’s done
  3. Delete Old Data ( This is especially interesting in our use case, we have a lot of intentional False Positive data for connectivity testing)
  4. Force Merge Indices

I have Jenkins set up to trigger the down stream jobs after the prior completes.

I could do a cool Jenkins Pipeline…. in my spare time.

Snapshots:

Daily snapshots are critical in case of cluster failure.   With a four node cluster, I’m running in a fairly safe setup, but if something goes catastrophically bad, I can always restore from a snapshot.   My setup has my snapshots going to AWS S3 buckets.

Delete Old Data:

When dealing with network monitoring, network sensors and storing of NSM data (see Suricata NSM Fields ), we have determined one easy way to test end to end integration is by inserting some obviously fake False Positives into our system.   We have stood up a Threat Intelligence Platform (Soltra Edge) to serve some fake Indicator/Observables.   Google.com, Yahoo.com, etc.   They show up in everyone’s networks if there is user traffic.   Now, this is great to determine connectivity, but long term that comes to be LOTS of traffic that I really don’t need to store…. so, they get deleted.

Force Merge Indices

There is a lot of magic that happens in Elasticsearch.  Thats’s fantastic.  Force Merging allows ES to effectively shrink the number of segments in a shard, thereby increasing performance when querying it.  This is really only useful for indices that are no longer receiving data.  In our use case, that’s historical data.  I delete the old data, then force merge it.

 

A day in the life.. of Jenkins.

 

 

 

 

Finding Relationships in Elasticsearch document data with Neo4j

I’m working on a new project, in all my spare time ( ha ha ha), that I thought was interesting.   We’ve been building relational data bases for a bajillion years (or almost as long as computers were storing data?).  DAta gets normalized, stored, related, queried, joined, etc.

These days, we have Elasticsearch, which is a No SQL database that stores documents.  These documents are semi-structured, and have almost no relationships that can be defined.  This is all well and good when you’re just looking at data in a full text index, but, when looking for insights or patterns, we need some tricks to find those needles in the giant haystacks that we’re tackling.

Finding Unknown Relationships

Elasticsearch is a staple in my tool belt.  I use it for almost any project that will collect data over time, or has a large ingestion rate.  Look at the ELK stack ( http://elastic.co ) and you’ll see some cool things demoed.  Using kibana, logstash and elasticsearch, you can build a very simple monitoring solution, log parser and collector, dashboarding tool, etc.

One of the ideas that I came up with today while working on the Perch Security platform was a way to discover relationships within the data.  Almost taking a denormalized data set, and normalizing it.   Well, that’s not entirely accurate.   What if we can find relationships in seemingly unrelated data?

Pulling a log file from Apache?   Also getting data from a usegroup on cyber attack victims?  Your data structures may be set to relate those data components, and you may not have thought about it..  That’s kind of a contrived example, because it’s pretty obvious you can search ‘bad’ data looking for indicators that are in your network, but my point is, what can we find out there?

Here’s a github repo for what I’m going to be working on.   It’s not really meant for public consumption, but I threw an MIT license on it if your’e interested in helping.

 

https://github.com/chrisfauerbach/GraphMyRelationships

Oh!   I forgot to mention, I’m going to be using a lot of my recently learned Graph database skills (Thanks again Neo4j) to help discovery some relationships.    It will help with the ‘obvious’ relationships, and maybe even when I get into figuring out “significant” terms and whatnot.     Let’s see if I can get some cool hits!

Any data sets?

 

 

 

 

Older posts

© 2017 De-Coder’s Ring

Theme by Anders NorenUp ↑