I’m working on a new project, in all my spare time ( ha ha ha), that I thought was interesting.   We’ve been building relational data bases for a bajillion years (or almost as long as computers were storing data?).  DAta gets normalized, stored, related, queried, joined, etc.

These days, we have Elasticsearch, which is a No SQL database that stores documents.  These documents are semi-structured, and have almost no relationships that can be defined.  This is all well and good when you’re just looking at data in a full text index, but, when looking for insights or patterns, we need some tricks to find those needles in the giant haystacks that we’re tackling.

Finding Unknown Relationships

Elasticsearch is a staple in my tool belt.  I use it for almost any project that will collect data over time, or has a large ingestion rate.  Look at the ELK stack ( http://elastic.co ) and you’ll see some cool things demoed.  Using kibana, logstash and elasticsearch, you can build a very simple monitoring solution, log parser and collector, dashboarding tool, etc.

One of the ideas that I came up with today while working on the Perch Security platform was a way to discover relationships within the data.  Almost taking a denormalized data set, and normalizing it.   Well, that’s not entirely accurate.   What if we can find relationships in seemingly unrelated data?

Pulling a log file from Apache?   Also getting data from a usegroup on cyber attack victims?  Your data structures may be set to relate those data components, and you may not have thought about it..  That’s kind of a contrived example, because it’s pretty obvious you can search ‘bad’ data looking for indicators that are in your network, but my point is, what can we find out there?

Here’s a github repo for what I’m going to be working on.   It’s not really meant for public consumption, but I threw an MIT license on it if your’e interested in helping.



Oh!   I forgot to mention, I’m going to be using a lot of my recently learned Graph database skills (Thanks again Neo4j) to help discovery some relationships.    It will help with the ‘obvious’ relationships, and maybe even when I get into figuring out “significant” terms and whatnot.     Let’s see if I can get some cool hits!

Any data sets?





Short URL: http://bit.ly/2f9I15C