I’m working on a new project, in all my spare time ( ha ha ha), that I thought was interesting. We’ve been building relational data bases for a bajillion years (or almost as long as computers were storing data?). DAta gets normalized, stored, related, queried, joined, etc.
These days, we have Elasticsearch, which is a No SQL database that stores documents. These documents are semi-structured, and have almost no relationships that can be defined. This is all well and good when you’re just looking at data in a full text index, but, when looking for insights or patterns, we need some tricks to find those needles in the giant haystacks that we’re tackling.
Elasticsearch is a staple in my tool belt. I use it for almost any project that will collect data over time, or has a large ingestion rate. Look at the ELK stack ( http://elastic.co ) and you’ll see some cool things demoed. Using kibana, logstash and elasticsearch, you can build a very simple monitoring solution, log parser and collector, dashboarding tool, etc.
One of the ideas that I came up with today while working on the Perch Security platform was a way to discover relationships within the data. Almost taking a denormalized data set, and normalizing it. Well, that’s not entirely accurate. What if we can find relationships in seemingly unrelated data?
Pulling a log file from Apache? Also getting data from a usegroup on cyber attack victims? Your data structures may be set to relate those data components, and you may not have thought about it.. That’s kind of a contrived example, because it’s pretty obvious you can search ‘bad’ data looking for indicators that are in your network, but my point is, what can we find out there?
Here’s a github repo for what I’m going to be working on. It’s not really meant for public consumption, but I threw an MIT license on it if your’e interested in helping.
Oh! I forgot to mention, I’m going to be using a lot of my recently learned Graph database skills (Thanks again Neo4j) to help discovery some relationships. It will help with the ‘obvious’ relationships, and maybe even when I get into figuring out “significant” terms and whatnot. Let’s see if I can get some cool hits!
Any data sets?