I spent years building a packet capture and network forensics tool. Slicing and dicing packets makes sense to me. Headers, payloads, etc.. easy peasy (no, it’s not really easy, but like I said, years). Understanding complex data structures comes with the territory, and so far, I haven’t met a challenge that took me too long to understand.
Then I met Taxii. Then Stix. I forgot how painful XML was.
FYI: All the visualizations and screen shots are grabbed from Neo4J. The top rated and most used Graph database in the world. My work has some specific requirements that I think are best suited with nodes, edges and finding relationships between data, so I thought I’d give it a shot. Nice to see a built in browser that does some pretty fantastic drawing and layouts without any work on my part. (Docker image to boot!)
TAXII is a set of instructions or standards on how to transport intelligence data. The standard (now an OASIS standard), defines the interactions with a web server (HTTP(s)) requests to query and receive intelligence. For most use cases, there are three main phases of interactions with a server:
- Discovery – Figure out the ‘other’ end points, this is where you start
- Collection Information – Determine how the intelligence is stored. Think of collections as a repository, or grouping of intelligence data within the server.
- Poll (pull) – (or push, but I’m focusing on pull). Receive intelligence data for further processing. Poll requests will result in different STIX packages (more to come)
I’m not going to go into details on the interactions here, but the python library for TAXII does a good enough job to get you started. It’s not perfectly clear, but it helps.
STIX defines some data structures around intelligence data. Everything is organized in a ‘package’. The package contains different pieces of information about the package and about the intelligence. In this article, I’ll focus on ‘observables’ and ‘indicators’. The items I won’t talk much about are:
- TTPs: Tactics, Techniques and Procedures. What mechanisms are the ‘bad guys’ using. Software packages, exploit kits, etc.
- Exploit Target: What’s being attacked
- Threat Actor: If known, who/what’s attacking?
- TLPs, Kill chains, etc
Observables are the facts. They are pieces of data that you may see on your network, on a host, in an email, etc. These can be URLs, email addresses, files (and their corresponding hashes), IP addresses, etc. A fact is a fact. There’s no context around it, it’s just a fact.
Indicators are the ‘why’ around the facts. These tell you what’s wrong with an IP address, or give the context and story about an email that was seen.
In the above pictures, you’ll see a malicious URL (hulk**, seriously, don’t follow it). The observable component is the URL. The indicator component tells us that it’s malicious. The description above tells us that the intelligence center at phishtank.com identified the URL as part of a phishing scheme.
Source of data
All security analysts are well aware of some open source intelligence data. Emerging Threat, PhishTank, etc. This data is updated regularly, and provided in their own format. Since we’re talking about using TAXII to transport this data, we need an open source/free Taxii source. Step in http://hailataxii.com
When you make a query against Hailataxii’s discovery end point, you learn the collections and poll URLs. Additionally, the inbox URL, but we’re not using that today. (Coincidentally, HAT’s URLs are all the same)
Once you query the collection information end point, you see approximately 11 (At the time of writing) collections. I will list those below. From there, we can make Poll requests to each collection, and start receiving (hundreds? Thousands?) of STIX packages.
Since I’m a network monitoring junky, I want to see the observables I can monitor. Specifically IPs and URLs. Parsing through the data, I find some interesting tidbits. Some packages have observables at the top level, and some have observables as children of the indicators. No big deal, we’ll keep it all and start storing/displaying.
Once it’s all parsed using some custom python (what a mess!), I’m able to start loading my Nodes and edges. Straight forward, I build nodes for the Community (Hailataxii), the Collection, the Package, Indicators and Observables. The observables can be related to the Indicator and/or the Package.
Yellow circle is the community, green circle is the collection, small blue circle is the package (told you it could be hundreds), purple is the indicator and reddish is the observable.
That’s about it! Don’t forget to check out my last post on Suricata NSM fields to see how some of these observables can be found on a network.
Please leave feedback if you have any questions!
Collections from Hail A Taxii: