Where’s my network traffic?
Using Graphite and nPulse CPX API to map out the network
It was late. The lab administrators were gone, and I work 95 miles away from our data center. At work, we’re working on setting up a new and improved QA/Testing rack of equipment, and I was trying to run my automated tests. Unfortunately, I misread a memo, and didn’t know where the data was going.
For our testing purposes, we have a custom replay appliance that exposes its operations via a RESTful API. Our CPX platform does as well, more on that in a second. So, when I passed some commands to the replay box, I didn’t get the data I expected. I tried again. Nothing. Hmmm.. No one was there to help troubleshoot, so, I had to figure it out, remotely.
Tools
The newest tool is my tool chain is:
Graphite (Graphite Web, Carbon, Whisper). http://goo.gl/UnqbN
combined with our CPX platform: http://goo.gl/qqnLx
old faithful, Tornado’s HTTP Client: http://goo.gl/O4kHH
Process
I have access to a bunch of machine in our development and test lab, so that helps. Using ‘my’ general virtual machine (Debian 6 Linux), I set up a graphite-web installtion. More on that later. It’s kind of a bear to get installed on Debian.
I whipped up a quick script that loop through our CPX boxes, to watch their stats. We have a pretty simple RESTFul API to get capture statistics. The plan is to grab the stats, create some entries in the Whisper database and then watch a graph to see where the traffic spikes. (From now on, I’m just going to use Graphite as the entire system. So, I will put data in Graphite. Although really, the data goes to Carbon, which puts it in Whisper, which is then served and visualized by Graphite-Web)
The format of the data is:
name.spaced.attribute value timestamp
in python:
“%s %d %d” % (name, value, time)
The CPX Capture Statistics end point takes this format, this returns a JSON structure:
‘https://%s/api/channel/capture?polling=true’ % cpx[‘url’]
So, to set up my python array of CPXs,
cpxs = [] cpxs.append({'url':'localhost:1443', 'name':'taylor', 'username':'cpx','password':'cpx'}) cpxs.append({'url':'localhost:2443', 'name':'hhext1', 'username':'cpx','password':'cpx'}) cpxs.append({'url':'localhost:3443', 'name':'harrison', 'username':'cpx','password':'cpx'}) cpxs.append({'url':'localhost:4443', 'name':'pierce', 'username':'cpx','password':'cpx'}) cpxs.append({'url':'localhost:5443', 'name':'ike', 'username':'cpx','password':'cpx'})
Then, simply enough, I loop the CPXs, build my URL, make a tornado request, and get the data back.
Then I loop through the stats of interest, build the appropriate Graphite formatted string, append it to my buffer, then send it away.
while True: #Keep doing it. There's enough delay in each HTTP request so nothing gets overwhelmed. stats = ['errors','mbps','octets','sliced','mfps','frames','violations','dfps','dropped'] lines = [] for cpx in cpxs: now = int( time.time() ) # I'm keeping a total per CPX. This is a quick way to aggregate the data to simplify some visualizations. totals = defaultdict(int) url = 'https://%s/api/channel/capture?polling=true' % cpx['url'] print cpx['name'], ":", now requ = httpclient.HTTPRequest(url,auth_username=cpx['username'],auth_password=cpx['password'], validate_cert=False) client = httpclient.HTTPClient() try: response = client.fetch(requ) if response.error: print response.error else: responsebody = response.body try: re = json.loads(responsebody) # Each CPX load balancing traffic across virtual 'feeds' # for improved performance and data localization. for feed in re['feeds']: feednum = feed['feed'] name = cpx['name'] for stat in stats: totals[stat] += int(float(feed.get(stat,0))) #You'll see here, I'm pumping in data per feed. #Also pushing the 'totals' after each CPX lines.append('cpx.capturestats.%s.%s.%d %s %d' % ( name, stat, feednum, feed.get(stat,0), now)) except: print 'Error: ', sys.exc_info() print 'Unable to parse: ', responsebody for stat in stats: lines.append('cpx.capturestats.%s.%s.total %s %d' % ( name, stat, totals.get(stat,0), now)) except: print 'Error getting a URL:', cpx print sys.exc_info() ## My error handling could be a lot better, but hey, this is a small utility. It'll be fine
Graphite Web
What I found to be the simplest way of charting exactly what I wanted to chart, was to use the ‘render’ API that graphite-web provides. Essentially, it’s a URL that outputs a PNG based on parameters. It even takes a wild card, so, in one fell swoop, I can get a PNG showing the total ‘mbps’ per CPX.
/render?width=900&from=-1h&until=now&height=600&yMin=&target=cpx.capturestats.*.mbps.total&yMinLeft=&lineWidth=2&lineMode=connected
Looks like this, for our ‘steady state’ traffic.
Then, after doing some experimentation with our replay end point, I can watch the graphite charts, to see which CPX is getting traffic, based on different parameters. Pretty slick! Now, I know where my traffic is going!
Footnote URLs:
http://graphite.readthedocs.org/en/0.9.10/overview.html#about-the-project
http://www.npulsetech.com/Products/HammerHead-Flow-Packet-Capture
http://www.tornadoweb.org/en/branch2.4/httpclient.html