De-Coder’s Ring

fauie.com: eclectic technology, gentleman farming and careers

Category: aws

AWS Fargate: What ECS Should Have Been in the First Place

Introduce AWS Fargate

AWS has been able to manage and run Docker containers for a long time using ECS, or Elastic Container Services. I found that it’s difficult to operate unless you start from an understanding and with infrastructure as code. In startup mode, that’s not always easy, so I led myself wrong and got stuck in a manual maintenance mode of ECS.

ECS allowed you to store Docker images in a registry, like you would in Dockyard or any other Docker registry. You could then create Tasks, which was the definition of how to run the Docker image.   The task configured things like port forwarding, disk mounting and kept a link to the tagged Docker image.   Next step to actually run the Task was to set up a Service.  The service provisions a certain number of tasks to run. Then you configure which ECS Cluster to run the tasks on. Finally, the conditions to auto scale the service by adding or removing instances of the tasks/image.

Here’s a snippet from the AWS blog:

 

AWS Fargate is an easy way to deploy your containers on AWS. To put it simply, Fargate is like EC2 but instead of giving you a virtual machine you get a container. It’s a technology that allows you to use containers as a fundamental compute primitive without having to manage the underlying instances. All you need to do is build your container image, specify the CPU and memory requirements, define your networking and IAM policies, and launch. With Fargate, you have flexible configuration options to closely match your application needs and you’re billed with per-second granularity.

Fargate solves an important pain point with ECS.   The ECS cluster of EC2 instances.  You, as an administrator/devops engineer on your AWS account, needs to provision a ECS cluster.  That’s a fancy and abbreviated way of saying: “Create an autoscaling group, using a launch configuration that has User Data configured to join the ECS Cluster that you’ve defined.”….     Not difficult by any stretch, but, it’s  always felt like a layer that shouldn’t be there.

I always found auto scaling to be a challenge.  Are you autoscaling your ECS Service?  Are you auto scaling the ECS cluster?   Yeah, you kind of have to do both.

Frankly, since I was in startup mode when dealing with ECS primarily, and believe me I dealt with ECS a ton, I never took the time or bothered to figure out how to ‘get it right’…

I got it working.   In startup land, getting it working is first and foremost… getting it done “right” is secondary (as long as you aren’t getting it done awfully poorly)….

Back to the point of Fargate. This is a major simplification of the ECS/Docker process.  Now, you can configure a group of ECS tasks to run without configuring the EC2 cluster.

The magic happens behind the scenes, managed 100% by AWS.  You are even presented with a “Fargate” cluster when you look at your ECS clusters in the web user interface for ECS.

 

Amazon has taken away the need to be particular about how your tasks are running across your instances.  You don’t have to stress about making sure you’re using your ECS Cluster optimally. AWS takes care of scaling your tasks to meet your jobs’ needs.

This simplification will now make Docker containers a first class citizen within AWS.   This is a huge change and will definitely streamline administration and provisioning of your containers.

 

 

SQS Cost Optimization: Save $1684 per month

I made a bone head move.  Yes, I admit it.

Amazon SQS has always been talked about as ‘free’.  In terms of passing messages for an application, it’s supposed to be freaking cheap as can be.

I was blown away when my July 2017 SQS bill was $1720!!

What?  How’s that FREE?!

Digging into my SQS reports, I made 4.3 billion (with a B) SQS calls.    Billed at $0.0000004 per call, that adds up to $1,720!

Well, my architecture would only be scaling up from there.. I had to do something about it.

I moved to Kafka.

… but that’s not the point of this post.  I realized, later, that I could have been close to free, and optimized a ton of my downstream pipeline.

SQS messages are billed on the 64k of data chunk.    My messages were averaging 1,300 bytes (1.3k).   Doing some quick math, I could have been batching up to 49 ‘messages’ at a time per SQS call.   This would save my producer, and my consumer a ton of API calls to SQS.

If I can batch 49 ‘messages’ per API call, then my 4.3 billion calls, becomes about 87.8 million SQS calls.

87.8 million SQS calls becomes $35.10

Too late for this implementation (Kafka is better in the long run in my opinion), but if the goal were server/infrastructure-less implementation, then shoot. I could have saved $1,684 per month.

Information on SQS pricing can be found here:

https://aws.amazon.com/sqs/pricing/

TL;DR

Batch your data before pushing to SQS, save moneys…. #profit

Spilling the beans…

Today, I’m at ILTA’s LegalSec Summit 2017.    Giving a talk later about Threat Automation:

Me

Talk

I’m excited about sharing some information about Threat Intelligence, automation and application on a network sensor.  That’s all good stuff.

What I’m really happy about, is that I can be totally open with the technology.  My goal is to educate folks on how they can do what my company does on a daily basis.  As an open source advocate, and a giant fan of a lot of the technology that I use every day (duh!), it’s good to show others how to do it.  We don’t provide anything that qualifies as super cool intellectual property….  we have some, but anyone can build the basics to run in their shop.  The challenge comes with the human capital needed to build and run this stuff.   That’s a big part of the challenge.

Two (or Three?) Tips to Improve AWS SQS Throughput

I have software that is processing approximately 200k records per 5 minutes from SQS.   That’s about 40k/ minute, or 666 records per second.  Not bad!

The only problem is, I had to run 25 instances of my python program to get that through put.  Ummm.. that’s not cool.. it should be WAY faster than that.  Shoot, my back end is elasticsearch, which I KNOW can EASILY support dozens of thousands of inserts per second.

My code isn’t terribly so either, so I was definitely bound up on the ingest from SQS.

Even batching, reading 10 messages at a time, didn’t help.. well, it did.. reading one message at a time was abysmal.

I kept reading about ‘infinite’ throughput with SQS, and figured out how to do it. Well, computers aren’t infinite, so that’s not exactly right, but, I was able to linearly increase throughput by having multiple consumer threads per processing thread in my code… since I then output to another SQS queue, I have multiple OUTPUT threads.    Now, the code I wrote is a multi-threaded beast.

  1. Multi thread your input, Use something like python Threading and Queues, or any other threading library with 0mq
  2. Multi thread your output.. same thing.
  3. .. bonus tip, be nice to AWS

Think of your code as a pipeline, because that’s really what it is.  Input -> Processing -> Output

Pipeline that mess

If you have multiple threads reading from SQS, they can each block on IO all day long. That type of thread should be super duper light-weight.  Essentially, read from the SQS Queue and shove the record into your Thread safe work queue (0MQ or a python Queue).    I’m a huge fan of Python since I can code and get something working fast and efficiently.  Sure, I could GoLang it, but Python makes me happy.

Here’s a quick code snippet of an example SQS reading thread:

 

Like I said, super simple.  That “outbound_queue” object is a Python Queue.  This is a thread safe FIFO Queue that can be shared. In my example, I have a single consume thread that reads the objects that my ReceiveMessageThread puts in.   Once they’re read and processed, then, I have the same construct for down-streaming messages to another SQS Queue.

Oh.. here’s the second (bonus?) tip, I apparently can’t count today.

See that “WaitTimeSeconds=5”?    My first attempt didn’t have that Wait.  What would happen then, would be a lot of “Empty Receives”. that ‘messages’ array would be empty.  No big deal.  Code can handle it.. and I’m sure AWS doesn’t mind too much if you spam them, but I figured I’d try not to DDOS them…. check out the graph…. this is the difference between not having a WaitTimeSeconds and having one.

Less Empty – Full?

 

See that HUGE drop?   yeah, I stopped DDOS-ing AWS.  I’m sure that’ll save me some money to, or something like that.   OH.. Dang.. I just read the pricing for SQS. Yep.  That’ll save me cashola.  Each API call (receive_messages) counts as an action.   Don’t tell my boss that I didn’t do the timeout.. ha!

After these updates, I’m able to process the same 200k records per 5 minutes, but now, I only need 3 instances of my code running.    That’ll free up a TON of compute in my Elastic Container Services cluster.

 

© 2018 De-Coder’s Ring

Theme by Anders NorenUp ↑