Monday, April 11, 2011

Network Intrusion Detection Systems


Incident response starts with a seed event which triggers an investigation. In my organization, this usually starts with a network intrusion detection system (NIDS) generating an alert based on predefined signatures. There are two major open-source NIDS: Snort and Suricata. There are lots of FAQ's and descriptions on the respective project web sites, so I will not cover the basics here. Instead, I will discuss some of the differences between them as well as performance guidelines.

Architecture Overview
The general flow of information is the same for both Snort and Suricata, though their internals differ substantially. At a high level, the network inspection looks like this:
Network packets → Packet header parsing → Payload normalization → Payload inspection → Alert generation
Generally, the processing effort for each phase is distributed as 10% for parsing, 10-20% for normalization, and 70-80% for payload inspection. Alert generation is usually negligible. If the sum total of effort it takes to complete these phases for a given packet exceeds the resources available, the packet will be “dropped,” meaning it goes unnoticed. Therefore, performance tuning is critical to reliably detecting intrusions.

Capacity Planning
Clearly, the vast majority of performance is dictated by the effort it takes to complete the payload inspection. This means that the reliability of a sensor is ultimately decided by whether or not it has enough resources to inspect the payload of the traffic it is assigned. (Some IDS events are generated solely based on IP addresses, but they are an exception.) When sizing hardware to create an IDS, here is my rule of thumb:
1 CPU = (1000 signatures ) * (500 megabits network traffic)
That is, you need one CPU for every thousand signatures inspecting 500 Megabits of network traffic. So if your rule set has 4000 signatures and your Internet gateway has 300 Megabits of network traffic, you will need at least ((4000/1000) = 4) * ((300/500) = .6) = 2.4 CPU's, meaning you'll need to spread the traffic across three CPU's. I should take a moment to point out that this formula applies to standard traffic for most organizations in which web makes up 80-90% of the traffic, followed by email. In a server hosting environment, you will need to find your own benchmarks.

Sizing Preprocessor Overhead
After you've acquired a box to use but before you start with finding out how many signatures you can run with your resources, you need to get a baseline of the performance when only running the payload normalization, usually referred to as preprocessors. Doing so is very simple: run the sensor with no rules loaded at peak traffic periods (around noon in an office environment). If you run the NIDS without daemonizing using the time command, you can get a pretty accurate reading on how much CPU time it takes. Here's an example:
time snort -c /etc/snort/snort-norules.conf
Let it run for around five minutes and kill it with Ctrl+C. The time command will then print out stats on the run, that look something like this:
real 4m33.143s
user 0m36.218s
sys 0m14.937s
“User” and “sys” refer to how much time was spent performing the inspection and how much time was spent moving packets from the network card into RAM and then into the NIDS, respectively. Add up “user” and “sys” and divide by “real” to get the percentage of CPU required. In this run, it would be (36.2 + 14.9)/273.1 = .18, or 18%. That is the percentage of CPU it takes to normalize the payloads. Keep in mind that during off-peak hours, most packets will not have large payloads, and so the percentage can be higher than during peak usage periods.

Detecting Packet Drops
Any libpcap-based IDS like Suricata and Snort will give you packet drop statistics. However, these numbers are unreliable, especially in high-drop situations. One way you can quickly find out if you are dropping packets is to run the above test but with your rules loaded. If your total CPU percentage is over 75%, there is a very good chance that you are occasionally dropping packets. If it's 95% or more, you are frequently dropping packets.

However, despite the above math, detecting packet loss when it really counts is still something of an art, as it's very difficult to account for a given packet. The best way to really determine whether or not your sensor is generally catching what it is supposed to catch is to setup a “heartbeat” signature. This consists of two parts: a script that will make a web request to a test site at a regular interval, and a signature designed to alert on that request. Here's an example Perl command:
perl -MLWP::UserAgent -e 'LWP::UserAgent->new()->get("http://example.com/testheartbeat123");'
That will make a web request to example.com. You should replace example.com with a site you have permission to make requests against.

The second part is to write a signature that will detect the heartbeat. Here's a corresponding Snort/Suricata signature that will detect the heartbeat request:
alert tcp any any → any 80 (msg:”Heartbeat” content:”/testheartbeat123”; http_uri; classification:not-suspicious; sid:1;)
Put an entry in your sensor's cron tab (this is assuming you're using Linux) to make the request every minute:
* * * * * perl -MLWP::UserAgent -e 'LWP::UserAgent->new()->get("http://example.com/testheartbeat123");' > /dev/null 2>&1

Now you should get an alert every minute, on the minute for your heartbeat signature. If there is ever a missing entry in your alert log, then you know the sensor had a lapse in coverage at that moment. If you log your alerts to a database, you can then create graphs using a spreadsheet to plot the times when your sensor was overloaded. You may also consider setting up a monitoring script that feeds a program like Nagios to detect if an entry is missing. The really nice thing about this setup is that it is a true check for the entire chain: traffic sourcing, detection, and alert output. If any part of the chain breaks, you'll be able to tell.

The caveat, of course, is that it is very possible to get a heartbeat alert in the same minute that the sensor is overloaded, so this is better for establishing trends, e.g. “the sensor got all 3600 heartbeats last hour” versus “there's no way we could've missed a packet that minute—we got a heartbeat.” The absolute, definitive test is to record all traffic to pcap for a given amount of time and replay it as a readfile through the NIDS. If the alerts from the readfile match the alerts you got live, then you're all set.

Multi-CPU Setups
Snort, unlike Suricata, is single-threaded. This means that anytime a single CPU cannot handle the load, packets will be lost. Suricata, by contrast, will attempt to use all of the CPU's on the sensor and will load-balance the traffic across all of the CPU's, so there is little tuning needed in this regard. In order to inspect more traffic than a single CPU can handle with Snort, you will need to run multiple Snort instances. In addition to the extra management overhead, this means that you will have to find a way to split the traffic evenly across those instances. The easiest way to do this is to use Luca Deri's PF_RING module to create a pfring DAQ for Snort. Details can be seen on Luca's blog.

It should be noted that when running Snort with multiple instances, each instance will have to normalize the traffic. Suricata improves on this by normalizing the traffic once (depending on configuration), then pushing the normalized traffic to worker threads which perform the payload inspection. Therefore, you will incur a 10-20% CPU utilization penalty per Snort instance.

Advanced Performance Enhancement
I stated above that about 10% of the CPU is devoted to the initial packet header parsing. This can be reduced and eliminated through several means. Using PF_RING, as mentioned above, the overhead can be drastically reduced to more like 1-2%. This can be very important if the link has gigabit or higher data but, you only want to inspect some of it. Filtering high-speed networks can be CPU intensive unless you use something like PF_RING to offload the packet filtering.

Another alternative is to purchase a pcap acceleration card, like the DAG cards manufactured by Endace Corporation. These cards range in price from a few to tens of thousands of dollars, depending on 1/10 gigabit configuration. They offload the entire burden of packet header parsing and filtering from the CPU onto the built-in packet processor.

Lastly, you can purchase a pcap load balancer, such as those made by Datasys and Gigamon. The feature sets range from basic replication to appliance-based pcap filtering. This option is the most effective but also the most costly.

Conclusion
If at all possible, I recommend diversifying your security portfolio by running both Snort and Suricata. Rapid development continues on Suricata (as well as improvements to Snort), so I hesitate to make any claims right now regarding which performs better overall. It is clear that Suricata has some major advantages when it comes to IP-only rules as well as detecting protocols on non-standard ports, but Snort has a solid track record and a mature codebase.