Thursday, March 31, 2011

Comprehensive Log Collection

In my last post, I described the importance of comprehensive logging in an enterprise and how you can use the open-source ELSA to get your logs collected and indexed. In this post, I'll describe the various things you can use to generate logs so that you have something to collect.

The Haystack

The classic dilemma with log collection is that the volume of ordinary logs drowns out important ones. ELSA solves this problem by allowing an analyst to cut through a massive amount of irrelevant logs and immediately find the sought-after logs. This allows the organization to enable extremely verbose logging from all of their devices without sacrificing the ability to find important logs. That in turn allows verbose logs to assist in investigations when they normally would have been sacrificed for efficiency. As a secondary benefit, it reduces the amount time spent managing logs because no one is tasked with the difficult choices of which logs should be filtered and which should be kept.

Historically, network devices and UNIX/Linux devices have been the main source of syslogs. Network logs are critical to detecting attacks and conducting incident response. In addition to providing network connection records for both allowed and denied connections, other important logs are sent by network devices. For instance, denial of service attacks can produce logs from firewalls indicating that they have reached their open connection limit. A Cisco FWSM will generate logs like “Connection limit exceeded” and should be alerted on using ELSA. Other logs may not be errors, but are anomalies. Specifically, logs regarding configuration changes are helpful for detecting unauthorized access or improper change management.

ELSA can help zero-in on these kinds of logs by providing the negative operator in a search query. If most logs from a device contain a certain string like “connection,” then the query can be altered with “-connection” to exclude all of those. These searches happen so quickly that you can work through adding a half-dozen negations in a few seconds to uncover a new anomaly. The interesting string representing the anomaly can then be added as an alert for the future. In the screenshot below, you can see a series of queries, each with a decreasing number of results (the number in parenthesis on the tab) with an increasing amount of negation.

Collecting Network Logs

Let's start with an example for configuring a Cisco router to log all network connection records to syslog. There's a great example of setting up logging from both Cisco Catalyst switches and routers, but in a nutshell, it involves a single line added for your log host (for example 192.168.10.10):

logging 192.168.10.10

Almost all network vendors provide a way to export logs as syslog. If possible, use TCP to prevent log loss. ELSA will handle either TCP or UDP logs.

Collecting Linux Server Logs

Setting up logging on UNIX and Linux is generally simple, but there are differences in the logging clients used. Standard Linux boxes up until a few years ago used the venerable syslogd agent to preform logging. To forward all logs to a syslog server of 192.168.10.10, you would add this to /etc/syslog.conf:

*.* @192.168.10.10

Then just restart syslogd:

/etc/init.d/syslogd restart

Different Linux distributions may change the restart command or the location of the syslogd.conf. A similar syntax is used for the newer rsyslog.

For Syslog-NG, adding a remote logging destination is a bit more involved, but is still not overly complicated. A typical syslog-ng.conf file (usually located in /etc/syslog-ng) will have a source entry like this:

source src {

internal();

unix-dgram(“/dev/log”);

}

This is the entry that allows it to listen on the /dev/log socket for incoming local logs. To forward all of these logs to a remote server, we want to add a log destination to our remote server 192.168.10.10.

Destination d_remote { udp(“192.168.10.10”); };

Then, we add a statement that forwards all logs:

log { source(src); destination(d_remote); };

Restart syslog-ng:

/etc/init.d/syslog restart

Now the server should be forwarding all of its local logs to the remote syslog server.

Collecting Windows Server Logs

Windows Server 2008 introduced a new feature in which servers can subscribe to logs from another server. Unfortunately, Microsoft implemented this in a proprietary way which means that it is not syslog compatible. Luckily, there is a good solution to this: Eventlog-to-syslog. Evtsys works on all Windows versions and is available for 32- and 64-bit. Installation could not be simpler: download the executable from the site and run it from a command-line:

evtsys.exe -i -h my.syslog.server.address

Done! There are also a number of enterprise options for configuring backup syslog servers, as well as fine-tuning which events are sent through the registry. See the evtsys documentation for more details.

The great thing about evtsys is that in addition to its very small footprint and ease of install, is that it will, by default, log all eventlog categories, including application-specific categories like SQL server. ELSA has a built-in parser for events forwarded by evtsys and will parse them so that producing reports on event ID and other characteristics are possible.

For ultra-verbose logging, you can enable Windows process accounting which will create a log for every process created. This creates a veritable torrent of logs, but with ELSA's horsepower, it will take them in stride, making them available in case of a breech. It's nearly impossible for an attacker to infiltrate a server and do damage without starting any new processes. Logging active directory account creations alone makes this a worthwhile endeavor.

Evtsys works on Windows desktops just as well as servers. Malware hunting is much easier when you have a log of all the processes created on the machine by the installation of a rootkit.

Collecting Miscellaneous Logs

Applications on servers often generate very helpful, verbose logs which provide a critical view into the business logic of the app. The only way to catch particularly sophisticated attacks are through the monitoring of the business logic because no observable exploits or attacks will be used. Unfortunately, most apps log to flat files instead of the system's built-in logging facility, and forwarding flat files is often more challenging than it should be. However, there are a few tricks for sending flat-file logs from a server and streaming them as syslog which I will detail below.

On Windows, you will need to install yet another third-party program to perform the logging. It's called Epilog and it's available from Intersect Alliance. This small program will run as a Windows service and stream all files that match a pattern in configured directories as syslog.

Linux makes this much easier if you have a recent version of Syslog-NG. Check out this excellent post from Syslog-NG's makers, Balabit, on how to setup Syslog-NG 3.2 for forwarding flat-files. Of particular interest is the ability to use wildcards to specify intermediate directories like this:

/var/log/apache2/*/*.log

which would allow a web server with a lot of virtual hosts on it to easily forward all of their logs.

Creating Your Own Log Generators

Sometimes you will find that there isn't a good existing source for the information that you want to get into ELSA. I wanted an easy, efficient way for recording URL's on a network to ELSA to correlate with IDS alerts. Unfortunately, we didn't use a web proxy, so there was no easy way of logging this. So, I created httpry_logger to address this issue. It will forward all requests with the response code and size as a log like this:

10.124.19.12|66.35.45.157|GET|isc.sans.org|/diary.html?storyid=10501&rss|-|Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110221 Ubuntu/10.10 (maverick) Firefox/3.6.14|,org,sans.org,isc.sans.org|301|260|8583

ELSA parses this output and creates fields like this:

host=10.68.2.28 program=url class=URL srcip=10.124.19.12 dstip=66.35.45.157 status_code=301 content_length=260 country_code=US method=GET site=isc.sans.org uri=/diary.html?storyid=10501&rss referer=- user_agent=Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110221 Ubuntu/10.10 (maverick) Firefox/3.6.14 domains=,org,sans.org,isc.sans.org

Notice how the domains field includes the comma-separated list of possible subdomains for easy searching. So, an ELSA search for “sans.org” will return all results for web requests to sites under the sans.org domain.

Log What You Can

Even if you're unable to get every log source you want to stream logs, don't let that stop you from getting the quick wins under your belt by enabling logging on what you can. Remember, the benefits are linear, so the more you're logging the more benefit you're getting. Ignore perfection and concentrate on progress!


Monday, March 28, 2011

Fighting APT with Open-source Software, Part 1: Logging

Just because Advanced Persistent Threats (FUD) is a marketing buzzword doesn't mean that it isn't a problem. The Cisco Security Blog had a fantastic post detailing what APT is, what it is not, and what it takes to defend against it. From the post: “The state of the art in response to APT does not involve new magic software hardware solution divining for APT, but relies more on asking the right questions and being able to effectively use the existing detection tools (logging, netflow, IDS and DPI).

The article then goes on to detail exactly what you need to combat APT. As they stated, it is in fact NOT a product. It is a collection of information and tools which provides a capability utilized by a team in a perpetual investigatory process. They are dead-on as they describe what you need. Here is my paraphrased reproduction:

  1. A comprehensive collection of logs and the ability to search and alert on them.

  2. Network intrusion detection.

  3. A comprehensive collection of network connection records.

  4. Information sharing and collaboration with other orgs.

  5. The ability to understand the malware you collect.

I'm going to add another requirement of my own:

  1. The ability to quickly view prior network traffic to gain context for a network event and collect network forensic data.

These items shouldn't be a huge shock to anyone, and are probably already on a to-do list somewhere in your organization. It's like asking a doctor what you should do to be healthy: she'll say to exercise and eat right. She will certainly not prescribe diet pills. But much like some people find a workout schedule that works for them, I'm going to detail the implementations and techniques that work for us and will probably work for you.

There is a lot of ground to cover here, so I am going to address solutions to these tasks in a series of posts which detail what is needed to fulfill the above requirements and how it can be done with 100% open-source software.

In this introductory post, I'll tackle the biggest, most important, and perhaps most familiar topic: logs.

Enterprise Log Management (Omniscience Through Logging)

Producing and collecting logs is a crucial part of fighting APT because it allows individual assets and applications to record events that by themselves may be insignificant, but may be an indicator of a malicious activity. There is no way to know ahead of time what logs are important, so all logs must be generated and collected. APT will not generate “error” logs unless you are very lucky—it's the “info” (and sometimes “debug”) logs that have the good stuff.

The first major hurdle for most organizations is collecting all of the relevant information and putting it in one place (at least from a query standpoint). There are a lot of reasons why this task is so difficult. The first of which is that historically, log collection is just not sexy. It's hard for people to get excited about it, and it takes a herculean effort to do it effectively. Unless you have a passion for it, it's not going to get done. Sure, it's easy enough to to get a few logs collected, but for effective defense, you're going to need comprehensive logging. This is generally accomplished by enabling logging on every asset and sending it all to an SIEM or log management solution. This is a daunting task, and this is one of the biggest reasons why fighting APT is so hard. Omniscience does not come easily or cheaply.

If you have the money, there are a lot of commercial SIEM and log management solutions that can do the job out there. Balabit makes a log collection product with a solid framework, ArcSight has an excellent reputation with its SIEM, and I can personally vouch for Splunk as being a terrific log search and reporting product. However, large orgs will have massive amounts of logs, and large-scale commercial deployments are going to be extremely expensive (easily six figures or more). There are a number of free, open-source solutions out there which will provide a means for log collection, searching, and alerting, but they are not designed to scale to collecting all events from a large organization, while still making that data full-text searchable with millisecond response times. That kind of functionality costs a lot of money.

Building Big

Almost two years ago, I set out to create a log collection solution that would allow Google-fast searching on an massively large set of logs. The problem was two-fold: find a syslog server that could receive, normalize, and index logs at a very high rate, and find a database that could query those logs at Google speeds, all with massive scalability. I have to say that when I first started, I believed that this task was impossible, but I was glad to prove myself wrong.

The first breakthrough was finding Syslog-NG version 3, which includes support for the pattern-db parser. It allows Syslog-NG to be given an XML file specifying log patterns to normalize into fields which can be inserted into a database. It does this with a high-speed pattern matching algorithm (Aho-Corasick) instead of a traditional regular expression. This allows it to parse logs at over 100k logs/second on commodity hardware. Combined with MySQL's ability to bulk load data at very high rates (over 100k rows/second), I had an extremely efficient mechanism for getting the logs from the network, parsed, and stored in a database.

The second task, finding an efficient indexing system, was much more challenging. After trying about a half-dozen different indexing techniques and technologies, including native MySQL full-text, MongoDB, TokuDB, HBase, Lucene, and CouchDB, I found that none of them were even close to being fast enough to keep up with a sustained log stream of more than a few thousand logs per second when indexing each word in the message. I was especially surprised when HBase proved too slow, as it's the open-source version of what Google uses.

Then I found Sphinxsearch.com, which specializes in open-source, full-text search for MySQL. Sphinx was able to index log tables at rates of 50k logs/second, and it provided a huge added feature: distributed group-by functionality. So, armed with Syslog-NG, MySQL, and Sphinx, I was able to put together a formal Perl framework to manage bulk loading log files written by Syslog-NG into MySQL and indexing the new rows.

That all proved to be the easy part. Writing a web frontend and middleware server around the whole distributed system proved to be the tougher challenge. Many thousands of lines of Perl and Javascript later, I had a web app that used the industry standard Yahoo User Interface (YUI) to manage searching, reporting, and alerting on the vast store of logs available.

Introducing Enterprise Log Search and Archive (ELSA)


ELSA collects and indexes syslog data as described above, archives and deletes old logs when they reach configured ages, and sends out alerts when preset searches match new logs. It is 100% web-based for both using and administering.

Main features:

  • Full-text search on any word in a message or parsed field.

  • Group by any field and produce reports based on results.

  • Schedule searches.

  • Alert on search hits on new logs.

  • Save searches, email saved search results.

  • Create incident tickets based on search results (with plugin).

  • Complete plugin system for results.

  • Export results as permalink or in Excel, PDF, CSV, and HTML.

  • Full LDAP integration for permissions.

  • Statistics for queries by user and log size and count.

  • Fully distributed architecture, can handle n nodes with all queries executing in parallel.

  • Compressed archive with better than 10:1 ratio.



One of the biggest requirement differences between a large-scale, enterprise logging solution versus your average log collector is assigning permissions to the logs so that users receive only the logs they are authorized for. ELSA accomplishes this by assigning logs a class when they are parsed and allowing administrators to assign permissions based on a combination of log class, sending host, and generating program. The permissions can be either local database users for small implementations, or LDAP group names if an LDAP directory is configured.

Permissions are a crucial and powerful component to any comprehensive logging solution. That gives security departments the power to let web developers have access to the logs specific to their web site to look for problems without allowing them access to sensitive logs. The site authors may be the most qualified to notice suspicious activity because they will have the most knowledge of what is normal. The same goes for administrators and developers in other areas of the enterprise.

However, the biggest win for the security department is that log queries finish quickly. Ad-hoc searches on billions of logs finish in about 500-2000 milliseconds. This is critical, because it allows security analysts to explore hunches and build context for the incident they are analyzing without having to decide if the query is worthwhile before running it. That is, they are free to guess, hypothesize, and explore without being penalized by having to wait around for results. This means that the data from a seed incident may quickly blossom into supporting data for several other, tangentially related incidents because of a common piece of data. It means that the full extent and context of an incident becomes apparent quickly, allowing the analyst to decide if the incident warrants further action or can be set aside as a false-positive.

Getting ELSA

ELSA is available under GPLv2 licensing at http://code.google.com/p/enterprise-log-search-and-archive/ . Please see the INSTALL doc for specifics, but the basic components, as mentioned above, are Linux (untested on *BSD), Syslog-NG 3.1, MySQL 5.1, Sphinx search, Apache, and Perl. It is a complex system and will require a fair amount of initial configuration, but once it is up and running, it will not need much maintenance or tuning. If you run into issues, let me know and I will try to help you get up and running.