Saturday, March 31, 2012

Why ELSA Doesn't Use Hadoop for Big Data

Of late I've been reading and hearing a lot about Apache Hadoop whenever the topic is Big Data.  Hadoop solves the Big Data problem:  How do I store and analyze data that is of an arbitrary size?

Apache's answer:

Hadoop is incredibly complicated, and though it used to be a pain to setup and manage, things have improved quite a bit.  However, it is still computationally inefficient when compared to non-clustered operations.  A famous example lately is the RIPE 100GB pcap analysis on Hadoop.  The article brags about being able to analyzie 100GB pcap on 100 Amazon EC2 instances in 180 seconds.  This performance is atrocious.  A simple bash script which breaks a pcap into smaller parts by time using tcpslice in parallel and pipes that to tcpdump will analyze an 80GB pcap in 120 seconds on a single server with four processors.  By those numbers, you pay a 100x penalty for using Hadoop.

But you don't use Hadoop because you want something done quickly, you use it because you don't have a way to analyze data at the scale you require.  Cisco has a good overview of why orgs choose Hadoop.  The main points from the article, Hadoop:
  •  Moves the compute to the data
  •  Is designed to scale massively and predictably
  •  Is designed to support partial failure and recoverability
  •  Provides a powerful and flexible data analytics framework

Let's examine these reasons one-by-one:

Move the Compute to the Data
If your solution uses servers that have the data they need to do the job on local disk, you've just moved the compute to the data.  From the article:

"this is a distributed filesystem that leverages local processing for application execution and minimizes data movement"

Minimizing the data movement is key.  If your app is grepping data on a remote file server in such a way that it's bringing the data back to look at, then you're going to have a bottleneck somewhere.

However, there are a lot of ways to do this.  The principle is a foundation of ELSA, but ELSA doesn't use Hadoop.  Logs are delivered to ELSA and, once stored, never migrate.  Queries are run against the individual nodes, and the results of the query are delivered to the client.  The data never moves and so the network is never a bottleneck.  In fact, most systems do things this way.  It's what normal databases do.

Scale Massively and Predictably
"Hadoop was built with the assumption that many relatively small and inexpensive computers with local storage could be clustered together to provide a single system with massive aggregated throughput to handle the big data growth problem."

Amen.  Again, note the importance of local disk (not SAN/NAS disk) and how not moving the data allows each node to be self-sufficient.

In ELSA, each node is ignorant of the other nodes.  This guarantees that when you add a node, it will provide exactly the same amount of benefit as the other nodes you added provide.  That is, its returns will not be diminished by increased cluster synchronization overhead or inter-node communications.

This is quite different than traditional RDBMS clustering which require a lot of complexity.  Hadoop and ELSA both solve this, but they do it in different ways.  Hadoop tries to make the synchronization as lightweight as possible, but it still requires a fair amount of overhead to make sure that all data is replicated where it should be.  Conversely, ELSA provides a framework for distributing data and queries across nodes in such a way that no synchronization is done whatsoever.  In fact, one ELSA log node may participate in any number of ELSA frontends.  It acts as a simple data repository and search engine, leaving all metadata, query overhead, etc. up to the frontend.  This is what makes it scalable, but also what makes it so simple.

Support Partial Failure and Recoverability 
"Data is (by default) replicated three times in the cluster"

This is where the Hadoop inefficiencies start to show.  Now, this is obviously a design decision to use 3x the amount of disk you actually need in favor of resiliency, but I'd argue that most of the data you're using is "best-effort."  If it's not, that's fine, but know up-front that you're going to be paying a massive premium for the redundancy.  The premium is two-fold: the raw disk needed plus the overhead of having to replicate and synchronize all of the data.

In our ELSA setup, we replicate our logs from production down to development through extremely basic syslog forwarding.  That is a 2x redundancy that gives us the utility of a dev environment and the resiliency of having a completely separate environment ready if production fails.  I will grant, however, that we don't have any fault tolerance on the query side, so if a node dies during a query, the query will indeed fail or have partial results. We do, however, have a load balancer in front of our log receivers which detects if a node goes down and reroutes logs accordingly, giving us full resiliency for log reception.  I think most orgs are willing to sacrifice guaranteed query success for the massive cost savings, as long as they can guarantee that logs aren't being lost.

Powerful and Flexible Data Analytics Framework
Hadoop provides a great general purpose framework in Java, and there are plenty of extensions to other languages.  This is a huge win and probably the overall reason for Hadoop's existence.  However, I want to stress a key point:  It's a general framework and not optimized for whatever task you are giving it.  Unless you're performing very basic arithmetic, operations you are doing will be slower than a native program.  It also means that your map and reduce functions will be generic.  For instance, in log parsing on Hadoop, you've distributed the work to many nodes, but each node is only doing basic regular expressions, and you will have to custom code all of the field and attribute parsers yourself.  ELSA uses advanced pattern matchers (Syslog-NG's PatternDB) to be incredibly efficient at parsing logs without using regular expressions.  This allows one ELSA node to do the work of dozens of standard syslog receivers.

One could certainly write an Aho-Corasick-based pattern matcher that could be run in Hadoop, but that is not a trivial task, and provides no more benefit than the already-distributed workload of ELSA.  So, if what you're doing is very generic, Hadoop may be a good fit.  Very often, however, the capabilities you gain from distributing the workload will be eclipsed by the natural performance of custom-built, existing apps.


ELSA Will Always Be Faster Than Hadoop
ELSA is not a generic data framework like Hadoop, so it benefits from not having the overhead of:
  •  Versioning
  •  3x replication
  •  Synchronization
  •  Java virtual machine
  •  Hadoop file system
Here's what it does have:

Unparalleled Indexing Speed
ELSA uses Sphinx, and Sphinx has the absolute fastest full-text indexing engine on the planet.  Desktop-grade hardware can see as many as 100k records/second indexed from standard MySQL databases with data rates above 30 MB/sec of data indexed.  It does this while still storing attributes to go along with each record.  It is this unparalleled indexing speed which is the single largest factor for why ELSA is the fastest log collection and searching solution.

Efficient Deletes
Any logging solution is dependent on the efficiency of deletes once the dataset has grown to the final retention size.  (This is often overlooked during testing because a full dataset is not yet present.)  Old logs must be dropped to make room for the new ones.   HBase (the noSQL database for Hadoop) does not delete data! Rather, data is marked for later deletion which happens during compaction.  Now, this may be ok for small or sporadically large workloads, but ELSA is designed for speed and write-heavy workloads.  HBase must suffer the overhead of deleted data (slower queries, more disk utilization) until it gets around to doing its costly compaction.  ELSA has extremely efficient deletes by simply marking an entire index (which encompasses a time range) as expired and issuing a re-index, which effectively truncates the file.  Not having to check each record in a giant index to see if it should be deleted is critical for quickly dumping old data.

Unparalleled Index Consolidation Speed
It is the speed of compaction (termed "consolidation" in ELSA or "index merge" in Sphinx) which is the actual overriding bottleneck for the entire system during sustained input.  Almost any database or noSQL solution can scale to tens of thousands of writes per second per server for bursts, but as those records are flushed to disk periodically, it becomes this flushing and subsequent consolidation of disk buffers that dictates the overall sustainable writes per second.  ELSA consolidates its indexes at rates of around 30k records/second, which establishes its sustained receiving limit.

Purpose-built Specialized Features
Sphinx provides critical features for full-text indexing such as stopwords (to boost performance when certain words are very common), advanced search techniques including phrase proximity matching (as in when quoting a search phrase), character set translation features, and many, many more.


When to Use Hadoop
This is a description of why Hadoop isn't always the right solution to Big Data problems, but that certainly doesn't mean that it's not a valuable project or that it isn't the best solution for a lot challenges.  It's important to use the right tool for the job, and thinking critically about what features each tool provides is paramount to a project's success.  In general, you should use Hadoop when:
  • Data access patterns will be very basic but analytics will be very complicated.
  • Your data needs absolutely guaranteed availability for both reading and writing.
  • There are inadequate traditional database-oriented tools which currently exist for your problem. 
Do not use Hadoop if:
  • You're don't know exactly why you're using it.
  • You want to maximize hardware efficiency.
  • Your data fits on a single "beefy" server.
  • You don't have full-time staff to dedicate to it.
The easiest alternative to using Hadoop for Big Data is to use multiple traditional databases and architect your read and write patterns such that the data in one database does not rely on the data in another.  Once that is established, it is much easier than you'd think to write basic aggregation routines in languages you're already invested in and familiar with.  This means you need to think very critically about your app architecture before you throw more hardware at it.

Friday, March 23, 2012

Deobfuscating XOR Executables

Several exploit kits are going to great lengths to obfuscate distributed binaries with a simple XOR key to evade network-based anti-virus and IDS.  This is a highly effective technique because it significantly burdens researchers and evades most signature-based detection.  At the very least, it creates trivial but annoying hurdles for analysis because an extra deobfuscation step must be made before putting a sample through the normal testing cycles in sandboxes, VirusTotal.com, etc.

As of tonight, StreamDB will now attempt to deobfuscate any streams it finds that have HTTP objects which contain unknown types of data using a couple of different algorithms.  It will display the still-obfuscated executable, but change the description to add "XOR obfuscated."  However, when extracting the file by object ID (streamdb/?oid=<oid>), as done when grabbing a file to submit to a sandbox or VirusTotal, it will return the deobfuscated version.

So, a file that normally would look like this:

0000000: 2f9f 9425 62c5 c425 2d9f 9425 2b9f 9b25  /..%b..%-..%+..%
0000010: d060 9425 979f 9425 2f9f 9425 6f9f 8e25  .`.%...%/..%o..%
0000020: 2f9f 9425 2f9f 9425 2f9f 9425 2f9f 9425  /..%/..%/..%/..%
0000030: 2f9f 9425 2f9f 9425 2f9f 9425 2f9f 9425  /..%/..%/..%/..%
0000040: 2f9e 9425 958f 942b 302b 9de8 0e27 9569  /..%...+0+...'.i
0000050: e2be 04b5 7bf7 fd56 0fef e64a 48ed f548  ....{..V...JH..H
0000060: 0ff2 e156 5bbf f640 0fed e14b 0fea fa41  ...V[..@...K...A
0000070: 4aed b472 46f1 a717 2295 b012 2f9f 9425  J..rF...".../..%
0000080: 2f9f 9425 2f9f 9425 2f9f 9425 2f9f 9425  /..%/..%/..%/..%
0000090: 2f9f 9425 2f9f 9425 2f9f 9425 2f9f 9425  /..%/..%/..%/..%
00000a0: 2f9f 9425 2f9f 9425 2f9f 9425 2f9f 9425  /..%/..%/..%/..%
00000b0: 2f9f 9425 2f9f 9425 2f9f 9425 2f9f 9425  /..%/..%/..%/..%
00000c0: 2f9f 9425 2f9f 9425 2f9f 9425 2f9f 9425  /..%/..%/..%/..%
00000d0: 2f9f 9425 2f9f 9425 2f9f 9425 2f9f 9425  /..%/..%/..%/..%
00000e0: 2f9f 9425 2f9f 9425 2f9f 9425 2f9f 9425  /..%/..%/..%/..%
00000f0: 2f9f 9425 2f9f 9425 2f9f 9425 2f9f 9425  /..%/..%/..%/..%

Will be correctly downloaded to look like this:

0000000: 4d5a 5000 0200 0000 0400 0f00 ffff 0000  MZP.............
0000010: b800 0000 0000 0000 4000 1a00 0000 0000  ........@.......
0000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000030: 0000 0000 0000 0000 0000 0000 0001 0000  ................
0000040: ba10 000e 1fb4 09cd 21b8 014c cd21 9090  ........!..L.!..
0000050: 5468 6973 2070 726f 6772 616d 206d 7573  This program mus
0000060: 7420 6265 2072 756e 2075 6e64 6572 2057  t be run under W
0000070: 696e 3332 0d0a 2437 0000 0000 0000 0000  in32..$7........
0000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000100: 5045 0000 4c01 0300 195e 422a 0000 0000  PE..L....^B*....

Additionally, there is an experimental feature for auto-submission to VirusTotal.com if you configure the API key in the streamdb.conf file and download the Perl API for VirusTotal.  There is also an example of submitting the code to an internal sandbox, like CuckooBox.org via your own custom submission page.  To submit a sample, add submit=executable (or any type of file) to the StreamDB query.

Sunday, March 11, 2012

Detection is the New Prevention

Like most large orgs, over the past few months, we've been dealing with many phishing campaigns which spread from infected webmail accounts, scrape the contact list, and propagate their phishing payload.  The catch is that when the spam comes from whitelisted orgs, the phishing all gets delivered because the spam filters either completely whitelist or are less guarded against mail coming from the authenticated mail servers of large orgs.  As a result, there is a higher success rate right now because so many legitimate accounts are compromised in so many orgs across the globe right now.

After a few painful incidents in which it took days to clean all of the infected accounts, I'm pleased to say we've honed our incident response plan for this scenario to the point where it is very effective:

Prevention
Our user awareness training teaches that any suspicious emails are to be reported immediately as phishing instead of normal spam.  The recent incidents have kept this in the forefront of our users' minds.  In the future, we may begin using testing services or our own SP Toolkit instance to simulate phishing campaigns.

Detection
Our users became our initial detection mechanisms.  With the spam gateways letting lots of phishing messages through and with phishing subject lines that are too varied to filter manually, the last line of defense and chance for detection is the user.

Containment
We streamlined and formalized our procedures for mass phishing scenarios.  Our staff in key roles knew ahead of time exactly what to do and when.  Specifically:
  1. Lock an account
  2. Force a logout
  3. Purge all email boxes of a specific subject line
  4. Monitor for successful phishes
  5. Block related on IP's and web sites

Monitoring for successful phishes is where ELSA comes in.  With ELSA logging URL's, we were able to take a given phishing email and monitor for any access to the phishing URL's.  StreamDB would then be used to pull out the POST data to show exactly which user submitted credentials to the phishing site so that containment procedures could be followed.

However, ELSA gives us the ability to go deeper than that and use generalizations from the phishing campaigns our users notice to find phishing that has gone unreported.

Fishing for Phishing with ELSA
In our case, a user reported a phishing email which had a URL of:
hxxp://www.pirini-trade .hr/narudzba/use/web/form1.html
My staff searched ELSA for:
site:www.pirini-trade.hr
No hits, so that particular email failed in its phishing attempt.  However, what if other phishing emails went out with a similar format?  Let's try some other searches:
uri:narudzba
Nope.
uri:"use/web/form1.html"
Nope.
How about this?
+method:POST +uri:use +uri:form1.html
Nope.
Just this:
+uri:use +uri:form1.html
Ok, now we've got some very interesting hits, but no POST's:
GET www.charlesgorley.ca /form/use/Text/form1.html
That looks very similar, but did anyone post data?
+method:POST +site:www.charlesgorley.ca
Hit!
POST www.charlesgorley.ca /form/use/Text/process.php
So, we can use this for our query:
+method:POST +uri:use +referer:form1.html
Set it to alert, and now we've got some decent coverage for variants of this campaign.  If it continues, writing this into a Suricata alert may even be a good idea.  In the meantime, we initiate our incident response plan for this user account.

Detection as Prevention
So, this is great, but how is this prevention as the account was already compromised?  In the case of this example, the POST we found was two days ago.  The key concept to understand is that even though the credentials are compromised, they are not used immediately.  In fact, we have noticed an average turnaround of six days between account compromise and account misuse.  The criminals harvesting the credentials have staff, code, and infrastructure to maintain just like the good guys, so all but the very best lack automation when it comes to actually utilizing the accounts for further phishing.  In fact, our experience has shown an organizational structure much more akin to sweatshop labor than elaborate and sophisticated technology.  Specifically, it is more economical for the criminals to pay low-wage workers to perform the mundane tasks of exploiting compromised accounts to send spam than it is to write scripts to do it.

How do we know this?  The attackers in our previous phishing attacks have all been from a few distinct geographic locations (Spain, France, and Nigeria) and have a distinct operational signature.  They use a browser plugin called "Crazy Browser" which allows a basic Internet Explorer instance to have multiple sub-windows open at the same time in one IE window.  The most likely reason to do this is for easy copy-and-paste between their personal email from which they receive orders from their criminal boss and the various webmail accounts they're logged into for the purposes of spamming.  The point is, if you need a lot of browser windows open, then you're obviously not doing a lot of scripting, and there is most certainly a human involved in the account use.

So, why is this important in shaping our IR plan?  It means that if we know it will be several days between account compromise and account misuse, we can lock down the account before it is misused with sufficiently fast detection.  Yes, the initial prevention via spam filter failed, and the phishtank.com coverage did not show up until a day after the phish was delivered, but from a business perspective, we were able to detect the compromise in time to prevent it from becoming a problem for the business.  ELSA enhances our detection abilities by letting us retroactively investigate phishes based on new leads.

I'm not the first to suggest that detection is the new prevention, but I think it's important to use examples to show how the two are intertwined, and what that means for the business.  Specifically, it means that visibility and analytics are more valuable than (or at least necessary alongside) traditional preventative measures, and we as security professionals need to arm our orgs with the tactical resources to provide these capabilities.

Thursday, March 8, 2012

Correlation in ELSA

In my previous post, I showed how complex the data sets in security have become.  In order to cope with this amount of heterogeneous records, powerful correlation capabilities are required.  I'm pleased to announce that as of today, basic correlation is now possible in ELSA through the new subsearch() transform.

Let's look at an example.  A recent investigation was focusing on what sites were hosting exploit kit materials.  We had a starting point in that we knew a given IP address was hosting an exploit kit, and we wanted to see what other sites hosted related material.  The initial search was:

+dstip:82.192.87.28 groupby:site
   
freshnewstoday.org
893g.in
panam-airline.ru
www.fruitcoctails.com
hife.in
hub7.yourgreatfind.info

So, the question was, what other IP's were also hosting these site names?  This will lead us to other hostile IP's which could be actionable intelligence for blocking at the firewall.  For this, we use the new subsearch() transform:

+dstip:82.192.87.28 groupby:site | subsearch(class:url groupby:dstip)

Which yields all of the unique IP's:

82.192.87.28
218.85.136.44
199.19.53.1
208.67.222.222
8.28.16.201
123.108.111.67
199.254.31.1
74.54.56.23
199.249.112.1
199.19.54.1
184.173.149.222
192.33.14.30
216.69.185.8
8.28.16.203
125.19.40.90

Now, let's say we are only concerned with the non-US IP addresses.  We can apply one of the previously existing transforms to do a whois lookup followed by a filter:

+dstip:82.192.87.28 groupby:site | subsearch(class:url groupby:dstip) | whois | filter(cc,us)

123.108.111.67 org=PANGNET cc=HK name=Pang International Limited descr=Pang International Limited
125.19.40.90 org=EXL-9803-nod cc=IN name=EXL EXL, A-48 SECTOR-58 NOIDA Uttar Pradesh India Contact Person: Mr.Neeraj Email: Neeraj.Jain@exlservice.com Phone:981809784 descr=EXL EXL, A-48 SECTOR-58 NOIDA Uttar Pradesh India Contact Person: Mr.Neeraj Email: Neeraj.Jain@exlservice.com Phone:981809784
218.85.136.44 org=CHINANET-FJ cc=CN name=CHINANET Fujian province network Data Communication Division China Telecom descr=CHINANET Fujian province network Data Communication Division China Telecom
82.192.87.28 org=LEASEWEB cc=NL name=LeaseWeb P.O. Box 93054 1090BB AMSTERDAM Netherlands www.leaseweb.com descr=LeaseWeb P.O. Box 93054 1090BB AMSTERDAM Netherlands www.leaseweb.com

Here something interesting has happened: since our last transform was ending in a groupby which summarizes the dstip field, ELSA has rolled up all of the whois transform fields and made them an opaque string for presentation after filtering for the non-US addresses.  If we want to continue with another transform, it will not do the string summary so that it can pass the results to the next transform in the native format.

So, let's say that we don't want to see the results for 82.192.87.28 because we already know about that one.  We can add another subsearch to filter that out

+dstip:82.192.87.28 groupby:site | subsearch(class:url groupby:dstip) | whois | filter(cc,us) | subsearch(class:url -82.192.87.28 groupby:site)

We get back:

apps.db.ripe.net
whois.arin.net

Obviously, these are hits from our node's lookups and not relevant.  Why did these come up?  The subsearch actually ran this query:

class:url -82.192.87.28 groupby:site +(82.192.87.28 218.85.136.44 193.0.6.142 10.56.64.145 125.19.40.90 123.108.111.67)

So, any log in class URL that contains any of these terms (except 82.192.87.28 because it was negated) will match.  That means that an URL like

GET whois.arin.net/rest/ip/218.85.136.44

will be found in this match.  We don't want that.  Luckily, there is a way to further filter this by adding on a field name (dstip) as the second argument to subsearch like this:

+dstip:82.192.87.28 groupby:site | subsearch(class:url groupby:dstip) | whois | filter(cc,us) | subsearch(class:url -82.192.87.28 groupby:site,dstip)

Now we get no results back, which means that there were no hits to any clients visiting sites on the other IP addresses because none of the IP's in the previous search showed up as a dstip in class URL.

Let's look at another example.  Find all clients who were attacked by an exploit kit like Blackhole:
+sig_msg:"exploit kit" groupby:srcip

Within those results, find the dstip's that any Java web requests went to:
| subsearch(+user_agent:java groupby:dstip)

Perform a whois transform and only include Russian sites:
| whois | grep(cc,ru)

 Now find all sites hosted on these Russian IP's:
| subsearch(class:url groupby:site,dstip)

Found some more bad sites!

www.spbtraveler.com
cat.ms.softspb.com
svyaz62.findhere.org
svyaz61.findhere.org
vlhrusbi.athersite.com
sjpvmakd.3d-game.com
www.spbtraveler1.com
www.spbtraveler2.com
91.196.216.152
www.spbtraveler3.com
svyaz54.findhere.org

Correlation is a powerful tool for combining multiple simple notions to create a very sophisticated question.  The ELSA utility transforms, like whois, provide a good way of whittling down large amounts of data into something interesting for use in alerting and reporting.  It is sometimes the only way to tease out bits of data that would otherwise blend in with the vast amount of similar logs.

Friday, March 2, 2012

Picking Up the Pieces

(When AV Fails)

Managing alerts with ELSA makes it easy to unobtrusively monitor for events that aren't necessarily noteworthy in and of themselves.  One such alert I have is for Symantec Anti-virus alerts, which get logged to the Windows eventlog as event ID 51.  Using eventlog-to-syslog, all of our Windows event logs are forwarded to ELSA.  We regularly see cases in which Symantec logs an alert to the local Windows event log but fails to forward the event to the Symantec management server.  This lets us see the alerts that fall through the cracks.  So, using the search:

eventid:51 program:symantec_antivirus

I get an event back that looks like this:

51: Security Risk Found!Trojan Horse in File: E:\Profile\<user name redacted>\Application Data\BE6CCC87FECC\prf275.tmp by: Auto-Protect scan. Action: Quarantine succeeded : Access denied. Action Description: The file was quarantined successfully.

This log was sent by a network file server, not the client desktop.  By the looks of things, AV did its job and this host is fine.  Just to be sure, I wanted a closer look.  Taking the name of the user from the profile directory, I did a follow-up ELSA search for just the user's name:

<user name redacted>

Which returned the domain controller log showing me which workstation they were logged in from at the time:

672: NT AUTHORITY\SYSTEM: Authentication Ticket Request: User Name: <user name redacted> Supplied Realm Name: <redacted> User ID: %{S-1-5-21-3772051035-1815270206-2219246618-5753} Service Name: krbtgt Service ID: %{S-1-5-21-3772051035-1815270206-2219246618-502} Ticket Options: 0x40810010 Result Code: - Ticket Encryption Type: 0x17 Pre-Authentication Type: 2 Client Address: 10.102.210.89 Certificate Issuer Name: Certificate Serial Number: Certificate Thumbprint:

This told me the user was at IP 10.102.210.89.  I wanted to see what IDS alerts we got during that time, so I searched for:

10.102.210.89 class:snort

and got:

[1:2014170:2] ET POLICY HTTP Request to .su TLD (Soviet Union) Often Malware Related [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 10.102.210.89:2676 -> 78.159.112.131:80

Which is suspicious.  However, we've seen plenty of legitimate sites using .su domains, so this in and of itself doesn't prove anything.  Again, we need to investigate further:

10.102.210.89 class:url

shows:

10.102.210.89|78.159.112.131|GET|imagedumper.su|/imagedump/image.php?size=0&imageid=0&resize=0&data=&thumbnail=1|-|B7B2BAC0C0C0A4ABA3C2C09F92908A9185A8C6C6C1C4C7DEC5C7C3DEAE8FC2C6C0C4\\0\\AK-81|,su,imagedumper.su|200|22528|6869

This looks even more suspicious, but still isn't enough to confirm a compromise.  What else was the host up to?  I wanted to see a list of every site they went to within a minute or so:

10.102.210.89 class:url groupby:site

This returned a list of about a hundred sites, including some very suspicious looking site names:

75686f68.kz
626f6f686f6f.ws

Something is definitely up.  So, what was the content returned by the request to imagedumper.su?  I use the StreamDB plugin to quickly pull the packet content, which shows that this was actually an executable download:

Returning 4 of 2 at offset 0 from Fri Mar  2 07:51:51 2012 to Fri Mar  2 07:51:51 2012 (92 ms)

2012-03-02 07:51:51 10.102.210.89:3812 <- 78.159.112.131:80 5s 22968 bytes Time cutoff

GET imagedumper.su/imagedump/image.php?size=0&imageid=0&resize=0&data=&thumbnail=1

oid=18043-3966842627-452-0

GET /imagedump/image.php?size=0&imageid=0&resize=0&data=&thumbnail=1
Cache-Control: no-cache
Host: imagedumper.su
User-Agent: B7B2BAC0C0C0A4ABA3C2C09F92908A9185A8C6C6C1C4C7DEC5C7C3DEAE8FC2C6C0C4|0|AK-81
X-HTTP-Version: 1.1



200 MS-DOS executable PE  for MS Windows (GUI) Intel 80386 32-bit, UPX compressed

oid=18043-3966843079-22968-0

200 OK
Date: Fri, 02 Mar 2012 16:04:33 GMT
Retry-After: 61867
Server: Apache/2.2.16 (Debian)
Content-Length: 22528
Content-Type: text/html
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-HTTP-Version: 1.1
X-Powered-By: PHP/5.3.3-7+squeeze7

MZ......................@.............................................    .!..L.!This program cannot be run in DOS mode.

$...........A...A...A...8...@.......@...RichA...........PE..L......C.....

So, I extract the executable using the object ID: streamdb/?oid=18043-3966843079-22968-0 and save that to my analysis machine.  I then upload it to the malwr.com online sandbox:

http://malwr.com/analysis/00b714468f1bc2254559dd8fd84186f1/

This shows me that the file should make a DNS request to 1051811156619.com.  Did this happen?  A quick ELSA search through Bro DNS logs confirms that it did:

1051811156619.com

shows:

1330696345.046917|2NMPpT5BTo|<DNS server redacted>|28707|209.236.76.107|53|udp|52601|1051811156619.com|1|C_INTERNET|1|A|0|NOERROR|F|T|F|F|F|0|207.112.46.80|3600.000000

So, our host did in fact execute the binary and make the DNS lookup, despite the Symantec log indicating that it was quarantined!  What did our host do with the answer it got?  Another ELSA search for the DNS answer IP:

207.112.46.80

shows:

Deny tcp src inside:10.102.210.89/3820 dst OUTSIDE:207.112.46.80/20001 by access-group "CSM_FW_ACL_inside" [0xfeb273c6, 0x0]

So, our strict outbound firewall policy is denying our host from connecting to this IP on a weird port.  The host is definitely compromised, but the command-and-control (C2) channel is being blocked.  Based on this, I decide that we need to have an IDS signature for this.  The question is, was the imagedumper.su web request a check-in or was it just a loader?  A quick ELSA query

+uri:"/imagedump/image.php"

shows that there haven't been any other requests to any other sites using that URI and the host makes this request regularly, so this is a good one for a sig.  As a result, I submitted the following to Emerging Threats:

alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"ET TROJAN Win32/Kryptik.ABUD Checkin"; flow:established,to_server; content:"/imagedump/image.php"; http_uri; reference:url,http://malwr.com/analysis/00b714468f1bc2254559dd8fd84186f1/; classtype:trojan-activity; sid:x; rev:1;)

So, what appeared to be a simple case of AV quarantining a virus proved to be an indicator of a full compromise which has been going on for days.  Performing this kind of investigation without ELSA and StreamDB is certainly possible, but I guarantee it would've taken longer than the five minutes it took me, beginning to end.  In total, the following information was needed:
  • AV console logs/alerts from file server
  • Windows logs from AD controller
  • IDS logs
  • URL logs
  • DNS logs
  • Full packet capture
If any one of these information sources had been missing, then we would not be able to confirm a compromise.  This just goes to show how complicated the security landscape is getting, and how much effort it takes to get it right.