Wednesday, October 3, 2012

Multi-node Bro Cluster Setup Howto

My previous post covering setting up a Bro cluster was a good starting point for using all of the cores on a server to process network traffic in Bro.  This post will show how to take that a step further and setup a multi-node cluster using more than one server.  We'll also go a step further with PF_RING and install the custom drivers.

For each node:


We'll begin as before by installing PF_RING first:

Install prereqs
sudo apt-get install ethtool libcap2-bin make g++ swig python-dev libmagic-dev libpcre3-dev libssl-dev cmake git-core subversion ruby-dev libgeoip-dev flex bison
Uninstall conflicting tcpdump
sudo apt-get remove tcpdump libpcap-0.8
Make the PF_RING kernel module
cd
svn export https://svn.ntop.org/svn/ntop/trunk/PF_RING/ pfring-svn
cd pfring-svn/kernel
make && sudo make install
Make PF_RING-aware driver (for an Intel NIC, Broadcom is also provided). 
PF_RING-DNA (even faster) drivers are available, but they come with tradeoffs and are not required for less than one gigabit of traffic.
First, find out which driver you need
lsmod | egrep "e1000|igb|ixgbe|bnx|bnx"
If you have multiple listed, which is likely, you'll want to see which is being used for your tap or span interface that you'll be monitoring using lspci.  Note that when you're installing drivers, you will lose your remote connection if the driver is also controlling the management interface.  I also recommend backing up the original driver that ships with the system.  In our example below, I will use a standard Intel gigabit NIC (igb).
find /lib/modules -name igb.ko
Copy this file for safe keeping as a backup in case it gets overwritten (unlikely, but better safe than sorry).  Now build and install the driver:
cd ../drivers/PF_RING_aware/intel/igb/igb-3.4.7/src
make && sudo make install
Install the new driver (this will take any active links down using the driver)
rmmod igb && modprobe igb
Build the PF_RING library and new utilities
cd ../userland/lib
./configure --prefix=/usr/local/pfring && make && sudo make install
cd ../libpcap-1.1.1-ring
./configure --prefix=/usr/local/pfring && make && sudo make install
echo "/usr/local/pfring/lib" >> /etc/ld.so.conf
cd ../tcpdump-4.1.1
./configure --prefix=/usr/local/pfring && make && sudo make install
# Add PF_RING to the ldconfig include list
echo "PATH=$PATH:/usr/local/pfring/bin:/usr/local/pfring/sbin" >> /etc/bash.bashrc


Create the Bro dir
sudo mkdir /usr/local/bro 

Set the interface specific settings, assuming eth4 is your gigabit interface with an MTU of 1514:

rmmod pf_ring
modprobe pf_ring transparent_mode=2 enable_tx_capture=0
ifconfig eth4 down
ethtool -K eth4 rx off
ethtool -K eth4 tx off
ethtool -K eth4 sg off
ethtool -K eth4 tso off
ethtool -K eth4 gso off
ethtool -K eth4 gro off
ethtool -K eth4 lro off
ethtool -K eth4 rxvlan off
ethtool -K eth4 txvlan off
ethtool -s eth4 speed 1000 duplex full
ifconfig eth4 mtu 1514
ifconfig eth4 up


Create the bro user:
sudo adduser bro --disabled-login
sudo mkdir /home/bro/.ssh
sudo chown -R bro:bro /home/bro

Now we need to create a helper script to fix permissions so our our Bro user can run bro promiscuously.  You can put the script anywhere, but it needs to be run after each Bro update from the manager (broctl install).  I'm hoping to find a clean way of doing this in the future via the broctl plugin system.  The script looks like this, assuming eth4 is your interface to monitor:


#!/bin/sh
setcap cap_net_raw,cap_net_admin=eip /usr/local/bro/bin/bro
setcap cap_net_raw,cap_net_admin=eip /usr/local/bro/bin/capstats


On the manager:

Create SSH keys:
sudo ssh-keygen -t rsa -k /home/bro/.ssh/id_rsa
sudo chown -R bro:bro /home/bro

On each node, you will need to create a file called /home/bro/.ssh/authorized_keys and place the text from the manager's /home/bro/.ssh/id_rsa.pub in it.  This will allow the manager to login without a password, which will be needed for cluster admin.  We need to login once to get the key loaded into known_hosts locally.  So for each node, also execute:
sudo su bro -c 'ssh bro@<node> ls'

Accept the key when asked (unless you have some reason to be suspicious).

Get and make Bro
cd
 mkdir brobuild && cd brobuild
git clone --recursive git://git.bro-ids.org/bro
./configure --prefix=/usr/local/bro --with-pcap=/usr/local/pfring && cd build && make -j8 && sudo make install
cd /usr/local/bro

Create the node.cfg
vi etc/node.cfg
It should look like this:

[manager]
type=manager
host=<manager IP>

[proxy-0]
type=proxy
host=<first node IP>

[worker-0]
type=worker
host=<first node IP>

interface=eth4 (or whatever your interface is)
lb_method=pf_ring
lb_procs=8 (set this to 1/2 the number of CPU's available)



Repeat this for as many nodes as there will be.

Now, for each node, we need to create a packet filter there to do a poor-man's load balancer.  You could always use a hardware load balancer to deal with this, but in our scenario, that's not possible, and all nodes are receiving the same traffic.  We're going to have each node focus on just its own part of the traffic stream, which it will then load balance using PF_RING internally to all its local worker processes.  To accomplish this, we're going to use a very strange BPF to send a hash of source/destination to the same box.  This will load balance based on the IP pairs talking, but it may be suboptimal if you have some very busy IP addresses.

In our example, there will be four nodes monitoring traffic, so the BPF looks like this for the first node:
(ip[14:2]+ip[18:2]) - (4*((ip[14:2]+ip[18:2])/4)) == 0
So, in /etc/bro/local.bro, we have this:
redef cmd_line_bpf_filter="(ip[14:2]+ip[18:2]) - (4*((ip[14:2]+ip[18:2])/4)) == 0";
On the second node, we would have this:
redef cmd_line_bpf_filter="(ip[14:2]+ip[18:2]) - (4*((ip[14:2]+ip[18:2])/4)) == 1";
Third:
redef cmd_line_bpf_filter="(ip[14:2]+ip[18:2]) - (4*((ip[14:2]+ip[18:2])/4)) == 2";
And fourth:
redef cmd_line_bpf_filter="(ip[14:2]+ip[18:2]) - (4*((ip[14:2]+ip[18:2])/4)) == 3";

Special note:   If you are monitoring a link that is still vlan tagged (like from an RSPAN), then you will need to stick vlan <vlan id> && in front of each of the BPF's.

We wrap a check around these statements so that the correct one gets execute don the correct node, so the final version is added to the bottom of our /usr/local/bro/share/bro/site/local.bro file which will be copied out to each of the nodes:

# Set BPF load balancer for 4 worker nodes
@if ( Cluster::node == /worker-0.*/ )
redef cmd_line_bpf_filter="(ip[14:2]+ip[18:2]) - (4*((ip[14:2]+ip[18:2])/4)) == 0";
@endif   
@if ( Cluster::node == /worker-1.*/ )
redef cmd_line_bpf_filter="(ip[14:2]+ip[18:2]) - (4*((ip[14:2]+ip[18:2])/4)) == 1";
@endif

@if ( Cluster::node == /worker-2.*/ )
redef cmd_line_bpf_filter="(ip[14:2]+ip[18:2]) - (4*((ip[14:2]+ip[18:2])/4)) == 2";
@endif   
@if ( Cluster::node == /worker-3.*/ )
redef cmd_line_bpf_filter="(ip[14:2]+ip[18:2]) - (4*((ip[14:2]+ip[18:2])/4)) == 3";
@endif


Finally, we need to send all of our logs somewhere like ELSA.  We can do this with either syslog-ng or rsyslogd.  Since rsyslog is installed by default on Ubuntu, I'll show that example.  It's the same as in the previous blog post on setting up Bro:

Create /etc/rsyslog.d/60-bro.conf and insert the following, changing @central_syslog_server to whatever your ELSA IP is:

$ModLoad imfile #
$InputFileName /usr/local/bro/logs/current/ssl.log
$InputFileTag bro_ssl:
$InputFileStateFile stat-bro_ssl
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFileName /usr/local/bro/logs/current/smtp.log
$InputFileTag bro_smtp:
$InputFileStateFile stat-bro_smtp
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFileName /usr/local/bro/logs/current/smtp_entities.log
$InputFileTag bro_smtp_entities:
$InputFileStateFile stat-bro_smtp_entities
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFileName /usr/local/bro/logs/current/notice.log
$InputFileTag bro_notice:
$InputFileStateFile stat-bro_notice
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFileName /usr/local/bro/logs/current/ssh.log
$InputFileTag bro_ssh:
$InputFileStateFile stat-bro_ssh
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFileName /usr/local/bro/logs/current/ftp.log
$InputFileTag bro_ftp:
$InputFileStateFile stat-bro_ftp
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
# check for new lines every second
$InputFilePollingInterval 1
local7.* @central_syslog_server


Then, 

restart rsyslog

We're ready to start the cluster.  Broctl will automatically copy over all of the Bro files, so we don't have to worry about syncing any config or Bro program files.

cd /usr/local/bro
su bro -c 'bin/broctl install'
su bro -c 'bin/broctl check'


On each node (this is the annoying part), run the bro_init.sh script:
ssh <admin user>@<node> "sudo sh /path/to/bro_init.sh"

This only needs to be done after 'install' because it overwrites the Bro binaries which have the special permissions set.

Now we can start the cluster.

su bro -c 'bin/broctl start'

If you cd to /usr/local/bro/logs/current, you should see the files growing as logs come in.  I recommend checking the /proc/net/pf_ring/ directory on each node and catting the pid files there to inspect packets per second, etc. to ensure that everything is being recorded properly.  Now all you have to do is go rummaging around for some old servers headed to surplus, and you'll have a very powerful, distributed (tell management it's "cloud") IDS that can do some amazing things.