Friday, April 20, 2012

Accelerating CIF with Sphinx

The Collective Intelligence Framework, CIF, is a way to consolidate public and private security intel into a single repository which can then be safely shared and accessed.  ELSA has had a transform plugin for CIF for months which allows search results to be looked up in CIF and any hits appended to the displayed fields.  In my work on integrating ELSA with CIF, I found that the CIF lookups for each record were taking too long to be effective when doing bulk lookups.  Having a good understanding of the Sphinx full-text search engine, which is the core component of ELSA, I knew that I could make it faster by overlaying Sphinx on top of the stock CIF Postgres database.

This proved to be much easier than I thought, and so even though I only needed the database handle that Sphinx provides for the ELSA plugin, I decided to create a full web frontend for it that would be compatible with the existing CIF web API.  I'm pleased to announce this code can now be found on Github here: https://github.com/mcholste/cif-rest-sphinx.  The code is very small and easy to install.  It allows for simple queries such as this:

http://my.cif.host/zeus


For the moment, it returns human-readable JSON records for each search match, like this:

{
      "subnet_end" : "778887474",
      "description" : "zeus v2 drop zone",
      "asn" : "50244",
      "asn_desc" : "ITELECOM Pixel View SRL",
      "created" : "1315668846",
      "subnet_start" : "778887474",
      "alternativeid" : "http://www.malwaredomainlist.com/mdl.php?search=46.108.225.50/~ishigo4/zs/ishi.php",
      "cc" : "RO",
      "detecttime" : "1315377480",
      "weight" : "1502",
      "confidence" : "25",
      "id" : "292466",
      "address" : "46.108.225.50/32",
      "severity" : "medium"
   }

Any of these terms are searchable, so you can further search on "ishi.php" or "ITELECOM" to see what else those terms are related to.  The most common searches are for IP's or domain names, like:

http://my.cif.host/deonixion.com

or

http://my.cif.host/46.108.225.50

In addition to easy manual lookups, it also makes it easy to plug external tools into CIF, either via the database handle or through the web API.  The JSON format that's returned is easily parsed by almost any client-side library or integrated into existing web pages.

Access can be controlled by adding API keys to the config file.  Queries include the API key in the request, so you should use an SSL-capable server if requiring API key-controlled access.

Most lookups take about one or two milliseconds to complete, so bulk queries should complete at a rate of around 150/second with database or web frontend overhead.  In the future, I plan to optimize the web API for batch queries which should make this query time even faster.

Reporting

In addition to search, the Sphinx wrapper also allows some almost instantaneous reporting via the database handle.  Here's a list of all ASN's currently hosting Zeus:

mysql -h127.0.0.1 -P9306

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 1

Server version: 2.0.4-id64-release (r3135)



Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.



mysql> SELECT @count, asn_desc FROM infrastructure WHERE MATCH('zeus') GROUP BY asn_desc ORDER BY @count DESC;

+-------+--------+------------------------------------------------------------------+--------+

| id | weight | asn_desc | @count |

+-------+--------+------------------------------------------------------------------+--------+

| 5120 | 1581 | THEPLANET-AS ThePlanet.com Internet Services, Inc. | 188 |

| 272 | 1581 | SOFTLAYER SoftLayer Technologies Inc. | 91 |

| 253 | 1581 | | 84 |

| 89680 | 1581 | CHINANET-IDC-BJ-AP IDC, China Telecommunications Corporation | 74 |

| 314 | 1581 | MASTER-AS Master Internet s.r.o / Czech Republic / www.master.cz | 69 |

| 255 | 1581 | ENOMAS1 eNom, Incorporated | 60 |

| 282 | 1581 | SERVINT ServInt | 60 |

| 8721 | 1581 | PAH-INC GoDaddy.com, Inc. | 59 |

| 4642 | 1581 | OVERSEE-DOT-NET Oversee.net | 57 |

| 268 | 1581 | LEASEWEB LeaseWeb B.V. | 56 |

| 283 | 1581 | CHINANET-BACKBONE No.31,Jin-rong Street | 56 |

| 422 | 1581 | NOC Network Operations Center Inc. | 51 |

| 450 | 1581 | AGAVA3 Agava Ltd. | 51 |

| 356 | 1581 | ONEANDONE-AS 1&1 Internet AG | 46 |

| 228 | 1581 | ADANET-AS Azerbaijan Data Network | 43 |

| 368 | 1581 | DINET-AS Digital Network JSC | 42 |

| 419 | 1581 | MASTERHOST-AS CJSC _MasterHost_ | 41 |

| 32886 | 1581 | KIXS-AS-KR Korea Telecom | 41 |

| 226 | 1581 | HOSTING-MEDIA Aurimas Rapalis trading as _II Hosting Media_ | 40 |

| 5346 | 1581 | ARUBA-ASN Aruba S.p.A. - Network | 40 |

+-------+--------+------------------------------------------------------------------+--------+

20 rows in set (0.00 sec)

Note the response time: 0.00 seconds!

How about all the IP's hosting Zeus?

mysql> SELECT @count, address FROM infrastructure WHERE MATCH('zeus') GROUP BY address ORDER BY @count DESC; SHOW META;
+-------+--------+--------------------+--------+
| id | weight | address | @count |
+-------+--------+--------------------+--------+
| 9239 | 1581 | 113.53.251.236/32 | 38 |
| 43178 | 1581 | 208.43.173.207/32 | 38 |
| 43185 | 1581 | 66.197.143.117/32 | 38 |
| 9240 | 1581 | 178.74.105.55/32 | 36 |
| 9241 | 1581 | 186.114.212.20/32 | 36 |
| 9242 | 1581 | 60.19.30.131/32 | 36 |
| 9243 | 1581 | 110.138.25.251/32 | 36 |
| 228 | 1581 | 109.127.8.242/32 | 35 |
| 401 | 1581 | 216.22.25.10/32 | 35 |
| 6348 | 1581 | 127.0.0.1/32 | 33 |
| 35148 | 1581 | 203.169.164.2/32 | 25 |
| 51102 | 1581 | 77.221.159.237/32 | 24 |
| 51105 | 1581 | 188.120.40.166/32 | 24 |
| 51107 | 1581 | 89.108.122.149/32 | 24 |
| 51109 | 1581 | 195.161.113.218/32 | 24 |
| 89669 | 1581 | 178.218.208.130/32 | 24 |
| 89671 | 1581 | 95.163.69.51/32 | 24 |
| 28932 | 1581 | 178.238.36.64/32 | 23 |
| 28937 | 1581 | 178.238.36.6/32 | 23 |
| 35140 | 1581 | 119.146.223.131/32 | 23 |
+-------+--------+--------------------+--------+
20 rows in set (0.00 sec)

+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 717 |
| total_found | 717 |
| time | 0.003 |
| keyword[0] | zeus |
| docs[0] | 3292 |
| hits[0] | 3292 |
+---------------+-------+
6 rows in set (0.01 sec)


Note how adding "SHOW META" to the end of the query will yield a second result table which shows how many more entries there are.  By default, on the first twenty are displayed.

Installation

Installation is fairly straightforward and can be done by following the INSTALL doc.  In general, it covers installing Sphinx and configuring everything on Ubuntu/Debian, but there aren't many steps, so performing the installation on other distributions should be fairly easy.  You can use the Github page to report any bugs or request help.