Friday, April 20, 2012

Accelerating CIF with Sphinx

The Collective Intelligence Framework, CIF, is a way to consolidate public and private security intel into a single repository which can then be safely shared and accessed.  ELSA has had a transform plugin for CIF for months which allows search results to be looked up in CIF and any hits appended to the displayed fields.  In my work on integrating ELSA with CIF, I found that the CIF lookups for each record were taking too long to be effective when doing bulk lookups.  Having a good understanding of the Sphinx full-text search engine, which is the core component of ELSA, I knew that I could make it faster by overlaying Sphinx on top of the stock CIF Postgres database.

This proved to be much easier than I thought, and so even though I only needed the database handle that Sphinx provides for the ELSA plugin, I decided to create a full web frontend for it that would be compatible with the existing CIF web API.  I'm pleased to announce this code can now be found on Github here: https://github.com/mcholste/cif-rest-sphinx.  The code is very small and easy to install.  It allows for simple queries such as this:

http://my.cif.host/zeus


For the moment, it returns human-readable JSON records for each search match, like this:

{
      "subnet_end" : "778887474",
      "description" : "zeus v2 drop zone",
      "asn" : "50244",
      "asn_desc" : "ITELECOM Pixel View SRL",
      "created" : "1315668846",
      "subnet_start" : "778887474",
      "alternativeid" : "http://www.malwaredomainlist.com/mdl.php?search=46.108.225.50/~ishigo4/zs/ishi.php",
      "cc" : "RO",
      "detecttime" : "1315377480",
      "weight" : "1502",
      "confidence" : "25",
      "id" : "292466",
      "address" : "46.108.225.50/32",
      "severity" : "medium"
   }

Any of these terms are searchable, so you can further search on "ishi.php" or "ITELECOM" to see what else those terms are related to.  The most common searches are for IP's or domain names, like:

http://my.cif.host/deonixion.com

or

http://my.cif.host/46.108.225.50

In addition to easy manual lookups, it also makes it easy to plug external tools into CIF, either via the database handle or through the web API.  The JSON format that's returned is easily parsed by almost any client-side library or integrated into existing web pages.

Access can be controlled by adding API keys to the config file.  Queries include the API key in the request, so you should use an SSL-capable server if requiring API key-controlled access.

Most lookups take about one or two milliseconds to complete, so bulk queries should complete at a rate of around 150/second with database or web frontend overhead.  In the future, I plan to optimize the web API for batch queries which should make this query time even faster.

Reporting

In addition to search, the Sphinx wrapper also allows some almost instantaneous reporting via the database handle.  Here's a list of all ASN's currently hosting Zeus:

mysql -h127.0.0.1 -P9306

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 1

Server version: 2.0.4-id64-release (r3135)



Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.



mysql> SELECT @count, asn_desc FROM infrastructure WHERE MATCH('zeus') GROUP BY asn_desc ORDER BY @count DESC;

+-------+--------+------------------------------------------------------------------+--------+

| id | weight | asn_desc | @count |

+-------+--------+------------------------------------------------------------------+--------+

| 5120 | 1581 | THEPLANET-AS ThePlanet.com Internet Services, Inc. | 188 |

| 272 | 1581 | SOFTLAYER SoftLayer Technologies Inc. | 91 |

| 253 | 1581 | | 84 |

| 89680 | 1581 | CHINANET-IDC-BJ-AP IDC, China Telecommunications Corporation | 74 |

| 314 | 1581 | MASTER-AS Master Internet s.r.o / Czech Republic / www.master.cz | 69 |

| 255 | 1581 | ENOMAS1 eNom, Incorporated | 60 |

| 282 | 1581 | SERVINT ServInt | 60 |

| 8721 | 1581 | PAH-INC GoDaddy.com, Inc. | 59 |

| 4642 | 1581 | OVERSEE-DOT-NET Oversee.net | 57 |

| 268 | 1581 | LEASEWEB LeaseWeb B.V. | 56 |

| 283 | 1581 | CHINANET-BACKBONE No.31,Jin-rong Street | 56 |

| 422 | 1581 | NOC Network Operations Center Inc. | 51 |

| 450 | 1581 | AGAVA3 Agava Ltd. | 51 |

| 356 | 1581 | ONEANDONE-AS 1&1 Internet AG | 46 |

| 228 | 1581 | ADANET-AS Azerbaijan Data Network | 43 |

| 368 | 1581 | DINET-AS Digital Network JSC | 42 |

| 419 | 1581 | MASTERHOST-AS CJSC _MasterHost_ | 41 |

| 32886 | 1581 | KIXS-AS-KR Korea Telecom | 41 |

| 226 | 1581 | HOSTING-MEDIA Aurimas Rapalis trading as _II Hosting Media_ | 40 |

| 5346 | 1581 | ARUBA-ASN Aruba S.p.A. - Network | 40 |

+-------+--------+------------------------------------------------------------------+--------+

20 rows in set (0.00 sec)

Note the response time: 0.00 seconds!

How about all the IP's hosting Zeus?

mysql> SELECT @count, address FROM infrastructure WHERE MATCH('zeus') GROUP BY address ORDER BY @count DESC; SHOW META;
+-------+--------+--------------------+--------+
| id | weight | address | @count |
+-------+--------+--------------------+--------+
| 9239 | 1581 | 113.53.251.236/32 | 38 |
| 43178 | 1581 | 208.43.173.207/32 | 38 |
| 43185 | 1581 | 66.197.143.117/32 | 38 |
| 9240 | 1581 | 178.74.105.55/32 | 36 |
| 9241 | 1581 | 186.114.212.20/32 | 36 |
| 9242 | 1581 | 60.19.30.131/32 | 36 |
| 9243 | 1581 | 110.138.25.251/32 | 36 |
| 228 | 1581 | 109.127.8.242/32 | 35 |
| 401 | 1581 | 216.22.25.10/32 | 35 |
| 6348 | 1581 | 127.0.0.1/32 | 33 |
| 35148 | 1581 | 203.169.164.2/32 | 25 |
| 51102 | 1581 | 77.221.159.237/32 | 24 |
| 51105 | 1581 | 188.120.40.166/32 | 24 |
| 51107 | 1581 | 89.108.122.149/32 | 24 |
| 51109 | 1581 | 195.161.113.218/32 | 24 |
| 89669 | 1581 | 178.218.208.130/32 | 24 |
| 89671 | 1581 | 95.163.69.51/32 | 24 |
| 28932 | 1581 | 178.238.36.64/32 | 23 |
| 28937 | 1581 | 178.238.36.6/32 | 23 |
| 35140 | 1581 | 119.146.223.131/32 | 23 |
+-------+--------+--------------------+--------+
20 rows in set (0.00 sec)

+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 717 |
| total_found | 717 |
| time | 0.003 |
| keyword[0] | zeus |
| docs[0] | 3292 |
| hits[0] | 3292 |
+---------------+-------+
6 rows in set (0.01 sec)


Note how adding "SHOW META" to the end of the query will yield a second result table which shows how many more entries there are.  By default, on the first twenty are displayed.

Installation

Installation is fairly straightforward and can be done by following the INSTALL doc.  In general, it covers installing Sphinx and configuring everything on Ubuntu/Debian, but there aren't many steps, so performing the installation on other distributions should be fairly easy.  You can use the Github page to report any bugs or request help.

12 comments:

  1. Martin - This looks awesome I have been playing around with CIF lately and this looks like a great addition to it. I will be trying this out and let you know how it goes. Thanks again for all the work you do.
    -- DFIRN00B

    ReplyDelete
  2. Nice work again Martin! I'm going to take a crack at this on Friday, so your timing is great. Have you discussed getting this rolled back in with Wes? Did you think about dropping an note out on the CIF mailing list?

    ReplyDelete
  3. Thanks! Yep, I put a notice on the list back in April with a link to the github code. As a side note, the upcoming new CIF plugin for ELSA uses this as the backend which lets us crunch through all Snort alerts or URLs for a given client IP against CIF very quickly.

    ReplyDelete
  4. Hi,
    I'm receiving an error when trying to install it.
    Can you please help?
    Thanks!


    root@ip-10-80-117-162:/usr/local/cif-rest-sphinx# sudo dpkg -i sphinxsearch_2.0.4-lucid_amd64.deb
    dpkg-deb: error: `sphinxsearch_2.0.4-lucid_amd64.deb' is not a debian format archive
    dpkg: error processing sphinxsearch_2.0.4-lucid_amd64.deb (--install):
    subprocess dpkg-deb --control returned error exit status 2
    Errors were encountered while processing:
    sphinxsearch_2.0.4-lucid_amd64.deb

    ReplyDelete
  5. Sorry about that, fixed a bad link. Try the latest link in the INSTALL doc: "http://sphinxsearch.com/files/sphinxsearch_2.0.4-lucid_amd64.deb"

    ReplyDelete
  6. Hi there
    How does the web frontend look like?
    Is there a GUI through which searching is done?

    Thanks (:

    ReplyDelete
  7. Yes, see many of the other posts on this blog, such as the latest: http://ossectools.blogspot.com/2012/06/j-edgar-and-big-data.html

    ReplyDelete
  8. Hi,
    So I understand that once i install it i don't have to use ELSA, i can query with http requests via my browser, correct?
    Thanks for the great info and ELSA is great.

    ReplyDelete
  9. Right, you can query it directly with your browser--you don't need ELSA. Or, you can query it with the MySQL interface in a different script, etc.

    ReplyDelete
  10. Hi, i've completed this,but i cant see where to input the apikey
    Thanks (:

    # Copy the config
    sudo cp cif-rest-sphinx.conf /etc/cif-rest-sphinx.conf
    # Edit as necessary for apikeys, etc.

    ReplyDelete
  11. This is an excellent post I seen thanks to share it. It is really what I wanted to see hope in future you will continue for sharing such a excellent post. multiple ip hosting

    ReplyDelete