Sunday, June 9, 2013

Understanding ELSA Query Performance

Most queries in ELSA are very fast and complete under a second or two.  However, some queries can take several seconds or even several minutes, and it can be annoying to wait for them.  A recent update to ELSA should help reduce the likelihood of having queries that take longer than a second or two, but understanding what factors are involved with query execution time can help a user to both write better queries and to take full advantage of the new improvements.

First, let's look at what happens when ELSA makes a query.  ELSA uses Sphinx as its search engine, and it uses two types of Sphinx indexes.  The first is a "temporary" index that ELSA initially creates for new logs which stores the fields (technically, attributes) of the events in RAM, deduplicated, using Sphinx's default "extern" docinfo.   The other is the "permanent" form which stores the attributes using Sphinx's "inline" docinfo.  Inline means that the attributes are something like a database table, where the name of the table is the keyword being searched, and all entries in the table correspond to the hits for that keyword.

So let's say we have log entries that look like this:

term1 term2 id=1, timestamp=x1, host=y, class=z
term1 term3 id=2, timestamp=x2, host=y, class=z

Sphinx's inline docinfo would store this as three total keywords, each with the list of attributes beneath it like a database table:

term1
id | timestamp | host | class
1  | x1        | y    | z
2  | x2        | y    | z


term2
id | timestamp | host | class
1  | x1        | y    | z


term3
id | timestamp | host | class
2  | x2        | y    | z

So when you query for +term1 +term2, Sphinx does a pseudo-SQL query like this:

SELECT * FROM term1 JOIN term2 WHERE term1.id=term2.id

Most terms are fairly rare, so the join is incredibly fast.  However, consider a situation in which "term1" appeared in hundreds of millions of events.  If your query includes "term1," then the number of "rows" in the "table" for that term could be millions or even billions, making that JOIN extremely expensive, especially if you've asked for the query to filter the results to specific time values or do a group by.

In addition to the slow querying, note that the disk required to store the Sphinx indexes is a function of the number of attributes it must store in these pseudo-tables.  So, a very common term will incur a massive disk cost to store the large pseudo-table.

Below is the count of the one hundred most common terms in a test dataset of ten million events.  You can think of each bar representing the number of rows in the pseudo-tables, so a query for 0 - (the two most common terms) would require a join across a pseudo-table with 45,355,729 rows multiplied with another with 33,907,455 rows.  Note how quickly the hit count of a given term drops off.



Stopwords

This is where Sphinx stopwords save the day.  Sphinx's indexer has an option to calculate the frequency with which keywords exist in data to be indexed.  You can invoke this by adding --buildstops <outfile> <n> --buildfreqs to the indexing command and it will find the top n most frequent keywords and write them to outfile, along with the count for how many times the keyword appeared.  This file can be referred to by a subsequent run of indexer, sans the stopword options, to ignore these n most frequent keywords.  This will save a massive amount of disk space (expect savings of around 60% percent) and also guarantee that queries including the word won't take forever, because the index won't have any knowledge of them.

However, this obviously means that the keywords can't be searched.  To cope with this, ELSA has a configuration item in the elsa_web.conf file where you can specify a hash of stopwords.  If a query attempts to search one of these keywords, then one of several things can happen:

  1. If some terms are stopwords and some are not, then the query will use the non-stopwords as the basis for the Sphinx search, and results will be filtered using the stopwords in ELSA.
  2. If all terms are stopwords, the query is run against the raw SQL database and Sphinx is not queried at all.
  3. If a query contains no keywords, just attributes (such as a query for just a class or a range query), the query will be run against the raw SQL database and not Sphinx.
Currently, stopwords must be manually created and added, but the optimization code exists in the current ELSA codebase.  I will be adding automatic stopword management in the near future so that all ELSA users will benefit from the massive disk savings and predictable performance that shifting stopword and attribute-only searches to SQL can provide.

49 comments:

  1. If you are new for monitoring your website, use EazeMonitoring a perfect tool for your website, just register to member registration and get your website under eazemonitoring scanner within a minute. Get More Details here http://eazeconnect.com/eazemonitoring.html

    ReplyDelete
  2. Thank you for your post. This was really an appreciating one. You done a good job. Keep on blogging like this unique information with us.

    Hadoop Training in Chennai

    ReplyDelete


  3. These provided information was really so nice,thanks for giving that post and the more skills to develop after refer that post.our giving articles really impressed for me,because of all information so nice.

    SAP ABAP training in Chennai

    ReplyDelete
  4. Thank you for sharing the information here. Here i got some valid information here. So please keep update like this with relevant topics.

    Hadoop Training in Chennai

    ReplyDelete
  5. Great blog..You have clearly explained about the concept..Step by step explanation is too good to understand..Its very useful for me to understand..Keep on sharing..
    Java Training in Chennai

    ReplyDelete

  6. I feel happy about and I love learning more about this topic. keep sharing your information regularly for my future reference. This content creates a new hope and inspiration with in me. Thanks for sharing article like this.

    informatica training in Mylapore

    ReplyDelete

  7. I feel happy about and I love learning more about this topic. keep sharing your information regularly for my future reference. This content creates a new hope and inspiration with in me. Thanks for sharing article like this.

    informatica training in Mylapore

    ReplyDelete
  8. This content creates a new hope and inspiration with in me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks.


    SAP training in Chennai

    ReplyDelete



  9. I have been following you for a couple of months now but this is my first time commenting on a blog post. Thank you for sharing your knowledge and experience with us. Keep up the good work. Already bookmarked for future reference.

    SAP training in Chennai

    ReplyDelete
  10. Superb i really enjoyed very much with this article here. Really its a amazing article i had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article.


    SEO Company in Chennai

    ReplyDelete
  11. Superb explanation & it's too clear to understand the concept as well, keep sharing admin with some updated information with right examples.Keep update more posts.

    SEO Company in Chennai

    ReplyDelete
  12. nice blog too informative. looking and reading your points its so impressive. doing more blog like this. i really appreciated doing like this.
    Digital Marketing Company in Chennai

    ReplyDelete
  13. This blog is having the general information. Got a creative work and this is very different one. We have to develop our creativity mind. This blog helps for this.

    Thank you for this blog. this is very interesting and useful.
    Email Marketing Chennai

    ReplyDelete
  14. It is really awesome and wonderful thus it is helpful too thanks for sharing these precious information it is really good and very well done a great job .




    Digital Marketing services in Chennai

    ReplyDelete
  15. Nice to see. This blog provide separate information for questions and answers. Thank you for this. very helpful for interview.
    SMO Services in Chennai

    ReplyDelete
  16. You have clearly explained about Understanding ELSA Query Performance.It is very useful for me to understand about this top.
    Hadoop training in Chennai

    ReplyDelete
  17. Wonderful bloggers like yourself who would positively reply encouraged me to be more open and engaging in commenting. So know it's helpful..

    SEO Training in Chennai

    ReplyDelete
  18. Your content is awesome . You have done a great job and its very useful for me . I appreciate your effort and I hope that you will get more positive comments from the web users.
    Hadoop training in Chennai

    ReplyDelete
  19. I do believe all of the concepts you’ve introduced in your post. They’re very convincing and will definitely work. Nonetheless, the posts are too short for novices. May you please extend them a bit from subsequent time? Thank you for the post.
    Digital marketing course in Chennai

    ReplyDelete
  20. Your content is awesome . You have done a great job and its very useful for me . I appreciate your effort and I hope that you will get more positive comments from the web users.
    SEO training in Chennai

    ReplyDelete
  21. Great information shared in this blog. Helps in gaining concepts about new information and concepts.Awsome information provided.Very useful for the beginners.
    dot net Training in Chennai

    ReplyDelete
  22. nice blog too informative. looking and reading your points its so impressive. doing more blog like this. i really appreciated doing like this.
    Java Training Institute in Chennai

    ReplyDelete
  23. this technological concepts are really well being and wonderful thus it is very much interesting and very well good too, really i got more information from your knowledge.



    Software Testing Training in Chennai

    ReplyDelete
  24. Thanks, I really appreciate the kind words.thanks for sharing that valuable information.Its goodness someone is promoting quality content.


    Dot Net training

    ReplyDelete
  25. Great explanation to given on this post .The given information very impressed for me really so nice content.

    Dot Net training in chennai

    ReplyDelete
  26. Thanks for your comment. I’m going to have to bookmark it for later, because it made me think! Best wishes.if you want get certification with job in Cloud Computing please let us know by click the followig link. Cloud Computing Training in Chennai

    ReplyDelete
  27. I have read your blog its very attractive and impressive. I like it your blog.

    Java Online Training Java EE Online Training Java EE Online Training Java 8 online training Java 8 online training

    Java Online Training from India Java Online Training from India Core Java Training Online Core Java Training Online Java Training InstitutesJava Training Institutes

    ReplyDelete
  28. Understanding query performance topic is the one which i have searched for a long time, this blog clears all my doubts.

    thanks.

    Title loans in Alabama

    ReplyDelete
  29. Superb. I really enjoyed very much with this article here. Really it is an amazing article I had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article.thank you for sharing such a great blog with us. expecting for your.
    Digital Marketing Company in India

    ReplyDelete
  30. This is really one of the most beneficial blogs I’ve ever browsed on this subject. I am very glad to read such a great blog and thank you for sharing this good information with us.
    Selenium Training in Chennai
    Selenium Course in Chennai

    ReplyDelete
  31. Thank you for sharing such a nice and interesting blog with us. I have seen that all will say the same thing repeatedly. But in your blog, I had a chance to get some useful and unique information. I would like to suggest your blog in my dude circle.
    Isoft Innovations Company Address
    Isoft Innovations Facebook

    ReplyDelete
  32. I just see the post i am so happy the post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be subscribing to your feed and I hope you post again soon.
    Facility Management Companies in Chennai


    ReplyDelete
  33. Thank you for taking the time to provide us with your valuable information. We strive to provide our candidates with excellent care.As always, we appreciate you confidence and trust in us.

    Painless Dental Treatment In Chennai

    Best Dental Clinic In Adyar

    ReplyDelete
  34. Thank you for sharing the information here. Its much informative and really i got some valid information. You had posted the amazing article.

    MSBI Training in Chennai

    Informatica Training in Chennai

    ReplyDelete
  35. Superb i really enjoyed very much with this article here. Really its a amazing article i had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article.

    SEO Company in India

    ReplyDelete
  36. Very Nice Blog I like the way you explained these things.
    Indias Fastest Local Search Engine
    CALL360
    Indias Leading Local Business Directory

    ReplyDelete
  37. Green Ladies hostel is specifically the safest women's hostel in Chennai near Solinganallur. Surrounded by good hotels, hospitals, bus stands and shopping malls.Ladies hostel Adyar

    ReplyDelete
  38. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Flats Cleaning in Chennai

    ReplyDelete
  39. brilliant article that I was searching for. Helps me a lot
    call360 is Fastest local search Engine we have 12 years of experience in online industery, in our Search Engine we offer,
    more than 220 categories and 1 Million Business Listing most frequently search categories
    are Money exchange Chennai and Bike mechanic Chennai,
    we deliver 100% accure data to users & 100% Verified leads to our
    registered business vendors and our most popular categories are
    AC mechanic chennai,
    Advertising agencies chennai
    catering services chennai

    ReplyDelete
  40. brilliant article that I was searching for. Helps me a lot.
    We are one of the Finest ladies hostel near OMR and our
    womens hostel in adyar is secure place for working womens
    we provide home based food with hi quality, our hostel located very near to Adyar bus depot.
    womens hostel near Adyar bus depot, we are one of the best and experienced
    womens hostel near omr

    ReplyDelete


  41. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic.

    Payday loans in Alabama
    Title loans in South Carolina

    ReplyDelete
  42. Its a wonderful post and very helpful, thanks for all this information. You are including better information regarding this topic in an effective way.Thank you so much

    Installment loans
    Payday loans
    Title loans
    Cash Advances

    ReplyDelete
  43. Interesting blog about query performance which attracted me more.Spend a worthful time.keep updating more.
    Digital marketing company in Chennai

    ReplyDelete
  44. I read the post and I have really enjoyed your blogs posts.looking for the next post.
    Digital Marketing Training In Bangalore.

    ReplyDelete
  45. This blog tells the thing that can be understood quick and that's what the speciality of it.worth sharing..
    Pawn Shops in Montgomery
    Pawn Shops in Birmingham
    Pawn Shops in Mobile

    ReplyDelete
  46. It's Really A Great Post. Looking For Some More Stuff
    I really enjoyed reading the Post. It was very informative and useful for me.
    Best Java Training institute in Bangalore

    ReplyDelete
  47. Thanks for appreciating. Really means and inspires a lot to hear from you guys.I have bookmarked it and I am looking forward to reading new articles. Keep up the good work..Believe me, This is very helpful for me
    Digital Marketing Company in chennai
    Digital Marketing Company in India

    ReplyDelete