LogBlog

« Log management in the age of compliance | Main | PCI Progress »

Musings on 100% Log Collection

One of the most exciting, complicated and, at the same time, very common questions from the field of log management is the "what logs to collect?" question. This comes up during compliance-driven log management projects (in the form of "what to collect for PCI DSS compliance?") as well as operationally-driven (in the form of "what logs from this application do I need to detect faults and errors?") or security-driven log management projects (in the form of "which logs will help me during the incident response?")

What are the answers that one sometimes hear? Otherwise awesome log management guidance NIST 800-92 "Guide to Computer Security Log Management" confuses the reader with this fascinating blurb: "generally, organizations should only require logging and analyzing the data that is of greatest importance." And how do people to know which logs are of importance? (I did have a bit of a debate with NIST folks on that...)

Other answers are situation-specific and thus limited in their usefulness ("need IDS alerts + server logs to detect intrusions via correlation", "need all logs that show access to PHI").  I spoke about the pitfalls of "prioritizing before collection" in my presentation "Six Mistakes of Log Management" and its companion paper.  In some cases, such as the incident response scenario, you might be naturally leaning towards grabbing as much as possible, since you never know which bit will help you answer that dreaded "WHAT happened?!" question ...

On the other hand, there is a simple answer that doesn't suffer from the above issues: collect everything. However, many folks go into a state of shock upon hearing it :-) "Everything!?! HOW can you collect 'everything''? What about storage, bandwidth, hardware, etc?"

But you know what? It really isn't as bad as you think! Just think that:

  1. Logs compress really well (1:10 to 1:15 compression ratios are not unheard of), so bandwidth and storage are less of an issue that initially estimated
  2. Disk storage is cheap (and tape is cheaper still); holding a billion of log records might well cost  under $200 (!) in disk drive cost
  3. Figuring out what you need, might need, can be "told to need", will need, etc is genuinely hard. You will never get it right!
  4. Many logs don't need to get collected the next second after generation, thus allowing you to save some bandwidth by moving them when network is less busy.
  5. Technologies exist to make sense of "everything", not just "hand-picked" and parsed logs. More on the "making sense" part later

Convinced yet? So, if you are pondering "what logs to collect?", try to switch your mindset into thinking "what will it take for me to collect everything?" You probably won't regret this decision!  At the same time, one can try to reverse this question and ask "why collect everything?" - in this case, the answer will be "because any other collection strategy is worse."

Related posts:

Technorati tags: ,

Posted July 19, 2007 in Compliance , Log Management & Intelligence , LogEd , Security | Permalink


TrackBack

TrackBack URL for this entry:
http://www.loglogic.com/mt/mt-tb.cgi/228

Visit loglogic.com

I ♥ Logs

Subscribe to this blog’s feed RSS

November 2007
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30  
Categories
Archives
Blogroll
Blogroll
Compliance
Good Reading
LogLogic
LogLogic Partners
Sites We Watch