LogBlog

« April 2008 | Main

Log data for the people, Facebook style

While LogLogic’s log data warehouse solution has featured an open web services API since late 2006, recently the interest from customers and partners to use the API has sharply increased.  In part, this trend can be explained by the re-discovery of log data in the fault management community.  Help desk staff has different requirements for visualizing log data than internal auditors.  Rather than waiting for commercial software vendors (such as LogLogic) or internal application developers, many end users are taking their destiny in their own hands and are customizing portal views into log data for their specific use case.  More than the emergence of a long-tail of use cases for log data, it is web 2.0 and Facebook that are at the root of the surge in log-creativity.  The Facebook generation is increasingly self-sufficient and EXPECTS applications to be open and programmable.  Of course this trend is not limited to LogLogic alone.  Analyst Vishwanath Venugopalan of The 451 Group says that 49% of enterprises create their own mash-ups today and that 74% of enterprises will increase mash-up activity next year.  It is therefore no surprise that customers are demanding their log management solution to have an open, standards-based API.  What is surprising is how few vendors have picked up on this trend by opening up their solutions.

Posted May 15, 2008 in | Permalink | TrackBack (0)

« April 2008 | Main

Anton Logging Tip of the Day #15: Fear and Loathing in Event 560 (and 562 and 567)

Following the new "tradition" of posting a security tip of the week (mentioned here, here ; SANS jumped in as well), I decided to follow along and join the initiative. One of the bloggers called it "pay it forward" to the community.

So, Anton Logging Tip of the Day #15: Fear and Loathing in Event 560 (and 562 and 567)

This tip digs into a seemingly simple, but really VERY esoteric subject: monitoring file access and modification via a Windows event log. Now, some people - who never studied this subject - tend to have a very simplistic view of this: just enable Object Access auditing, then right-click on a file or directory, click Security->Advanced->Auditing and then pick what types of events will be logged and by what accessing entities (i.e. users or computers). OK, so this will produce some logs, that is for sure. But are they useful?

First, why are we doing this? We typically need to know the following when we audit file access in Windows (or any other OS for that matter) for security (monitoring and investigation) or compliance:

Can we get this from the above logs? No.

What? No!?! Really?

Yes, really. We can get some of the above, some of the time, not all of the above, all of the time. Here is an example, we are looking at event ID 560 (picture) and then at an extract from its description field.

Event:

event_log560_1_thumb

Description (selected field):

Object Server: Security

Object Type: File

Object Name: C:\0\TestBed\simple_text_file.txt

Image File Name: C:\WINDOWS\system32\notepad.exe

Primary User Name: Anton

Primary Domain: XXXXXX

Accesses: READ_CONTROL

SYNCHRONIZE

ReadData (or ListDirectory)

WriteData (or AddFile)

AppendData (or AddSubdirectory or CreatePipeInstance)

ReadEA

WriteEA

ReadAttributes

WriteAttributes

 

WTH is that? Well, we know that the user  'Anton' has successfully read? wrote? changed attributes? did something? with a file named "C:\0\TestBed\simple_text_file.txt" using a program named "C:\WINDOWS\system32\notepad.exe." That's the best we can get, in this case! We may try to look at event IDs 562 and 567, but this missing information (i.e. the exact action performed) will not be added.

BTW, there will be  a few more dozen (sometime hundreds!) of the 560s, 562s and 567s  produced - all from just opening the text file in a notepad. The above event is notable for having BOTH "notepad" and "simple_text_file.txt" in the same event; others will have either of the two.

Anything else gets in the way? Yes, lots! MS Office will write to all files, even just opened for reading (with no user modifications to the content whatsoever), which will screw up your log monitoring efforts. If the file is on a share, more information will be missing (e.g. username might be).

So, how to use Windows event logs for file access tracking?

  1. Enable logging (as described above)
  2. Pick events 560 (most useful) and 562, 567 (useful too)
  3. Look for fun filenames that might be touched by the users (have a list of files and users handy)
  4. Figure out what programs were used to access them (this is called "Image File Name" in "WinLogSpeak")
  5. Ponder the 'Accesses' section of each event until your brain turns blue :-) or until you decide whether such access is authorized or not...

Overall, this is still very useful for file access monitoring, but the process is somewhat painful.

BTW, I am tagging all the tips on my del.icio.us feed. Here is the link: All Security Tips of the Day.

Technorati tags: , , ,

Posted May 08, 2008 in Compliance , Innovation , Log Management & Intelligence , Security | Permalink | TrackBack (0)

« April 2008 | Main

More 80s: Rubik's Cube for Log Operations

clip_image001

While log management for operations and log management for compliance or security are different applications, they share many of the same foundational requirements so system administrators can benefit from recent advances inspired by security applications:

- Collection

The ability to collect log data from a large variety of sources – with different protocols and different formats, either through an agent-less or agent-based infrastructure. A near-real-time collection is also critical to both security and operations use of logs. Such timely collection enables alerting that warns the users of recent or even impending system failures.

- Normalization

The ability to compare log data from disparate sources. For example, the ability to run a user activity report aggregating all login activity for a particular user, including login to the VPN and the finance server. Or such as the ability to run one report that shows all activity for a particular user, from e-mails sent to websites visited. For operational use, performance measurement across different systems can only be done on normalized data.

- Summarization

The ability to count and summary the log messages collected, by log type, by message type and such. One failed login perhaps isn’t meaningful, but more than five in a row could be significant. The same logic applies to system errors and failures that needs to be reviewed while using logs for maintaining and optimizing system and network operations.

- Statistical analysis

Unusual patterns in log data, an unusual ratio between accepted and denied connections on a firewall for example, can be an indication of a security breach. In the future, statistical algorithms applied to log data may enable failure prediction and other advanced analysis that directly contributes to improved SLAs.

- Alerting

The ability to trigger (near) real-time alerts that are user configurable, either based on manually written rules or automated statistical analysis. Such alerts serve to bring urgent issues to system operator or security analysts attention.

- Search

Search is central to log-based investigations, whether for an operations use (such as system fault investigation) or security use (hacker or insider attack). An ability to go through 100% of logs is key for all three uses for logs: security, compliance and operations. Such searches must be fast and easy – so that users are able to run them while under pressure of a troubleshooting or security incident.

It is also important to note that log management for operations has its own unique requirements:

- Collection revisited

Faults are notoriously singular – this means that they occur once, but never again in quite the same manner. Therefore it is very difficult to predict what log messages are going to be most useful for problem isolation and most practitioners now admit it is best to keep all log data around for post-incident analysis. Therefore, the requirement to collect 100% of all log messages of all log sources is even more important in operations than it is in security.

- Log browsing (data mining)

While for compliance, an auditor may review the same report (say failed logins) every quarter, no two troubleshooting session are quite the same. Problem isolation is an interactive process of trial and error. An administrator may look at the same data from many different angles before understanding the root-cause – like examining a Rubik’s cube. Reports have to be customizable on the fly. Pre- and post-report filtering options are important to allow for dynamic report (re)-configuration. Search is important, but not sufficient and you will likely want to be combine search with access to normalized and cross-correlated information.

- Search (and reporting) speed

Speed truly matters when it comes to fault detection and problem isolation. Whether a forensic investigation takes one hour or one day or one week usually doesn’t really break the bank, but whether a down-time situation persists for minutes or hours can be a matter of many millions of dollars in missed revenues. When troubleshooting a problem, every query must be very fast: whether indexed search or a report against normalized data, every second and every minute counts.

- GUI and Workflow

An external auditor looking at logs to verify that nobody improperly accessed credit card information is going to follow a very different work-flow from an internal investigator examining a potential fraud case and yet completely different from that help-desk person who is trying to tell you why your e-mail isn’t being delivered or your VPN connection is so slow. For optimal functionality and productivity, the best graphical user interfaces and workflows are application specific.

- SOA-based portal or mash-up

The initial fault alarm will likely land with a help-desk employee; in the form of an HP Software (or equivalent) alert, a log alert or a phone call from an unhappy user. Either way, the first-level support person will attempt to perform some analysis. In many cases, truly understanding the problem requires access to log data. Without log automation, it could require a phone call to a third-level support person and a long wait-time until the escalation managers returns his log analysis. However, in the new brave world of log analysis, the help-desk employee could access log data remotely with a single mouse-click assuming the task is made easy enough. It probably means further customizing the workflow and GUI to a particular customer’s situation. This is easy to do with today’s web 2.0 technologies and open web services APIs: a custom portal or mash-up can be created in days.

- SOA based integration

Unlike with log management for security, for fault analysis very mature consoles and dashboards exist. These event management systems even have correlation and alerting capabilities. Rather than replacing these systems with yet another console, most companies are going to look for the ability to integrate a new information source, log data in this case, into the existing fault management console. Web services likely will be the mechanism of choice.

- (Lack of) archiving

Keeping log data around for long periods of time is not a requirement. Data quickly loses its value after the fact. However, mining historical data patterns to predict future failures before they occur can be very valuable. This field is still in its infancy, but shows a lot of promise. Given enough data, both error data and fault data, predictive analysis is not far in the future.

It appears to me that the ideal technical architecture for log management recognizes both similarities and differences of the various log management use cases (and there are many more than just security and operations). Would the ideal solution perhaps be a common log data platform that can collect, aggregate, summarize, normalize, index and apply basic analytics to log data once, while allowing for a many different user experiences depending on the use case?

Posted May 06, 2008 in | Permalink | TrackBack (0)

« April 2008 | Main

Logging Poll #8 Context for Log Analysis

So, my next poll is up - and it is fun (but more technical): what information is most useful when trying to make sense of a log entry?
Vote here! Analysis will be posted here in a few weeks.

Past polls:

  • Poll #7 "What tools do you use for Windows Event Log collection?" (analysis)
  • Poll #6 "Which logs do you LOOK at?" (analysis)
  • Poll #5 "What are your top challenges with logs?" (analysis)
  • Poll #4 "Who looks at logs in your organization?" (analysis)
  • Poll #3 "What do you do with logs?" (analysis)
  • Poll #2 "Why collect logs?" (analysis)
  • Poll #1 "Which logs do you collect?" (analysis)

     

    Technorati tags: , ,
  • Posted May 05, 2008 in Innovation , Log Management & Intelligence , LogEd | Permalink | TrackBack (0)

    « April 2008 | Main

    The best of the 80s: log management for operations!

    clip_image002

    Log management has been around for a loooong time. In the 80s log file management was the primary mechanism for fault analysis and management of computer systems. Also in the 80s, Eric Allman at the University of Berkeley developed a logging standard called syslog as part of the Sendmail project. While adopted by quite a few applications, many other protocols and formats persist until today.

    The sheer success of log data nearly killed it. The cacophony of log formats and the sheer volume of messages generated – up to 40 terabytes a month for a mid-sized organization or, shall we say 100,000 log messages every second (!) , it is impossible for any human being to keep track of all that logs have to say. Based on SNMP alerts and other event data, including selected error log messages, large-scale event management systems such as HP OpenView emerged as the new kings of fault detection.

    If it was not for compliance and security concerns, log management might not have made it back. But out of a need to track user activity and to identity potential insider and outsider intrusions and transgressions of corporate networks emerged a new form of log file analysis. Log data featured prominently in Paul Proctor’s Practical Intrusion Detection Handbook in the late 90s for example and tens of companies emerged to perfect the art of security event management based on log data.

    Now, in part due to virtualization and the ever increasing cost of downtime in our networked economy, system and network administrators have re-discovered log data. In surveys, 70%+ of organizations confess their primary budget for log management still comes from compliance. However, this same group admits for years now that 70% of their use of log data is driven by operational needs such as fault detection and problem isolation. This is no surprise, because operations use cases can drive true log management ROI. One minute of down-time could cost millions so if automating log management can help to accelerate problem isolation, then companies are willing to pay big bucks. If giving help-desk employees access to normalized log data can off-load expensive third-level support personnel that is even better.

    So, as the sun is setting on HP OpenView (the name was changed to HP Software in 2007), a new dawn has broken for log management in operations! Hoorah!

    Posted May 01, 2008 in | Permalink | TrackBack (0)

    Visit loglogic.com

    I ♥ Logs

    Subscribe to this blog’s feed RSS

    May 2008
    Sun Mon Tue Wed Thu Fri Sat
            1 2 3
    4 5 6 7 8 9 10
    11 12 13 14 15 16 17
    18 19 20 21 22 23 24
    25 26 27 28 29 30 31
    Categories
    Archives
    Blogroll
    Blogroll
    Compliance
    Good Reading
    LogLogic
    LogLogic Partners
    Sites We Watch