
While LogLogic’s log data warehouse solution has featured an open web services API since late 2006, recently the interest from customers and partners to use the API has sharply increased. In part, this trend can be explained by the re-discovery of log data in the fault management community. Help desk staff has different requirements for visualizing log data than internal auditors. Rather than waiting for commercial software vendors (such as LogLogic) or internal application developers, many end users are taking their destiny in their own hands and are customizing portal views into log data for their specific use case. More than the emergence of a long-tail of use cases for log data, it is web 2.0 and Facebook that are at the root of the surge in log-creativity. The Facebook generation is increasingly self-sufficient and EXPECTS applications to be open and programmable. Of course this trend is not limited to LogLogic alone. Analyst Vishwanath Venugopalan of The 451 Group says that 49% of enterprises create their own mash-ups today and that 74% of enterprises will increase mash-up activity next year. It is therefore no surprise that customers are demanding their log management solution to have an open, standards-based API. What is surprising is how few vendors have picked up on this trend by opening up their solutions.
Posted May 15, 2008 in | Permalink | TrackBack (0)
Following the new "tradition" of posting a security tip of the week (mentioned here, here ; SANS jumped in as well), I decided to follow along and join the initiative. One of the bloggers called it "pay it forward" to the community.
So, Anton Logging Tip of the Day #15: Fear and Loathing in Event 560 (and 562 and 567)
This tip digs into a seemingly simple, but really VERY esoteric subject: monitoring file access and modification via a Windows event log. Now, some people - who never studied this subject - tend to have a very simplistic view of this: just enable Object Access auditing, then right-click on a file or directory, click Security->Advanced->Auditing and then pick what types of events will be logged and by what accessing entities (i.e. users or computers). OK, so this will produce some logs, that is for sure. But are they useful?
First, why are we doing this? We typically need to know the following when we audit file access in Windows (or any other OS for that matter) for security (monitoring and investigation) or compliance:
Can we get this from the above logs? No.
What? No!?! Really?
Yes, really. We can get some of the above, some of the time, not all of the above, all of the time. Here is an example, we are looking at event ID 560 (picture) and then at an extract from its description field.
Event:
Description (selected field):
Object Server: Security
Object Type: File
Object Name: C:\0\TestBed\simple_text_file.txt
Image File Name: C:\WINDOWS\system32\notepad.exe
Primary User Name: Anton
Primary Domain: XXXXXX
Accesses: READ_CONTROL
SYNCHRONIZE
ReadData (or ListDirectory)
WriteData (or AddFile)
AppendData (or AddSubdirectory or CreatePipeInstance)
ReadEA
WriteEA
ReadAttributes
WriteAttributes
WTH is that? Well, we know that the user 'Anton' has successfully read? wrote? changed attributes? did something? with a file named "C:\0\TestBed\simple_text_file.txt" using a program named "C:\WINDOWS\system32\notepad.exe." That's the best we can get, in this case! We may try to look at event IDs 562 and 567, but this missing information (i.e. the exact action performed) will not be added.
BTW, there will be a few more dozen (sometime hundreds!) of the 560s, 562s and 567s produced - all from just opening the text file in a notepad. The above event is notable for having BOTH "notepad" and "simple_text_file.txt" in the same event; others will have either of the two.
Anything else gets in the way? Yes, lots! MS Office will write to all files, even just opened for reading (with no user modifications to the content whatsoever), which will screw up your log monitoring efforts. If the file is on a share, more information will be missing (e.g. username might be).
So, how to use Windows event logs for file access tracking?
Overall, this is still very useful for file access monitoring, but the process is somewhat painful.
BTW, I am tagging all the tips on my del.icio.us feed. Here is the link: All Security Tips of the Day.
Posted May 08, 2008 in Compliance , Innovation , Log Management & Intelligence , Security | Permalink | TrackBack (0)
While log management for operations and log management for compliance or security are different applications, they share many of the same foundational requirements so system administrators can benefit from recent advances inspired by security applications:
- Collection
The ability to collect log data from a large variety of sources – with different protocols and different formats, either through an agent-less or agent-based infrastructure. A near-real-time collection is also critical to both security and operations use of logs. Such timely collection enables alerting that warns the users of recent or even impending system failures.
- Normalization
The ability to compare log data from disparate sources. For example, the ability to run a user activity report aggregating all login activity for a particular user, including login to the VPN and the finance server. Or such as the ability to run one report that shows all activity for a particular user, from e-mails sent to websites visited. For operational use, performance measurement across different systems can only be done on normalized data.
- Summarization
The ability to count and summary the log messages collected, by log type, by message type and such. One failed login perhaps isn’t meaningful, but more than five in a row could be significant. The same logic applies to system errors and failures that needs to be reviewed while using logs for maintaining and optimizing system and network operations.
- Statistical analysis
Unusual patterns in log data, an unusual ratio between accepted and denied connections on a firewall for example, can be an indication of a security breach. In the future, statistical algorithms applied to log data may enable failure prediction and other advanced analysis that directly contributes to improved SLAs.
- Alerting
The ability to trigger (near) real-time alerts that are user configurable, either based on manually written rules or automated statistical analysis. Such alerts serve to bring urgent issues to system operator or security analysts attention.
- Search
Search is central to log-based investigations, whether for an operations use (such as system fault investigation) or security use (hacker or insider attack). An ability to go through 100% of logs is key for all three uses for logs: security, compliance and operations. Such searches must be fast and easy – so that users are able to run them while under pressure of a troubleshooting or security incident.
It is also important to note that log management for operations has its own unique requirements:
- Collection revisited
Faults are notoriously singular – this means that they occur once, but never again in quite the same manner. Therefore it is very difficult to predict what log messages are going to be most useful for problem isolation and most practitioners now admit it is best to keep all log data around for post-incident analysis. Therefore, the requirement to collect 100% of all log messages of all log sources is even more important in operations than it is in security.
- Log browsing (data mining)
While for compliance, an auditor may review the same report (say failed logins) every quarter, no two troubleshooting session are quite the same. Problem isolation is an interactive process of trial and error. An administrator may look at the same data from many different angles before understanding the root-cause – like examining a Rubik’s cube. Reports have to be customizable on the fly. Pre- and post-report filtering options are important to allow for dynamic report (re)-configuration. Search is important, but not sufficient and you will likely want to be combine search with access to normalized and cross-correlated information.
- Search (and reporting) speed
Speed truly matters when it comes to fault detection and problem isolation. Whether a forensic investigation takes one hour or one day or one week usually doesn’t really break the bank, but whether a down-time situation persists for minutes or hours can be a matter of many millions of dollars in missed revenues. When troubleshooting a problem, every query must be very fast: whether indexed search or a report against normalized data, every second and every minute counts.
- GUI and Workflow
An external auditor looking at logs to verify that nobody improperly accessed credit card information is going to follow a very different work-flow from an internal investigator examining a potential fraud case and yet completely different from that help-desk person who is trying to tell you why your e-mail isn’t being delivered or your VPN connection is so slow. For optimal functionality and productivity, the best graphical user interfaces and workflows are application specific.
- SOA-based portal or mash-up
The initial fault alarm will likely land with a help-desk employee; in the form of an HP Software (or equivalent) alert, a log alert or a phone call from an unhappy user. Either way, the first-level support person will attempt to perform some analysis. In many cases, truly understanding the problem requires access to log data. Without log automation, it could require a phone call to a third-level support person and a long wait-time until the escalation managers returns his log analysis. However, in the new brave world of log analysis, the help-desk employee could access log data remotely with a single mouse-click assuming the task is made easy enough. It probably means further customizing the workflow and GUI to a particular customer’s situation. This is easy to do with today’s web 2.0 technologies and open web services APIs: a custom portal or mash-up can be created in days.
- SOA based integration
Unlike with log management for security, for fault analysis very mature consoles and dashboards exist. These event management systems even have correlation and alerting capabilities. Rather than replacing these systems with yet another console, most companies are going to look for the ability to integrate a new information source, log data in this case, into the existing fault management console. Web services likely will be the mechanism of choice.
- (Lack of) archiving
Keeping log data around for long periods of time is not a requirement. Data quickly loses its value after the fact. However, mining historical data patterns to predict future failures before they occur can be very valuable. This field is still in its infancy, but shows a lot of promise. Given enough data, both error data and fault data, predictive analysis is not far in the future.
It appears to me that the ideal technical architecture for log management recognizes both similarities and differences of the various log management use cases (and there are many more than just security and operations). Would the ideal solution perhaps be a common log data platform that can collect, aggregate, summarize, normalize, index and apply basic analytics to log data once, while allowing for a many different user experiences depending on the use case?
Posted May 06, 2008 in | Permalink | TrackBack (0)
So, my next poll is up - and it is fun (but more technical): what information is most useful when trying to make sense of a log entry?
Vote here! Analysis will be posted here in a few weeks.
Past polls:
Posted May 05, 2008 in Innovation , Log Management & Intelligence , LogEd | Permalink | TrackBack (0)
Log management has been around for a loooong time. In the 80s log file management was the primary mechanism for fault analysis and management of computer systems. Also in the 80s, Eric Allman at the University of Berkeley developed a logging standard called syslog as part of the Sendmail project. While adopted by quite a few applications, many other protocols and formats persist until today.
The sheer success of log data nearly killed it. The cacophony of log formats and the sheer volume of messages generated – up to 40 terabytes a month for a mid-sized organization or, shall we say 100,000 log messages every second (!) , it is impossible for any human being to keep track of all that logs have to say. Based on SNMP alerts and other event data, including selected error log messages, large-scale event management systems such as HP OpenView emerged as the new kings of fault detection.
If it was not for compliance and security concerns, log management might not have made it back. But out of a need to track user activity and to identity potential insider and outsider intrusions and transgressions of corporate networks emerged a new form of log file analysis. Log data featured prominently in Paul Proctor’s Practical Intrusion Detection Handbook in the late 90s for example and tens of companies emerged to perfect the art of security event management based on log data.
Now, in part due to virtualization and the ever increasing cost of downtime in our networked economy, system and network administrators have re-discovered log data. In surveys, 70%+ of organizations confess their primary budget for log management still comes from compliance. However, this same group admits for years now that 70% of their use of log data is driven by operational needs such as fault detection and problem isolation. This is no surprise, because operations use cases can drive true log management ROI. One minute of down-time could cost millions so if automating log management can help to accelerate problem isolation, then companies are willing to pay big bucks. If giving help-desk employees access to normalized log data can off-load expensive third-level support personnel that is even better.
So, as the sun is setting on HP OpenView (the name was changed to HP Software in 2007), a new dawn has broken for log management in operations! Hoorah!
Posted May 01, 2008 in | Permalink | TrackBack (0)
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 |