LogBlog

« July 2008 | Main | September 2008 »

Anton Logging Tip of the Day #16: Virtually There - Journey Into VMWare ESX Log Analysis

Following the new "tradition" of posting tips of the week, I decided to follow along and join the initiative.

So, after a long delay, Anton Logging Tip of the Day #16: Virtually There - Journey Into VMWare ESX Log Analysis

CISecurity guide for VMWare (here) and DISA STIG for virtual machines (here) both mandate collection and analysis of VM platform logs; none goes into enough details on what to look for in logs. Let's try to shed some light on security-focused log analysis of VMWare ESX v. 3.x logs.

First, at least until ESXi becomes the default choice, one needs to keep in mind that ESX has "Linux-inside" and thus diving into /var/log will not reveal any "alien technology" (well, not much of it  :-)). However, one of the most useful logs is /var/log/hostd.N which is not a descendant of Linux standard logs. Extensive VM event records are written into this file.

Let's focus on various types of logins to the ESX platform and identify logs that indicate a successful and failed attempts to log in. Here are a few useful examples to analyze:

Successful logins:

This is a classic Linux root login message; you can watch for these by searching VMWare ESX logs for "session AND opened AND user AND root."  Notice the user name of the user who switched to root.

This is also a classic Linux message for a normal (non-root) user login.

This is a VMWare -specific application login to ESX. You can track such events by username, by event ID or by keywords "event AND logged AND user" (if you are using search)

Failed logins:

Another classic Linux message from the ESX system; a failure to login due to incorrect password.

A message indicating a failure to login due to incorrect username (note a typo).

This ESX Linux platform message should also be familiar to Linux/Unix admins: it indicates multiple sudo password failures; look for such messages in the logs.

BTW, do you need to be reminded to track NOT only failed, but also successful login events?! This applies to virtual as well as physical environments.

Overall, you must prepare for the future by learning to analyze  VMWare logs, just like you handled "legacy OS", such as Linux/Unix and Windows.

As I said before, I am tagging all the tips on my del.icio.us feed; here is the link: All Security Tips of the Day.

Technorati tags: , , ,

Posted August 27, 2008 in Innovation , Log Management & Intelligence , Security | Permalink | TrackBack (0)

« July 2008 | Main | September 2008 »

Logging Stories from the Field

Our brilliant field engineer, Dimitri McKay (his blog) brings another fun and insightful story from the field:  "I recently went on-site for a proof of concept. I’ve always loved these exercises, as it gives me a chance to help a customer see that which was invisible in the past, whether it be virus-outbreaks, users abusing bandwidth via bit-torrent and file sharing, or VOIP phones assaulting DHCP servers for IP addresses. This particular customer had an interesting configuration.  They had been sending their critical/alert and emergency firewall logs to a 3rd party security operations center. That SOC was supposed to monitor the firewall data for any risky traffic, identify any anomalies, and report the instant there’s an issue.

Because of this, we wanted to forward those specific messages on to the 3rd party security operations center. We configured the LogLogic appliance to collect ALL firewall messages, and send off all Emergency, Critical and Alert messages to that 3rd party vendor.  Over the next couple hours, we turned routed stream after stream of syslog data to the LogLogic appliance. Slowly raising our MPS rates, collecting data for alerting, collecting data for reporting; overall collecting data to sift and sort through. We cranked the logging levels way up. Our goal was to "abuse" the box, and get as much out of this POC as possible. The firewalls, routers, switches, and Unix/Linux boxes were logging; we then added several dozen Windows hosts logging via Lasso.

Now came the fun part. We then started drilling down into the data. I illustrated to our customer the agile reports, we ran searches across a mountain of data, and I showed the onlookers what interesting information we could mine from that mass of daily log data.  At one point, I ran an FTP report.

Suddenly there were questions: "Why are there like 450 active FTP connections from Germany?"
Which led to more questions: who is that? what are they attempting to do?

Within several minutes, we were able to see that an FTP server had been left wide open in the DMZ with 'anonymous' logins allowed. We were also able to see the file names being uploaded/downloaded via the firewall port 21 traffic logs.  Next, we began looking at the logs from the compromised server itself.  We were able to see that the server that had been compromised had been actually compromised back in March.

The logs also revealed that they attempted a dictionary attack (over nearly 3 months) hoping to get access to the box, and the fear was: if they accessed to the box, and ran something like l0ftcrack on it, the box had once been part of the domain, and so user accounts and passwords could be revealed.  The box was a virtual machine "test box" that a developer had fired up, added to the domain, didn't harden, and had then transferred it to the DMZ violating a half dozen or more security protocols.  In that same DMZ was their email system, which likely had the same login/pass combo, and if that was compromised, who knows.

At the end of the day we were able to identify a compromised machine, make the security officer look like a superstar, and illustrate just how fast and agile our reports and search capabilities are. If we didn't prove that LogLogic is the clear best fit, we certainly created a perfect use case for Log Management. "

Enjoy more of Dimitri's writing on his blog!

Posted August 26, 2008 in Log Management & Intelligence | Permalink | TrackBack (0)

« July 2008 | Main | September 2008 »

Challenges of Enterprise Cloud Computing

[ Originally posted at OnSaaS ]

Today, the major use of cloud computing for enterprises are still in its infancy (heck the whole cloud computing space is in its infancy). Most enterprises use cloud computing for testing, development and other peripheral tasks. However, most, if any, are using the clouds for production use. This is fairly similar to the virtualization space, where early use of the virtualization technology are for testing and development. Ten years later, we are seeing more and more enterprises adopt virtualization for production use and virtualization has become main stream.

What are these challenges for enterprise cloud computing? I have tried to summarize them here (in no particular order).

Data Governance

I've written extensively about the need for data governance in previous posts. In essence, enterprises have a ton of sensitive data that requires access monitoring and protection. Data (and information generated from the data) is the life blood of many enterprises, the loss of control will not be acceptable. Whole markets (read: DLP) are created to protect the enterprise data and information. On top of all that, enterprises must comply with many of the regulations that require data governance. By moving the data into the cloud, enterprise, for now, will lose some capabilities to govern their own data set. They would have to rely on the service providers to guarantee the safety of their data.

I hate to invoke the ILM acronym but much of data governance is about

So who's tackling this problem? As far as I know, nobody is and nobody really can except for the service providers themselves. It is really up to the service providers such as Amazon, Google and Salesforce to provide guarantees that customer data are safe and access to data are restricted and protected.

Manageability

There are some great IaaS/PaaS out there, including Amazon's web services (S3, EC2, EBS, etc), Google's App Engine, Salesforce's Force.com, Joyent, etc. However, most of these are raw infrastructures and platforms that do not have great management capabilities. This is not unusual. Throughout computing history, raw capabilities will generally appear on the market first, then management of these raw capabilities become a differentiator when competition heats up. Just look at the blade server and virtualization spaces as these are great examples of that trend. The hypervisor was the key technology that enabled enterprise virtualization; however, that piece is now being given away (see VMware's ESXi) and management capabilities becomes the main differentiator.

Cloud computing is no different. An example of missing management capabilities for cloud infrastructures is auto-scaling. Amazon EC2 claims to be elastic; however, it really means that it has the potential to be elastic. Amazon EC2 will not automatically scale your application as your server becomes heavily loaded. It is still up to the developer to manage that scalability problem.

So who's tackling this problem? Many startups have recognized the need for management early on and have built management capabilities on top of the existing cloud infrastructure/platforms. RightScale is one of the early pioneers in this space. Their solution solves many of the management issues such as auto-scaling and load balancing.

Monitoring

Monitoring, whether is for performance or availability, is critical to any IT shop. We are not talking about just how much CPU or memory the machines are using. We are talking about performance of transactions and disk IO and others. CPU and memory usage are misleading most of the time in virtual environments. The only real measurement is how long your transactions are taking and how much latency there are. According to High Availability's article on latency:

Amazon found every 100ms of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%. A broker could lose $4 million in revenues per millisecond if their electronic trading platform is 5 milliseconds behind the competition.

So who's tackling this problem? Hypernic's CloudStatus is one of the first to recognize this issue and developed a solution for it. They started with monitoring of Amazon's web services, then recently added monitoring for Google App Engine. In addition, RightScale's solution can also provide monitoring for the virtual machines under their management.

Reliability and Availability

I won't beat the dead "Gmail down, EC2 down, etc down" horse here. But the truth of the matter is enterprises today cannot reasonably rely on the cloud infrastructures/platforms to run their business. There’s almost no SLAs provided by the cloud providers today. Even Jeff Barr from Amazon said that AWS only provides SLA for their S3 service. I haven’t researched the SLA issue so not sure how true that is. But if it’s true, I think this will be one of the biggest factor, if not the biggest factor, in enterprise adoption. Can you imagine enterprises signing up cloud computing contracts without SLAs clearly defined? It’s like going to host their business critical infrastructure in a data center that doesn’t have clearly defined SLA.

We all know that SLAs really doesn’t buy you much. In most cases, enterprises get refunded for the amount of time that the network was down. No SLA will cover business loss. However, as one of the CSOs I met said, it’s about risk transfer. As long as there’s a defined SLA on paper, when the network/site goes down, they can go after somebody. If there’s no SLA, it will be the CIO/CSO’s head that’s on the chopping block.

So who's tackling this problem? Well, again, no one is today as far as I know. Maybe some startup will come up with clever idea to provide SLA as a third party vendor (read: cloud insurance.) Or maybe the cloud providers will grow/wake up and actually do something to encourage the enterprise adoption.

Virtualization Security

Security is a huge area that encompasses many different things, including the standard enterprise security policies on access control, activity monitoring, patch management, etc. On top of that, virtualization security is something that most enterprises are just starting to grasp but don't fully understand. Many IT people still believe that the hypervisor and virtual machines are safe. Recent presentations from Blackhat has demonstrate that we shouldn't sleep so tight at night. As IT shops get more educated on the virtualization security issues, it will become one of the factors they will consider when they move into the cloud. Access control and monitoring of the virtual infrastructure will be on top of their mind.

So who's tackling this problem? There are quite a few startups like Reflex, Blue Lane and Catbird that are creating privileged VAs that claim to protect the VAs running on VMware's ESX servers. However, ensure you do your research on the performance of these solutions first before adopting one of them. Other startups (unnamed) are creating interesting solutions in protecting the actual virtual infrastructure themselves, e.g., how do you protect and monitor access to the ESX servers? how do you control and monitor the movement of virtual machines using live migration or VMotion.

---

Cloud computing is here to stay. It will be the next big wave and will be adopted by enterprises. However, the industry as a whole needs to answer some of these challenges and ease the enterprises' concerns.

Posted August 23, 2008 in Cloud Computing , Security | Permalink | TrackBack (0)

« July 2008 | Main | September 2008 »

Even More Critical Logging Questions - Answered

I recently did this webcast on logging for accountability (slides and recording here) and people asked a lot of good questions. Here are some of the answers for them as well as our blog readers.

 

Q1: How do you handle variety of log sources? There are so many, almost beyond my capability.

A1: Sorry to ponder the meaning of "is" here, but what is meant by "handle"? It is really not that hard to collect logs from a large number of diverse sources, given the right tools (as long as the logs can be delivered via syslog or grabbed as files). Now, there will certainly be challenges  when the volume of logs gets large, but if by "handle" you mean "collect + store", it is really not that hard, again, given the right tools. Now, if "handle" means "make sense of what all those logs are trying to tell you," it is a different story altogether. It is indeed pretty hard to extract the meaning of all those logs automatically.

 

Q2: You talked about the importance of logging; however for an intermediate or novice admin what are the starting steps .. what are the minimal logs they should start at once?

A2: Answered in "Log Management - Day 1" If you want a simple list of logging things to "enable today,"  I cannot really answer it since I know neither your needs, nor your environment. Remember, "requirements first - tools second!"

 

Q3: What regulations, rules or guidance exist regarding sharing or visibility of logs to users?

A3: PCI DSS says in Requirement 10.5:  "Secure audit trails so they cannot be altered.
10.5.1 Limit viewing of audit trails to those with a job-related need
10.5.2 Protect audit trail files from unauthorized modifications"

NIST guidance for FISMA also says something similar (for example, look in NIST 800-92 doc). Overall, log protection and security are mentioned in many other regulations as well, all the way to ISO and COBIT.

 

Q4: How I can learn what exactly I need to log?

A4: Let me answer "how can I learn" part and not the "what exactly I need to log part," as it is actually answerable (also see discussion on "MUST-DO Logging for PCI?") . To learn what you need to log, first ask "Why?" (and then see this) - basically establish what you want to accomplish with logs, then catalogue your systems, then figure how to tweak the logging knobs - and then actually go and tweak them.

 

Q5: What is "more control" and what is "less control" that you mention in the webcast? Can you give an example?

A5: OK, I did say that "sometimes when you implement more controls, you actually have less control." What do I mean? If you buy a firewall (a network security control) and then - over time, of course - configure it with 7800 rules (!) that are supposed to give you control over who can and cannot access your network, you will not gain control over your environment. You will actually be less in control of who is touching your network, compared to, say, having only 20 rules.

 

Q6: What about mandated NIST controls for government systems? Auditing is a specific control for Moderate and High risk systems. What list of events do you recommend for auditing?

A6: This is too long to answer here, but NIST 800-92 Guide is a really good source of such info ("Guide to Computer Security Log Management [PDF]") Also, see my presentation on NIST 800-92 Guide in the Real World.

 

Q7: The issue that many organizations get stuck on is the monitoring process, and defining what exceptions to monitor for? Is there guidance for this? How much of it is system specific and how much is applicable generally to all systems?

A7: I outlined some general ideas back in 2004 via this presentation; it is mostly general, but also has pointers to specific system. Keep in mind that it is focused on security, not operational monitoring (which is often no less important - in fact, often MORE important).

Enjoy! Sorry for being brief with some of the answers.

Other questions that I answered in the past:

Posted August 07, 2008 in Innovation , Log Management & Intelligence | Permalink | TrackBack (0)

« July 2008 | Main | September 2008 »

Tomorrow's Logging Problems - Part II

I would like to continue the discussion I started in my previous post called "Today's Logging Problems - Then Future Problems - Part I." Specifically, upon outlining some problems with logging, I will now forecast what will happen with them in 18-24 months.

First, I'll predict that "Not knowing what to log" problem will be mostly solved in 18-24 months; at least as far as major regulations go, people will have a pretty good idea a) what the auditors want them to log (and review!) b) what they need to log for solving their problems. Now, esoteric log sources and custom application might still present a challenge from that point of view, but for basic "staples" (firewall, network gear, major OS) the mystery will be over (again, see "Tell me EXACTLY what to log for PCI?" for some reference)

Next, the problem of "Log volume" will definitely get worse, much worse. One might think that 100,000 each second is a lot of log - but there WILL BE more at many companies! Big application log explosion is coming, fueled by the need to address logging in areas where such motivation was lacking before (basically, custom and vertical applications) as well as harness the power of "uncommon" logs for such tasks as fraud analysis or SOA monitoring. Keep in mind that even though in some areas logging is not a preferred way of monitoring and auditing activities (see this discussion on database logs here), application logging will still explode on us ...

The problem of "Log diversity" (the fact that most logs all look different in format and meaning) will get worse before it will get better - and better it WILL get since standards are being developed. We will see people struggling with all sorts of bizarre log data in the coming years. Virtualization (on logs and VM), web services and SOA, various ERP applications and even cloud services will increase the diversity of logging in the coming years.

Similar to the above, a problem of "Bad logs" (ones that are subjective, miss key information, require groping for a crystal ball to understand, turn log analysis into a painful experience or are useless in some other way) will also follow the pattern of the above log diversity problems - it will get worse before it gets better (via the CEE standard effort that now covers the OpenXDAS effort as well!) I noticed that people started asked me questions about "how to do application logging right?" and "what to tell application developers about logging?" which almost never happened in the past. More on this in the future!

"Getting the logs" has gotten much easier in recent years; agentless collectors like Project Lasso (which, BTW, just got updated) and grabbing files remotely via secure protocols made application log collection easier (TCP syslog and buffering also helped). Next, Windows 2008 will make it MUCH easier for the whole Windows kingdom due to their use of web services. However, in the future it might resurface as we try to collect logs from unusual places, again, clouds come to mind as well as virtual environments (e.g. how do you get logs off a dormant VM?). What's the next frontier in this area? Log discovery - automatic finding and identifying log files on systems in order to analyze and retain them.

All this, however, pales in comparison with my favorite "uber-challenge", "Making sense of logs in an automated fashion" - this baby is definitely not going away in 2-3 years. Much more research is needed to make that "log->conclusion" jump automatically without head-scratching, invoking ancient deities and making wild guesses. Once there, we can attempt to reliably handle "proactive logging" (i.e. analyzing various failure or compromise precursors in logs and then predicting the future based on them), another Holy Grail of logging domain.

Anything new will emerge? Yes, I think awareness of the "Logging Gap" problem will grow. "Logging gap" happens when you combine "a need to log" with utter "inability to do so." For example, this will happen when people will know that they HAVE TO log, say, for compliance, but will have no way of doing it due to application or platform limitations. This will become one of the challenges and special "logging add-ons" will appear to close the logging gap and create additional logs where activity audit is desperately needed, but native logging is not helping to achieve it.

Also, I think people will finally wake up to "Log security" challenges - i.e. producing for use as evidence, compliance attestations, etc. Log security is not getting the attention it deserves, but I think this challenge will finally emerge in full force in the next 2-3 years. BTW, my next poll addresses that (vote)

Anything else I missed? Share away!

Related posts:

Posted August 06, 2008 in Innovation , Log Management & Intelligence | Permalink | TrackBack (0)

« July 2008 | Main | September 2008 »

Logging Poll #8 - Log Security and Protections

My next logging poll is out - with it I set out to figure out the old mystery of mine, why people don't protect their log data (e.g. see this lamentation "Top 11 Reasons to Secure and Protect Your Logs")

Vote away! As always, results will be posted.

Past polls and analysis are all here.

Posted August 05, 2008 in Log Management & Intelligence , LogMatters | Permalink | TrackBack (0)

Visit loglogic.com

I ♥ Logs

Subscribe to this blog’s feed RSS

November 2008
Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            
Categories
Archives
Blogroll
Blogroll
Compliance
Good Reading
LogLogic
LogLogic Partners
Sites We Watch