Greetings all. Hopefully lots of you were able to join us at our Catalyst Conference last week. I know I enjoyed sharing a beer with many of you by the pool (wish it was a little warmer, but not exactly a rough life).

Anyway, today I wanted to spend a little time talking about a topic I brought up awhile ago: Monitoring. I don’t pretend to be an industry expert on monitoring, or even a ChannelAdvisor expert for that matter, but I do know enough to be dangerous. I also feel this is a pretty important topic for you to understand, as our use of monitoring really can have a material impact on your business (or more appropriately, a lack of good monitoring can have a materially negative impact on your business, something we take seriously).

What is Monitoring?

As I covered in a prior posting, our products operate as a Software as a Service solution. This means our software must be online and ready to service transactions 24 hours a day, 7 days a week. The ChannelAdvisor software accomplishes a wide range of tasks under this charter: from posting auctions to ebay, to bidding on keywords on Google, to sending data feeds to Shopping.Com, to serving up checkouts to your customers. All of this software runs on lots and lots of servers, well over 200 and growing steadily by my last count.

The more servers you have in a system, the more potential something, somewhere will go wrong. A circuit board may fry, Windows may crash, a meteor may strike, whatever…you just don’t know. We’ve built a lot of redundancy and compensators into our systems to account for these failures (and in my last shameless narcissistic plug, you can refer to Long Live the Data and Evil Monkeys and Self Healing Software for more info on that), but there are still problems that can occur outside of our control. Additionally, since our systems are so interconnected with the outside world (ebay, amazon, google, paypal, shopping.com, etc.), any changes in any of these partners (API changes, systems down for maintenance, whatever), will have repurcussions on our systems as well.

This is where monitoring comes in: monitoring is essentially an automated, repeatable process for polling specific resources and systems for a predicted result. If a certain result is not found, a monitoring system will alert a person for further analysis and correction. Think of it like the lookout by the campfire: if the bad guys are coming over the ridge, you need to wake up your buddies to run for the hills (or in the case of 300, start kicking some Persian butt).

Monitors are (or at least should be) a way of life for large software enterprises. With hundreds of servers you can’t rely on a human to watch all of the servers, 24 hours a day, 7 days a week. If server #192 runs out of hard drive space, somebody needs to know about it. It is the job of the monitor to warn you about these problems before they really become “problems”.

Types of Monitors

At CA our monitors fall primarily into 2 categories:

1. System Monitors
System monitors for the most part watch the physical resources of the IT infrastructure: disk space, network utilization, available memory, I/O activity, etc. System monitors are also useful for tracking availability of physical resources. If a database server runs low on storage, or a particular network link starts getting saturated, it’s important to alert an operator before the problem becomes bad enough to affect performance. System monitors are somewhat turnkey in nature, hardware parameters are common to any application regardless of what solution it performs. There is of course still a high level of customization possible within the parameters specific to your enterprise.

Example: Tracking CPU Utilization on multiple servers

2. Application Monitors
Application monitors look into the guts of our software to validate state specific to what we do here at CA. When you run software in a 24/7 capacity, certain predictable patterns emerge. For example, we can always expect a certain number of Paypal transactions to be processed at any particular hour of the day. We can also expect a certain number of emails to get dispatched or a certain number of auction postings to push out, etc., within a given hour of the day.

Given those trends, you can setup business rules to page/alert a human if your expectations are not met. For example, if we expect at least 50 Paypal transactions to occur every 5 minutes based on historical trends, and suddenly 5 minutes elapse without any transactions, we know something may be amiss. In some cases the alert may be a false alarm (say Paypal is down for system maintenance), but in others it may be indicative of something damaging to the system that needs to be addressed immediately (say Paypal changed their API, or we deployed a bug in our code).

How We Use Monitors

We take monitoring very serious at ChannelAdvisor. So much so that we have a dedicated team of engineers devoted to creating, maintaining, and responding to those monitors. This “Systems” team rotates through an on-call schedule that guarantees someone is always available to respond to monitor alerts 24 hours a day, 7 days a week, 365 days a year - including even Groundhog Day.

We also take writing our monitors pretty seriously, and consider that part of our secret sauce. We work hard to spot, and resolve, problems before our customers notice them. Ideally the software is architected in a manner where problems just don’t occur, or if they do occur they automatically fix themselves (self-healing software), but we also realize we live in a reality where some problems still make it through despite best efforts. Because of this, monitoring is a crucial last stop safety net. If a problem occurs and one of our customers has to tell us, this our failure. If something crashes at 3am, we WANT the monitor to wake us up so we can fix it by 3:10. Sleep is for wimps.

We’re of course human, though, and sometimes a problem arises that did not have a monitor to spot it - there are a lot of moving parts after all. We of course strive to minimize this, but I do assure you when such a circumstance arises, we take that lesson as an opportunity to write a new monitor to cover that case for the next time. Fool me once, shame on you. Fool me twice, shame on me. (As Scotty would say).

Anyway, that’s about it for now. Until next time.