Wednesday, December 9, 2009
Arcsight Unified Windows Connector de-mystified
As I'm now full-time consulting for Arcsight , I figured this would be a good place to share some of the black magic knowledge that may help others be successful… A great first topic to jump into is the Windows Unified Connector.
This will require you to have some basic-advanced Arcsight administration experience but hopefully it's easy for anyone to understand.
Windows Unified is one of the heaviest utilized connectors but is also one of the most troublesome to understand. Hopefully this post will give you a better idea how it works and how to properly troubleshoot and tune it.
Windows Event logs have a wealth of security information, especially on the domain controllers. Who logged on/off, who changed user or file permissions, etc.
Windows Unified is a polling Connector which at regular intervals connects to each specified Windows Server, authenticates and grabs a copy of the latest event logs via WMI (Windows API) to normalize and forward to Arcsight ESM.
There are several common issues experienced using the Windows Unified Connector. Perhaps the most prevalent is delayed events. It is possible to have a Windows Unified Connector sending events to ESM hours or even days late, obviously this kills your ability to do real-time correlation along with anything else really.
Limiting Connectors. A client recently had two separate connectors one production and one running remotely as backup both configured the same and actively polling the same Windows machines. This means the Windows hosts are getting hammered with double duty polling. It's a must to only poll from one Connector at a time and to obtain a backup site, simply add a new ESM destination from the production Connector to also forward events to the backup ESM. In your Disaster Recovery plan have a procedure for quickly turning up the Connector on the backup network to take over during a failure.
The biggest offender of latency is grouping, the way the Unified Connector works, it polls all systems with the same frequency for the same number of events. This can lead to serious event delay and backlog if you are polling high event rate servers and low event rate servers on the same Connector.
Create multiple Unified Windows Connectors and group the high event rate systems on one or several Connectors and leave the low event rate systems on another. This is a key to eliminating event latency.
Finally there are a few knobs which allow you to tune both polling frequency and number of events fetched at a time.
50 is the default but it would not hurt to bump this value up on your high event rate Connector.
This value controls how long in seconds to wait until the next event poll. -1 the default means continuously poll without delay. On a slow network or polling over long WAN or VPN links, it makes since to add sleeptime, start with 20 seconds and work your way up until you find the right setting for your network.
Windows Unified Conenctor uses CIFS connection via RPC TCP/445, make sure the RPC service is turned on and is not firewalled to or from the Unified Connector's IP.
Unable to open RPC Handler, if you see this in your Connector logs, it means the remote machine cannot be reached, it's down or authentication is failing. Start with ping, then telnet to port 445 from the Connector finally check login credentials.