Friday, November 11, 2016

Notes from the field




During a recent engagement I ran into a perplexing situation as everything was working great, then everything went downhill from there!

Sudden duplicate IP address on network - affecting both static and DHCP assigned addresses.

Customer built out infrastructure servers for Horizon View, joined systems to domain, everything functioned normally. We were able to register vCenter with the Connection server, all seemed fine.

The next morning we completed the composer server installation. Returning to the Connection Server management interface, we proceeded to register the stand-alone composer server, which failed. During this time period we were also focusing on optimizing the gold image for the linked-clones.

With the priority focused on at least confirming functionality of the Horizon View Infrastructure, we cloned the optimized parent image to allow creation of a manual desktop pool which would validate communication between the desktop with the Horizon Agent and Connection Server. As we brought the cloned image online we received a duplicate IP address error.  This led me down the path of reviewing and removing any non-present hidden devices, reviewing the registry, validating the necessary MS hotfixes were in place for issues related to cloning and the vmxnet3 adapter. All of this checks out,

While banging my head against a wall - which does become painful after a bit, I get word that other infrastructure servers along with our SQL server, no longer could communicate with Active Directory. This validates for me it has nothing to do with our gold image or the configuration of the Horizon environment. The basic troubleshooting question "What changed", which is often met with 'nothing' or 'its not the network!' became the focus.

I was able to determine that while we were completing the built out of the Horizon View infrastructure that morning, the network team opened up external access for the zone in which the Horizon View infrastructure servers reside. This was done to allow the Windows servers to be patched from Windows Update external services as an internal solution was not available.

After a few minutes with google I quickly found similar behaviors related to Proxy-ARP being enabled on the internal interface of the Cisco ASA.  I was able to convince the Network team to look at the settings, and sure enough, Proxy-ARP was enabled. Upon disabling, everything returned to normal - Thankfully!

A good link related to proxy-arp:

https://lkhill.com/proxy-arp-sucks/

No comments:

Post a Comment