Doing consulting I get to see a lot of errors (who calls in the consultant when things are working fine?). One of the most common errors to see on a domain controller is the Netlogon event ID: 5807 error. I see this error on so many client’s domain controllers that I have a power shell script just to find the culprits.
Let’s start at the beginning, AD Sites and Services is used to define subnets so that a client can figure out which site a client it is in, and which domain controllers that it should use.
Sounds simple enough right? But what happens when there is no subnet defined?
Human logic would say, “Go to the closest DC then”.
But just how does the client pc figure out who is closest?
Simple answer….It can’t.
A client receives a list of the DCs offering services according to the sites defined in AD Sites and Services. These are what guide our PCs to the correct domain controllers for AD services. When you create subnets and bind them to sites you are setting up the connections for the clients. So if you bind the subnet 10.0.0.0/24 to Site1, any client in that subnet will use the DCs in Site1.
Now if we have a subnet that is not defined in AD Sites and Services, say 10.1.1.0/24, the client will not get a list of the DCs in its site, but will randomly select one of the DCs in your domain. This sounds good at first until you think about that satellite sales office in Singapore that has a congested 512K WAN link, and the fact that now PCs in the London office are using the DC there for logon services!
Now let’s take a look at the System event log on a DC in a domain where there are subnets that haven’t been defined. We’re all busy and we totally meant to define the 10.1.1.0/24 subnet that was used for the new expansion of the London office but somehow it just slipped through the cracks. Now we see in the system log the NETLOGON event ID: 5807.
Here’s the error text:
During the past 4.15 hours there have been 132 connections to this Domain Controller from client machines whose IP addresses don’t map to any of the existing sites in the enterprise. Those clients, therefore, have undefined sites and may connect to any Domain Controller including those that are in far distant locations from the clients. A client’s site is determined by the mapping of its subnet to one of the existing sites. To move the above clients to one of the sites, please consider creating subnet object(s) covering the above IP addresses with mapping to one of the existing sites. The names and IP addresses of the clients in question have been logged on this computer in the following log file ‘%SystemRoot%debugnetlogon.log’ and, potentially, in the log file ‘%SystemRoot%debugnetlogon.bak’ created if the former log becomes full. The log(s) may contain additional unrelated debugging information. To filter out the needed information, please search for lines which contain text ‘NO_CLIENT_SITE:’. The first word after this string is the client name and the second word is the client IP address. The maximum size of the log(s) is controlled by the following registry DWORD value ‘HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesNetlogonParametersLogFileMaxSize’; the default is 20000000 bytes. The current maximum size is 20000000 bytes. To set a different maximum size, create the above registry value and set the desired maximum size in bytes.
This DC has seen 132 connections, and the error is telling us to go look in the log file to see what IPs are the culprits. It’s such a nice error that it even gives us the path to the file so that we don’t have to go hunt for it!
So let’s copy the path from the error and paste it into the Run box. Make sure you don’t copy the single quotes!
The file will look something like this:
So we can see that the offending IPs are in the 10.1.1.x range. Which is the subnet we forgot to create for the London office. So the good news is that the clients are able to log in, the bad news is that this is the DC in the Singapore office that has a congested WAN link!
The solution to the problem is simple, create a subnet for 10.1.1.0/24 and bind it to the London site. Once replication happens, all the clients that login from PCs within that subnet will be directed to the London DCs.
The resolution for this error is so easy that you might getting complacent right now. The problem is that in most mid/large companies the Network department is the one setting up the physical subnets and the Server team is responsible for creating the subnets in Sites and Services. This is where a disconnect usually occurs. A new subnet is set up for clients in a satellite office by the network team. The server team now needs to create the subnet in AD, but the Network team assigns the ticket to the wrong team, or the information gets lost by the Server team. There’s a thousand ways for it to slip through the cracks.
We don’t tend to have enough time to go through our event logs enough and not everyone has log monitoring in place (try the demo version of System Center!). So what I use is a simple script that I run on the first of each month that goes and grabs the contents of the Netlogon.log files on all my DCs and returns me a csv of the unique IPs seen in the last month. This way I can see any subnets that haven’t been defined and define them.
It also helps to catch things like that wireless signal that was set up just for the test lab and couldn’t possibly connect to the production network, because you made sure that it was isolate….but hey look at that there’s your laptop trying to login to our production AD from that wireless segment. Guess that “isolated test network” wasn’t so isolated after all, huh?
So here’s the script, It’s not very complicated but there are a few twists to it that make it work so lets break it down.
First we set up a function to grab all the domain controllers in the domain and stuff it into an array.
Next we set up a function that will connect to a server’s c$ share to grab the netlogon.log file and push the contents into an array. The annoying thing about the log file is that it simply appends to the end of the file but has no year in the date stamp. I’m only really interested in the last months entries but if I match just June I’ll get the entries from this June and every other June for the past few years!
The trick is to use the [array]::Reverse function to flip the array around since the array so that you are effectively reading the log file from the end. No we just need to loop through the array until we get an entry that is not this month (nice I run it on the 1st of the month) or last month. As soon as we see something from two months ago we break out of the loop. I added a little bit of logic in that I also check to see if the IP address column is actually an IP address using a regex match. The reason for this is that if you have the logging level turned up (a registry entry) the value won’t be an IP Address so we can skip it.
Now that we have just the entries we want we set up we push them into a variable and at return that to the main script where we use the Sort-Object –unique function to get only the unique IP addresses, because we really don’t care that 10.1.1.2 has 55 entries in the log. We just need to know it’s there, not how many times the IP was logged.
The final part of the script is to set up your email variables and send the report on its merry way. Just watch out for the $to variable. You must use a comma separated quoted list for the send-mail function to work right when using multiple addresses.
Schedule this script to run on the 1st of each month and you won’t have to worry about missing subnets anymore!
Here’s the full script. As always this is provided as is, use at your own risk.