Lync Edge DNS Round Robin w/ NAT: Hairpin/50k Port Range Issue
Issue:
Bob (remote user) tries to call Carol (internal user) but receives an error message indicating the “Call failed due to network issues”. Snooper reveals the following error: “Call failed to establish due to a media connectivity failure when one endpoint is internal and the other is remote” with an ICE Warning “ICEWarn=0x40003e0”.
Assuming the following scenario:
· Lync is deployed in a scaled consolidated topology using NAT
· The 50k port range inbound is blocked
· DNS load balancing
· The External Corporate firewall is blocking Hairpin traffic
Bob initiates a call to Carol
Before Bob can send a SIP Invite message to Carol, Lync utilizes STUN, TURN, and ICE to discover a candidate list for completing the media path. To understand that process, have a look at the following article: http://blogs.technet.com/b/nexthop/archive/2009/04/22/how-communicator-uses-sdp-and-ice-to-establish-a-media-channel.aspx .
SIP Invite
Here is the SDP candidate list that Bob sends as part of the SIP Invite to Carol:
a=candidate:1 1 UDP 2130705919 192.168.1.100 33728 typ host
a=candidate:1 2 UDP 2130705406 192.168.1.100 33729 typ host
a=candidate:2 1 TCP-PASS 6556159 178.64.39.80 50468 typ relay raddr 65.10.10.189 rport 26654
a=candidate:2 2 TCP-PASS 6556158 178.64.39.80 50468 typ relay raddr 65.10.10.189 rport 26654
a=candidate:3 1 UDP 16648703 178.64.39.80 57548 typ relay raddr 65.10.10.189 rport 14932
a=candidate:3 2 UDP 16648702 178.64.39.80 57555 typ relay raddr 65.10.10.189 rport 14933
a=candidate:4 1 UDP 1694235135 65.10.10.189 14932 typ srflx raddr 192.168.1.100 rport 14932
a=candidate:4 2 UDP 1694233598 65.10.10.189 14933 typ srflx raddr 192.168.1.100 rport 14933
a=candidate:5 1 TCP-ACT 7075839 178.64.39.80 50468 typ relay raddr 65.10.10.189 rport 26654
a=candidate:5 2 TCP-ACT 7075326 178.64.39.80 50468 typ relay raddr 65.10.10.189 rport 26654
a=candidate:6 1 TCP-ACT 1684796927 65.10.10.189 26654 typ srflx raddr 192.168.1.100 rport 26654
a=candidate:6 2 TCP-ACT 1684796414 65.10.10.189 26654 typ srflx raddr 192.168.1.100 rport 26654
SIP/2.0 200 OK
Carol uses the same discovery process with STUN, Turn, and ICE to create an SDP candidate list to send to Bob.
Here is the SDP candidate list that Carol sends to Bob in the SIP/2.0 200 OK response:
a=candidate:1 1 UDP 2130706431 10.10.10.211 55476 typ host
a=candidate:1 2 UDP 2130705918 10.10.10.211 55477 typ host
a=candidate:2 1 tcp-pass 6555135 178.64.39.81 54978 typ relay raddr 10.10.10.211 rport 49583
a=candidate:2 2 tcp-pass 6555134 178.64.39.81 54978 typ relay raddr 10.10.10.211 rport 49583
a=candidate:3 1 UDP 16647679 178.64.39.81 52755 typ relay raddr 10.10.10.211 rport 53324
a=candidate:3 2 UDP 16647678 178.64.39.81 56065 typ relay raddr 10.10.10.211 rport 53325
a=candidate:4 1 tcp-act 7076863 178.64.39.81 54978 typ relay raddr 10.10.10.211 rport 49583
a=candidate:4 2 tcp-act 7076350 178.64.39.81 54978 typ relay raddr 10.10.10.211 rport 49583
a=candidate:5 1 tcp-act 1684797951 10.10.10.211 49583 typ srflx raddr 10.10.10.211 rport 49583
a=candidate:5 2 tcp-act 1684797438 10.10.10.211 49583 typ srflx raddr 10.10.10.211 rport 49583
Carol tries Bob’s candidate list
When Carol receives Bob’s candidate list, she tries to connect directly using this information:
192.168.1.100 (Bob’s real IP)
Carol is unable to establish a connection with Bob’s real IP because his IP is non-routable
65.10.10.189 (Bob’s public IP)
Carol is unable to establish a connection with Bob’s public IP because Bob’s Home Firewall blocks this traffic
178.64.39.80 (LyncEdge1 AV Edge public IP)
Carol is unable to connect directly to LyncEdge1’s AV edge interface because hairpin traffic is blocked on the corporate network, and because the 50k port range is blocked inbound on LyncEdge1’s AV public IP.
Bob tries Carol’s candidate list
When Bob receives Carol’s candidate list, he tries to connect directly using this information:
10.10.10.211 (Carol’s Real IP)
Bob is unable to connect direct to Carol’s real IP because he is unable to route to this address
178.64.39.81 (LyncEdge2 AV Edge Public IP)
Bob is unable to connect direct to Carol’s Media Relay because the inbound 50k port range is blocked on the External Corporate Firewall
Lync Edge AV Media Relay tries candidate list
Bob and Carol have exhausted all efforts to try and establish a media path directly. There is no direct line of site in which the connection can be made.
The Media Relay service on the Edge AV server will attempt to relay the connection using the candidate lists provided by each client. The Media Relay Service will initiate a Turn FORWARD request with a source port of UDP 3478 and a destination port of UDP 3478.
LyncEdge1 Media Relay using TURN Forward
LyncEdge1 tries to relay the connection to Carol, for Bob, via the Media Relay service using TURN Forward on UDP 3478.
10.10.10.211 (Carol’s Real IP)
Inbound traffic on UDP 3478 to Carol is blocked on the Internal Corporate Firewall
78.64.39.81 (LyncEdge2 AV Edge Public IP)
LyncEdge2’s AV edge interface is unreachable because Hairpin traffic is blocked on the External Corporate Firewall
LyncEdge2 Media Relay using TURN Forward
Similarly, LyncEdge2 tries to relay the connection to Bob, for Carol, via the Media Relay service using TURN Forward on UDP 3478
192.168.1.100 (Bob’s real IP)
Bob’s real IP is non-routable
65.10.10.189 (Bob’s public IP)
Bob’s Home Firewall blocks inbound traffic on this port
178.64.39.80 (LyncEdge1 AV Edge public IP)
LyncEdge1’s AV edge interface is unreachable because hairpin traffic is blocked on the External Corporate Firewall
Findings
Given the assumptions stated at the top of this article:
· Lync is deployed in a scaled consolidated topology using NAT
· The 50k port range inbound is blocked
· DNS load balancing
· The External Corporate Firewall is blocking Hairpin traffic
The media path cannot be established, and we can expect the call to fail.
“Call failed due to network issues”
ms-client-diagnostics: 23; reason=”Call failed to establish due to a media connectivity failure when one endpoint is internal and the other is remote”;CallerMediaDebug=”audio:ICEWarn=0x40003e0,LocalSite=65.10.10.189:26654,LocalMR= 178.64.39.80:50468,RemoteSite=10.10.10.211:49583,RemoteMR=178.64.39.81:54978,PortRange= 1025:65000,LocalMRTCPPort=50468,RemoteMRTCPPort=54978,LocalLocation=1, RemoteLocation=2,FederationType=0″
There are 2 ways to resolve this issue:
1. Open the 50k port inbound. When Bob tries to complete the media path using Carol’s candidate list, we see him try to connect to Carol’s Media Relay server (LyncEdge2) using the 50k port range. If this connection is successful, the call will complete. While this is the easiest method, it is not always the most preferred given the number of ports required.
UPDATE: Opening the 50K port range will require that the remote user (Bob) is able to make an outbound TCP connection to LyncEdge2 on a port in the 50K range. This is usually not a problem when Bob is working remote from a home office using a personal wireless router/firewall. However, when Bob is traveling to a customer site and connects into the corporate guest WiFi network, outbound ports in the 50K range may be blocked. It is not unheard of to see 80 and 443 to be the only ports open outbound from corporate networks, especially guest WiFi networks. Thanks to Thomas Binder for providing this additional information. I highly suggest to have a look at his presentation from TechEd Europe: “Lync Deep Dive: Edge Media Connectivity with ICE” http://channel9.msdn.com/Events/TechEd/Europe/2012/EXL412. About 1 hour in will discuss this scenario.
UPDATE 2: Check out the new session from Thomas Binder at the Lync Conference 2014: Edge Media Connectivity in Lync 2013 http://aka.ms/AVEdge.
2. Allow Hairpin traffic on the Corporate Edge Firewall between the Lync Edge servers. When the Media Relay service on the Edge AV server tries the candidate lists, it will attempt to connect to the public IP of the opposing Edge server using port UDP 3478. UDP 3478 should already be open on the External Corporate Firewall based on Determining External A/V Firewall and Port Requirements .
Resolution 2 Implementation:
Based on resolution number 2, the following solution was implemented for a Cisco ASA:
access-list tcp_state_bypass permit tcp host 10.1.0.77 host 10.1.0.78
access-list tcp_state_bypass permit tcp host 10.1.0.78 host 10.1.0.77
access-list tcp_state_bypass permit udp host 10.1.0.77 host 10.1.0.78
access-list tcp_state_bypass permit udp host 10.1.0.78 host 10.1.0.77
class-map tcp_bypass
match access-list tcp_state_bypass
policy-map bypass_policy
class tcp_bypass
set connection advanced-options tcp_state_bypass
static (dmz,dmz) 178.64.39.80 10.1.0.77 netmask 255.255.255.255
static (dmz,dmz) 178.64.39.81 10.1.0.78 netmask 255.255.255.255
Now that traffic is allowed to traverse between the two AV Edge servers public IP addresses, the media path is complete and the call is established. Bob maintains a connection with LyncEdge1, while Carol maintains a connection with LyncEdge2. LyncEdge1 uses TURN Forward to relay the media path through LyncEdge2’s public IP address, and LyncEdge 2 uses TURN Forward to relay the media path through LyncEdge1’s public IP address.