Lync Edge Hairpin Requirement

December 9, 2011, 10:53 am

≫ Next: Lync DTMF issues with CUCM 8.6(1a)

Lync Edge DNS Round Robin w/ NAT: Hairpin/50k Port Range Issue

Issue:

Bob (remote user) tries to call Carol (internal user) but receives an error message indicating the “Call failed due to network issues”. Snooper reveals the following error: “Call failed to establish due to a media connectivity failure when one endpoint is internal and the other is remote” with an ICE Warning “ICEWarn=0x40003e0”.

Assuming the following scenario:

· Lync is deployed in a scaled consolidated topology using NAT

· The 50k port range inbound is blocked

· DNS load balancing

· The External Corporate firewall is blocking Hairpin traffic

Bob initiates a call to Carol

Before Bob can send a SIP Invite message to Carol, Lync utilizes STUN, TURN, and ICE to discover a candidate list for completing the media path. To understand that process, have a look at the following article: http://blogs.technet.com/b/nexthop/archive/2009/04/22/how-communicator-uses-sdp-and-ice-to-establish-a-media-channel.aspx .

SIP Invite

Here is the SDP candidate list that Bob sends as part of the SIP Invite to Carol:

a=candidate:1 1 UDP 2130705919 192.168.1.100 33728 typ host

a=candidate:1 2 UDP 2130705406 192.168.1.100 33729 typ host

a=candidate:2 1 TCP-PASS 6556159 178.64.39.80 50468 typ relay raddr 65.10.10.189 rport 26654

a=candidate:2 2 TCP-PASS 6556158 178.64.39.80 50468 typ relay raddr 65.10.10.189 rport 26654

a=candidate:3 1 UDP 16648703 178.64.39.80 57548 typ relay raddr 65.10.10.189 rport 14932

a=candidate:3 2 UDP 16648702 178.64.39.80 57555 typ relay raddr 65.10.10.189 rport 14933

a=candidate:4 1 UDP 1694235135 65.10.10.189 14932 typ srflx raddr 192.168.1.100 rport 14932

a=candidate:4 2 UDP 1694233598 65.10.10.189 14933 typ srflx raddr 192.168.1.100 rport 14933

a=candidate:5 1 TCP-ACT 7075839 178.64.39.80 50468 typ relay raddr 65.10.10.189 rport 26654

a=candidate:5 2 TCP-ACT 7075326 178.64.39.80 50468 typ relay raddr 65.10.10.189 rport 26654

a=candidate:6 1 TCP-ACT 1684796927 65.10.10.189 26654 typ srflx raddr 192.168.1.100 rport 26654

a=candidate:6 2 TCP-ACT 1684796414 65.10.10.189 26654 typ srflx raddr 192.168.1.100 rport 26654

SIP/2.0 200 OK

Carol uses the same discovery process with STUN, Turn, and ICE to create an SDP candidate list to send to Bob.

Here is the SDP candidate list that Carol sends to Bob in the SIP/2.0 200 OK response:

a=candidate:1 1 UDP 2130706431 10.10.10.211 55476 typ host

a=candidate:1 2 UDP 2130705918 10.10.10.211 55477 typ host

a=candidate:2 1 tcp-pass 6555135 178.64.39.81 54978 typ relay raddr 10.10.10.211 rport 49583

a=candidate:2 2 tcp-pass 6555134 178.64.39.81 54978 typ relay raddr 10.10.10.211 rport 49583

a=candidate:3 1 UDP 16647679 178.64.39.81 52755 typ relay raddr 10.10.10.211 rport 53324

a=candidate:3 2 UDP 16647678 178.64.39.81 56065 typ relay raddr 10.10.10.211 rport 53325

a=candidate:4 1 tcp-act 7076863 178.64.39.81 54978 typ relay raddr 10.10.10.211 rport 49583

a=candidate:4 2 tcp-act 7076350 178.64.39.81 54978 typ relay raddr 10.10.10.211 rport 49583

a=candidate:5 1 tcp-act 1684797951 10.10.10.211 49583 typ srflx raddr 10.10.10.211 rport 49583

a=candidate:5 2 tcp-act 1684797438 10.10.10.211 49583 typ srflx raddr 10.10.10.211 rport 49583

Carol tries Bob’s candidate list

When Carol receives Bob’s candidate list, she tries to connect directly using this information:

192.168.1.100 (Bob’s real IP)

Carol is unable to establish a connection with Bob’s real IP because his IP is non-routable

65.10.10.189 (Bob’s public IP)

Carol is unable to establish a connection with Bob’s public IP because Bob’s Home Firewall blocks this traffic

178.64.39.80 (LyncEdge1 AV Edge public IP)

Carol is unable to connect directly to LyncEdge1’s AV edge interface because hairpin traffic is blocked on the corporate network, and because the 50k port range is blocked inbound on LyncEdge1’s AV public IP.

Bob tries Carol’s candidate list

When Bob receives Carol’s candidate list, he tries to connect directly using this information:

10.10.10.211 (Carol’s Real IP)

Bob is unable to connect direct to Carol’s real IP because he is unable to route to this address

178.64.39.81 (LyncEdge2 AV Edge Public IP)

Bob is unable to connect direct to Carol’s Media Relay because the inbound 50k port range is blocked on the External Corporate Firewall

Lync Edge AV Media Relay tries candidate list

Bob and Carol have exhausted all efforts to try and establish a media path directly. There is no direct line of site in which the connection can be made.

The Media Relay service on the Edge AV server will attempt to relay the connection using the candidate lists provided by each client. The Media Relay Service will initiate a Turn FORWARD request with a source port of UDP 3478 and a destination port of UDP 3478.

LyncEdge1 Media Relay using TURN Forward

LyncEdge1 tries to relay the connection to Carol, for Bob, via the Media Relay service using TURN Forward on UDP 3478.

10.10.10.211 (Carol’s Real IP)

Inbound traffic on UDP 3478 to Carol is blocked on the Internal Corporate Firewall

78.64.39.81 (LyncEdge2 AV Edge Public IP)

LyncEdge2’s AV edge interface is unreachable because Hairpin traffic is blocked on the External Corporate Firewall

LyncEdge2 Media Relay using TURN Forward

Similarly, LyncEdge2 tries to relay the connection to Bob, for Carol, via the Media Relay service using TURN Forward on UDP 3478

192.168.1.100 (Bob’s real IP)

Bob’s real IP is non-routable

65.10.10.189 (Bob’s public IP)

Bob’s Home Firewall blocks inbound traffic on this port

178.64.39.80 (LyncEdge1 AV Edge public IP)

LyncEdge1’s AV edge interface is unreachable because hairpin traffic is blocked on the External Corporate Firewall

Findings

Given the assumptions stated at the top of this article:

· Lync is deployed in a scaled consolidated topology using NAT

· The 50k port range inbound is blocked

· DNS load balancing

· The External Corporate Firewall is blocking Hairpin traffic

The media path cannot be established, and we can expect the call to fail.

“Call failed due to network issues”

ms-client-diagnostics: 23; reason=”Call failed to establish due to a media connectivity failure when one endpoint is internal and the other is remote”;CallerMediaDebug=”audio:ICEWarn=0x40003e0,LocalSite=65.10.10.189:26654,LocalMR= 178.64.39.80:50468,RemoteSite=10.10.10.211:49583,RemoteMR=178.64.39.81:54978,PortRange= 1025:65000,LocalMRTCPPort=50468,RemoteMRTCPPort=54978,LocalLocation=1, RemoteLocation=2,FederationType=0″

There are 2 ways to resolve this issue:

1. Open the 50k port inbound. When Bob tries to complete the media path using Carol’s candidate list, we see him try to connect to Carol’s Media Relay server (LyncEdge2) using the 50k port range. If this connection is successful, the call will complete. While this is the easiest method, it is not always the most preferred given the number of ports required.

UPDATE: Opening the 50K port range will require that the remote user (Bob) is able to make an outbound TCP connection to LyncEdge2 on a port in the 50K range. This is usually not a problem when Bob is working remote from a home office using a personal wireless router/firewall. However, when Bob is traveling to a customer site and connects into the corporate guest WiFi network, outbound ports in the 50K range may be blocked. It is not unheard of to see 80 and 443 to be the only ports open outbound from corporate networks, especially guest WiFi networks. Thanks to Thomas Binder for providing this additional information. I highly suggest to have a look at his presentation from TechEd Europe: “Lync Deep Dive: Edge Media Connectivity with ICE” http://channel9.msdn.com/Events/TechEd/Europe/2012/EXL412. About 1 hour in will discuss this scenario.

UPDATE 2: Check out the new session from Thomas Binder at the Lync Conference 2014: Edge Media Connectivity in Lync 2013 http://aka.ms/AVEdge.

2. Allow Hairpin traffic on the Corporate Edge Firewall between the Lync Edge servers. When the Media Relay service on the Edge AV server tries the candidate lists, it will attempt to connect to the public IP of the opposing Edge server using port UDP 3478. UDP 3478 should already be open on the External Corporate Firewall based on Determining External A/V Firewall and Port Requirements .

Resolution 2 Implementation:

Based on resolution number 2, the following solution was implemented for a Cisco ASA:

access-list tcp_state_bypass permit tcp host 10.1.0.77 host 10.1.0.78

access-list tcp_state_bypass permit tcp host 10.1.0.78 host 10.1.0.77

access-list tcp_state_bypass permit udp host 10.1.0.77 host 10.1.0.78

access-list tcp_state_bypass permit udp host 10.1.0.78 host 10.1.0.77

class-map tcp_bypass

match access-list tcp_state_bypass

policy-map bypass_policy

class tcp_bypass

set connection advanced-options tcp_state_bypass

static (dmz,dmz) 178.64.39.80 10.1.0.77 netmask 255.255.255.255

static (dmz,dmz) 178.64.39.81 10.1.0.78 netmask 255.255.255.255

Now that traffic is allowed to traverse between the two AV Edge servers public IP addresses, the media path is complete and the call is established. Bob maintains a connection with LyncEdge1, while Carol maintains a connection with LyncEdge2. LyncEdge1 uses TURN Forward to relay the media path through LyncEdge2’s public IP address, and LyncEdge 2 uses TURN Forward to relay the media path through LyncEdge1’s public IP address.

↧

Lync DTMF issues with CUCM 8.6(1a)

April 10, 2012, 12:52 pm

≫ Next: Lync Edge DNS LB EE Pool using Hosts File

≪ Previous: Lync Edge Hairpin Requirement

Issue:

When setting up a Direct SIP connection between Lync and CUCM 8.6(1a) DTMF digits are not passed from CUCM to Lync. The DTMF has been negotiated properly as RFC2833 within the SIP signaling, but CUCM is not passing the received DTMF digits from MGCP through the MTP to the SIP trunk properly.

Cause:

After working with TAC for some time, they determined that this appears to be caused by bug: CSCtw70877.

Resolution:

There is no documented workaround for this bug. The fix is to upgrade to 8.6(2a)su1

http://www.cisco.com/cisco/software/release.html?mdfid=283782839&flowid=26422&softwareid=282074295&release=8.6%282a%29SU1&relind=AVAILABLE&rellifecycle=&reltype=latest

↧

Lync Edge DNS LB EE Pool using Hosts File

May 15, 2012, 7:51 am

≫ Next: Application Layer Firewall Blocks Lync Application Sharing

≪ Previous: Lync DTMF issues with CUCM 8.6(1a)

Issue:

Adding multiple host names within a single line item in the Hosts file results in in the Edge server not properly failing over between the internal FE servers in the EE Pool.

When using DNS round robin for an internal Enterprise Edition Pool, the Edge servers need to be able to resolve all IP addresses that would be associated with the Pool. Typically the EE Pool IP addresses would be returned to the server via round robin DNS entries that have been entered on the DMZ DNS servers. In an environment that does not have local DNS servers in the DMZ, but instead uses public DNS servers for resolution, the Edge servers cannot resolve the private IP addresses for the EE pool members.

TechNet documentation shows the following: (Use local Host file)

Set Up Network Interfaces for Edge Servers
- Each Edge Server is a multihomed computer with external and internal facing interfaces. The adapter Domain Name System (DNS) settings depend on whether there are DNS servers in the perimeter network. If DNS servers exist in the perimeter, they must have a zone containing one or more DNS A records for the next hop server or Pool (that is, either a Director or a designated Front End Pool), and for external queries they refer name lookups to other public DNS servers. If no DNS servers exist in the perimeter, the Edge Server(s) use external DNS servers to resolve Internet name lookups, and each Edge Server uses a HOST to resolve the next hop server names to IP addresses.
Security Note
- For security reasons, we recommend that you do not have your Edge Servers access a DNS server located in the internal network.

When configuring the local Hosts files on Windows Servers, it is typical to setup the hosts file with multiple names in the following format:

IP <tab> hostname <tab> FQDN

10.0.0.15 server1 lyncpool.domain.local

10.0.0.16 server2 lyncpool.domain.local

For many reasons, it is convenient to setup the local hosts file to include multiple names including the local host name of the internal Lync FE server.

When setting the hosts file up in this manner, the Edge server will not failover between the two internal FE servers in the EE Pool as expected when using DNS LB. When the FE server listed first in the Hosts file goes offline, the Access Edge service does not try to re-establish service with the second FE server.

Cause:

When looking at the cached values from the Hosts files on the system, we can see the following differences when multiple names are listed vs. when only the FQDN is listed.

Example 1: (Hosts file contains server host name)

In this example we set the hosts file to include the local host name of the internal FE server, plus the FQDN of the EE Pool. When displaying the local DNS Cache “ipconfig /displaydns” we see the records as recorded by the local system cache.

What causes the problem is the line item for “lyncpool.domain.local”. We see that there is a CNAME value that correlates to the first line item in Hosts file “server1”.

When server1 goes offline, it never fails over to server2, since it is not registered in the local cache.

Hosts file:

10.0.0.15 server1 lyncpool.domain.local

10.0.0.16 server2 lyncpool.domain.local

C:\Users\Administrator>ipconfig /displaydns

Windows IP Configuration

server1
    —————————————-
    Record Name . . . . . : server1
    Record Type . . . . . : 1
    Time To Live . . . . : 86400
    Data Length . . . . . : 4
    Section . . . . . . . : Answer
    A (Host) Record . . . : 10.0.0.15

server2
    —————————————-
    Record Name . . . . . : server2
    Record Type . . . . . : 1
    Time To Live . . . . : 86400
    Data Length . . . . . : 4
    Section . . . . . . . : Answer
    A (Host) Record . . . : 10.0.0.16

lyncpool.domain.local
    —————————————-
    Record Name . . . . . : lyncpool.domain.local
    Record Type . . . . . : 5
    Time To Live . . . . : 86400
    Data Length . . . . . : 8
    Section . . . . . . . : Answer
    CNAME Record . . . . : server1

Example #2 (Hosts file contains only the Lync FE Pool FQDN)

In this example we set the hosts file to include only the FQDN of the internal EE Pool. When displaying the local DNS Cache “ipconfig /displaydns” we see the records as recorded by the local system cache.

In this example, we see that there is no longer a CNAME value. Instead, for lyncpool.domain.local we see the two A records containing the IP’s for Server1 and Server2.

When server1 goes offline, it takes about 60 seconds, but eventually the Access Edge service starts connecting the the second FE server in the EE Pool as expected.

Hosts File:

10.0.0.15 lyncpool.domain.local

10.0.0.16 lyncpool.domain.local

C:\Users\Administrator>ipconfig /displaydns

Windows IP Configuration

lyncpool.domain.local
    —————————————-
    Record Name . . . . . : lyncpool.domain.local
    Record Type . . . . . : 1
    Time To Live . . . . : 86400
    Data Length . . . . . : 4
    Section . . . . . . . : Answer
    A (Host) Record . . . : 10.0.0.15

Record Name . . . . . : lyncpool.domain.local
    Record Type . . . . . : 1
    Time To Live . . . . : 86400
    Data Length . . . . . : 4
    Section . . . . . . . : Answer
    A (Host) Record . . . : 10.0.0.16

Resolution:

When using local Hosts files with DNS LB for the internal EE Pool, the format of the Hosts file must be in the following format, with only the FQDN of the internal EE Pool listed:

10.0.0.15 lyncpool.domain.local

10.0.0.16 lyncpool.domain.local

Adding multiple host names within a single line item results in a CNAME value being created in the local DNS cache, that only resolves the first entry listed. This results in the Edge server to not properly fail over between the internal FE servers in the EE Pool.

↧

Application Layer Firewall Blocks Lync Application Sharing

February 4, 2013, 3:01 pm

≫ Next: Lync On-Premise Mobility Configuration Can Cause issues with Lync Online (O365) Meetings

≪ Previous: Lync Edge DNS LB EE Pool using Hosts File

Anyone familiar with deploying Lync knows there are a lot of firewall requirements. There are plenty of great articles detailing the port requirements, so I won’t get into that. But what happens when you have all the ports open, and you still run into issues. You’ve verified with the firewall guys that the configurations are correct, you’ve tested the ports with Telnet or PortQry and everything appears to be configured as defined by Microsoft’s requirements.

I ran into a similar situation recently at a customer site, which proved to be a great learning experience.

In this article we will look at such an example where the environment has an application layer firewall (unbeknownst to the Lync admin, but knownst to us. Bonus points if you know the movie reference), how this impacts Lync, and how we can troubleshoot it.

An application layer firewall has the ability to inspect network traffic up to the application layer (Layer 7) of the OSI model. Where a traditional stateful firewall only inspects traffic up to the transport layer (Layer 4) of the OSI model.

In this example, the following assumptions are made:

A single Standard Edition is deployed
A single Edge Server is deployed
There are multiple firewalls on the network. One separating the internet from the DMZ, and one separating the DMZ from the internal network
All ports as required and defined are open: http://technet.microsoft.com/en-us/library/gg425891.aspx
Alice is a user connected to the external network via the Lync Edge. No VPN.
Bob is on the internal network

Issue:

Users are reporting issues with Lync desktop sharing saying they receive the following error message: “Sharing failed to connect due to network issues.” The problem appears to be intermittent, sometimes it works while other times it fails.

Upon further investigation, you notice that desktop sharing works when both users are internal. You also notice it works when both users are external. However, it is not working when one user is internal and the other user is external. It seems to be impacting application sharing and file transfers, but not IM/P and Voice.

Being the great Lync admins that we are, we get out Snooper and OCSLogger. We can use these to look at SIP logs on both the client (UCCAPI log) and server (SIPStack log). In the diagnostic logs we see a SIP BYE packet, with the following diagnostic error: (We can also pull this from the monitoring server if deployed)

Table: SIP BYE Diagnostic

In this diagnostic message we see several things of importance:

ICEWarn=0x800029

LocalSite=192.168.100.50

LocalMR=10.0.0.100

RemoteSite=178.64.39.80

RemoteMR=65.10.10.189

So where do we turn to start looking at ICEWarning errors? How about Chapter 9 of the Lync Server 2010 Resource Kit. Inside we find Table 2: ICE Protocol Warning Flags. From the results we find that 0x8xxxxx typically refers to an issue communicating with a TURN server.

Next we see that the LocalMR=10.0.0.100. This is the local media relay, aka TURN server, that the client should use. The error message indicates a failure when communicating with the TURN server.

To verify that the Lync client is pulling the correct Media Relay information, we filter the client UCCAPI logs to look at the media relay authentication service (MRAS) details. Below, we can see that the Edge pool is listening on ports UDP 3478 and TCP 443 for media relay. The hostname equals the FQDN of the Edge Pool. Quickly validate that the client can resolve the FQDN to the internal IP of the Edge Pool to eliminate a DNS issue.

Table: MRAS

Knowing that the client is successfully getting the correct media relay (TURN) server information, we should start looking at the SIP session details relating to media relay.

To do this, we look in the original SIP Invite for the Application Sharing request and we see the below SDP candidate list. This is the candidate list that the internal client is sending to the external client. This should include all possible connection points, including the local host IP as well as the Edge servers media relay IP. What we notice here is that it only includes the clients local host IP. It does not include a Media Relay candidate. Which is odd, because we can see clearly in MRAS that the client gets a successful response back from the EdgePool with Media Relay information.

Table: App Sharing SDP

IM/P and Voice calls are working without any issue. Why isn’t app sharing and file transfer?

Let’s compare the SDP logs for a voice call as compared to that of the app sharing SDP logs from above.

Table: Voice SDP

In this candidate list we see both the local host IP, as well as the Media Relay IP of the Edge. What we also notice is that voice is using UDP, not TCP as the app sharing request was.

So what do we know at this point?

Lync IM/P work internal and external
Lync Voice works internal and external
App sharing works between two internal users
App sharing works between two external users
App sharing does not work when one user is internal and one is external
Lync Voice is using UDP
App Sharing is using TCP

The fact that Voice is using UDP and only includes UDP information in the SDP is throwing a few red flags. Why don’t we see any TCP candidates? To answer this, we turn to our good friend Netmon to see what exactly is occurring at the transport layer.

For a good reference for SDP and ICE negotiations in Lync, I suggest you check out the following link: How Communicator Uses SDP and ICE To Establish a Media Channel.

In the picture below, copied from Mr. Ott’s TechNet article, we can see the SDP discovery process for both TCP and UDP.

Table: Copied from How Communicator Uses SDP and ICE to Establish a Media Channel

During the TCP connection test, a TLS Handshake is completed and then TURN is negotiated. Where with UDP, there is no TLS requirement and TURN is immediately negotiated.

We will want to start Netmon traces on both the internal client and the Lync Edge server, and then initiate a desktop sharing session.

First we will check Netmon for the TCP connection test. For this, we want to filter requests including the client IP, the Edge internal interface IP, and TCP port 443. We will expect to see the TLS Handshake negotiation, as well as the TURN negotiation.

Below we can see the filtered Netmon traces from the client. The first packet we see is the TLS Handshake request from the client. What we don’t see is a TLS Handshake response back from the Edge. We see packets that appear to be from the Edge servers private IP. Yet we do not see any TURN negotiation packets.

Netmon Filter:

“ipv4.address==192.168.100.50 and ipv4.address==10.0.0.100, and tcp.port==443”

Table: Netmon Client Results (App Sharing)

A similar filter on the Netmon capture for the Edge server results in zero packets. Indicating the client traffic is never making it to the Edge. Even though the client traces show responses back from the Edge, the Edge server does not show these packets.

Table: Netmon Edge Results (App Sharing)

So we know the client is trying to send the SDP discovery, but the Edge is not receiving the packets, and therefore never responding.

If we look at the same Netmon logs for the UDP Connection Test, we see the communication on both the client and the server. We see TURN negotiation on UDP 3478.

Earlier we validated that the ports were open and listening via Telnet and PortQry. While we are running Netmon, let’s go ahead and initiate another Telnet and PortQry test to validate the ports are still open and listening, and see if we capture the traffic.

Table: Netmon Telnet Capture Client

Table: Netmon Telnet Capture Edge

Sure enough, we can see here that the traffic is captured on both the client and the server, validating traffic is successfully making it through the firewall on port 443. What is different between the Telnet and the SDP negotiation is that Telnet simply connects to the port, where the SDP connection test starts with a TLS Handshake to initiate the encryption process.

It still does not seam to be a firewall port issue, as all tests show the port to be open. But the traffic is still not making it to the Edge based on our NetMon queries. Further when we look at our NetMon traces, we appear to be getting a response back from the Edge.

We know Lync voice is working over UDP, and App Sharing is not over TCP 443.

What other traffic uses port TCP 443?

Secure web traffic uses TCP 443. In many environments, customers deploy forward proxies for web traffic. These can be used for many reasons, but a common reason is to filter all web traffic through a single point for filtering purposes. In this way companies can filter the types of web searches employees are able to perform.

Is it possible that all port TCP 443 traffic is being funneled through a proxy?

What happens if we fire up a web browser and try to hit the Lync Edge Pool internal FQDN via https://edgepool.silbers.net? While the Edge server listens on port 443, it does not use this port for displaying content in a web browser. So we should not get anything displayed in the browser. However, when we launch the web browser, to our surprise we get a web page displayed saying we must authenticate to access the web page we are trying to search. Sure enough, the firewall is acting as a forward proxy, and inspecting layer 7 traffic. All outbound traffic on port TCP 80 and TCP 443 are being forwarded through the proxy for inspection. Which results in the SDP negotiation to fail.

Cause:

In this environment, the firewall between the internal user and the Edge server, was acting as both a stateful firewall, and an application layer firewall. Inspecting traffic at both layer 4 and layer 7.

The stateful firewall was correctly configured to allow ports UDP 3478 and TCP 443; however, the application layer firewall was filtering web traffic on port TCP 443 via its forward proxy feature set. Telnet and PortQry succeed because they are simply connecting to a port. They are not sending any SSL traffic, and therefore the application firewall was not forwarding the traffic via the proxy. When Lync tries the SDP negotiation, it sends a TLS Handshake request. The application firewall sees this and forwards it to the proxy for inspection. Therefore we never see any SDP or TURN traffic actually hit the Edge server. We only see the telnet and PortQry traffic.

Resolution:

To resolve this issue, we work with the networking team to disable the application and proxy filtering on the firewall for traffic destined to the Lync Edge servers.

Once application filtering is disabled on the firewall, we test Desktop Sharing again with an external user and validate everything now works as expected.

Further, we can look at the SIP traffic via snooper, and now see the media relay included in the SDP candidate list.

↧

Lync On-Premise Mobility Configuration Can Cause issues with Lync Online (O365) Meetings

March 12, 2013, 1:16 pm

≫ Next: Lync EWS Broken During Exchange 2013/2007 Transition

≪ Previous: Application Layer Firewall Blocks Lync Application Sharing

Setup:

In this scenario:

Bob, a Lync On-Premise user, receives a Lync Meeting request from Carol, a Lync Online (Office 365) user
These users are not in the same organization
Federation is not setup for these domains
Open Federation is not setup for the Lync On-Premise environment
Mobility has been setup in the Lync On-Premise environment

Issue:

When Bob, the Lync On-Premise user receives a Lync Meeting request from Carol, a Lync Online (O365) user, and clicks on the Join Lync Meeting meet URL, he receives the following error in his client “A server error occurred. Please contact your support team.”

Using Snooper, we open the Lync client diagnostic logs: Communicator-uccapi-0.uccapilog

We see the following error message:

SIP/2.0 500 The server encountered an unexpected internal error

ms-diagnostics: 1028;reason="Domain resolved by DNS SRV to a configured hosting service but the domain is not in the allow list";domain="domain.com";fqdn1="sipfed.online.lync.comtrue5061";source="sip.silbers.net"

Cause:

As I mention in the setup, there is no federation setup between the two Lync environments, and Open Federation is not setup for the Lync On-Premise environment. What we notice in the error message however is that the client is trying to communicate with a domain configured as a “Hosting Service”, and is trying to connect to sipfed.online.lync.com.

We can check the configured hosting providers in Lync with the following:

c:\Get-CsHostingProvider

Identity                                   : LyncOnline
Name                                      : LyncOnline
ProxyFqdn                              : sipfed.online.lync.com
VericiationLevel                      : UseSourceVerification
Enabled                                  : True
EnableSharedAddressSpace    : False
HostsOCSUsers                      : False
IsLocal                                   : False
AutoDiscoverUrl                      :

Here we can see that sipfed.online.lync.com is setup as a Hosting Provider. This was configured as part of the Lync Mobility configuration. Why is the Lync Meeting trying to talk to the hosting provider used for Lync Mobility?

Sipfed.online.lync.com is also used as the access edge for federation with Office 365.

So let’s check the federation SRV record for domain.com and see if it is configured to point to Office 365.

Using nslookup for the SIP Federation SRV record:

c:\nslookup

Default Server: google-public-dns-a.google.com
Address: 8.8.8.8

>set type=srv
>_sipfederationtls._tcp.domain.com
Default Server: google-public-dns-a.google.com
Address: 8.8.8.8

Non-authoritative answer:
_sipfederationtls._tcp.domain.com SRV service location:
          priority       = 100
          weight         = 1
          port           = 5061
          svr hostname   = sipfed.online.lync.com

Here we can see the SRV record for domain.com is pointing to sipfed.online.lync.com. Which means Carol’s Lync environment is hosted with Office 365.

When Bob attempts to join Carol’s meeting, Lync does a federation validation for Carol’s domain “domain.com” and finds a valid SRV record pointing to sipfed.online.lync.com. Bob’s On-Premise Lync environment has sipfed.online.lync.com configured as a valid hosting provider.

Since sipfed.online.lync.com is a valid hosting provider, Lync next checks to see if “domain.com” is an Allowed Domain. In this scenario, the only Allowed Domain configured is Push.Lync.Com.

Get-CsAllowedDomain

Identity                    : Push.lync.com
Domain                     : Push.lync.com
ProxyFqdn                :
Comment                  :
MarkForMonitoring    : False

In the results we see only “Push.Lync.Com”, which is configured for push notifications with Lync Mobility.

Since Domain.com is not an Allowed Domain, Lync blocks the connection with the error: “Domain resolved by DNS SRV to a configured hosting service but the domain is not in the allow list”

Resolution:

Since domain.com is hosted on Office 365, which uses the same FQDN for Federation as Lync Mobility, it is necessary to add domain.com as an Allowed Domain. Keep in mind though that this not only allows Lync Meetings, but essentially enables federation with this entire domain. So keep in mind your other policies that may target federation.

Set-CsAllowedDomain –Identity Domain.com

An alternate method would be to allow Open Federation. This comes with its own warnings however, as Open Federation isn’t always the best solution.

↧

Lync EWS Broken During Exchange 2013/2007 Transition

December 19, 2013, 2:15 pm

≫ Next: Storage Service had an EWS Autodiscovery Failure–32054

≪ Previous: Lync On-Premise Mobility Configuration Can Cause issues with Lync Online (O365) Meetings

Issue:

During the transition to Exchange 2013 from Exchange 2007, Exchange Web Services (EWS) integration for Lync will be unavailable for users whose mailboxes remain on Exchange 2007.

MAPI integration via Outlook will continue to be available.

Microsoft has confirmed that this is a known issue, and the only work around at this time is to rely on MAPI integration via Outlook, or migrate the users to Exchange 2013

Microsoft also stated that this is only an issue with Exchange 2007/2013 coexistence, and will not impact 2010/2013 migrations.

If Outlook is not launched, the following error message will be displayed. Other possible EWS integrations errors may also be displayed.

“Lync cannot connect to the Exchange server. Please try signing out and signing back in. Outlook contact and calendar information will be unavailable until the connection is restored.”

If you open the Lync Configuration Information, you will see that the EWS Information is not displayed, and the status is “EWS not deployed.”

You will also notice that there is no information listed for the EWS Internal URL or EWS External URL.

CAUSE:

NOTE: This is only an issue for mailboxes which still remain on Exchange 2007. Mailboxes already migrated to 2013 are not impacted, and the issue resolves itself when a 2007 mailbox is migrated to 2013.

During the transition from Exchange 2007 to 2013, the AutoDiscover DNS records for Exchange are updated and pointed to the Exchange 2013 CAS. This is necessary as part of the transition as Exchange 2013 can proxy connections to legacy versions, but Exchange 2007 cannot proxy connections forward to newer versions.

The Lync client retrieves EWS configuration information from the Exchange AutoDiscover service. When the Lync client is launched, it makes a request to https://autodiscover.domain.com/autodiscover/autodiscover.xml. There are many good posts on this process, so I won’t cover it.

When the Lync client connects to the Exchange 2013 AutoDiscover Virtual Directory, the client receives a 401 Authentication Required. If the workstation is joined to the domain, it is authenticated via Integrated Authentication. If it is not domain joined, the client receives an authentication prompt for Basic Authentication.

After authenticating, the Lync client immediately gets an EWS integration error in the bottom right corner of the client, and red exclamation marks on the additional tabs.

If we look at the Configuration Information by holding CTRL and right-clicking the Lync icon in the system tray, we will see the wonderful “EWS Not Deployed” message.

If we open a NetMon trace, we can see a request made to the Exchange 2013 CAS.

Further, if we look at the IIS logs on the Exchange 2013 CAS, we can see the Lync attempt to make a connection, receive the 401 Authentication Required, and subsequently a 200 OK.

2013-12-19 22:01:55 10.10.10.10 POST /autodiscover/autodiscover.svc &cafeReqId=b5283628-655c-4478-9e3d-d6fa43a3c44a; 443 – 10.10.10.155 OC/15.0.4563.1000+(Microsoft+Lync) 401 1 2148074254 78
2013-12-19 22:01:55 10.10.10.10 POST /autodiscover/autodiscover.svc &cafeReqId=4a99f3d0-694c-42a3-92cb-cf0a6293034c; 443 domain\test 10.10.10.155 OC/15.0.4563.1000+(Microsoft+Lync) 200 0 0 234

So it appears everything should be working right?. The Lync client connects to AutoDiscover successfully, and we see the 200 OK. But the Lync client never pulls down the XML configuration.

Now we can go look at the AutoDiscover log on the Exchange 2013 server. This can be found at C:\Program Files\Microsoft\Exchange Server\V15\Logging\Autodiscover.

Low and behold we finally find an error:

ErrorMessage=The SOAP-based AutoDiscover service is not available for mailboxes on Exchange 2007.

Resolution:

Microsoft has confirmed that this is a “Known Issue” at this time, and there is no valid workaround or fix. The only fix is to migrate the user to Exchange 2013.

↧

Storage Service had an EWS Autodiscovery Failure–32054

September 18, 2015, 1:57 pm

≫ Next: Office Web Apps Not Supported on Non-System Drive

≪ Previous: Lync EWS Broken During Exchange 2013/2007 Transition

I ran into the following scenario during a recent project. We were migrating Exchange On-Premise to Exchange Online.

Issue:

Users are unable to see Meetings using the Lync Mobile Client, and Event ID 32054 appears in the Lync event log.

Event ID:

Log Name:      Lync Server
Source:        LS Storage Service
Date:          9/18/2015 10:00:00 AM
Event ID:      32054
Task Category: (4006)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      LyncFE.company.com

Description:

Storage Service had an EWS Autodiscovery failure.

StoreWebException: code=ErrorEwsAutodiscover, reason=GetUserSettings failed, user1@company.com, Autodiscover Uri=https://autodiscover.company.com/Autodiscover/Autodiscover.svc, Autodiscover WebProxy=<NULL>, WebExceptionStatus=ConnectFailure —> System.Net.WebException: Unable to connect to the remote server —> System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused it 132.245.71.8:443

Cause: Autodiscovery Uri was not correctly configured or unreachable, that there is a problem with the Proxy, or other errors.

Setup:

Lync 2013 OnPrem with Enterprise Voice
Exchange Online – All user mailboxes hosted in O365
Exchange UM Online
Exchange 2013 OnPrem – Single Hybrid server used only for management
SCP – AutoDiscoverServiceInternalUri = $null
DNS – A CNAME exists for domain.company.com that resolves to autodiscover.outlook.com

To integrate Exchange Online in O365 with Lync 2013 On-Premise, the following article was used.

How to integrate Exchange Online with Skype for Business Online, Lync Server 2013, or a Lync Server 2013 hybrid deployment

After following all the steps, we verified the “Online Meeting” link was visible when scheduling meetings in OWA, and it was “assumed” that OAuth integration was working as expected.

Yet we still were receiving 32054 LS Storage Service errors.

Troubleshooting:

Test-CsExStorageConnectivity –SipUri user1@company.com –Verbose

No connection could be made because the target machine actively refused it 132.245.226.72:443

Test failed.

What we can see here is that the connection was being made to 132.245.226.72, an O365 IP Address.

This is based on the ExchangeAutodiscoverURL being set to https://autodiscover.company.com/autodiscover/autodiscover.svc. Autodiscover.company.com has a CNAME in DNS that resolves to autodiscover.office.com

If I open an Internet Browser and try the same request, I receive an ERR_CONNECTION_REFUSED. So it seems O365 does not like this. Even though step 6 in the above article states to configure it as so.

The next test was to try and point the ExchangeAutodsicoverURL to the On-Premise Hybrid server.

Set-CsOAuthConfiguration –Identity Global –ExchangeAutodiscoverUrl “https://exhybrid.company.com/autodiscover/autodiscover.svc”

Then re-run the CsExStorageConnectivity Test

Test-CsExStorageConnectivity –SipUri user1@company.com –Verbose

“The autodiscover service couldn’t be located”

Test Failed.

I don’t have the entire Verbose log copied, but essentially what we see is Autodiscover finds the user on the Hybrid server, and tries a subsequent Autodiscover lookup based on the users “targetAddress”. Which is set to “user1@company.onmicrosoft.com”, which is used for mail flow during coexistence. This obviously fails.

So what do we do next??? Call Microsoft for support of course.

Resolution:

HTTP instead of HTTPS

The resolution was actually quite simple. Change HTTPS to HTTP in the ExchangeAutodiscoverURL when pointing to O365.

Set-CsOAuthConfiguration –Identity Global –ExchangeAutodiscoverUrl “http://autodiscover.company.com/autodiscover/autodiscover.svc”

The HTTP request will actually redirect to https://autodiscover-s.outlook.com/autodiscover/autodiscover.svc

After changing this to HTTP, we re-ran the CsExStorageConnectivity Test

Test-CsExStorageConnectivity –SipUri user1@company.com –Verbose

Test passed.

The following Event IDs were also registered:

Log Name:      Lync Server
Source:        LS Storage Service
Date:          9/18/2015 2:00:00 PM
Event ID:      32048
Task Category: (4006)
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      LyncFE.company.com

Description:

OAuth was properly configured for Storage Service.

Log Name:      Lync Server
Source:        LS Storage Service
Date:          9/18/2015 2:00:00 PM
Event ID:      32052
Task Category: (4006)
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      LyncFE.company.com

Description:

OAuth STS was properly configured for Storage Service.

32054 Event IDs went away, and users were able to see Meetings via the Lync Mobile Client.

↧

Office Web Apps Not Supported on Non-System Drive

May 23, 2016, 12:58 pm

≫ Next: Skype Meeting Icon missing from OWA in Exchange Online (TLS 1.0 Required)

≪ Previous: Storage Service had an EWS Autodiscovery Failure–32054

Issue:

I ran into the following scenario during a recent project.

Skype for Business users were unable to present Power Point Presentations, and were presented with the following error message:

“Either you’ve lost network connectivity or our server is too busy to handle your request. Please check your network connection and try again later”

As you watched the attempt to share the PPT, you could see that the presentation was successfully uploaded to the meeting. You could see the PPT as an attachment within the “Manage Content” section of the meeting, but after a minute or two of waiting, users would see the above error displayed in their client.

Snooper Traces revealed the following:

54020;reason="The WAC presentation failed with a critical error."

ErrorMessage="Either you’ve lost network connectivity or our server is too busy to handle your request. Please check your network connection and try again later."

ULS Logs revealed the following:

AsyncResult::SetCompleted – Completed with unthrown exception Microsoft.Office.Server.Powerpoint.Pipe.Interface.PipeApplicationException: Exception of type ‘Microsoft.Office.Server.Powerpoint.Pipe.Interface.PipeApplicationException’ was thrown.

at Microsoft.Office.Server.Powerpoint.Pipe.Web.WacViewServices.EndGetItem(IAsyncResult asyncResult)

at Microsoft.Office.Server.Powerpoint.Pipe.Interface.ResourceRequest.End(IAsyncResult asyncResult)

at Microsoft.Office.Server.Powerpoint.Pipe.Interface.PipeManager.OnRequestComplete(IAsyncResult result) WacItemRetrievalResultErrorInfo; ItemRetrivalStatus: InProgress; ErrorCode: ErrorInProgress;

The frustrating part, is that this was all working for several months, and then suddenly stopped working. The first reports of issues correlated to reboots caused by monthly Windows OS patching.

After spending many hours troubleshooting OWAS, reviewing/uninstalling/reinstalling updates, uninstalling/reinstalling OWAS, digging through Snooper traces, tracing CorrelationId’s in Fiddler to ULS logs, etc…it seems we had finally stumbled upon the answer.

Resolution:

Installing OWAS on a Non-System Drive, in this case (D:\), resulted in the errors above. While installing OWAS on the System Drive (C:\), resulted in PowerPoints being presenting as expected.

Following up with Microsoft, they said they have had other reports of OWAS being deployed on Non-System drives, and that doing so is UNSUPPORTED.

Here is the link they referenced as documentation for it being unsupported… https://blogs.technet.microsoft.com/office_web_apps_server_2013_support_blog/2014/03/26/office-web-apps-2013-vmware/

So be warned, DON’T INSTALL OWAS ON A NON-SYSTEM DRIVE!!!

↧

Skype Meeting Icon missing from OWA in Exchange Online (TLS 1.0 Required)

June 22, 2016, 1:39 pm

≫ Next: Azure Voice Mail…Explained (Kind of)

≪ Previous: Office Web Apps Not Supported on Non-System Drive

In an effort to adhere to stricter security policies and updated PCI guidelines, a recent customer implemented a policy that required SSL 3.0, TLS 1.0, and known vulnerable cipher suites to be disabled. Only TLS 1.1 and higher would be allowed.

Setup:

Skype for Business deployed OnPremise and Exchange Online in O365.

Reverse Proxy had SSL 3.0 and TLS 1.0 disabled.

Findings:

One of our findings was that the Skype Meeting icon was missing from OWA in Exchange Online.

Troubleshooting:

We double checked all the integration steps located here

here

Unfortunately, none of these seemed to do the trick.

Test-OauthConnectivity returned successfully in Exchange Online

Test-CsExStorageConnectivity returned successfully in Skype4B OnPrem

The Skype for Business Autodiscover Web Service test (via the Remote Connectivity Analyzer site) failed with “The certificate couldn’t be validated because SSL negotiation wasn’t successful”

So we turned to Fiddler…

Fiddler traces for browser initiated sessions to the Meeting join page, or to Lyncdiscover, showed the connection was established using TLS 1.2 and successfully connected.

Fiddler traces from the Microsoft Lync Connectivity Analyzer showed the connection was established using TLS 1.0, and resulted in an error stating “Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host”

Version: 3.1 (TLS/1.0)
Random:
"Time": 10/25/2084 8:40:55 AM
SessionID: empty
Extensions:
server_name skypewebext.company.com
elliptic_curves secp256r1 [0x17], secp384r1 [0x18]
ec_point_formats uncompressed [0x0]
SessionTicket empty
extended_master_secret empty
renegotiation_info 00

So it appears that Web Service calls from the Skype4B/Lync client, as well as those coming from Exchange Online, are hard coded to use TLS 1.0.

Resolution:

Re-enabling TLS 1.0 on the Reverse Proxy resolved all these issues.

Scheduling Skype Meetings became available in Exchange Online
Remote Connectivity Analyzer Autodiscover tests ran successfully

I followed up with Microsoft on my findings, and they indicated that disabling TLS 1.0 on Skype4B servers wasn’t supported. They also said a KB article would be released in the future saying disabling TLS 1.0 wasn’t supported, but didn’t have a time frame.

↧

Azure Voice Mail…Explained (Kind of)

January 23, 2017, 12:22 pm

≫ Next: Adding a "Cloud Only" SIP Domain in a Skype for Business Hybrid World

≪ Previous: Skype Meeting Icon missing from OWA in Exchange Online (TLS 1.0 Required)

I’m not sure about the rest of the world, but Azure Voicemail (AVM), aka Cloud PBX Voicemail, has had me utterly confused. I want to take a minute and explain what I’ve discovered, deduced, and can divulge from conversations I’ve had with Microsoft.

Let’s be clear, I think AVM is a good thing. AVM brings all things SfB under one product group, which makes sense. It eliminates the Exchange group from the conversation and streamlines the process. I’m sure it makes development easier, regression testing easier, and support easier when it all falls under one umbrella. I know my job is much easier when resources and decision makers all align in one practice.

What I don’t understand though, is why it came to market without any documentation, without feature parity, and leaving the user with options that frankly don’t work in certain scenarios.

So what is AVM? Well, plain and simple, it’s a voicemail solution for SfB Cloud PBX with PSTN Calling or SfB Cloud PBX with OnPrem-PSTN Calling, that deposits voicemail messages into an Exchange Online or OnPrem mailbox.

Why do I find AVM confusing?

It still requires ExUM…really?…yeah kind of!
It doesn’t require an ExUM Policy
But…if an ExUM Policy is applied which can happen in certain scenarios, none of the settings actually apply to the user
There’s no configuration available for AVM at the moment
There’s no documentation to help administrators make the right decision, or explain WTF is going on

There are 3 scenarios I want to look at:

1. A new user is enabled in O365, SfB Online and PSTN Calling

2. A user is migrated from SfB OnPrem and ExUM OnPrem, to SfB Online and AVM

3. A confused admin enables ExUM for a SfB Online user

Scenario 1: A new user enabled in O365, SfB Online and PSTN Calling

A new user (Skype4bT5) is licensed with an E5 and PSTN Domestic Calling plan. A new mailbox is auto-provisioned in Exchange Online and a new SfB account is created in SfB Online.

If we look in Exchange Online, Skype4bT5’s mailbox is auto-provisioned, and Unified Messaging is Disabled. Pretty straight forward.

Next we sign into the SfBO portal and assign Skype4bT5 a phone number.

After the user is enabled for PSTN Calling and assigned a number, we can go back and have a look at Exchange Online to see if UM is enabled. Nope, still disabled. Checked via console and PowerShell just to be sure.

Next we place a call to Skype4bt5 and leave a message.

The message is delivered to the mailbox, and Skype4bt5 is able see it and listen to it. Keep in mind that at no point did we enable ExUM for this user.

This is where things get interesting.

If we go back to Exchange, we can see that Unified Messaging is now Enabled. Without any administrator intervention mind you. There seems to be an automated process (Magic) behind the scenes that will enable UM if it is not previously enabled when a VM is delivered. You’ll also notice there is no extension and no UM MailboxPolicy.

When we check in the EAC, the user is enabled, but when we click “View Details”, we actually get an error message. Which apparently is the expected result…a feature I suppose you could say.

Why does it enable ExUM? Well, it would seem that ExUM gets enabled for a single purpose. From what I’m told by Microsoft, this is so the client side API’s light up visual voicemail in the Outlook Client.

ExUM Disabled (MP3 Attached) ExUM is enabled (Visual Voicemail Available)

The user now gets visual voicemail in the Outlook client and voicemail appears to be functioning properly in the SfB client. However, the user gets no control over things like: Call Answering Rules, Greetings, Notifications, Outlook Voice Access, Play on Phone, Reset PIN, and Voice Mail Preview. Settings that would traditionally be the Settings page in OWA, but these options don’t exist.

Which means there are no additional features currently available in AVM other than receiving voicemail to your mailbox. Transcription is currently available in Preview, but is not enabled by default.

Scenario 2: User migrated from SfB OnPrem and ExUM OnPrem, to SfB Online and AVM

In this scenario, we are making a few assumptions. First, let’s assume that we migrated our Exchange Mailbox from Exchange OnPrem to Exchange Online (ExO) sometime in the past. Which means we have setup an ExUM Dial Plan and ExUM Mailbox Policy in ExO and assigned an ExUM policy to every OnPrem SfB Enterprise Voice user.

Just so everyone follows, Exchange UM Online is setup as the voicemail solution for SfB OnPrem Enterprise Voice users. Follow me here?

Now, at some point in the future (let’s just assume we operated in this hybrid configuration for some time), SfB is migrated to SfB Online, and the users number is ported to Microsoft.

Now, both SfB and Exchange are enabled and homed purely Online in O365.

In this scenario, ExUM would have already been enabled for each user, a UM Mailbox Policy would have already been set and an Extension would have already configured. As seen below.

Since ExUM was already enabled and an ExUM Policy is set for the user in ExO, additional options are lit up on the Settings page in OWA:

That’s great right, it would appear that we now have the full range of ExUM features available for the SfBO end user! Well, not exactly. While the features appear to be available to configure, this unfortunately is just a side-effect of having an ExUM Policy set for the user. The user can go in and configure these settings till their heart’s content, which gives the user the appearance that these options are set and should work. Unfortunately, most of them don’t actually do anything in this scenario…such as: Call answering rules, greetings, notifications, voice mail preview options.

This leaves the user with a less than ideal experience. They *think* they’ve configured a bunch of settings, and will expect them to work, but in reality they don’t.

This can be especially confusing/frustrating if the user was using these features prior to the SfB Online migration. Where these may have already be configured, and simply stop working.

What’s still unclear to me, is what the admin should do at this point? Now that users are enabled for SfBO, PSTN Calling, and have an ExUM Mailbox Policy assigned, how do you clear the UM Mailbox Policy and reset the mailbox to get the expected AVM experience?

If the admin goes back and disables Exchange UM for the user, AVM does not do its behind the scenes magic and re-enable UM without a UM Mailbox Policy. ExUM simply remains disabled. Meaning AVM is still enabled and the user receives new voicemails, but Visual Voicemail is not available in Outlook.

So it would appear that the users are stuck with ExUM enabled with the pre-existing ExUM Mailbox Policy. End-user communication becomes critical as to ensure users are aware that these settings no longer function.

Next time, I will try to disable ExUM on the users prior to the SfB migration. This will hopefully allow AVM to auto-provision ExUM behind the scenes, enabling Visual Voicemail, all while not exposing the additional ExUM settings in the OWA settings page.

Scenario 3: A confused admin enables ExUM for SfB Online user

I’ll admit, this scenario is based off my experiences prior to working through this.

In this scenario, the administrator doesn’t have any guidelines, documentation, or best practices to follow (there aren’t any published), and he/she goes in and enables all SfB Online PSTN Calling Users for ExUM. Every user gets an ExUM Mailbox Policy, and an EUM extension.

The net results are exactly the same as Scenario 2. The user is enabled for ExUM and AVM continues to deliver the voicemail to the user’s mailbox as expected; unfortunately, the user is assigned an ExUM Voicemail Policy, which means Voice Mail settings are again exposed in the OWA settings, and they still don’t function.

Summary:

In summary, I hope this is only a temporary issue. The Skype for Business product groups is continuing to release features and add new things in Preview. I’m sure it won’t be long until features Online surpass those OnPrem. In the meantime, I hope this helps clear a little bit of the confusion with AVM and ExUM…if not making the waters just a bit murkier.

↧

Adding a "Cloud Only" SIP Domain in a Skype for Business Hybrid World

March 21, 2017, 3:42 pm

≪ Previous: Azure Voice Mail…Explained (Kind of)

You know the old saying, when it rains it pours. Well, I had 4 different inquiries last week alone on this topic. I thought I knew how it would work, but figured I needed proof. So naturally, I took to my lab.

Setup:

Contoso has Skype for Business deployed in a hybrid (split-domain) configuration. 80% of the users are homed On-Prem as they use Enterprise Voice. 20% do not use Enterprise Voice, and these users are homed online. Contoso uses the SIP namespace contoso.com for all users, both On-Prem and Online.

A very common scenario in today’s hybrid world.

Scenario:

There becomes a need to introduce a new SIP namespace (Fabrikam.com). The desire is to home these users 100% Online.

The natural thought process would be to simply add the validated domain to the O365 tenant, and enable the new users for Skype for Business Online using the new SIP namespace. Meaning, create the new users in AD with a UPN matching user@fabrikam.com, let AADC sync the user to O365, then license the user for Skype for Business.

This would also include creating all the necessary DNS records and pointing them to O365. The rational thought here being this is a SfB Online ONLY SIP Namespace. No On-Prem users will use this namespace. So, let’s setup it up as such.

Issue:

Once this has been setup, we see our first issue. Users created in SfB Online using the new SIP domain (Fabrikam.com) cannot see presence or IM with users On-Prem (Contoso.com).

Test Users:

User	Location	SIP Domain	SIP Address
Alice Wonderland	On-Prem	Contoso.com	awonderland@contoso.com
Peter Parker	Online	Contoso.com	pparker@contoso.com
Jack Bauer	Online	Fabrikam.com	jbauer@fabrikam.com
Jeremy Silber	Federated User	Hidden to protect the innocent	Hidden to protect the innocent

Symptoms: (Screenshots Below per Scenario)

On-Prem to Online:
- Alice Wonderland (Contoso.com – Homed On-Prem) can see presence and initiate IMs with Jack Bauer (Fabrikam.com – Homed Online)
- Jack Bauer (Fabrikam.com – Homed Online) can receive IM’s from Alice Wonderland (Contoso.com – Homed On-Prem) and reply to IM’s, but presence for Alice is not available.
Online to OnPrem:
- Jack Bauer (Fabrikam.com – Homed Online) can sign-in to SfB using the new SIP namespace
- Jack Bauer (Fabrikam.com – Homed Online) can see presence and initiate IM’s with users homed online within the same tenant (both SIP domains) and vice-versa.
- Jack Bauer (Fabrikam.com – Homed Online) can see presence and initiate IM’s with federated domains and vice-versa
- Jack Bauer (Fabrikam.com – Homed Online), CANNOT see presence or initiate IM’s with Alice Wonderland (Contoso.com – Homed On-Prem)
Online to Online:
- Peter Parker (Contoso.com – Homed Online) can see presence and IM with everyone.

Troubleshooting:

From within Snooper, we can see a “504 Server Time-out” error, when Jack Bauer tries to initiate an IM with Alice Wonderland.

Naturally, my first troubleshooting step is to Google the error. “Cannot route From and To domains in this combination”;cause=”Possible server configuration issue”;summary=”The domain of the message that corresponds to local deployment (internal) is not shared with remote peer.” Which doesn’t return anything of value. Hence my writing this article.

In previous conversations with Microsoft, they have stated something to the effect of all Online SIP Namespaces must also be valid On-Prem SIP Domains. Meaning both Contoso.com and Fabrikam.com should be added as valid SIP domains in the Topology Builder of my On-Prem SfB Deployment. While the error message is kind of vague and cryptic, it sounds plausible that this could be the issue.

Testing:

To test this theory, I figured that I would need to add Fabrikam.com as an “Additional supported SIP Domain” within the topology builder.

I wanted to test each stage individually to see when exactly this would start working.

Add Fabrikam.com as supported SIP Domain. Publish Topology : No change
Before:

After:
Update internal Front End Certificates: No Change
Restart Front End Service and Access Edge Service after successful replication: No Change
Update Access Edge Certificate: Success

Updating the Access Edge Server certificate to include “Sip.fabrikam.com” is the step that made this start working.

This made me think, do I really need Fabrikam.com added as a valid SIP domain in the On-Prem topology, or does it only want the certificate? So, I removed fabrikam.com from the topology builder, but left the certificate in place on the Edge server. What do you know, it still worked! Presence was still available and IM continued to work.

To double check, I added the old certificate without the sip.fabrikam.com SAN entry. Again IM/P broke. I then re-added the new cert with the SAN entry sip.fabkrikam.com and voila, again IM/P started working.

Updating the Access Edge certificate to include a SAN entry for the new SIP namespace (sip.fabrikam.com) works, without having to update the entire Lync/SfB Topology. Albeit, what works…isn’t always supported.

DNS CNAME and SRV records for fabrikam.com point to SfB Online. Which is exactly what I wanted.

Resolution:

From a support perspective, Microsoft states (Although I’m still looking for an official statement), that any SIP domain used in SfB Online, must also be a valid SIP domain in the Lync/SfB On-Prem topology. Which makes sense. If you update the topology builder to include the new SIP namespace, and re-run the certificate wizard on the Edge servers as you’re supposed to, the new SIP namespace will automatically be included as a new SAN entry. As this is an expected outcome of adding the new SIP namespace to the topology, this is what Microsoft tested, and therefore supports.

While it’s always my recommendation to stay in a supported scenario, it does seem plausible to just update the Edge server certificates with the new SIP SAN entry, without updating the entire topology. I’d also bet that updating the certificate really is the only step necessary. Of course, I will admit that I did not test all functionality. Only IM/P in this scenario with a single Edge server. Further testing may prove other workloads don’t work as expected.

UPDATE 4/5/2017 – In reading through the new SOF material, specifically “3 – Design-Cloud PBX, PSTN Conferencing and Client – Design and Migration Document”, I came across the Hybrid Deployment Prerequisite section, Table 38.

Table 38 – Hybrid Deployment prerequisites

Question Answer Comments

SIP domain(s) in the on-premises Lync Server
or Skype for Business Server deployment Verify the list of SIP domain(s) matches the list of Office 365 tenant’s validated domain(s) for Skype for Business Online.
If not, plan and document the effort to ensure SIP domain(s) between on-premises and the cloud are in- sync as there are impacts to certificates and DNS records.

Office 365 tenant’s validated domain(s)
enabled/planned for Skype for Business Online Verify the list of Office 365 tenant’s validated domain(s) for Skype for Business Online matches the list of on-premises SIP domain(s).
If not, plan and document the effort to ensure Office 365 tenant’s validated domain(s) for Skype for Business Online are in-sync with on-premises SIP domain(s) as there are impacts to external DNS configuration to verify domain ownership.

Details of on-premises Edge server’s Access
Edge certificate (issuer, subject name,
subject alternative name(s), etc.) Verify the certificate is issued by a public Certificate Authority (CA), or a CA listed in the list of Unified Communications certificate partners.
Verify the list of subject alternative name(s) contains access edge FQDN(s) for all SIP domain(s) intended for the hybrid relationship.
If not, plan and document the effort to reissue the certificate to include subject alternative name(s) to support all SIP domain(s) in the hybrid relationship.

While not entirely the same use case, there is also a reference to this in the “Plan for Skype for Business Cloud Connector Edition” documentation https://technet.microsoft.com/en-us/library/mt605227.aspx#BKMK_Certs. “You will need to add sip.sipdomain.com for every SIP domain and the name of the access Edge pools per domain”.

Further, in discussing this with a colleague at Microsoft, it would seem that there is logic in CCE that requires a SAN entry for “sip.sipdomain.com” on the Access Edge Pool certificate for every SIP domain in the environment. This is used as part of the authentication/validation check that permits communication from SfB Online to CCE. It would seem this same logic is used for authentication/validation in a Hybrid deployment as we can see here.

Next…In certain scenarios, unsupported ones of course, I can see where adding the SIP namespace On-Prem would not be possible. Let’s take the scenario where we have an acquisition. 2 separate Active Directory Forests, with the assumption they will stay separate for some time. We can’t share the SIP Namespace between 2 On-Prem Lync/SfB deployments. Plus, we want to start moving workloads to the cloud. Well, Azure AD Connect can be setup to sync both forests to the same O365 tenant (Supported). Exchange Hybrid can be configured in both forests to sync to the same O365 tenant (Supported). Why wouldn’t we want Lync/SfB setup in hybrid with the same O365 tenant as well (unsupported). I know dual-Hybrid in Lync/SfB is not supported, but now I’m curious if we can make it work. Can we use the same workaround used here, to trick SfB Online into working in this unsupported configuration? We’re out of time for today though, so that will have to wait for another post. Stay Tuned!

↧

Table 38 – Hybrid Deployment prerequisites
Question	Answer	Comments
SIP domain(s) in the on-premises Lync Server or Skype for Business Server deployment		Verify the list of SIP domain(s) matches the list of Office 365 tenant’s validated domain(s) for Skype for Business Online. If not, plan and document the effort to ensure SIP domain(s) between on-premises and the cloud are in- sync as there are impacts to certificates and DNS records.
Office 365 tenant’s validated domain(s) enabled/planned for Skype for Business Online		Verify the list of Office 365 tenant’s validated domain(s) for Skype for Business Online matches the list of on-premises SIP domain(s). If not, plan and document the effort to ensure Office 365 tenant’s validated domain(s) for Skype for Business Online are in-sync with on-premises SIP domain(s) as there are impacts to external DNS configuration to verify domain ownership.
Details of on-premises Edge server’s Access Edge certificate (issuer, subject name, subject alternative name(s), etc.)		Verify the certificate is issued by a public Certificate Authority (CA), or a CA listed in the list of Unified Communications certificate partners. Verify the list of subject alternative name(s) contains access edge FQDN(s) for all SIP domain(s) intended for the hybrid relationship. If not, plan and document the effort to reissue the certificate to include subject alternative name(s) to support all SIP domain(s) in the hybrid relationship.