Full explanation of NSX-T Federation (and that rhymes)

With the General Availability of VMware NSX-T version 3.0, one of the most anticipated features is released: “Federation”. NSX-T Federation enables customers to stretch their NSX-T deployments over multiple site and/or towards the public cloud, but still keeping one Single Pane of Management. With NSX for vSphere (NSX-v) this feature was also available and was called the Cross-vCenter deployment, which enabled you to could create universal objects which then where distributed to all connected NSX Managers. The Cross-vCenter feature of NSX-v had some limitations and VMware has learned from their shortcoming(s), which has been overcome with NSX-T Federation: You could say that NSX-T Federation is the next evolution of multi-location SDN for VMware.

NSX-T Federation Use Cases

The main goal of NSX-T Federation is to enable multi-site NSX-T deployments, where the management-, control- and data planes are distributed over multiple locations (connected sites) and/or public cloud (with NSX Cloud). NSX Cloud integrated Public Cloud (Amazon Web Service (AWS) and Microsoft Azure) native networking constructs into the NSX Management plane. VMConAWS can be seen as a normal NSX-T connected site.

Simplified Operational management: Uniform Configuration & Security Enforcement
The management plane is distributed over all sites, but still managed as a single entity. Objects are created “globally” and are available at each connected site automatically, which takes away the burden of manually replicating the objects to each site individually. You will end up with a uniform environment throughout all sites. Having the objects available at each connected sites lowers the operational impact significantly and improves security, as security policies are uniformly enforced at each connected site.

Simplified Disaster Recovery (DR)
NSX-T Federation enables customers to stretched their networks (called logical segments and logical routers) over the connected sites. When a disaster occurs the administrator (or a DR tool like VMware Site Recovery manager) can seamlessly recover you virtual machines on the DR site, without having to deal with network modification for the recovered Virtual Machines. Traditionally networks were only operational in one site, when a Virtual Machine was recovered to a DR site, the network connection was not established automatically. A manual or scripted network modification of the recovered VM(s) was needed to re-establish the network connectivity again. This old architecture adds operational complexity in nowadays changing application landscape environments. Removing the necessity of having to re-ip your virtual machines in a DR scenario greatly enhances RTO objectives and lowers the operational burden.

Also network services, which are provided by logical routers, can now be distributed over multiple locations. A logical router can be stretched over multiple location, which are placed in a primary and secondary instances. The primary instance is responsible for the uplink (ramp-on/ramp-off) connectivity towards the physical network, the secondary instance is connecting to the primary instance for the uplink connectivity. In case of a site outage of the primary instance, after a Global Manager failover (which I will discuss in the next section), the secondary Logical Router instance will be made active to a primary Logical Router. It can now handle the uplink connectivity for the secondary site. This solution will be discussed more in depth in section Spanned Logical Routers.

NSX-T Federation Topology

With NSX-T Federation a new concept of a Global Manager (GM) is introduced, which enables a Single Pane of Glass for all connected NSX-T enabled locations. Global objects from the Global Manager are pushed to the site-local NSX Managers.
The GM component itself consist of two clusters which are placed in a active/standby configuration, ideally distributed over 2 sites for availability purposes: When a site outage of the active cluster occurs, the standby site (including the standby cluster) takes over the active role. The configuration is synchronized between the active and standby clusters, to prevent data loss. Each (node majority) cluster consist of a 3 management appliances, which brings the GM components to a total of 6 NSX Manager appliances.

The GM is connected to all Local Managers (LM) and/or public cloud NSX Managers, which still can be managed independently. The GM pushes global objects to the LM, which are placed in conjunction with the site-local objects. So a LM can serve its site-local and global objects. The global objects are shared between connected LM’s for uniform configuration and security enforcement. The GM only pushes global objects to the LM’s where the objects are relevant, if a global object is not relevant for a LM it will not push the configuration. This will preserve unnecessary resource consumption of the LM’s

An overview of the NSX-T Federation components

From a management plane perspective the following rule of thumb applies:
6 + (number of sites x 3) = Total number of NSX-T Manager s
So, for example, when you want to federate 3 sites, you need (6+3*3) = 15 NSX-T Managers.
So be aware of the resource consumption that NSX-T Federation requires, as the amount of NSX-T Manager needs to be multiplied by the large sized NSX-T Manager system requirements (which currently is 12 vCPU and 48 GB RAM).
For a 3 site NSX-T Federation topology 180 vCPUs and 720 GB of RAM is required.

At this moment, there is no limitations of the amount of sites that you want to federate: The only limitation that applies is the amount of hypervisor hosts that can be federated. When building a NSX-T Federation topology the limitation of the amount of hypervisor hosts (which is currently 1024 hosts) is shared between all federated NSX Managers: This means that a NSX-T Federation topology, in total, may not exceed 1024 hypervisor hosts at this time. The reason behind this shared limitation is the GM component which federates all hypervisors host, the management burden for all the hypervisor hosts are consolidated into the GM component. This “general” limitation applies to the GM component as it is a “normal” NSX Management cluster.
If you run into this limitation, please contact me: I think you run into a different set of problems when your environment scales like this #gnifl.

Until now we only talked about the GM to LM communication, but with NSX-T Federation also LM to LM communication is being established. In the NSX for vSphere Cross vCenter era, there was no communication channels between the (secondary) NSX Managers which introduced one of the shortcomings of the solution: When a VM was attached to a NSX Universal Security Group, the translation of that VM was not shared with all connected NSX Managers. This could result in blocked communication, as group membership (translations) waren’t shared and the Distributed Firewall actively blocks that communication because of missing crucial pieces of IP information. Your only option was to work with NSX IpSet objects which where shared by all connected NSX Manager.
This limitation has been overcome with NSX-T Federation by introducing LM-to-LM communications. The GM pushes a Security Group objects to the LM’s, but the group membership and translated IP addresses are shared between the LM’s. This allows customers to use Security Tagging for the membership of VM’s to Global Security Groups (something that wasn’t available with NSX-V active/active Cross-vCenter configurations).

NSX-T Global Security Group overview

NSX-T offers Security Groups: VM can be made member of a security group (dynamically) and these security groups can be used to create firewall rules. VM’s can be made member of a security group directly or by using membership criteria dynamically (including Security Tagging and name-based membership).
This feature is available in all NSX solutions and versions, but with NSX-T a concept of global and regional security groups is introduced. The difference is the span-width of a security group: a security group can now span multiple locations (LM’s). The span-width of security groups can be selected on per object basis: You can have security groups available on all LM’s and other security groups can only be available on selected LM’s. This can be interesting for infrastructure-based firewall rules (for example: Active Directory, DNS, NTP, SMPT, etc.) which must be globally available and application-based firewall rules which only must be available in a region (aka selected sites). Security Group Objects can be created globally, regionally and locally, but can all be utilized together in the same firewall rules: The DFW firewall rules can use a mix of different types of Security Group objects. This allows you to re-use the firewall rules, lowering the total amount of firewall rules which must be managed.

Spanned Network Topologies

As with Security Groups, other network objects (like Logical Segments and Logical Routers) can now have different span widths (globally, regional and locally) too. NSX-T Federation allows you to have a mix of the different types of logical routers and segments. As said, enlarging the span width of Logical Segment and Logical Routers improve Disaster Recovery scenario’s: You can migrate virtual machines to another location without having to re-ip them.

The concept of stretched networks over multiple locations is not new. How NSX-T Federation addresses stretched networks is rather “new” (from a VMware perspective). Each Logical Segment is assigned a Virtual Network Identifier (VNI): A identification number of 5000 or higher. With NSX for vSphere a Universal ID pool was used to assign VNI’s to stretched networks: With NSX-T each network is assigned a VNI for each location where the network exists. For example: If you have a “HR” stretched network that spans 3 locations, this network is assigned a VNI of 5001 in Location 1, a VNI of 6001 in Location 2 and an VNI of 7001 in Location 3. To establish communication between those segments , a bridging Logical Segment is being used, which are connected to Logical Routers. The diagram below shows the topology of this concept.

The bridging segments provide connectivity between the local segments, creating one spanned (stretched) Logical Segment. These bridging segments are automatically created by NSX-T, you only select the locations where the segment should be made available and NSX-T takes care of the rest.

Spanned Logical Routers

With NSX-T Federation it is also possible to span your (Tier-0 and Tier-1) Logical Router instances over multiple locations (just as Logical Segments). Be ware that a NSX-T Logical Router consist of 2 components: a service router (SR)- and a distributed router (DR) component. The SR is responsible for stateful services (like VPN, Firewalling, DHCP server, meta data proxy, etc) and providing uplink connectivity to the physical network. The DR is responsible for the routing between logical segments and is offloaded to the hypervisor hosts. With NSX-T Federation, both components will span the locations.

NSX-T Federation support both active/active and active/standby high availability modes for Tier-0 and Tier-1 routers. A (spanned) Logical Router is deployed on location specific Edge Clusters, which means that a spanned Logical Router is deployed on an Edge Cluster in location 1 and on an Edge Cluster in location 2 (and so on..). Availability Modes are configured per Edge Cluster and are not distributed over the Edge Clusters: Each Edge Cluster in each location will have an Active/Standby pair. A spanned Logical Router can have multiple active/standby pairs: one pair per Edge Cluster (location).

Only the SR component is an exception as it can be only active in one location (= Edge Cluster) at any given time due the nature of stateful services and no support for flow table synchronization by NSX-T (which potentially could enable distributed stateful services .. okay, this is wishful thinking). Logical Routers are deployed on Primary and Secondary Edge Clusters: The SR will be only active (or standby) on the Primary Edge Cluster. The SR’s on the Secondary Edge Cluster will be configured as “remote-active” and/or “remote-standby”, they will accept incoming traffic but they will send it to the primary Edge Cluster over the routerlink (which is an spanned logical segment).