Technical and Non-Technical lessons learned from SDN implementation.

SDN implementation from any vendor is a big undertaking. Unlike traditional networks, where lifecycle activities individually within network, compute, storage or security teams can happen in fair bit of isolation from each other, SDN implementation requires all teams, especially network, compute, application and security teams, to work more closely with each other than ever before.

(source: http://www.imagesource.com/)

We’ll first cover differences between traditional application deployment and SDN based application deployment. Then we’ll go over the lessons learned based on that insight.

In the past, network team would provision some interfaces, storage team would provision some LUNs, and firewall team would configure some flows, and then the compute/application team build their virtual and physical infrastructure. With SDN, multiple teams merge into one or two larger teams. The virtualization and storage team provision the infrastructure once for the all the hypervisors, and then is mostly out of the picture. The network/security/compute team (now effectively one team) works with the application owners directly to provision virtual networks with (preferably) zero-trust security and provisions VMs (or even physical servers/appliances) based on application requirements.

In smaller organizations, the network/security/compute team roles could be merged into fewer roles, however in larger organizations, individuals may continue to specialize and control their specific areas of responsibilities, however, they need to learn about each others’ domains and work much more closely than before.

Without SDN:

Previously, the network specialist could configure VLAN 220 for all the AD servers, and not know anything about AD beyond that, or the compute specialist have to know anything about VLAN 220 as long as the AD server could reach all the other servers, or the security specialist would configure flows between IP x/port x and IP y/port y, and not have to know anything about AD or VLANs.

With SDN:

With SDN, an example application owner, administering the AD environment, needs to sit down with the network, security, and compute specialist and tell them in more detail how AD functions and what it’s requirements are.

Since all servers and workstations need to communicate with AD for authentication and acquiring GPO/policy, the firewall specialist may create a global policy allowing appropriate flows to just the required TCP/UDP ports from all subnets within the organization. There will naturally be redundant AD servers – so they would be made part of one group – and all the rest of the servers/workstations would be made part of another logical group. The flows then would be enabled in between groups where the group membership would be dynamic – based on IP address, VM name, location, or almost any other factor. The security team would reuse these groups, possibly nesting them inside other groups, or nesting other groups inside the AD related groups, a practice which ultimately helps the firewall functionality to scale better.

The network specialist would make sure that the AD servers’ group is located in the DC, subnet and pod which would overall provide the least latency and appropriate throughput to all of the AD’s clients. In addition to the appropriate VLAN/Layer 2 segment configuration, and appropriate routing configuration to ensure AD is accessible by all enterprise clients, it may even be worthwhile prioritizing AD traffic via QoS policies (that translate between SDN virtual routers and switches and the physical routers and switches). For some other types of applications, e.g. email, web servers, etc. other networking functions such as load balancing, IDS/IPS and VPNs may also be implemented.

The compute specialist would make sure that the VMs serving the AD application are appropriately sized, have the required levels of backup, have appropriate high-availability and disaster recovery options available, affinity rules where required, and have appropriate priority above other VMs in terms of CPU and memory.

Similar exercise needs to happen for every application! Depending on the size of the data center, there could be hundereds or even thousands of applications hosted in just one data center.

So where do you begin?

Since the above-mentioned process is a daunting task, most SDN solutions initially focus on migrating the infrastructure into the SDN based solution mostly intact – mimicking existing routing, switching, firewall, load balancing and other components of the network. Then, applications are slowly either evaluated using the above process or only new applications are implemented using the above process with old ones eventually sunsetting. Even migrating in such a manner is a daunting task as well – but it’s a little more manageable than having to go through 1000+ applications and understand exactly how each one works and implementing zero-trust policies from the get-go.

Although in traditional networks, each specialist may have their own monitoring tools, which they may continue to use, the advent of SDN brings a new class of monitoring tools, often built-in to the SDN solution that monitor the health of the whole infrastructure: network, compute, firewalls and storage.

This finally brings us to the topic of this post. Having been through the early stages of deploying an SDN solution, in my case with VMware NSX in particular, one can see that the larger the data center, the more complex the migration will be. Based on above considerations, and other lessons learned by my team and the overall project team, here are some recommendations:

Non-technical:

ensure that you have buy-in from all top-level management/executives
complete a detailed Total Cost of Ownership (TCO) exercise for the SDN solution
ensure that the SDN technical leads have appropriate training to implement the SDN solution (see my previous post)
ensure that you have all technical teams on board from the early stages of the project and have buy-in from all teams. This is absolutely essential, and may simply involve educating all project team members about the benefits of the chosen SDN solution. This should include team members from, but not limited to:
- Networking
- Security
- VMware
- Storage (Array)
- Storage (SAN)
- Project Management Office (PMO)
all project members must have a clear picture (through meetings, documentation, project charter, etc.) of the goals and objectives of the project – something that is signed off by the appropriate project sponsor. This can be facilitated over one or more webinars.
ensure that every technical team makes at least one resource available for the duration of the project with backups identified
since many project members may be working with each other for the first time (e.g. network and application teams), ensure that project is started in advanced enough such that the project team has the time to get to know each other – and such team work is deliberately facilitated by the PMs
ensure that the project manager delegated by the sponsoring executive has authority for the scope of the project over the technical resources
Have buffer funding and buffered timelines to deal with contingencies (a good PM will take care of this by default)
Plan, plan and plan some more

Technical:

migrate to SDN mimicking existing policies (routing, switching, firewall, load balancers, etc.) and then migrate to zero-trust based policies and SDN design later
ensure that end-state SDN design is completed well before the implementation (and possibly vetted by vendor Sales Engineer)
setup at least five different hour-long webinars (in the early stages of the project) giving overviews of all the different technical domains (network, storage, security, compute, applications) by the respective technical leads to the whole project team – possibly even including executive(s) and sponsor(s)
setup at least one or two different hour-long webinars with SDN technical lead providing a clear picture of the planned implementation – again possibly including executive(s) and sponsor(s)

Although my current project is exceptionally well managed, I am hoping the above pointers will save teams working in larger organizations some headache 🙂