Untitled 1

Urs Holzle did the keynote talk at the 2012 Open Networking Summit where he focused on Software Defined Networking in Wide Area Networking. Urs leads the Technical Infrastructure group at Google where he is Senior VP and Technical Fellow. Software defined networking (SDN) is the central management of networking routing decisions rather than depending upon distributed routing algorithms running semi-autonomously on each router. Essentially what is playing out in the networking world is a replay of what we have seen in the server world across many dimensions. The dimension that is central to the SDN discussion is a datacenter full of 10k to 50k servers are not managed individually by an administrator and the nodes making up the networking fabric shouldn’t be either.

The key observations behind SDN are 1) if the entire system is under single administrative control, central routing control is possible, 2) at the scale of a single administrative domain, central control of networking routing decisions is practical, and 3) central routing control allows many advantages including faster convergence on failure, priority-based routing decisions when resource constrained, application-aware routing and it enables the same software system that manages application deployment to manage network configuration.

In Holzle’s talk, he motivated SDN by first talking about WAN economics:

· Cost per bit/sec delivered should go down with scale rather than up (consider analogy in compute and storage)

· However, cost/bit doesn’t naturally decrease with size due to:

o Quadratic complexity in pairwise interactions

o Manual management and configuration of individual elements

o Complexity of automation due to non-standard vendor configuration APIs

· Solution: Manage the WAN as a fabric rather than as a collection of individual boxes

· Current equipment and protocols don’t support this:

o Internet protocols are box-centric rather than fabric-centric

o Little support for monitoring and operations

o Optimized for “eventual consistency” in networking

o Little baseline support for low-latency routing and fast failover

· Advantages of central traffic engineering:

o Better networking utilization with a global view

o Converges faster to target optimum on failure

o Allows more control and to specify application intent:

§ Deterministic behavior simplifies planning vs overprovisioning for worst case variability

o Can mirror product event streams for testing to support faster innovation and roust software development

o Controller uses modern server hardware (50x better performance)

· Testability matters:

o Decentralized requires a full scale test bed of production network to test new traffic engineering features

o Centralized can tap real production input to research new ideas and to test new implementations

· SDN Testing Strategy:

o Various logical modules enable testing in isolation

o Virtual environment to experiment and test with the complete system end-to-end

§ Everything is real except the hardware

o Allows use of tools to validate state across all devices after every update from central server

§ Enforce ‘make before break’ semantics

o Able to simulate the entire back-bone with real monitoring and alerts

· Google is using custom networking equipment with 100s of ports of 10GigE

o Dataplane runs on merchant silicon routing ASICs

o Control plane runs on Linux hosted on custom hardware

o Supports OpenFlow

o Quagga BGP and ISIS stacks

o Only supports the protocols in use at Google

· OpenFlow Deployment History:

o The OpenFlow deployment was done on the Google internal (non-customer facing) network

o Phase I: Spring 2010

§ Install OpenFlow-controlled switches but make them look like regular routers

§ BGP/ISIS/OSPF now interfaces with OpenFlow controller to program switch state

§ Installation procedure:

· Pre-deploy gear at one site, take down 50% of bandwidth, perform upgrade, bring new equipment online and repeat with the remaining capacity

· Repeat at other sites

o Phase II: Mid 2011

§ Activate simple SDN without traffic engineering

§ Ramp traffic up on test network

§ Test transparent software rollouts

o Phase III: Early 2012

§ All datacenter backbone traffic carried by new network

§ Rolled out central traffic engineering

· Optimized routing based upon 7 application level priorities

· Globally optimized flow placement

§ External copy scheduler works with the OpenFlow controller to implement deadline scheduling for large data copies

· Google SDN Experience:

o Much faster iteration: deployed production quality centralized traffic engineering in 2 months

§ Fewer devices to update

§ Much better testing prior to roll-out

o Simplified high-fidelity test environment

o No packet loss during upgrade

o No capacity loss during upgrade

o Most features don’t touch the switch

o Higher network utilization

o More stable

o Unified view of entire network fabric (rather than router-by-router view)

o Able to implement:

§ Traffic engineering with higher quality of service awareness and predictability

§ Latency, loss, bandwidth, and deadline sensitivity in routing decisions

o Improved routing decisions:

§ Based upon a priori knowledge of network topology

§ Based upon L1 and L3 connectivity

o Improved monitoring and alerts

· SDN Challenges:

o OpenFlow protocol barebones but good enough

o Master election/control plane partition challenging to handle

o What to leave on router and what to run centrally?

o Flow programming can be slow for large networks

· Conclusions:

o OpenFlow is ready for real world use

o SDN is ready for real world use

§ Enables rich feature deployment

§ Simplified network management

o Googles Datacenter WAN runs on OpenFlow

§ Largest production network at Google

§ Improved manageability

§ Lower cost

A video of Urs’ talk is available at: OpenFlow @ Google

James Hamilton
e: jrh@mvdirona.com
http://blog.mvdirona.com / http://perspectives.mvdirona.com