Navigating Google's Cloud Ecosystem

In 2009, Google experienced a significant security breach known as Operation Aurora. During this attack, hackers gained access to Google's internal network by enticing an employee to click on an unsecured link, which subsequently uploaded malware. While Google effectively resolved the specific incident, its repercussions continue to shape many of the practices that govern our interactions on the internet today. It also underscored the pivotal role technology companies play in safeguarding users and their businesses by fortifying their infrastructures.

Heather Adkins, VP of Security Engineering at Google, commented on the incident:

"It's not a surprise that we would see governments hacking each other. I think it's a little bit of a surprise to us when we saw attacks happening against private companies, against companies that were enabling business online, helping students learn, helping people express themselves."

Introduction: Balancing Ease of Use, Security, and Performance

Creating a secure and reliable internet infrastructure relies on a delicate balance of practices and technologies. Among other things, the current Google approach prioritizes Ease of Use, Security, and Performance. When a system is user-friendly, errors are minimized; when it's error-free, it is more secure, and performance can be optimized. Google places a strong emphasis on ease of use, performance, and security in its services to ensure that users swiftly obtain accurate results from trusted sources.

In this context, let's explore what happens when you type "google.com" and hit enter in a Google Chrome browser.

Where Do You Type It?

Google Chrome was the pioneer in introducing the Omnibox. Prior to this innovation, users had to choose between either, searching or entering a URL. The omnibox merged these functions into one, enhancing user convenience in a multifunctional window. Users can simply type anything into the omnibox, and the JavaScript engine determines the appropriate action, whether it's searching or navigating to a URL. Moreover, Chrome's powerful V8 JavaScript engine enables the omnibox to perform additional functions like acting as a calculator, calendar, or even a gaming platform.

The JavaScript engine also powers autocomplete, drawing from the user's history, location, cookies, and popular searches to provide a seamless and accurate typing experience.

The URL

From the moment you type a URL eg "google.com," the security procedures begin. Google has taken measures to secure closely related domains like "googl.com" and "gogle.com" to prevent users from falling victim to typosquatting attacks. After you press the enter key, the browser adds the "https://" protocol to the URL for a secure link.

The browser initiates a DNS (Domain Name System) query to find the IP address of the web server host. In this process Frequently visited sites may be cached in the user's browser, operating system, or router for quicker access. If not found in these caches, a recursive lookup is initiated, eventually reaching an authoritative Name Server. Google operates its own authoritative Name Server, ensuring a speedy and reliable resolution of the URL.

Once the ip that corresponds to the domain name is found, the browser can begin the communication to the web server responsible for handling the request.
The connection from the client to the Web server can be simplified as in the illustration below

This connection is made possible by the worldwide interconnection of computers.
In this process, the request is sent from the client's browser, to the ISP and from the ISP to the internet exchange point where it joins the global traffic. However, this being a request to Google, things are slightly different due to their expansive network.

The Connection

Internet Service Provider (ISP)

The ISP is responsible for routing the request to the appropriate destination using proxy servers. It does this by forwarding the request to the regional IX (Internet Exchange) point where traffic is exchanged. An IX is a pivotal network infrastructure component where multiple internet service providers interconnect to exchange data. A Point of Presence (PoP), typically a strategically located data center, serves as an intermediary link between the ISP and the broader internet infrastructure, including the global network maintained by a company like Google. At the PoP, the user's request is interconnected with a larger network of internet traffic. The PoP essentially functions as a junction where various ISPs and network providers meet to exchange data efficiently. This interconnection helps streamline the flow of data and reduces latency by making content available closer to end-users.
Static content that's very popular with the local host's user base, including YouTube and Google, can also be temporarily cached on edge nodes(Google Global Cache) within the ISP's servers and thus can be rapidly accessed without leaving the ISP network.

Google's Network

Google's Edge Node servers, strategically distributed across the globe, serve as the initial point of contact for users. These servers are integral to Google's Content Delivery Network (CDN) and play a crucial role in optimizing the user experience. At the edge, Google employs advanced security measures, including firewalls, to ensure data security and protect against various online threats such as DDoS attacks. These security mechanisms are a critical part of Google's commitment to safeguarding its network and users' data. Additionally, Google's Edge Node servers utilize advanced load balancing techniques to efficiently distribute incoming user requests among a network of backend servers. This approach ensures that resources are allocated optimally and that user requests are processed with minimal latency, even during high-demand periods. As a result, users benefit from fast and reliable access to Google services, with a focus on reducing latency and maintaining uninterrupted service availability, all based on publicly available information and industry best practices.

Click here to view a map of the Google Cloud Infrastructure.

The Load Balancer

Google handles an astonishing 8.5 billion daily searches on its website, demanding a high-performance, low-latency infrastructure. To meet this colossal demand, Google employs a custom-distributed packet-level load balancer known as Maglev.

Maglev differs from other traditional hardware load balancers in a few key ways:

It effectively spreads all packets directed to a specific backend server IP address across a pool of Maglev machines using Equal-Cost Multi-Path (ECMP) forwarding. This design allows Google to enhance Maglev's capacity by simply adding more servers to the pool. Even more importantly, this uniform packet distribution improves redundancy modeling, with Maglev's N + 1 approach, boosting system availability and reliability compared to conventional load balancing systems that typically rely on active/passive pairs for 1 + 1 redundancy.
Maglev is a uniquely Google-developed solution, offering full end-to-end control. This control empowers Google to experiment and iterate rapidly, ensuring the system adapts to their evolving requirements seamlessly.
In Google's data centers, Maglev operates on commodity hardware, simplifying deployment and making it an economical choice for their massive infrastructure.

Traffic is directed to the Load Balancers using the Anycast network addressing and routing methodology. This technique allows data packets from a single sender to be routed to the nearest node within a group of potential receivers, all identified by the same destination IP address through the use of Virtual IPs. Every Google service depends on one or more Virtual IP addresses (VIPs), which differ from physical IPs as they aren't tied to specific network interfaces but instead are served by multiple service endpoints within Google's load-balancing system. Sometimes, Maglev may need to process a packet destined for an anycast VIP for a client located closer to another front-end site. In such cases, Maglev forwards the packet to another Maglev on a machine situated at the nearest front-end site for efficient delivery.

ECMP

Equal-Cost Multi-Path (ECMP) routing is a network routing technique where multiple paths of equal cost are available to reach a destination. Instead of picking just one path for traffic, ECMP spreads the load across these equal-cost paths. This helps in improving network efficiency, reducing congestion, and enhancing fault tolerance because if one path fails, traffic can seamlessly switch to another without disruption. In essence, it's like having multiple lanes on a highway, and traffic is evenly distributed to avoid congestion and ensure reliability

The Destination: HTTP Daemon

HTTPd, short for HTTP daemon or HTTP server, is a software application designed to manage incoming HTTP requests from clients, typically web browsers. It serves web pages, files, and various resources in response to these requests. In simpler terms, it's the software responsible for hosting and delivering websites and web applications on the internet.

In the realm of HTTP servers, there are several options to choose from. Some of the most popular ones include Apache HTTP Server (Apache), Nginx, Microsoft Internet Information Services (IIS), and Google's customized web server, known as Google Web Service (GWS). Google has developed GWS to meet its unique scale and performance requirements, allowing them to fine-tune the server for efficiency, scalability, and seamless integration with their services. This customization also enables Google to implement features tailored to their specific applications, offering capabilities that may not be available in off-the-shelf web servers. Additionally, developing in-house solutions grants Google greater control over security and the ability to implement proprietary optimizations for enhanced service performance.

The Response

Once the response is prepared, it's encapsulated into an IP packet, with the source address set as the VIP and the destination address as the user's IP. A technique called Direct Server Return (DSR) streamlines the return traffic from load-balanced servers, allowing it to bypass the load balancer and go directly to the router. This optimizes the handling of returning packets and avoids unnecessary traffic through the load balancer.

Subsequently, the user's web browser takes over, rendering the web page using the information from the response, thereby completing the communication cycle. Additionally, cookies and tokens are often included in the response header to facilitate session tracking and management.

References

What happens when https://github.com/alex/what-happens-when
Google mitigated the largest DDoS attack to date, peaking above 398 million rps
https://cloud.google.com/blog/products/identity-security/google-cloud-mitigated-largest-ddos-attack-peaking-above-398-million-rps
How DNS works https://www.cloudflare.com/learning/dns/what-is-dns/
Google's network infrastructure https://peering.google.com/#/infrastructure
Google Cloud Load Balancing https://sre.google/workbook/managing-load/
Maglev Research Paper https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf
Map identifying aspects of Google's network and server infrastructure. https://www.google.com/maps/d/viewer?mid=1nXSNhvDo5jaSS1h9gFuqQnRNIqg&ll=-6.90207176296886%2C-49.606406191355404&z=2
What is an Internet exchange point? https://www.cloudflare.com/learning/cdn/glossary/internet-exchange-point-ixp/
Doing our part: How Google’s network helps internet content reach users https://cloud.google.com/blog/products/infrastructure/google-network-infrastructure-investments
Google Data Center FAQ https://www.datacenterknowledge.com/data-center-faqs/google-data-center-faq

A Blog by Victor