Rounak Kumar GunjanOct 05, 2021 12:02:08 IST
Facebook faced what is being said as one of the largest outages ever last night, with users not being able to use the service for hours on end. The crippling outage at Facebook, WhatsApp, Instagram, Facebook Messenger, and more Facebook services occurred because of a problem in the company’s domain name system, a relatively unknown but crucial component of the internet.
Facebook has said that the outage was due to a configuration change to its routers, and users have nothing to worry about. “The underlying cause of this outage also impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem,” the company said in a blog post. Facebook said that the configuration changes on the routers that coordinate network traffic between its data centres caused issues that interrupted this communication. Basically, Facebook’s machines weren’t able to talk to each other.
Before Facebook’s official statement, web infrastructure and website security company Cloudfare also detailed what caused the issue. In a blog post, Cloudfare said that “Facebook and related properties disappeared from the Internet in a flurry of Border Gateway Protocol (BGP) updates.” The problems began with a routine BGP update that went wrong, wiping out the DNS routing information that Facebook needs to allow other networks to find its sites.
Before understanding what these technical terms like DNS or BGP stand for and means, we tried to break down the problem into simpler words. Let’s start with the basics:
What is DNS and what went wrong with it?
According to a report by Bloomberg, DNS is like a phone book for the internet. It’s the tool that converts a web domain, like Facebook.com, into the actual internet protocol, or IP, address where the site resides. Think of Facebook.com as the person one might look up in the white pages, and the IP address as the physical address they’ll find.
On Monday, a technical problem related to Facebook’s DNS records caused outages. When a DNS error occurs, that makes turning Facebook.com into a user’s profile page impossible. That’s apparently what happened inside Facebook, but at a scale that’s temporarily crippled the entire Facebook ecosystem.
Not only were Facebook’s primary platforms down, but also their internal applications, including the company’s own email system. Users on Twitter and Reddit also indicated that employees at the company’s Menlo Park, California, campus were unable to access offices and conference rooms that required a security badge. That could happen if the system that grants access is also connected to the same domain, Facebook.com.
Now what is BGP?
The same report by Bloomberg also states that the problem at Facebook Inc. appeared to have its origins in the Border Gateway Protocol or BGP. If DNS is the internet’s phone book, BGP is its postal service. When a user enters data on the internet, BGP determines the best available paths that data could travel.
Minutes before Facebook’s platforms stopped loading, public records show that a large number of changes were made to Facebook’s BGP routes, according to Cloudflare Inc.’s chief technology officer, John Graham-Cumming, in a Tweet.
While the BGP snafu may explain why Facebook’s DNS has failed, the company hasn’t yet commented on why the BGP routes were withdrawn early on October 4.
Are Facebook services back up?
Yes. Most parts of Facebook’s services including WhatsApp, Instagram, and Facebook are back up and running. While we are being able to use all three services flawlessly at the time of writing this article. Facebook also says that its systems are all back up and running. Even the company’s WhatsApp Twitter handle has said that the instant messaging platform is back and running at 100 percent.