How Web browsers function

Cover Image for How Web browsers function
Shaun Michael Stone
Shaun Michael Stone

What is this video about?

This video explains the process behind a web browser. What happens when you enter a URL and press enter, and how does the browser know how to present the page to you? This is all explained, as well as all the components that make up a web browser.


How Web browsers function


We use web browsers every day to display web pages, but have you ever wondered what is happening behind the scenes? This is the basic flow of viewing a web page: you send a request over the network to a server, it responds with a collection of web content as a response. Your browser interprets the content returned and displays the page. Let’s look at a high level structure of a browser and the components it uses to accomplish this:

The user interface is what’s presented to the user to interact with. It displays to you the address bar, back and forward buttons and any other visual element you can interact with, such as tabs. A browser has a rendering engine that is responsible for displaying the visual representation of the webpage. Think of the rendering engine as a painter working on a blank canvas. It’s his responsibility to construct the page by applying the right structures and colours.

The engine takes in HTML and CSS documents, then displays its interpretation of both. HTML exists to markup our content, and CSS is used to style and animate our content. The browser engine acts as a marshal who directs actions between the User Interface and the rendering engine, as well as external communication with servers.

To receive content for a web page, the browser has to communicate over the network, asking for all the necessary images and documents that make up the page. You’ve probably encountered a situation where an image is missing on the page, this usually means the network failed to fetch the image from the server. ((demo ajax request in animation))

To apply interactive logic and functionality to our website, we rely on a programming language called JavaScript. The browser has no idea how to deal with JavaScript directly. It’s like a person who only knows Spanish, but is trying to listen to someone speaking Chinese. We need a way to translate the communication, and this is done with an interpreter. Browsers have their own JavaScript interpreters: Chakra is used by Microsoft Edge, SpiderMonkey for Firefox and V8 is used by Google Chrome.

You also have something called data storage such as Cookies and Local Storage. This helps us retain state even when you refresh the page. You’ll find cookies are used to remember bits of information such as your name.

The rendering engine can render images, videos, SVG files, audio files, but by default; displays HTML and XML documents; types of markup language. These documents as you can see are constructed of tags. Not all browsers use the same rendering engine. This is why you sometimes see inconsistencies with how things look from browser to browser. Chrome and Opera use an engine called Blink, Safari uses WebKit and Firefox uses Gecko. These engines have their own implementations of how to render the page, but all tend to follow the same flow.

It’s the responsibility of the network layer to provide the rendering engine the requested document. Firstly, the rendering engine reads the HTML and constructs a DOM content tree, this stands for ‘Document Object Model’; an object representation of the HTML document. The DOM tree is made up of DOM nodes. Nodes can be images, text blocks, buttons or any other element.

All the CSS styling associated with these nodes are parsed by the engine. Now with the styling information and visual instructions, a new tree can be created; the render tree. Once this has been constructed, it then goes through a layout process where each node is positioned on the screen with coordinates. The render tree is then traversed, with each node painted using the UI backend layer. The process happens so fast, you can’t see every node get rendered on the page, hence why the whole page’s entire content is visible when the page has finished loading.

When the parsing process has finished, the browser will mark the document as interactive, allowing you; the user, to interact with the nodes on the page.


Assets Used

Please share if you enjoyed it