Ali Naqvi

Learning C as the first programming language

Fri, 24 Oct 2025 00:00:00 GMT

The question of which programming language is the best beginner language is a hotly debated one. When I was researching which language I should learn, the best advice I came across was from David Kopec, who advises to think about the following three things in this order of importance:

More important than the choice of language is to pick a language and stick with it for a while.
Choose a language that is popular enough and has a welcoming community so that it has a lot of resources available to learn from.
The last thing to think about is to pick a language that is popular in the niche that you want to eventually develop for (i.e., if you already have a specialty in mind, e.g., web, mobile, ML, OS, etc.).

Popular languages

Let’s first look at the popularity of the top programming languages according to GitHub’s Octoverse 2024:

Different measures of popularity will give different rankings, but this picture gives us a great overview of the past 10 years in programming languages. I think five of the most commonly suggested programming languages to learn when you start out are:

Python
JavaScript
Java
C
C++

Looking at the above picture, it’s no surprise that the most common suggestion these days for the best first language is Python. It has a relatively simple syntax, can be used for a variety of different applications, and is widely and increasingly used in the industry. Python is really not a bad choice at all and I almost went for it, especially because I was starting out during the GenAI hype, which massively boosted the Python hype. Even before the latest AI boom, Python’s popularity was growing. But now it has overtaken JavaScript as the most used language on GitHub.

JavaScript has a massive developer base and has been the most popular programming language for many years, mainly thanks to it being the language of web browsers. JavaScript also gets a fair bit of criticism due to its perceived fundamental design flaws that go back to its history of originally being developed in just 10 days. Also, having such a simple and flexible syntax, it is very easy to write bad JavaScript code, and there is in fact a lot of bad JavaScript code out there. This is another major reason it is criticized so much. It is accused of promoting bad coding practices.

Java was once the rising star of programming languages due to it being platform-independent and its use in the Android ecosystem. Its popularity has taken a hit in recent years, mainly because Kotlin has become the preferred language for Android app development. Java is still a very popular and very important programming language, especially because many large corporations have built their backend systems in it.

C is the oldest programming language in our list (and in most lists of programming languages these days). More than 50 years after it was first created, it continues to rank among the top 10 most popular programming languages. C still powers an enormous amount of infrastructure in the world, ranging from the largest supercomputers to the smallest microcontrollers and embedded systems. It is a relatively low-level programming language. That means C gets you as close to the hardware as possible, short of using assembly language. It also means it leaves a lot to the programmer instead of automatically taking care of basic things like memory management for you. This makes it a somewhat “difficult” language in the world of today’s higher-level programming languages. Even though the language itself is quite small, C makes it very easy to make serious mistakes with consequences like crashing the system. So one always needs to be extra-careful when programming in it. And this is probably the main reason C has recently been falling out of favor, as newer languages like Go and Rust offer the performance and power of C without its drawbacks. But there is only so much you can expect from a 50-year-old programming language. Still, the case for C as the first programming language is strong due to its unparalleled educational value. More on this in a bit.

C++ was created to be an improvement upon C and to be its object-oriented version. When C++ was originally developed, all C code could be seamlessly run with C++, but that is no longer the case. The two languages have diverged quite a bit in the ensuing decades, even though they are still often mentioned as C/C++, as if they are almost the same. Unlike C, C++ has evolved to become a behemoth of a language, with so many features that it can be overwhelming. So you will often hear things like “C++ is a kitchen sink language” or that “nobody really is an expert in C++”. It gets a lot of flak for this and many other reasons. For example, Linus Torvalds (the creator of Linux and Git) famously said:

C++ is a horrible language… the choice of C is the only sane choice… C++ leads to really really bad design choices.

In any case, C++ is one of the most popular languages today, finding applications in a variety of niches including systems development, desktop applications, video games, embedded systems, servers, and more. C++ provides a combination of:

C-like performance and low-level access to hardware, and;
non-C-like higher-level features and ease of use.

More on C

Due to both historical accidents and C’s merits, C has stood the test of time. Despite being so powerful, C is in fact a very small language and it is easy to quickly get up to speed with it (it only has 30-50 keywords, depending on which standard you are looking at).

Famously, C makes it very easy to shoot yourself in the foot because it leaves all the onus of higher-level functionality on the programmer. To quote directly from the legendary book The C Programming Language by Brian Kernighan and Dennis Ritchie (the latter being the creator of C):

C provides no operations to deal directly with composite objects such as character strings, sets, lists or arrays. There are no operations that manipulate an entire array or string, although structures may be copied as a unit. The language does not define any storage allocation facility other than static definition and the stack discipline provided by the local variables of functions; there is no heap or garbage collection. All of these higher-level mechanisms must be provided by explicitly called functions. Most C implementations have included a reasonably standard collection of such functions.

Although the absence of some of these features may seem like a grave deficiency, ("You mean I have to call a function to compare two character strings?"), keeping the language down to modest size has real benefits. Since C is relatively small, it can be described in small space, and learned quickly. A programmer can reasonably expect to know and understand and indeed regularly use the entire language.

A lack of garbage collection in C is the most infamous of these deficits. Dr. Charles Severance (better known as Dr. Chuck) notes that:

The lack of garbage collection feature in C is both one of the great strengths of the language and at the same time is likely the reason that the average programmer will never develop or maintain a major C application during their career.

Even though these things were left out deliberately, they are more than just frustrating for the modern programmer. They can actually lead to security vulnerabilities and issues with system stability if not handled correctly. So there is a good reason C is being gradually replaced by other languages that came after it.

Why C

Despite the problems mentioned above, C turned out to be the perfect language for me to learn as my first language. Its educational value is unparalleled:

Almost all modern programming languages are directly or indirectly influenced by C, so learning C gives you a huge leg up if you want to be generally good at programming.
When you learn a “real life” language (that you will be using day to day) after learning C, you will be in a much better position to grasp the nuances of the said language.
Learning C helps to deeply understand how computers work under the hood.
Helps you learn good coding practices.

Let’s look at each of these points in more detail.

The mother of modern programming languages

C is the de facto lingua franca of programming languages. Due to the success and ubiquity of C, most programming languages today are either directly or indirectly inspired by it. The procedural, imperative style of C, with its curly braces, semicolons, and familiar control structures (if, for, while, switch), is the direct ancestor of the syntax used in a massive family of popular languages. The list of C-family programming languages contains over 70 programming languages that share significant features with it!

Here are some prominent modern programming languages that borrow aspects of C:

C++: A direct descendant of C (originally called "C with Classes").
Java: Heavily borrowed C/C++ syntax to make it familiar to C programmers.
C#: Microsoft's answer to Java. It adopted the C-family syntax that Java popularized but is built on its own .NET framework.
Go: A spiritual successor to C, developed at Google. It aims to capture C's simplicity and performance for systems programming but modernizes it.
Rust: A modern systems language designed to be a "safer C."
JavaScript: Adopted C-style syntax (curly braces, semicolons, for loops) to have a familiar feel.
PHP: A web-focused scripting language whose syntax is heavily modeled on C.
Swift: Apple's successor to Objective-C (which is a strict superset of C). Swift can import any C library.

In addition, languages like Python and Ruby, which have a very different syntax from C, have their primary implementations written in C. This allows them to easily interface with high-performance libraries written in C.

Preparation for learning a “real” language

If you want to be a language-agnostic programmer, learning most languages becomes dramatically easier if you know C, as you can focus on their unique features (like classes in C++, the garbage collector in Java, or async/await in JavaScript) instead of wrestling with basic syntax.

While C is very much a "real" language used in critical systems, the spirit of this point is that it prepares you for the languages you'll likely use for application development (e.g., C++, Java, C#, Go, Rust, JavaScript). Learning C first is like learning Latin to understand Romance languages; you grasp the roots from which so much has grown.

When you move from C to a language like Rust, you'll have a profound appreciation for its "borrow checker" because you've personally experienced the memory-related bugs it prevents. When you use a garbage-collected language like Go or Java, you'll understand the trade-offs being made — convenience at the cost of some performance overhead and non-deterministic cleanup — because you've had to manage memory manually. This deep context allows you to grasp the why behind a language's design, not just the how of its syntax.

Seeing under the hood

Modern programming languages and frameworks are built on layers of abstraction designed to maximize developer productivity. While this is powerful, it can leave programmers with a fragile, incomplete mental model of how a computer actually works. C tears away these layers.

Direct memory management is the single most important concept C teaches. You are responsible for every byte of memory you allocate on the heap using malloc() and for releasing it with free(). Forgetting to free() memory leads to memory leaks. Using memory after it has been freed leads to crashes. This forces you to understand the fundamental difference between the stack (for local variables, managed automatically) and the heap (for dynamically allocated data, managed by you). This knowledge is invaluable for debugging complex performance issues in any language.

Pointers are arguably the most feared but also the most powerful feature in C. A pointer is simply a variable that holds a memory address. By using pointers, you are no longer working with abstract data but with the actual locations in memory where that data lives. This teaches you: how data is laid out in memory; the difference between passing data "by value" (a copy) and "by reference" (a pointer to the original); and how to build complex data structures like linked lists, trees, and hash tables from scratch, giving you a visceral understanding of how they work.

C is often called a "high-level assembly language." It provides very little abstraction over the underlying hardware. When you are working with text (strings) in C, you are forced to consider how letters and words are stored in memory: each character in a string occupies a contiguous byte of memory, ending with a special character (the null-terminator) that signals the end of that string. When you are working with numbers, C forces you to think about how much space (in bytes) each number takes in memory, with different implications depending on how large you want the value of the number to be. When you work with arrays, you understand they are just contiguous blocks of memory. This proximity to the metal gives you an intuition for how your code will be executed by the CPU, making you a far more effective performance tuner and low-level debugger.

Internalizing good coding practices

Starting with a language that holds your hand can instill habits that are inefficient or unsafe in the long run. C, by contrast, has no safety net. It is a strict but fair teacher that forces you to become a disciplined, mindful programmer.

For example, in Python, you can create a list and append() to it indefinitely. The interpreter handles all the memory allocation, resizing, and garbage collection behind the scenes. This is convenient but hides the computational cost. A new programmer might not realize that appending to a list a million times could trigger multiple expensive reallocations and copies of the entire data structure.

To do the same in C, you must manage a dynamic array yourself. You have to track its size and capacity. When size equals capacity, you must allocate a new, larger block of memory with realloc(), copy the old data over, and free the old block. This process forces you to be mindful of resources from day one. You learn to think about efficiency not as a premature optimization but as a core part of writing good code.

In C, many standard library functions signal errors by returning NULL or -1. If you try to open a file and it doesn't exist, fopen() returns NULL. If you don't explicitly check for this NULL value and try to use the file pointer, your program will crash with a segmentation fault. This forces you into the habit of defensive programming: always check return values, anticipate failure modes, and handle errors gracefully. This is a hallmark of professional, production-ready code that is often neglected when a language's exception-handling system makes it easy to be lazy.

Because nothing is done for you automatically, every line of C code is more deliberate. You choose unsigned int over int for a reason. You decide whether a function parameter should be a pointer or a copy. This constant need to make low-level decisions builds a powerful mental muscle, leading to code that is more precise, efficient, and intentional, regardless of the language you ultimately work in.

Conclusion

In conclusion, while you may not write C every day in a web or mobile development job, the lessons it imparts are universal and timeless. Learning C is an investment in your fundamental understanding of computation itself. It makes you a better problem-solver, a more insightful debugger, and a more disciplined engineer — the kind of programmer who doesn't just know how to use a tool but understands how the tool works.

Writing an Nginx-like web server from scratch in C++

Thu, 21 Aug 2025 00:00:00 GMT

Overview of HTTP

When you enter a website name in a browser's address bar, it first gets translated into an IP address via a process called DNS resolution. This IP address denotes the public address of a computer. A public IP address is the internet equivalent of the address of a physical building — anybody can send letters to this address. The browser then sends a message to this IP address asking for a website, i.e., asking for the resource at that location (URL) — the resource is usually an HTML document. A computer can only receive messages on an IP address if a program is "listening" for them. That program is the web server. A server is simply a computer program whose job is to continuously listen for incoming requests, "understand" (parse) those requests, and respond accordingly. In common usage, the word "server" usually refers to the computer (hardware) because that computer's entire purpose in life is to respond to requests from the web. But it is the server program/application (the software) that truly deserves the name "server" because any computer can be a server, including your laptop and your phone, and you can still do other things on that computer while that server runs in the background.

In our example of entering a website name in the browser, the server may simply send back a default file (called an index file, usually an HTML file), which we can say is the homepage of that website. This is just the beginning of the conversation between your browser and the server. When the browser parses that HTML file to display it to you, it usually finds other resources mentioned in the HTML (e.g., images, CSS files, JavaScript, etc.), so it sends individual requests to the server for each of those resources (files). Then, as the server sends the requested files one by one, the browser displays them on the page as they are received. These days, all of this (and more!) happens in a fraction of a second. Your interaction with the page will then determine the next cycle of messages being sent back and forth between the browser and the server. In most of the messages that the browser sends, it is requesting a file (e.g., an HTML file or an image), but the browser can also send data for the server to save, e.g, when you upload an image or fill out a form. In addition to simply serving files saved on the server (called static files), it is also possible for a server to generate files on the fly based on some pre-programmed logic, e.g., based on user interaction with the website. A server can also delegate some of its tasks to other programs running on the same machine. E.g., it can give a received request to a Python program and then send the user the output of the Python program.

The "language" that the browser and the server use to communicate with each other is called HTTP, which stands for Hypertext Transfer Protocol. Just like any other communication protocol, it defines the rules of the communication. A human example (somewhat dated) is the protocol of saying "over" after finishing sending a message on a radio transceiver, such as in military radio communication or on walkie-talkies, if you are old enough to remember those! The HTTP protocol was invented in the early 1990s by Tim Berners-Lee, who is the inventor of the broader World Wide Web and HTML (he did so while working at CERN, a European nuclear research agency!). Today, the Internet Engineering Task Force (IETF) is the body that is responsible for standardizing the protocol. The legendary HTTP/1.1, the specific version of HTTP first published in 1997, remained the most widely used HTTP protocol on the web for almost two decades. HTTP/2, published in 2015, has now largely overtaken it, and HTTP/3 is quickly gaining support, thanks to the performance benefits of each subsequent generation. HTTP/1.1 used a simple plain-text format, which made it very human-readable — somewhat surprisingly, because it could easily have been more cryptic, similar to assembly language or even binary (HTTP/2 and HTTP/3 utilize binary encoding for efficiency).

A concrete example of HTTP communication

Let's take a concrete example to better understand HTTP. Let us visit info.cern.ch, the first website ever created!

When you enter info.cern.ch into the browser, a DNS lookup takes place to resolve info.cern.ch into an IP address, which happens to be 188.184.67.127 for this website (this is an IPv4 address). The browser now knows which building (as it were) to send the message to. It actually sends a message to the IP:port combination 188.184.67.127:443 — the part after the : is called the port. If the IP address were an address of a building, the port would be the specific apartment number within the building. 443 is the standard port used for HTTPS requests — the 'S' stands for Secure and signifies that the data sent back and forth will be encrypted for added security. In the past, using the less secure version of HTTP, the browser would have used 188.184.67.127:80, where 80 denotes the standard HTTP port.

The exact HTTP request that the browser sends depends on things like your specific browser and operating system, but it would be something like this when you visit info.cern.ch:

GET / HTTP/1.1
Host: info.cern.ch
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0
Accept: */*

The first line is the most important: It uses one of the HTTP methods, GET, to request the resource at location / (denoting the root of the domain).

Here is what the server responds with:

HTTP/1.1 200 OK
Date: Tue, 15 Jul 2025 19:09:50 GMT
Server: Apache
Last-Modified: Wed, 05 Feb 2014 16:00:31 GMT
ETag: "286-4f1aadb3105c0"
Accept-Ranges: bytes
Content-Length: 646
Connection: close
Content-Type: text/html

<html><head></head><body><header>
<title>http://info.cern.ch</title>
</header>

<h1>http://info.cern.ch - home of the first website</h1>
<p>From here you can:</p>
<ul>
<li><a href="http://info.cern.ch/hypertext/WWW/TheProject.html">Browse the first website</a></li>
<li><a href="http://line-mode.cern.ch/www/hypertext/WWW/TheProject.html">Browse the first website using the line-mode browser simulator</a></li>
<li><a href="http://home.web.cern.ch/topics/birth-web">Learn about the birth of the web</a></li>
<li><a href="http://home.web.cern.ch/about">Learn about CERN, the physics laboratory where the web was born</a></li>
</ul>
</body></html>

The first line contains 200 OK, denoting a successful response. 200 is one of the HTTP response status codes, which are sent by the server to succinctly communicate what happened to each request. After the first line, all lines before the first empty line are the HTTP headers, and everything after that is the response body (or payload). The response body, in this case, is the contents of a very simple HTML file and is what your browser will render.

On this page (https://info.cern.ch), if you click on another link such as "Browse the first website", the browser will send another request to the server, very similar to the previous one except that the / will be replaced by a different resource location (specified in the URL that we just clicked):

GET /hypertext/WWW/TheProject.html HTTP/1.1
Host: info.cern.ch
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:140.0) Gecko/20100101 Firefox/140.0
Accept: */*

The server again sends the 200 OK response along with the contents of the file at the requested location. And so the communication between the browser and the server continues.

Nginx

So, to deploy/host a website, one needs to use a server that will serve the files related to that website. Today, the majority of the world's websites are served using one of two open-source server applications — Nginx (pronounced "engine x") and Apache. According to W3Techs, as of the date of writing, Nginx is used by 33.8% of all websites, while Apache is used by 27.6%. These days, many websites are being deployed "serverless", which is a misnomer because servers are, of course, still involved, just abstracted away and managed by large cloud service providers (hyperscalers) such as AWS, instead of being directly managed by the website owner.

In order to understand how a server works, one of the best ways to get started is to get familiar with one of these commonly used open-source web server applications. Setting up an Nginx or Apache web server is rather straightforward. You can turn any computer (that is running a supported operating system) into a server. For the base case, Nginx and Apache don't differ much, so for simplicity, let's focus on one of them: Nginx. You need to run only a couple of commands to get a "Hello World" server up and running.

The following should work on any Debian-based Linux distribution, such as Ubuntu. To install Nginx:

sudo apt install nginx

Start Nginx service (runs in the background):

sudo service nginx start

Now Nginx server is already running and serving a default HTML file.

At any time, you can check the status with:

sudo service nginx status

Go to http://localhost/ in your browser, and you should see the "Welcome to nginx!" default page. This is simply the contents of this HTML file: /var/www/html/index.nginx-debian.html.

You can add an HTML file named index.html in the same directory, so it will be served instead of the default file.

sudo chown -R $USER:$USER /var/www/html # Give yourself ownership of the directory to prevent "permission denied" messages
echo "<h1>Hello world</h1>" > /var/www/html/index.html # Create an index.html with the contents "<h1>Hello world</h1>"
sudo service nginx restart # Restart Nginx for the changes to take effect

Now http://localhost/ should be serving the "Hello world" page. You can change that to any HTML content and that will be served on the browser. localhost is an alias for the "loopback" IP address (usually 127.0.0.1). When you visit localhost, it is as if you were visiting your site from another computer by typing in the public IP address and port of the computer on which the server is listening (the latter requires messing with various router and firewall settings and can quickly lead to security issues).

Nginx uses a configuration file to control all aspects of how the server will behave and what it will serve. By default, the configuration file being used is /etc/nginx/sites-enabled/default. Here is a simplified version of it:

server {
    listen 80; # Listening on port 80 on all IP addresses of the computer

    root /var/www/html; # The directory in which all the website files will live

    index index.html index.htm index.nginx-debian.html; # The file to serve (first one available in this order) when someone requests the "homepage"

    server_name _; # Sets names of a virtual server such as "example.com". This name is matched with the `Host` header in incoming HTTP requests
}

Writing the server in C++

Now that we have an understanding of the HTTP protocol as well as the behavior of Nginx, the most popular server application, we can start writing our own server. We will implement the HTTP/1.1 protocol.

The C++ program will take a configuration file as an argument (an example compatible with our program is provided here). This config file contains all the settings of our server.

Here is approximately what our C++ program will need to do:

Read and parse the config file and check that it is valid.
Save the input configuration in suitable data structures for easy retrieval as requests arrive.
Indefinitely wait for new connections to arrive on the IP:port combination(s) specified in the config file.
When new connections arrive, queue them suitably and accept as many as the capacity allows.
For each accepted connection that has sent a request, parse the message and prepare a suitable response.
Send the response and wait for further messages and further connections.

Overall, in our code, we aimed to follow the best practices of modern C++ (we used the C++17 standard, since that is the latest one allowed at our campus). We never performed any manual memory allocation using new/delete, but instead used smart pointers (std::unique_ptr). Since lower-level file operations were inevitable (modern C++ does not provide an alternative to C networking APIs yet), we had to use manual open() and close() for file descriptors, but we made sure to use RAII principles (the object destructor is responsible for closing file descriptors). Other modern features include the use of the Filesystem library (std::filesystem) for performing file system operations such as path joining, checking the existence of a file or a directory, creating a directory, changing working directory, etc.

Config file

Our program accepts a simplified version of the Nginx config file.

Various directives are allowed in the configuration file. Directives are divided into simple directives and block directives. A simple directive consists of the name and parameters separated by spaces and ends with a semicolon (;). A block directive has the same structure as a simple directive, but instead of a semicolon, it ends with a set of additional instructions surrounded by braces ({ and }). If a block directive can have other directives inside braces, it is called a context (examples: server and location contexts).

Directives placed in the configuration file outside of any contexts are considered to be in the "global" context. Unlike Nginx, there is no http context for our program. server directives must be in the global context. Comments are allowed: the rest of a line after the # sign is considered a comment.

Here is a more detailed description of the directives allowed in each of the three allowed contexts:

1. "Global" context:

At least one server block is mandatory in the global context.

Allowed directives inside global context:

Directive	Multiple allowed	Duplicates allowed	Is optional	Default value	Number of arguments
`server`	yes	yes (ignore)	no	-	n/a (block)
`root`	no	no	yes	inherited or "html"	1
`index`	yes	yes	yes	index.html	no limit
`error_page`	yes	yes	yes	inherited or none	no limit (at least 2, last is URI)
`autoindex`	no	no	yes	inherited or off	1 (on/off)
`client_max_body_size`	no	no	yes	inherited or 1m	1

2. server context:

The first server block defined for a given IP:port combination becomes the default server for that IP:port. If multiple servers have the same server_name, it is allowed, but a warning will be issued that they will be considered one server (i.e., all the subsequent duplicate servers will be ignored).

Allowed directives inside server context:

Directive	Multiple allowed	Duplicates allowed	Is optional	Default value	Number of arguments
`listen`	yes	no	yes	INADDR_ANY:http	1
`server_name`	yes	no	yes	""	no limit
`root`	no	no	yes	inherited or "html"	1
`index`	yes	yes	yes	index.html	no limit
`error_page`	yes	yes	yes	inherited or none	no limit (at least 2, last is URI)
`location`	yes	no	yes	/ {}	block (see below)
`autoindex`	no	no	yes	inherited or off	1 (on/off)
`client_max_body_size`	no	no	yes	inherited or 1m	1
`cgi_handler`	yes	no	yes	-	2 (ext and interpreter)

3. location context:

Allowed directives inside location context:

Directive	Multiple allowed	Duplicates allowed	Is optional	Default value	Number of arguments
`root`	no	no	yes	inherited or "html"	1
`autoindex`	no	no	yes	inherited or off	1 (on/off)
`client_max_body_size`	no	no	yes	inherited or 1m	1
`index`	yes	yes	yes	inherited or index.html	no limit
`error_page`	yes	yes	yes	inherited or none	no limit (at least 2, last is URI)
`cgi_handler`	yes	no	yes	inherited or none	2 (ext and interpreter)
`limit_except`	no	no	yes	GET POST	no limit
`upload_store`	no	no	yes	-	1
`return`	no	no	yes	-	1 or 2 (code URL)

In terms of C++, each of the above 3 contexts is its own class: GlobalConfig, ServerConfig, and LocationConfig. There is only one GlobalConfig object since only one config file is allowed. Each GlobalConfig object can have multiple ServerConfig objects, and each ServerConfig object can have multiple LocationConfig objects.

Socket programming

Even in modern C++, network programming requires dealing with low-level C libraries and APIs that were designed at the dawn of the Internet. This is one of the frequently cited limitations of C++. Various proposals to modernize this aspect of the language have been raised and are currently under consideration by the C++ standards committee.

So, in order to write a web server in C++, if we don't want to use third-party libraries, we must get ready to do some good old C programming! One of the main concepts to grasp is that of a socket.

In UNIX-like systems, a socket is a software abstraction that serves as an endpoint for sending and receiving data. The Berkeley Sockets API, which standardized this concept, was designed to be protocol-agnostic from its inception, supporting different communication domains.

Two common domains are:

Internet Sockets (AF_INET/AF_INET6): Used for communication over a network. These sockets are identified by an IP address and a port number. They do not have a visible representation in the file system.
Unix Domain Sockets (AF_UNIX): Used for inter-process communication on the same host. These are more efficient than network sockets for local communication and are represented by a special file in the filesystem.

Regardless of the type, when a socket is created, the kernel returns a file descriptor — a small integer — that the process uses to read from, write to, and manage the socket. This exemplifies the "everything is a file" philosophy in UNIX-like systems.

There are plenty of tutorials available out there that guide you through the basics of socket programming, so I won't rehash them here. For example, see this article.

Here are the key C library functions that we will need, categorized by the header in which they are defined:

#include <netdb.h>
- getaddrinfo() for converting the human-readable IP:host combinations (provided in config file) to C-ready data structures struct addrinfo, to be used in the next functions
#include <sys/socket.h>:
- socket() for creating a TCP socket
- bind() for assigning the socket to an IP address and a port (“naming the socket”)
- listen() for listening to incoming connections and creating a queue
- accept() for accepting a connection in the queue
#include <fcntl.h>:
- fcntl() for setting the sockets to non-blocking mode
#include <poll.h>:
- poll() for asking the OS kernel which fds are "ready" to be processed

The main server loop (`poll`)

poll() is one of the most important functions for the web server. It is a system call that allows the program to monitor multiple file descriptors to see if they are ready for input/output operations (reading or writing) without blocking.

The heart of the server is an infinite loop (the "event loop") that calls poll() on every iteration. By passing certain parameters to poll(), we are basically telling the kernel (of the operating system) which file descriptors we are interested in, and for what "events" — we tell poll() whether we are interested in reading from (POLLIN) or writing to (POLLOUT) a file descriptor, or both (e.g., "tell me when file descriptor listen_fd_1 has a new connection," "tell me when file descriptor connection_fd_1 has data to read," or "tell me when file descriptor connection_fd_2 is ready to accept more data to write").

When poll() is called, our server program sleeps (does not consume CPU) until one or more of the registered events occur, or a specified timeout expires.

When poll() returns, it tells us which file descriptors (sockets) are "ready" and for which events. "Ready" means that you can now safely read from or write to that file descriptor without worrying that that read or write might "block", i.e., freeze the program (e.g., while waiting for user input on stdin). We then iterate through these ready file descriptors and perform the non-blocking input/output operations (like accept() to accept new connections, read()/recv() to read incoming messages, and write()/send() to send back HTTP responses).

This is what allows the server to constantly be "listening" without consuming excess resources. poll() lets us pass responsibility from our program to the OS kernel. The kernel knows how to do this job better than most any code that we can write. Kernels have had this kind of capability — checking the status of files to see which ones are "ready" — since before the invention of the web.

This event loop continues indefinitely until a UNIX signal (SIGINT) is received. A signal handler ensures that the server shuts down gracefully after performing any necessary cleanup of resources (closing file descriptors, freeing memory, etc.).

HTTP messages

Parsing of incoming request messages:

When a request arrives on our server, the first thing the server must do is to parse it and make sure that it is a valid HTTP request (follows the rules of the HTTP protocol, only version HTTP/1.1 in our case). Again, I won't go through the details here myself since there are already great guides out there. MDN in particular is a fantastic resource with very readable details on the HTTP protocol (and much more regarding the web). We must keep in mind things like case (in)sensitivity (e.g., header name is case-insensitive), that line separator is CRLF (\r\n, not \n), and that the headers are separated from the body by a blank line (\r\n\r\n). Here is an image from MDN that I find very helpful:

Our web server supports three HTTP methods: GET, POST, and DELETE. HTTP methods indicate the purpose of the request and what is expected if the request is successful:

GET: Tells the server to retrieve a resource at a specified location.
POST: Submits some data to the specified resource, often causing a change in state on the server.
DELETE: Deletes the specified resource.

If the requested method (always in the first line of an HTTP request) is none of these three, the request is considered invalid by our server, and an error response is sent back, e.g.:

HTTP/1.1 501 Not Implemented
Content-Type: text/html
Content-Length: 158
Server: Webserv
Date: Sun, 17 Aug 2025 21:31:41 GMT

<html>
  <head>
    <title>501 Not Implemented</title>
  </head>
  <body>
    <h1>501 Not Implemented</h1>
    <p>The server does not support the facility required.</p>
  </body>
</html>

The server also checks whether the value of the Content-Length header matches the actual size of the content received. The incoming request may also be "chunked" — chunked encoding is a mechanism in HTTP/1.1 that allows the sender to transfer a request or response body as a series of "chunks" without knowing the total size in advance. So the server must "unchunk" it. Instead of Content-Length, the sender includes the header Transfer-Encoding: chunked, which acts as a signal to the receiver (our server) that the body will not be a single block of data of a known size, but will instead follow a special format.

Generating response messages:

After successfully parsing and validating a request, the server starts working on generating an appropriate response. In our C++ code, this is done by separate classes for GET, POST, and DELETE (in addition to a class for handling unknown/bad requests), which all inherit from a HTTPRequest abstract base class with the main pure virtual function being generateResponse().

The server must never do a read or a write operation without going through poll() (to be sure that the operation will be non-blocking). So generateResponse() acts like a state machine. Each time generateResponse() needs to read from or write to a file, it does some work, saves its current working state at the point where it needs to read or write, registers the read or write file descriptor with poll, and returns. When poll() says the file descriptor is ready for reading/writing, the generateResponse() function will be called again, recognize (based on the previously saved state) that this is not the first time it is being called, and continue where it left off last time. Every time it returns, the main server loop checks if the response is ready, so that it can be sent to the client.

The requested resource path is always built by adding the URI to the root. E.g., if the first line of the request is GET /some_file_or_dir HTTP/1.1 and the root (set in the configuration file) is /var/www/my_website/, the resource path is root + URI = /var/www/my_website/some_file_or_dir. And if the first line of the request is GET / HTTP/1.1, the resource path is root + URI = /var/www/my_website/ (same as root).

For GET responses, the generateResponse() function checks whether the requested path (root + URI) exists or not; otherwise, it is an error (404). If the requested path is a file (except for CGI files), the server reads the file and sends its contents as the body of the response. If the requested resource is a directory, the server checks if an index file has been set in the server configuration file. If such a file is found in the directory, then this file is chosen for the response; otherwise, an error 404 is returned.

For POST, the URI, instead of being a file or directory, represents an endpoint that will process the data the client is sending. POST /submit-form HTTP/1.1 doesn't mean there's a file or directory named submit-form. It means there is a piece of logic on the server, associated with the /submit-form path, that knows what to do with the incoming form data. For form data, POST usually hands over the task to a CGI program. If there is no CGI program set up at the requested URI, it is considered to be a request to upload data to a file on the server. In our particular case, uploads are only allowed if a location block is provided in the configuration file with an upload_store directive. This will allow uploads for that particular location (URI/endpoint) only. For example,

   location /upload { 
       upload_store /path/to/upload/directory;
   }

This will allow uploads only for requests such as POST /upload HTTP/1.1, but other requests like POST /some-uri HTTP/1.1 will get an error. The request body for POST requests is usually not plain text that can be written to a file; the body is usually encoded. Once a valid upload request is received, the POST version of generateResponse() must parse the encoded request body. Uploads via POST requests almost always use the multipart/form-data encoding, identified by the Content-Type header (see RFC for more details on the syntax). For simple form data, application/x-www-form-urlencoded encoding may be used. Less commonly, for file uploads, application/octet-stream might be used to send raw binary data. It is also possible to see a non-encoded media type, such as Content-Type: text/plain, Content-Type: text/html, or Content-Type: application/json, etc., in which case the body's raw data is saved in a file with an appropriate extension. If a file upload is successful, a 201 Created response is sent; otherwise, an appropriate response, such as 415 Unsupported Media Type is sent.

For DELETE, the URI refers to the file or directory that should be deleted. In our server, DELETE is disallowed by default in all locations, and must be enabled manually for individual locations. For example, if DELETE is enabled in the /uploads location, for a request DELETE /uploads/photo123.jpg HTTP/1.1, if the file exists, the server will delete the file and send a 200 success response (202 or 204 can also be sent). If the file does not exist, it will be a 404 Not Found response. If the resource is not inside the upload_store directory (e.g., DELETE /index.html HTTP/1.1), it will be a 403 Forbidden response.

Regardless of the method requested, redirections in our server are configured using a return directive in the config file. For example, the config file might contain:

    location /redirect-demo {
        return 301 "https://www.youtube.com/watch?v=dQw4w9WgXcQ";
    }

Then any request to the URI /redirect-demo will be redirected to the URL https://www.youtube.com/watch?v=dQw4w9WgXcQ. A redirect response might be:

HTTP/1.1 301 Moved Permanently
Content-Type: text/html
Location: https://www.youtube.com/watch?v=dQw4w9WgXcQ
Content-Length: 105
Server: Webserv
Date: Mon, 18 Aug 2025 14:15:11 GMT

<html>
  <head>
    <title>301 Moved Permanently</title>
  </head>
  <body>
    <h1>301 Moved Permanently</h1>
  </body>
</html>

CGI

The Common Gateway Interface (CGI) is a standard protocol that allows a web server to execute external programs (CGI scripts) to process and generate content for web requests. Just like HTTP/1.1, the CGI/1.1 specification is described in an RFC. Legacy CGI is not very commonly used nowadays, as much more performant and feature-rich alternatives now exist, such as FastCGI, servlets, and web frameworks. Our server uses CGI/1.1 for educational purposes.

Our config file allows a cgi_handler directive, which takes two arguments: an extension and the path to a handler for that extension. The interpreter can be for any language, e.g. Perl, PHP, or Python. For example, the following can be specified in the config file:

server {
    listen localhost:9743;
    server_name localhost;
    root ./assets/default_website;
    cgi_handler .py /usr/bin/python3;
    location /cgi-demo {
        index hello.py;
    }
}

Here, the cgi_handler is provided in the server context and will be inherited by any location blocks within that server block. What this means is that any request for a resource name ending with .py will be considered a CGI request within this server block. In addition, requests to the /cgi-demo location will also be considered CGI requests since its index file ends with .py.

The general idea is that the server:

Identifies a requested resource as a CGI script (e.g., /cgi-demo/hello.py).
Prepares an environment for the script, packing all the request details (method, headers, query string, etc.) into environment variables.
Executes the script using the configured interpreter (e.g., /usr/bin/python3).
Pipes the request body (if any, like from a POST) to the script's standard input (stdin).
Reads the script's standard output (stdout). This output contains the response headers and body that the server should send back to the client.
Waits for the script to terminate and cleans up.

Just like network programming, modern C++ is annoyingly lacking in features to create, manage, and communicate with sub-processes (child processes). So we must again go back to C APIs. We will use fork() to create a child process, execve() to execute the CGI program, pipe() (with the help of dup2()) to communicate with the child process. waitpid() is used to check the status of the child process, and kill() can be used to terminate the process if needed.

The CGI protocol has very specific rules (just like the HTTP protocol) about how a request is passed to the CGI process. The most critical part is the preparation of environment variables that must be passed to the CGI program (via the execve system call). Here are the essential ones based on the CGI/1.1 specification:

Variable	Description	Example Value
`GATEWAY_INTERFACE`	The CGI version.	`CGI/1.1`
`SERVER_PROTOCOL`	The protocol of the request.	`HTTP/1.1`
`REQUEST_METHOD`	The HTTP method.	`GET`, `POST`, etc.
`REQUEST_URI`	The full, original request URI.	`/path/to/script.py?user=test`
`SCRIPT_FILENAME`	The absolute filesystem path to the script.	`/var/www/html/path/to/script.py`
`SCRIPT_NAME`	The virtual path to the script.	`/path/to/script.py`
`QUERY_STRING`	The part of the URI after the `?`.	`user=test`
`CONTENT_TYPE`	The `Content-Type` header from the client request.	`application/x-www-form-urlencoded`
`CONTENT_LENGTH`	The length of the request body. If the request was chunked, this should be the total size of the unchunked body.	`15`
`SERVER_NAME`	The server's hostname or IP.	`example.com`
`SERVER_PORT`	The port the server received the request on.	`80`
`REMOTE_ADDR`	The IP address of the client.	`192.168.1.10`

All other headers from the request must also be passed as environment variables. The convention is to prefix them with HTTP_, convert the header name to uppercase, and replace hyphens (-) with underscores (_). For example, User-Agent becomes HTTP_USER_AGENT and Accept-Language becomes HTTP_ACCEPT_LANGUAGE.

We then use the standard UNIX way to create the child process. Two pipes are needed to communicate with it: one for our server to write the request body to the CGI's stdin, and the other for our server to read the response from the CGI's stdout. We use a CGISubprocess class for this:

CGISubprocess::CGISubprocess()
{
    if (pipe(_pipe_to_cgi) != 0)
        throw std::runtime_error("Failed to create pipe to CGI: " + std::string{strerror(errno)});
    setNonBlocking(_pipe_to_cgi[0]);
    setNonBlocking(_pipe_to_cgi[1]);
    if (pipe(_pipe_from_cgi) != 0)
    {
        close(_pipe_to_cgi[0]);
        close(_pipe_to_cgi[1]);
        throw std::runtime_error("Failed to create pipe from CGI: " + std::string{strerror(errno)});
    }
    setNonBlocking(_pipe_from_cgi[0]);
    setNonBlocking(_pipe_from_cgi[1]);
}

void CGISubprocess::createSubprocess(const std::filesystem::path &filePathAbs, const std::string &interpreter)
{
    // fork (create a duplicate of the current process)
    _pid = fork();
    if (_pid == -1)
        throw std::runtime_error("Failed to create fork for CGI: " + std::string{strerror(errno)});

    // in child
    else if (_pid == 0)
    {
        // change current working directory to script directory
        std::filesystem::current_path(filePathAbs.parent_path());

        close(_pipe_to_cgi[1]); // Close write end of the pipe (parent will write to it)
        dup2(_pipe_to_cgi[0], STDIN_FILENO); // redirect stdin to read end of pipe_to_cgi
        close(_pipe_to_cgi[0]); // close redirected fd
        close(_pipe_from_cgi[0]); // close read end of the pipe (parent will read from it)
        dup2(_pipe_from_cgi[1], STDOUT_FILENO); // redirect stdout to write end of pipe_from_cgi
        close(_pipe_from_cgi[1]); // close redirected fd

        // prepare args for execve (interpreter is the program name and script file is the argument, like running `python3 hello.py`)
        char *args[] = {const_cast<char *>(interpreter.c_str()), const_cast<char *>(filePathAbs.c_str()), NULL};

        // execve
        if (execve(args[0], args, _envp.data()) == -1)
        {
            // can also simply use std::exit but it won't clean up the local objects (destructor will not be called)
            throw std::runtime_error("execve failed: " + std::string{strerror(errno)});
        }
    }
    // in parent
    else if (_pid > 0)
    {
        _subprocessStarted = true;
        // close unneeded pipes
        close(_pipe_to_cgi[0]);
        close(_pipe_from_cgi[1]);
    }
}

We then register the two remaining open file descriptors in the parent process with poll() (one to write to the CGI and one to read from it), which will tell us whether the file descriptors are ready for reading/writing or not. After writing to the child and reading back from it successfully, all that remains to be done is to send a response to the client. The response we read from the CGI is almost a full response, but not quite. It can e.g., contain the Status: 200 OK header, which we should convert to a proper start line HTTP/1.1 200 OK. This is now the full response and can finally be sent to the client.

Closing remarks

So hopefully, this provides some insight into the inner workings of a web server, a technology that lies at the heart of the web that we all interact with every day. Admittedly, it can be frustrating to go back to understand old-school technologies and use verbose C/C++ code to implement what modern languages and frameworks can achieve in a snap of a finger. But modern frameworks can only do that because somebody else has already done the difficult work, notwithstanding the fact that a surprisingly large amount of infrastructure in the present day still runs on decades-old technologies. I, for one, learned a great deal by working on this project and expect these lessons to be useful in a wide range of development tasks.

Podcasts that I recommend

Wed, 23 Apr 2025 00:00:00 GMT

I listen to a lot of podcasts and have learned a great deal from them. Here I will share a categorized list of the shows as well as episodes I found the most useful. Hopefully it can help you discover and learn something useful.

They are listed roughly in descending order of when I listened to them. So the ones on top of the table reflect my latest thinking, while the ones on bottom could be things that I was interested in or agreed with in the past, but possibly not anymore.

Shows

Honorable mentions

Deep Questions: Cal Newport’s approach to productivity (detailed in his book Deep Work) has been an important influence on me. On the Deep Questions podcast, Cal answers questions and shares case studies about putting these ideas into practice in the real world.
Astral Codex Ten Podcast: Audio version of one of my favorite blogs, Astral Codex Ten.
Revolutions: A history podcast covering the American, French, Russian, and other revolutions, by Mike Duncan of The History of Rome.

Episodes

You might also be interested in my post about detailed testing and comparison of podcast apps.

Understanding the universality of computation

Tue, 08 Jul 2025 00:00:00 GMT

The Turing principle is so fundamental to our understanding of reality that David Deutsch considers it to be one of the four main constituents of his "theory of everything" (the other three being quantum theory, neo-Darwinian theory of evolution, and Popper's theory of epistemology).

The Church-Turing Conjecture

In 1936, mathematicians Alan Turing, Alonzo Church, and Emil Post — trying to understand the precise nature of "computing" (or "calculating" or "proving") — independently conjectured that what can or cannot be computed does not depend on things like the design of the computer, but is universal. This is now known as the Church-Turing conjecture (or hypothesis or thesis) and can be stated as follows:

Everything that is computable can be computed by a Turing machine.

The Turing machine is not a very complex object. The example that Alan Turing gave was that of a long strip of paper with a mechanism to read, write, and erase symbols on it, and move to other parts of the strip depending on the symbols. But this is just one implementation of the Turing machine, which is an abstract concept. It's irrelevant what material the computer is made of — it just needs to satisfy the read-write-erase-move criteria (i.e., be Turing-complete). This makes the Turing machine a universal computer (an abstract one), a computer that can do what any other computer can do.

The epoch-making genius of Alan Turing, and why he deserves special mention among the other mathematicians who came at this, was that his model was the closest to being physical (as opposed to abstract mathematical) and he fully understood and articulated this universality of computation: the fact that there is nothing special about the hardware of a computer — the nature of computation does not depend on which physical object is performing it. He conjectured that one device as simple as the Turing machine can perform any possible computation — it just needs to be provided with the right algorithm (software) to do so. If something is computable, a Turing machine can compute it; if something is uncomputable, no other possible object can compute it.

Remember that this was before the "age of computing". Turing later went on to propose, quite correctly, that a universal computer such as a Turing machine could even be made to "think", i.e., become intelligent, if it is provided with the right "programming". This is why Alan Turing is considered one of the "founding fathers" of the fields of computer science as well as artificial intelligence. (Interestingly, Ada Lovelace, almost a hundred years before Turing, came close to understanding this universality, suggesting that computers could be made to do much more than number-crunching, such as music generation, but she erroneously ruled out the possibility of their originating autonomous thought.).

It might seem like an abrupt jump to go from "being able to compute everything that is computable" to "being able to simulate a mind". I will explain the connection shortly.

But first, I hasten to add that it is now known that the above-stated Church-Turing hypothesis could at best only be true of classical computation, because quantum computers are capable of performing computations that no classical computer can (a Turing machine is a classical computer). One example of such a computation is the generation of true random numbers. But that is not a fatal blow to the hypothesis. As I mentioned, the Turing machine is just one example that Turing used to prove that it is possible for one machine to perform any possible computation. To make the Turing machine truly universal, we just need to modify its design to make it a quantum computer. So, restating the Church-Turing hypothesis to take that fact into account:

There exists an abstract universal computer that can compute everything that is computable.

Stating the hypothesis this way also takes the focus off a single design of a computer — the Turing machine. I think many people learning about this focus too much on the Turing machine itself and miss the truly astonishing part: the nature of reality is such that one computer, no matter where it is in the universe and no matter what it is made of, can compute what anything else in the universe can compute.

Computation connects mathematics to physics

Turing did not explicitly state it, but it is implicit in his work that "everything that is computable" refers to everything that is computable in physical reality, because the laws of physics determine what is and is not computable.

To compute means to perform a calculation. It refers to the physical application of some mathematical theory (rules and formulae). Given a (physical) input, a computer processes it (using some physical medium) and provides a certain (physical) output. Humans have been building computing devices since prehistoric times. Using tally marks or even your fingers to do counting is like using a crude computer. The abacus was a more sophisticated early computer. The ancient Greeks built devices to predict eclipses, e.g., the Antikythera mechanism. The modern word "computer" has come to mean a programmable computer, meaning we can relatively easily change the algorithm/software it uses to convert an input into an output without changing anything about the hardware. But these are just a special case of a "computer" in the deeper sense.

So computing is how mathematics (which is abstract) interacts with physical reality. Even if a mathematician is working on "pure mathematics" without using a conventional computer, she is still using some physical system, mainly her brain along with maybe pen and paper, to perform computations. Mathematically proving a theorem is exactly equivalent to performing some computation (usually using the computer known as the human brain).

When we re-examine the Church-Turing conjecture while keeping this physical definition of computation in mind, it has been suggested that it be called the "Turing principle", not a hypothesis or a conjecture, to make it clear that it is a physical principle, comparable to, for example, the laws of thermodynamics or the gravitational equivalence principle. So the Turing principle can be stated as follows:

There exists an abstract universal computer that can perform any computation that any physical object can perform.

We are still talking about an abstract universal computer. However, note that the only thing that makes the Turing machine abstract instead of real is that it requires unlimited hardware resources: unlimited memory (i.e., an infinitely long paper tape), unlimited running time (no limit to how much time it can take to perform computations), and an unlimited energy supply. This is needed to remove the limitation of computations that can be performed in principle but are not practicable because they require infeasible amounts of memory and/or time. In order to translate the abstraction of the Turing machine into a real physical device, all we need to say is that such a physically possible device can compute anything that is computable, as long as we provide it with enough paper tape, time, and energy. Similarly, the quantum equivalent of the Turing machine can also be conceived, and has been described by David Deutsch.

Clearly, given the above, it must be possible to build a physical universal computer. It will only be limited by the hardware that it has access to. And we can continue to supply it with resources (energy and memory) until we run out of matter in the universe. So we can do away with talking about abstract computers and talk about physically possible computers:

It is possible to build a universal computer, a physical object that can perform any computation that any other physical object can perform.

Simulating anything in the universe

Now, why should "everything that is computable" be of interest to anybody other than mathematicians and theoretical computer scientists? Why should the Turing principle deserve to be considered a part of any viable "theory of everything"?

The answer: Because computers can simulate physical reality. It turns out that simulating physical processes using a computer has also been common since prehistoric times. Consider a goat herder putting tally marks on a wooden stick to count his goats. As the goats leave the pen, he puts a tally mark on a stick for each goat. When the goats return, he checks whether the number of tally marks is equal to the number of goats. If they are not, either some goats are missing or he has got somebody else’s goats (assuming his calculation is accurate). So one physical system, the goats, is being simulated by a completely different physical system: tally marks on a wooden stick. Note that in order to do this, the goat herder must assume that the laws of physics are such that it is irrelevant what physical system he uses to simulate the count of goats. Of course, this example is that of a very crude simulation, and we know that we can do much better.

Simulating physical processes using computers is nowadays known as generating virtual reality. The hardware that exists today can already generate sounds indistinguishable from natural sounds to the human ear. Indeed, it's meaningless to ask whether a sound is "real" or "artificial" if it is causing the exact same effect on the listener's ear and brain. The technology of computer-generated graphics is rapidly improving, and those for other senses like touch, smell, and taste are also not difficult to foresee. To create a perfect simulation of reality, the hardware advances that need to be made are relatively trivial compared to the software advances. The software advances in question are no different from advances in understanding the physics of the real world. Thus, it is in principle possible to create a virtual reality rendering that is indistinguishable from actual reality, if we knew the real laws of physics to arbitrary accuracy. To take the converse of that: Computers can perfectly simulate (a given part of) reality because reality is rather like a computer; the initial conditions are the input, the laws are the software/program, and the final conditions are the output. This breathtaking insight is due to David Deutsch, who has stated the strongest, all-embracing, physical form of the Turing principle, known as the Church–Turing–Deutsch principle:

Every finite physical process can be perfectly simulated by a finite universal computer.

These days we take for granted the ability of computers to simulate everything from the aerodynamics of a vehicle to the collision of galaxies. The Turing principle (in its strongest form that Deutsch has argued for) makes it clear that our computers can simulate any part of reality with perfect accuracy, given the right program. This is a profound fact about reality that has far-reaching consequences.

David Deutsch uses the term "self-similarity" for this remarkable property of reality:

... physical reality is self-similar on several levels: among the stupendous complexities of the universe and multiverse, some patterns are nevertheless endlessly repeated. Earth and Jupiter are in many ways dramatically dissimilar planets, but they both move in ellipses, and they are made of the same set of a hundred or so chemical elements (albeit in different proportions), and so are their parallel-universe counterparts. The evidence that so impressed Galileo and his contemporaries also exists on other planets and in distant galaxies. The evidence being considered at this moment by physicists and astronomers would also have been available a billion years ago, and will still be available a billion years hence. The very existence of general, explanatory theories implies that disparate objects and events are physically alike in some ways. The light reaching us from distant galaxies is, after all, only light, but it looks to us like galaxies. Thus reality contains not only evidence, but also the means (such as our minds, and our artefacts) of understanding it.

It is the presence of self-similarity that makes the world comprehensible to beings like us. The Turing principle shows one of the ways in which reality is self-similar. There is something exceptionally computation-friendly about the laws of physics as we find them.

Virtual reality is how we understand

The human brain is a physical object, a universal computer — at least a classical universal computer since it is Turing-complete (it can mimic a Turing machine, e.g., by using the read-write-erase-move procedure mentioned earlier). Our experience of the world is literally virtual reality. We never see what is "really out there". We don't even see what is in our brains: electrochemical phenomena. What we actually experience is the (clearly inaccurate) virtual reality rendering of that which is out there. The program/software that dictates how the raw sensory input is processed into the virtual reality that we end up experiencing is partly inborn and partly acquired throughout life.

To improve our understanding of the world, we habitually update the program of the brain by acquiring new theories. The Turing principle implies that what makes us intelligent, and able to comprehend the world better than any other animal, could not be any magical property of the hardware of our brain. The hardware is irrelevant as long as it is a universal computer, which is a very low bar since something as simple as a Turing machine (or more accurately its quantum counterpart) is a universal computer. Our personal computers and most programming languages are also Turing-complete and have been for decades. What makes us intelligent is the programming of our brain and the fact that we are able to update that programming. Not only does that make us "more intelligent than other animals", but it makes us qualitatively different from them: Creating and updating virtual reality is our ecological niche. We are the only (remaining) species whose members rely mainly on updating their virtual reality rendering (i.e., creating knowledge) to survive.

We understand the world by creating explanations and updating our inborn assumptions. For example, there is nothing in our genes that tells us how to make fire. Or how to make arrows and other sharp objects to hunt or fend off animals much bigger than us. Once someone has created a theory, the same ability to generate virtual reality also allows us to pass on that information to other members of our species, so they don't have to "rediscover fire" or "reinvent the wheel" (meme replication requires that knowledge be encoded in some medium, such as speech or writing, and then decoded by another individual creatively, i.e., by creating a virtual reality rendering of the encoded message). The virtual reality renderings in the brains of other animals, insofar as there are any, are defined by their genes. They cannot update the program in any significant way in their lifetime, but we can. This fact has incidentally given us the ability to reason about everything — the Turing principle necessitates that there is no limit to how accurate our rendering of reality can become. When our intuitions tell us that the Earth is flat, we can reason by creating explanations and testing them against reality. No matter how strong our intuitions are (it really, really feels like the Earth is flat), we have the ability to see past them and get closer to reality.

I will leave you with a quote by David Deutsch that summarizes the significance of the Turing principle better than I ever can:

This is the strongest form of the Turing principle. It not only tells us that various parts of reality can resemble one another. It tells us that a single physical object, buildable once and for all (apart from maintenance and a supply of additional memory when needed), can perform with unlimited accuracy the task of describing or mimicking any other part of the multiverse. The set of all behaviours and responses of that one object exactly mirrors the set of all behaviours and responses of all other physically possible objects and processes.

This is just the sort of self-similarity that is necessary if ... the fabric of reality is to be truly unified and comprehensible. If the laws of physics as they apply to any physical object or process are to be comprehensible, they must be capable of being embodied in another physical object – the knower. It is also necessary that processes capable of creating such knowledge be physically possible. Such processes are called science. Science depends on experimental testing, which means physically rendering a law’s predictions and comparing it with (a rendering of) reality. It also depends on explanation, and that requires the abstract laws themselves, not merely their predictive content, to be capable of being rendered in virtual reality. This is a tall order, but reality does meet it. That is to say, the laws of physics meet it. The laws of physics, by conforming to the Turing principle, make it physically possible for those same laws to become known to physical objects. Thus, the laws of physics may be said to mandate their own comprehensibility.

My quest for the best podcast app

Sun, 16 Mar 2025 00:00:00 GMT

The Problem

I listen to 10-20 hours of audio content per week. So, it's only reasonable that I invest time in finding a good app that makes the listening experience better and helps me be more productive.

I have long been a user of Pocket Casts (with its Plus plan). The main selling point of Pocket Casts for me was its folders and bookmarks. Overall it is a very decent app, but it lacks some features, such as the ability to search for a keyword in all the episodes of subscribed shows. You can only search show by show, and given that I am subscribed to over 230 shows, that’s just not feasible. It provides a transcript feature that is based on the Podcasting 2.0 Transcript tag, meaning it relies on the creators manually providing a transcript, which most creators do not. So the vast majority of podcasts currently don’t have transcripts available on Pocket Casts; automatic generation of transcripts is not available.

Here is a list of features I wanted in my ideal podcast app:

Organize subscribed shows into categories (e.g., using folders or tags).
Search for episode keywords within a specific show.
- I assumed this was a standard feature, but Apple Podcasts proved me wrong.
Search for episode keywords across all subscribed shows.
Clip-sharing with others, even if they don’t use the same app.
Import podcasts easily from another player (e.g., via OPML).
- Again, I was surprised that Apple Podcasts lacks standard OPML import/export functionality.
Support for multiple queues or playlists, beyond just a single "up next" list.
Bookmarks for marking interesting points in an episode, ideally with notes/comments.
Episode transcripts (either via Podcasting 2.0 tag or auto-generated).
Detailed listening statistics (e.g., listening time per week, most listened-to shows per month).
(Price is, of course, always a factor, but given how much value I get from podcasts—and since most apps offer free tiers with optional premium features—this was a secondary consideration.)

Clearly, this is a demanding list, and not every feature is a must-have. I also haven't listed the very basic features that virtually every podcast app includes, like custom playback speed or adding shows via RSS URL. Below, I will discuss the relative importance (weight) I assigned to each desired feature.

Testing Setup

So, in early 2025, about a month before my Pocket Casts Plus yearly subscription was due to renew, I went on a hunt for the best podcast-listening app out there for iOS. To my surprise, there aren't that many contenders.

On Reddit (across different subreddits), there seems to be a strong consensus that Podcast Addict is the best podcast app overall. Unfortunately, it is Android-only, so it wasn't an option for me. Beyond that, opinions (on Reddit and elsewhere) are very mixed. I had to test the apps myself to find out.

I selected various apps based on online recommendations and App Store searches. I initially downloaded and briefly evaluated (at least) the following apps:

Pocket Casts (My baseline app)
Overcast
Castro
Downcast
Snipd
Apple Podcasts
Spotify
Audible
Amazon Music
RSSRadio
Fountain
Podurama
Podcat
Podimo
Castbox
Podbean
Podger
Kasey Podcast
Luminary
Podcast Republic

Testing Details

Some apps from the initial list were quickly eliminated due to obvious omissions of key features (a lack of folders/tags was often the easiest deal-breaker to spot). After this first pass, I shortlisted the following apps for more detailed testing:

Pocket Casts
Overcast
Apple Podcasts
RSSRadio
Fountain
Podurama

I used each of these six apps as my primary podcast player for at least a week. I imported all my (230+) podcast subscriptions via OPML into each app, where supported. (As mentioned, Apple Podcasts lacks OPML import, so I manually added only my most frequently listened-to shows, not the full 230+ list).

An obvious winner didn’t immediately emerge. Each app had significant strengths and weaknesses relative to my criteria. It became clear I likely wouldn't find one perfect app with every feature exactly as I wanted. Therefore, I decided to perform a quantitative comparison.

For each desired feature, I rated its implementation in each app on a scale of 0-10 (where 0 means absent or unusable, and 10 means perfectly implemented for my needs). I also assigned a weight to each feature (0.0-1.0 scale) based on its importance to me. For more details on why I gave a specific score, hover/tap on the numbers in the table below.

Feature	Feature importance (weight) (0.0-1.0)	Pocket Casts (0-10)	Fountain (0-10)	Apple Podcasts (0-10)	Podurama (0-10)	RSS Radio (0-10)	Overcast (0-10)
Folders/Organization	0.8	8	8	0	9	8	0
Playlists / Multiple Queues	0.5	0	8	0	10	5	10
Clip-sharing	0.5	8	10	0	0	0	5
Bookmarks	0.7	10	3	0	10	0	0
Transcripts	0.4	0	9	10	0	0	0
Search within one show	1.0	10	5	0	10	10	10
Search within all subscribed shows	0.8	0	4	10	0	4	10
Listening stats	0.2	3	0	0	3	3	10
Premium price (annual value)	0.2	2	3	10	3	9	7
Total weighted score	-	28.4	29.9	14.0	30.4	24.5	28.9
Total unweighted score	-	41	50	30	45	39	52

"Total unweighted score" is the simple sum of the scores for each feature (maximum possible is 90 based on 9 features listed). "Total weighted score" is the more important metric for my decision; it's calculated by multiplying each feature’s score by its assigned importance weight, and then summing these products.

While this isn't the most rigorous quantitative analysis possible, it served its purpose well by providing a structured comparison based on my priorities (high value-to-time-invested ratio). The scores and weights are subjective; I experimented with adjusting them slightly based on my perceptions, but the overall ranking didn't change significantly.

Based on the weighted scores, Podurama comes out on top for my specific needs:

Podurama (30.4)
Fountain (29.9)
Overcast (28.9)
Pocket Casts (28.4)
RSS Radio (24.5)
Apple Podcasts (14.0)

However, the scores for the top four apps are very close to each other, indicating that the decision wasn't clear-cut and involved trade-offs.

Podurama

Main Positives:

Offers a ton of options and settings: highly customizable launch screen, ability to set different settings per show, bulk editing of episodes, etc.
Provides multiple ways to learn about an episode before listening (AI-generated insights, chapters, snippets). The "Popular" episodes section within shows can sometimes aid discovery.
Includes a web app (I haven't used it extensively, but its availability is a plus for some users).

Main Negatives:

No clip-sharing functionality.
Does not support searching within episodes across all subscribed shows.

Podurama is big on machine learning; it offers various “AI features” such as podcast summaries, AI-generated chapters, and episode trailers/snippets. However, I personally don't find such features so useful.

Fountain

This was probably the least-known app among my shortlist.

Main Positives:

The clip-sharing feature is outstanding (rated 10/10). It's easy to use and share, even with people not using Fountain.
Transcripts are generally very good. Seem to be using an advanced ML model.

Main Negatives:

Transcript text cannot be copied (a drawback for quoting or saving snippets).
Search within a show is limited to episode titles only; it does not search descriptions or show notes. This is a considerable deficit for my search-heavy workflow.
No dedicated bookmarking feature. While creating clips serves as a workaround (saved clips act like bookmarks), it's slower and less convenient than a dedicated bookmark button (hence the 3/10 score).
Occasional performance issues (freezing/slowing during search), though infrequent enough not to be a deal-breaker.
Timestamps in show notes are not clickable. This is a major nuisance, as many podcasts use show note timestamps for navigation instead of formal chapter markers.
Lacks customization for the order of shows/episodes on the main screen (shows are sorted alphabetically, episodes by recency), rendering the launch screen less useful for prioritizing content.
Incomplete show notes observed for some episodes (text gets truncated).
Lacks advanced playback features like silence skipping ("Smart Speed" in Overcast) and volume boost.
Sometimes fails to pause playback via headphone controls.
No OPML export option, making it difficult to switch away from Fountain in the future.
No web app.

Brief Notes on the Remaining Runners-Up

Based on my criteria:

Pocket Casts: Its primary strength remains its intuitive show organization via folders. The smooth drag-and-drop interface allows full customization of show order within folders. Its bookmarking is excellent (10/10). However, its lack of global search and auto-generated transcripts were major drawbacks for me.
Overcast: Stands out for its minimalistic design, high performance, excellent global search (10/10), and valuable playback features like Smart Speed and Voice Boost (reflected partly in the Price/Value score, as these are free). Its lack of folder organization (0/10) and basic bookmarking were key weaknesses for me. Its playlist implementation is strong (10/10).
Apple Podcasts: It is, well, Apple Podcasts. It is a first-party app that comes pre-installed on Apple devices. If you like Apple’s approach to doing things (privacy, etc.), this app might be for you. It offers excellent global search (10/10) and widely available transcripts (10/10). However, it severely lacks organizational features (no folders - 0/10), offers no OPML import/export, and has minimal queue/playlist management (0/10).

Conclusion

After this extensive testing, I have switched to Podurama as my primary podcast app. However, the fact that this detailed analysis was necessary, and the closeness of the scores, highlight that there is no perfect podcast app, at least for my specific needs.

I've actually settled on a multi-app system, keeping several apps on my phone to leverage the strengths of each:

Podurama: My main app for daily listening, chosen for its excellent organization (folders - 9/10), good bookmarking (10/10), and customization options.
Fountain: Used specifically when I want to create and share audio clips with others.
Overcast: Kept primarily for its powerful keyword search across all episodes of my subscribed shows (10/10), a feature Podurama lacks. Its Smart Speed is also beneficial.
Apple Podcasts: Used mainly for its high-quality transcript feature (10/10) or for browsing Apple's unique podcast charts and recommendations.

This fragmented approach isn't ideal, but it's the best way I've found to access the features I value most. Maybe an app will improve to combine all these strengths one day. Maybe they won’t, so I might even consider building my own podcast app at some point to finally get everything I want in one place.

How I built this website starting with no web dev experience

Sun, 06 Apr 2025 00:00:00 GMT

The idea of having a website has long enticed me. I take a lot of notes and take great care to keep them well-organized. Writing useful ideas down has the obvious upside of reliable retrieval of information compared to human memory. I also find that writing helps me sharpen my thinking on any given topic. I largely agree with Holden Karnofsky's Learning by Writing. It is so often the case that an idea makes perfect sense in my head, but when I start writing it down, there turn out to be glaring holes in my understanding. Or at least, it reveals various assumptions that had been implicit.

So having personal notes is great. But many of them deserve to be out in the open, for anyone to discover, learn, and criticize. This is how knowledge grows.

One common idea I encountered and sometimes believed was that having your own website is akin to reinventing the wheel. Why not use already established platforms like Medium or Substack? The answer is that while those platforms have their use cases, having your own website gives you extremely fine-grained control over everything. It might not be for everyone to create your own website from scratch. But there are gradations of the level of control that one may want. There are countless services available these days that can help you "spin up" and host a website.

When one thinks of starting a blog, often the first thing that comes to mind is WordPress. It was no different for me. But the option that comes to mind first is hardly ever the best option. So I started researching my options. While my coding knowledge is rapidly growing, especially in lower-level programming, I have to be strategic about what I spend my time learning.

The full spectrum of blog-building tools from low-level to high level

Level 1: Pure Code

At the fundamental level, all websites are HTML, which gives structure to the content you see, styled with CSS. JavaScript is used to manipulate the HTML and CSS to add interactivity and dynamic features. So HTML, CSS, and JavaScript are sent by the server and received by the browser, which renders the website that you see. Note that the JavaScript sent to the browser is called "client-side JavaScript" (JavaScript has applications that are far wider than just this) and this is only done in case something interactive or "dynamic" needs to happen on the website without reloading the page (without asking the server to send the HTML of the whole page again), e.g., the top bar of this website is dynamic (it appears on upscrolling and disappears on downscrolling). Here's some client-side JavaScript for you to play with:

I am dynamic!

One can theoretically build a website from scratch with just vanilla HTML, CSS, and JavaScript. However, in practice, building something similar to this website would be extremely time-consuming. It would be difficult to manage content and it would be hard to scale (every post is a new HTML file or requires complex JavaScript). There are some limited use cases for this approach: Learning web fundamentals, very small personal sites where tinkering is the goal, digital gardens built piece-by-piece.

Building one's website from scratch with this approach would really be reinventing the wheel. In programming and computer science, layers of abstraction exist because the lower level functionality becomes standardized. Unless there is a specific need for lower-level control, we abstract away (automate) the repetitive stuff so that we can build larger, more complex things.

Level 2: Frontend Libraries/Frameworks

The next layer is to use libraries and frameworks to structure your code. With frameworks like React.js, Vue.js, Svelte, and Angular, you can create reusable components. For example, you can write HTML for a "button" and create a Button component. This way, every time you want to insert a button in your web page, you just use the Button component instead of repeating all the HTML that went into creating the button. You can even pass it some properties so that one Button component can be used to refer to different kinds of buttons (e.g., colors, size, text, etc.). This provides much better code organization.

Creating a component is just like writing a function in a programming language once, and then simply calling that function (with specific parameters if needed) to perform that action as many times as needed, without needing to worry about the code and inner workings of the function. While this significantly reduces the development time of the frontend, you still need to build the "blog" functionality yourself (routing, fetching data, rendering Markdown, etc.). Use cases for this level include building highly custom web applications where a blog is just one part, or as the foundation for tools in the next level.

Level 3: Static Site Generators (SSGs) & Meta-Frameworks

These tools take your content (often Markdown files), apply templates, and generate static HTML, CSS, and JavaScript. This way, your focus shifts more to content, instead of the HTML, CSS, and JavaScript code. You still have a lot of control over how the translation from Markdown to HTML (and CSS and JavaScript) takes place. Tools in this layer include Next.js, Astro, Gatsby, Eleventy (11ty), Hugo, and Jekyll.

These tools are mostly developer-focused but differ vastly from each other and the learning curve varies. For example, it is possible to quickly spin up a website with Hugo (using a theme that someone else built) with very little coding knowledge. Next.js, on the other hand, is a very developer-focused but extremely powerful framework built on top of React.js (previous level), and provides not only static site generation (SSG) but also server-side rendering (SSR) capabilities. Next.js has quickly become the go-to choice for building modern websites that require a range of complex features. However, it can be overkill for mostly-static blogs.

In this level, you can write your content directly into text files (most often Markdown files). But since this can be inconvenient and lacking word-processing features, a headless CMS is a popular choice to manage content. These systems manage your content (posts, authors, categories) via a user-friendly interface and expose it to the site generators through an API, decoupling content management from the website's frontend. Popular choices for headless CMS are Strapi, Contentful, Sanity.io, Payload CMS, WordPress (used headlessly).

Level 4: Traditional/Monolithic CMS

All-in-one systems that provide an admin interface for content management and render the frontend of the website using themes and templates. Often database-driven. Examples include WordPress.org (self-hosted), Drupal, Joomla, and Ghost (self-hosted option). Such systems require minimal coding and provide a lot of features out-of-the-box (or via plugins). Performance and level of customization can vary. Plugin/theme bloat or security issues are a risk if not managed appropriately.

Level 5: Hosted Platforms & Website Builders

These are commercial services that bundle the software, hosting, maintenance, security, and often a visual drag-and-drop editor into one package. This is the easiest way to get started with minimal technical knowledge. Examples include WordPress.com (hosted), Wix, Squarespace, Blogger (Google), Ghost (Pro-hosted), Medium, and Substack. The obvious downside is limited design/customization beyond the provided templates. Vendor lock-in can also be a big issue (you might not fully "own" your site's underlying code or have full data portability). Can also become expensive with higher tiers/features.

Learning web fundamentals

Before I could choose a tool to build my blog, one thing was obvious to me: I should get more familiar with the fundamentals of the web, both because it will help me build a better website, but also because it will be invaluable in my software engineering career, no matter which specialization I end up choosing. So I dove into learning the basics such as understanding the syntax and usage of HTML, CSS, and JavaScript. But I promised myself not to get bogged down into the fundamentals so much that it prevents me from building any higher-level things (thankfully I was already familiar with the phenomenon of Tutorial Hell).

With that in mind, I went through a set of tutorials. Including freeCodeCamp's Responsive Web Design course to get familiar with HTML and CSS, javascript.info to learn JavaScript, and various parts of the MDN Core Modules to sharpen concepts about frontend development. I went through these at varying speeds and often skipping or abandoning when I felt bogged down or the learning slowed down. The most important thing in my view is to get up to speed with enough of the basics to start building something. Then building that anything is a better teacher than passively consuming a large number of tutorials. And by basics, I don't necessarily mean that you have to start with the lowest-level explanation in terms of the hierarchy of complexity. After all, we don't need to know all the facts about the microelectronic structure underlying our computers in order to use the computers proficiently and productively. To "understand" something is very different from merely learning facts about something. "Understanding is about coherence, elegance, and simplicity, as opposed to arbitrariness and complexity," as David Deutsch put it. It is about making meaningful connections between the different levels of abstraction, and being able to answer why things are the way they are, and the kinds of things that must be happening, without necessarily being able to recite all the facts.

In this case, this process consisted of understanding what websites really are (renderings of HTML, CSS, and JavaScript), how they are served (sent from the server to the browser), how they are hosted (choosing where to serve from), what kinds of tools are available to create different kinds of websites, which of those tools are most widely used and why, etc.

Choosing a website-building tool

After gaining a reasonable understanding of the fundamentals (how the web works) and thinking through the full spectrum of tools as laid down above, it was time for me to choose. The full spectrum didn't simply lay itself out in my head as it is listed here. There were naturally a lot of uncertainties and gaps in my understanding of the landscape. Not to mention that there was a lot of conflicting advice out there.

The best way to eliminate those uncertainties was to experiment. Actually getting hands-on, as opposed to reading people's opinions about it, is often the best way for me to learn about and understand a topic. I went through official tutorials / quick start guides of various tools/frameworks mentioned above (e.g., React, Next.js, Astro, Hugo).

I seriously considered all levels from level 2 and up since they all have their pros and cons. I wanted to have as much control (ownership of data as well as design) over my website as possible. But I wanted to balance that with how much time investment will be required to learn and manage that. Also, I know that my needs will change in the future and my knowledge will grow, so I needed scalability and portability (ability of the website to grow and transform, or even move to another framework if needed).

I created a list of features that I would ideally like in my website (not necessarily immediately, but at some point in the future). I wanted the freedom to be able to implement features like customizable dynamic table of contents, progress bar that updates on scroll, system for searching and filtering posts, AI narration of articles, etc.

For these reasons, the tools in level 4 and 5 were much less attractive. React and Next.js were the most powerful options (that have become the leading choice in the industry by professional web developers). If I was sure I wanted to become a web developer or if my needs were more complicated than a blog, I would certainly choose the React/Next.js stack. However, the learning curve for them is notoriously steep. For someone completely new to web development like me, at the start there would be all sorts of bugs, performance issues, and possibly even security vulnerabilities in the website. Also, Next.js shines when it comes to dynamic websites with complex features. It was designed for building full-blown web applications, not with focus on static site generation, which is where I wanted to start.

So Next.js is something I will keep an eye on in the longer term. But for the time being, I turned my attention to tools that focus mostly on static site generation. Hugo and Astro are by far the most sophisticated and modern in this aspect (largely superseding the pioneering tools like Jekyll and Gatsby, although they still have their idiosyncrasies and use cases). Hugo's primary selling point is its superior "build speeds" (the time it takes to generate the HTML from the input Markdown files and configuration files) as it is written in Go. But I didn't understand why that aspect is so hyped. Maybe build speed is important when you have hundreds or thousands of pages on your website (like documentation sites). To me, it is a very minor aspect in the broader decision, when there are so many other deciding factors involved. Hugo mainly excels as building static sites, and does not make it straightforward to add client-side JavaScript for dynamic features. It does have numerous themes available to start with. But customizing them is not easy, as you're forced to write complex Go templates or hacky JavaScript workarounds. Hugo simply does not have "highly customizable websites" as its selling point. But that's exactly what Astro offers.

Astro is a relatively new player, but one that has quickly won the hearts of many, due to the high utility-to-complexity ratio that it offers. Astro, like React/Next.js, is component-based, but unlike Next.js, it is "framework-agnostic" (not tied to a single framework), i.e., it allows you to use components from any popular frameworks (React, Vue, Svelte, etc.) or no framework at all. It also ships zero JavaScript to the client by default, but this behavior can be altered in a piecemeal way. So by default, all the "input files" like Markdown files and code/configuration files are used to generate purely static/HTML files, but "partial hydration" can be introduced, such that there are "islands" of dynamic content (client-side JavaScript), while the rest of the site remains static. This paradigm offers an incredible combination of performance and flexibility. The learning curve is also not as steep as Next.js. For beginners starting with any of these frameworks, it is usually a good idea to start with a theme (or "starter templates") that someone has designed using that framework, and then modify that theme to suit your needs. All the mentioned frameworks have many themes available (Astro, Hugo, Next.js), thanks to the generous contributions of the community members. I also found Astro themes to be much easier to heavily modify (and even mix and match) than others (especially Hugo).

Learning and navigating through Astro

Once I had chosen Astro, it was time to start building. As mentioned, choosing a theme and building upon it is usually a great way to kick-start this process. I scoured through the hundreds of themes that are available, to find something that has some of the features and aesthetic that I want, so I can build upon it. Some of the themes that fit the bill were Astro Micro, Astro Pure, Nordlys, and Webtrotion. After trying all of them out and testing and tinkering with them, I decided to use Astro Pure as the base of my website, and then heavily modify it. The generous creator of the Pure theme has added lots of useful components to it (e.g., MediumZoom, QRCode, Aside, Tabs, etc.), that can be used not only in the theme, but also in other Astro projects.

One important criterion for making such decisions is the breadth and depth of the available documentation. Astro itself naturally has extensive and high-quality documentation. But the Astro Pure theme is also reasonably well-documented.

Just like I went about understanding the fundamentals of the web (see above), I went on to seek to understand how Astro works. I learned about the syntax of .astro files, project and directory structure, configuration files, deployment options, Markdown and content collections, etc. At the same time, I kept deepening my understanding of HTML and how to style the HTML using CSS (especially learning Tailwind CSS) and to manipulate its structure using JavaScript and TypeScript.

I made various changes to the Astro Pure theme. This included changing the overall aesthetic by using different colors, fonts, roundedness of edges, etc. I also created a system for "topics" instead of "tags" and made the topics page feel much more dynamic (while remaining fully static). That said, as it stands, the structure of my blog is still mostly based on the original codebase of the theme, instead of my modifications to it. The blog will naturally continue to evolve and new features will be added as time goes on.

Choosing a content management system

Once I was happy (for the time being) with the structure and look of the website, I focused my attention on thinking about the content management system that I will use to write my content. In the past, I used many note management systems such as Notion, Obsidian, OneNote, and Evernote. Notion provides a very user-friendly yet powerful interface to create complex pages, which can be easily exported as Markdown files. Notion can technically be used as a backend to generate the Markdown files. But it would add a lot of overhead and intricate setup to get it to work seamlessly, since Notion is cloud-first and all your notes are saved first and foremost on the cloud instead of in your local file system. Obsidian, on the other hand, is slightly less user-friendly, but works mainly with your local file system. It also has a thriving plugin ecosystem that provides powerful tools such as Git integration, frontmatter support, AI-powered spell & grammar checking, etc. What's more, Obsidian has a great mobile app with which you can easily make and sync changes on the go.

Obsidian seemed great for almost all my needs, except there was no way for me to seamlessly integrate it into my workflow. I do almost all my development on VS Code running on WSL (Windows Subsystem for Linux). So the local copy of all my Astro code (and blog posts as Markdown files) is located in the WSL file system instead of that of Windows. Obsidian installed on Windows unfortunately does not integrate well with WSL (and installing the Linux version of Obsidian inside WSL results in poor performance). So if I were to use Obsidian, I would have to set up additional syncing mechanisms (e.g., Git, Dropbox, or symbolic links) to synchronize between Obsidian on Windows and Astro on WSL. Additionally, Obsidian's support for MDX is limited, although this was not a big factor in my decision for my choice of content management system.

By the way, I love VS Code. I think it is one of the best pieces of software ever developed. It is extremely versatile and works with so many use cases and development styles (I apologize to Neovim or JetBrains users who are clenching their fists right now). So why not try writing Markdown directly in VS Code? I decided to give this a try after I read that quite a few people are doing this. I was surprised by how powerful a Markdown editor it can become with the right setup. VS Code has some built-in support for Markdown and offers features like dynamic previews and document outline. But as always, there is an extension for everything in VS Code. The Markdown All in One extension is a must-have. It adds a ton of features like keyboard shortcuts (bold, italics, lists, etc.) and convenient commands such as inserting tables, links, images, etc. Markdown Table is another key extension that makes it very easy to edit and navigate tables (Tab to move around and commands for inserting and moving rows/columns, etc.). Front Matter CMS extension provides a powerful CMS without ever having to leave VS Code (I mainly use it to manage my required and optional frontmatter fields, such as tags, preview images, publish date, etc.). Code Spell Checker provides basic spell checking, and this is useful generally when coding, not just with Markdown content. I am still experimenting to find the best tool for high-quality spell and grammar checking. When I need to edit some content on my iPhone, things get a little less seamless. I use the extremely well-designed app Working Copy as a Git client. It has a decent built-in file editor too. If at some point in the future I need to do heavy editing on the phone and need a more user-friendly interface, Obsidian for iPhone can also be used in conjunction with Working Copy. So here is a brief summary of how Obsidian and VS Code compare as CMS for my purposes:

Feature	Obsidian	VS Code	Winner
Frontmatter Management	✅ Excellent	✅ Excellent	🟦 VS Code (slight edge)
Spell & Grammar Checking (AI)	✅ Good	✅ Excellent	🟦 VS Code
MDX Support	❌ Limited	✅ Excellent	🟦 VS Code
Rich Markdown Editing & Tables	✅ Excellent	✅ Good	🟪 Obsidian
Workflow & Syncing Ease (WSL)	❌ Friction	✅ Excellent	🟦 VS Code

Concluding thoughts

All in all, it took me just a month or two to go from "HTML and JavaScript are nothing more than words to me" all the way to getting this website up and running, as I wanted it to be. I learned so much in the process and will continue to do so. It would not have been possible at all, if not for the wonderful and generous people who have developed these amazing tools (free or paid) to make others' lives easier. Shout-out to the creators/developers of all the tools and tutorials that I mentioned (Astro, Pure theme, VS Code extensions, JavaScript.info, and so much more) or neglected to mention. This culture of standing with each other as we solve intellectual challenges and make collective progress is one of the main reasons I decided to pursue a career in software engineering.

{//}

Writing a bash-like shell in C

Sun, 16 Mar 2025 00:00:00 GMT

Many people working in or interested in tech rely on a shell, the most popular one being bash, followed by zsh, to interact with their computer. What better way to deeply understand the shell than coding one yourself?

The “minishell” project is often cited as one of the trickiest projects in the entire 42 curriculum. Here’s a list of the main things I learned while working on it:

Writing a garbage collector (and understanding its pros and cons)
How the shell takes input from the command line, parses it, and executes it
How command pipelines and file redirections work
Process creation, management, and communication
The logic behind shell exit statuses
How the shell handles signals, e.g., SIGINT sent with Ctrl-C and SIGQUIT sent with Ctrl-\
Effective collaboration using Notion and GitHub
Good documentation of code

Setup

This was a group project, and I was honored to partner with Robert. We used Notion as our primary collaboration tool. While GitHub does offer project management features, Notion seemed better suited in this particular case. Notion was where all the relevant brainstorming notes went. We created a detailed project overview, listing step by step the overall logic of the program, what the different parts of the program would be, and what each part would achieve. Here is what the Project Outline page on Notion looked like (each section is expandable in the original Notion page):

Using this, we came up with a task list. This is what it looked like:

In addition, we had a Notes section on Notion where we put any other ideas and thoughts of interest.

GitHub is where our code lived, so we did make use of GitHub’s collaboration features. E.g., we set up rules in the repository’s settings so that changes couldn’t be directly pushed to the main branch; all work had to be done on separate branches, and then a pull request created to merge with main. Since we decided to look over each other’s work before it was merged, we also enabled the rule to require approval before merging, ensuring the person who created the pull request couldn't merge it directly. And, of course, it's important to require branches to be up-to-date before merging.

Below I will explain the key features of our work. The code can be seen in full here.

The program

Overall logic

We implemented the core parts of bash, without recreating its entire functionality. For example, we implemented the handling of file redirections (<, <<, >, >>), environment variables (accessed with $ and the env command), and any number of pipes (|). But we did not implement background processes (&), wildcards (*), or other command chaining and grouping functionality (e.g., ;, &&, ||, ()).

Here is how our shell program works at a high level:

Input reader: The user is given a prompt to enter commands; the command history is saved and can be accessed with arrow keys.
Tokenizer: The string (line of text) inputted by the user is split by spaces, except for parts within quotes (single or double). $ followed by an environment variable name or ? (last exit status) is expanded within double quotes or unquoted strings.
Parser: The parser’s job is to understand what each token means: Is it a command name, command argument, filename, special character, etc.?
Execution: Child processes are created to execute each command. File redirections and command pipelines are handled. The exit status of the final command in a pipeline is set accordingly.
Signals: The user can send SIGINT (Ctrl-C) or SIGQUIT (Ctrl-\) at any time, and the shell reacts appropriately depending on its current state.

All of this was done without using any global variables, except one for the very specific task of indicating the current signal (SIGINT or SIGQUIT), if received.

External functions used

A limited number of external/library functions were permitted and used. Most of the functionality was coded using our own functions. Here is a table of library functions used, along with the associated library.

Library	Functions from this library used in our program
GNU Readline Library - `readline.h`	readline, rl_clear_history, rl_on_new_line, rl_replace_line, rl_redisplay, add_history
Standard Library - `stdlib.h`	malloc, free, exit, getenv
UNIX Standard - `unistd.h`	write, access, close, read, fork, execve, pipe, dup2, unlink, chdir, getcwd
Standard Input/Output - `stdio.h`	printf, perror
Signal Handling - `signal.h`	sigaction, sigemptyset, sigaddset
Process Control - `sys/wait.h`	waitpid
Directory Operations - `dirent.h`	opendir, readdir, closedir
File Control - `fcntl.h`	open

Writing a garbage collector

C is notorious for its cumbersome memory management. I decided to create a garbage collector, which helped reduce the headache of manual memory tracking and allowed us to focus more on the shell's logic. At the same time, it was a great learning opportunity to think through the process of automating memory management, often taken for granted in higher-level languages. This garbage collector was designed to have minimal performance overhead in our context, and I believe its advantages far outweighed its costs.

The advantages of the garbage collector in our case were:

Eliminating the risk of double free (a common and dangerous error).
Eliminating the risk of memory leaks.
Removing the need to manually check for malloc failure after every call.
Eliminating the risk of double close for opened file descriptors.
Eliminating the risk of forgetting to close file descriptors.

This helped ensure that our program cleaned up after itself properly, rather than relying solely on the operating system reclaiming resources upon exit (which can mask leaks during development, even if valgrind catches them later).

The main idea is this: Two linked lists are maintained—one containing all pointers allocated with malloc (that must eventually be freed) and the other containing all file descriptors opened (that must eventually be closed).

The key functions I wrote for this purpose are gc_malloc, gc_free, gc_open, gc_close, and gc_exit. For cases where an external function returns a dynamically allocated (malloc’d) pointer (e.g., readline()), there are also gc_add_to_allocs and similarly gc_add_to_open_fds (for FDs obtained outside gc_open).

So, throughout the program, instead of directly using malloc, we used gc_malloc, which performs these steps:

Call malloc.
Check for malloc failure: If malloc returns NULL, free all previously tracked memory, close all tracked file descriptors using the garbage collector's cleanup mechanism, and exit the program gracefully (as memory allocation failure often indicates a critical issue).
Add the allocated pointer to a linked list: This list tracks dynamically allocated memory that hasn't yet been freed via the garbage collector.

Anytime we were done with some allocated memory and wanted to free it, instead of using free, we used gc_free. This function looks for the given pointer in the linked list, frees the pointer using free, and then removes the corresponding node from the list. This prevents double free errors, even if gc_free is accidentally called again on the same pointer.

gc_open and gc_close follow a very similar logic to gc_malloc and gc_free but deal with file descriptors (integers) instead of pointers. Interestingly, the same linked list structure could be used for both, storing file descriptors cast to void* and casting them back to int before calling close.

The gc_exit function is called whenever we intend to exit the program. This function ensures that all pointers and file descriptors tracked by our garbage collector lists are cleaned up (freed and closed, respectively) before the program terminates using the standard exit function with the appropriate exit status.

Reading user input

User input is read and history maintained using the GNU Readline library, which is also used by bash itself. We use the library’s readline function to display a custom prompt ("minishell$"), after which the user can enter their command line. Standard Emacs-like keybindings are available (e.g., Ctrl-A, Ctrl-E, Ctrl-W, Ctrl-U, Ctrl-L). The readline library’s add_history function is called each time a valid line from the user is received, allowing command history access via the up and down arrow keys.

Since the line returned by readline is dynamically allocated, we add it to our garbage collector's list of allocated pointers using gc_add_to_allocs immediately after receiving it.

Tokenizing user input

The tokenizer takes the line from the readline function and splits it into tokens (saved as a NULL-terminated array of strings, char **). Generally, a token is delimited by whitespace. However, sections enclosed in single (') or double (") quotes are treated as single tokens. Within double quotes, $ followed by an environment variable name or ? is expanded; within single quotes, no expansion occurs. For example, the input "< file1 cat | grep -i "hello, $USER" > file2" might become {"<", "file1", "cat", "|", "grep", "-i", "hello, username", ">", "file2", NULL} (assuming $USER is "username").

Note that with nested quotes (only possible with alternating single and double quotes), only the outer quotes determining the token boundary are removed. For example, awk '{count++} END {print count}' becomes {"awk", "{count++} END {print count}", NULL}. However, awk "'{count++} END {print count}'" becomes {"awk", "'{count++} END {print count}'", NULL} (inner single quotes preserved).

Following bash, redirection operators (<, <<, >, >>) act as delimiters even without surrounding spaces. All these are equivalent: echo hello > outfile, echo hello>outfile, echo hello >outfile, echo hello> outfile, and are tokenized as {"echo", "hello", ">", "outfile", NULL}.

Also following bash, if quoted sections directly abut unquoted text or other quoted sections without spaces, they are concatenated into a single token after quote removal. E.g., grep "hello"world becomes {"grep", "helloworld", NULL}.

If the last token is a pipe (|), our shell throws a syntax error, unlike bash which prompts for more input. Other syntax errors (like missing filenames after redirection) are typically handled by the parser.

Parsing the tokens and preparing for execution

The parser takes the char ** token array produced by the tokenizer and builds a structure representing the command(s) to be executed. We used a linked list of "command groups", where each group corresponds to a single command between pipes (or the only command if no pipes exist). Each command group contains all information needed for execution:

The command name and its arguments (char **cmd_args, where cmd_args[0] is the command name).
Whether the command is a built-in or an external program (t_cmd_type).
The file descriptor for input (int in_fd), defaulting to stdin but potentially redirected to a file or the read-end of a pipe.
The file descriptor for output (int out_fd), defaulting to stdout but potentially redirected to a file or the write-end of a pipe.
Pointers to the previous and next command groups in the pipeline (if any).

Our t_cmd_grp struct looked something like this:

typedef struct s_cmd_grp
{
  char              *cmd_name;  // Command name (e.g., "ls", "grep")
  char              **cmd_args; // NULL-terminated array of args (args[0] is cmd_name)
  t_cmd_type        cmd_type;   // enum (BUILTIN or EXTERNAL)
  int               in_fd;      // Input file descriptor (0=stdin, pipe-read, file)
  int               out_fd;     // Output file descriptor (1=stdout, pipe-write, file)
  struct s_cmd_grp  *previous;  // Previous command group in pipeline (or NULL)
  struct s_cmd_grp  *next;      // Next command group in pipeline (or NULL)
} t_cmd_grp;

Here’s a high-level view of how the parser processes the tokens:

It identifies the command name (usually the first token unless it's a redirection).
It collects subsequent tokens as arguments until a metacharacter (|, <, >, >>, <<) or the end of the token list is reached.
When a redirection operator (<, >, >>) is encountered, the next token is treated as a filename. The file is opened (using gc_open), and the in_fd or out_fd of the current command group is updated. Error handling occurs if the file cannot be opened or if the filename token is missing.
When a heredoc operator (<<) is encountered followed by a delimiter token, the shell prompts the user for input line by line until the delimiter is entered on a line by itself. This input is typically stored in a temporary file (e.g., /tmp/minishell_heredoc). This temp file is opened (using gc_open), its file descriptor is set as the in_fd for the command group, and crucially, unlink is called on the temp file immediately after opening. This makes the filename disappear from the directory listing, but the open file descriptor remains valid until closed, ensuring automatic cleanup even if the shell crashes.
If a pipe (|) is encountered, a pipe is created (using pipe()), the out_fd of the current command group is set to the write-end of the pipe, and parsing continues for the next command group, whose in_fd will be set to the read-end of the pipe.

Throughout this process, syntax errors are detected (e.g., > > file, | |, redirection without filename) and reported to the user, preventing execution.

Writing the built-in commands

Our minishell included the following built-in commands:

echo with the -n option
cd with only a relative or absolute path
pwd with no options
export with arguments (but without support for marking shell variables for export, only environment variables)
unset with arguments
env with no options or arguments
exit with an optional numeric status

Being "built-in" means that the shell's own code executes these commands directly, instead of searching for and executing an external program from the PATH.

Each built-in corresponds to a function in our program (e.g., ft_echo, ft_cd, ft_pwd, etc., where ft refers to 42).

ft_echo: Prints its arguments separated by spaces. If the first argument is -n, no trailing newline is printed; otherwise, a newline is added.
ft_cd: Uses the chdir function to change the current working directory. It updates the PWD and OLDPWD environment variables accordingly. Throws a "too many arguments" error if more than one path is given. If no arguments are supplied, it attempts to change to the directory specified by the HOME environment variable.
ft_pwd: Uses the getcwd function to get and print the current working directory path. (Alternatively, it could print the PWD environment variable, but getcwd is generally more reliable).
ft_export: Without arguments, it prints the list of environment variables in a specific format (similar to declare -x in bash, but possibly simpler in our case, perhaps matching env). With arguments, it attempts to add or update environment variables. Arguments should be in the format NAME=value. It performs validity checks on NAME (must start with a letter or underscore, followed by letters, numbers, or underscores). An error is reported for invalid names or formats. If NAME is valid but =value is missing, we reported an error, as our minishell didn't support shell-local variables that could be marked for export later.
ft_unset: Takes variable names as arguments and removes the corresponding environment variables. Performs validity checks on the names.
ft_env: Prints the list of current environment variables (name=value pairs, one per line).
ft_exit: Causes the minishell to terminate. If a numeric argument is provided (e.g., exit 1), the shell exits with that status code (modulo 256). If no argument is given, the shell exits with the status code of the last executed command. Handles non-numeric arguments appropriately (prints an error and exits with a specific status like 2 or 1).

Executing the commands

Once the parser has built the linked list of command groups, the execution phase begins. For each command group, a child process is typically created using fork() (with exceptions for certain built-ins, see below). Within each child process:

Redirection: If cmd_grp.in_fd is not STDIN_FILENO (0) or cmd_grp.out_fd is not STDOUT_FILENO (1), the dup2 function is used to duplicate the appropriate file descriptor (in_fd or out_fd) onto standard input (0) or standard output (1), respectively. Any original file descriptors used for redirection (from files or pipes) that are no longer needed in the child are closed.
Execution:
- If the command is a built-in, the corresponding ft_ function (like ft_echo, ft_pwd, ft_env) is called directly within the child process. After the built-in completes, the child process exits with an appropriate status (usually 0 for success, non-zero for failure).
- If the command is external, the execve function is called. This function attempts to replace the current process image (the child process) with the specified external program (found via the PATH environment variable). Arguments (cmd_grp.cmd_args) and the current environment variables are passed to execve. If execve fails (e.g., command not found), an error message is printed, and the child exits with a specific failure status (e.g., 127).

Important Caveat for Built-ins: Four built-ins (cd, export, unset, exit) modify the state of the shell process itself (current directory, environment variables, or termination). Running these in a child process would have no effect on the parent shell. Therefore:

If the user enters a command line containing only one of these four built-ins (no pipes), the corresponding ft_ function is executed directly in the parent shell process. No child process is created for this command.
If these built-ins appear within a pipeline (e.g., export VAR=1 | grep VAR), they are executed in child processes, meaning their effect will be lost once the child exits. This matches the behavior of standard shells like bash. (The exit command in a pipeline would just terminate that child process).

After potentially forking child processes, the parent shell process must:

Close any pipe file descriptors it doesn't need (e.g., the write-end of a pipe after forking the left-side child, the read-end after forking the right-side child). This is crucial for EOF to propagate correctly through pipelines.
Wait for all child processes in the pipeline to terminate using waitpid. It waits for the last command in the pipeline specifically to retrieve its exit status.
Save the exit status of the last command to make it available via $?.

Handling Signals

Our shell needed to handle user-generated signals gracefully, primarily SIGINT (Ctrl-C) and SIGQUIT (Ctrl-\). We also naturally handled EOF (Ctrl-D), although it's technically an end-of-file condition on input, not a signal.

The shell's response to SIGINT and SIGQUIT depends on its current state. We used the sigaction function to set up signal handlers – functions that are called when a specific signal is received. Our set_signal_handler function was called at different points to switch between signal handling modes:

Interactive Mode: When the shell is displaying the prompt (minishell$) and waiting for user input:
- SIGINT (Ctrl-C): Abort the current input line, print a newline, and display a fresh prompt. Do not exit the shell.
- SIGQUIT (Ctrl-\): Ignored. Does nothing.
Heredoc Mode: When the shell is reading input for a heredoc (showing a > prompt):
- SIGINT (Ctrl-C): Abort the heredoc input, cancel the entire command line that contained the heredoc, print a newline, and display a fresh minishell$ prompt.
- SIGQUIT (Ctrl-\): Ignored. Does nothing.
Non-interactive Mode (Child Process Running): When the shell has launched one or more child processes to execute a command pipeline:
- SIGINT (Ctrl-C): The parent shell ignores SIGINT itself (so Ctrl-C doesn't kill the shell). The signal is passed to the foreground child process group. The default action for SIGINT usually terminates the process. The parent shell then waits for the children and eventually displays a new prompt. A newline might be printed by the signal handler in the parent for cleaner output.
- SIGQUIT (Ctrl-\): Similar to SIGINT, the parent ignores it, and the signal is passed to the children. The default action for SIGQUIT usually terminates the process and potentially dumps core. The shell prints "Quit (core dumped)" (or similar) after the child terminates and displays a new prompt.

Setting signal handlers appropriately (e.g., using SIG_DFL for default behavior in children, or custom handler functions) at the right times (before readline, before fork, after fork in parent/child) is key to achieving this behavior.

Closing remarks

Before starting the project, it naturally felt intimidating. I had previously never programmed something of this complexity. After finishing this project, it now feels like a much more manageable challenge, albeit a complex one. The process demystified many aspects of how shells operate. I am now ready for bigger challenges, which might also initially seem intimidating or even insurmountable. However, I know that problems are inevitable, but they are also soluble.

Ali Naqvi

Learning C as the first programming language

Popular languages

More on C

Why C

The mother of modern programming languages

Preparation for learning a “real” language

Seeing under the hood

Internalizing good coding practices

Conclusion

Writing an Nginx-like web server from scratch in C++

Overview of HTTP

A concrete example of HTTP communication

Nginx

Writing the server in C++

Config file

Socket programming

The main server loop (poll)

HTTP messages

CGI

Closing remarks

Podcasts that I recommend

Shows

Honorable mentions

Episodes

Understanding the universality of computation

The Church-Turing Conjecture

Computation connects mathematics to physics

Simulating anything in the universe

Virtual reality is how we understand

My quest for the best podcast app

The Problem

Testing Setup

Testing Details

Podurama

Fountain

Brief Notes on the Remaining Runners-Up

Conclusion

How I built this website starting with no web dev experience

The full spectrum of blog-building tools from low-level to high level

Level 1: Pure Code

Level 2: Frontend Libraries/Frameworks

Level 3: Static Site Generators (SSGs) & Meta-Frameworks

Level 4: Traditional/Monolithic CMS

Level 5: Hosted Platforms & Website Builders

Learning web fundamentals

Choosing a website-building tool

Learning and navigating through Astro

Choosing a content management system

Concluding thoughts

Writing a bash-like shell in C

Setup

The program

Overall logic

External functions used

Writing a garbage collector

Reading user input

Tokenizing user input

Parsing the tokens and preparing for execution

Writing the built-in commands

Executing the commands

Handling Signals

Closing remarks

The main server loop (`poll`)