A Martian's guide to networking.
"While writing the previous post on how to set up a web server on a Raspberry Pi, I realised that there were still quite a lot of things in the setup process that I didn't understand well. It did allow the setup to work, but I hardly had any idea what was going on behind the scenes. Having no academic background in Networking, I just trusted Google and Stackoverflow on guiding me correctly. Turns out, that this isn't a good strategy when you want to explain the same stuff to someone else. You surely won't go about teaching someone how to prepare fried chicken just beacuse you have tasted it (FYI, I can prepare fried chicken and I'm a vegetarian, so havent tasted it).
So, I rummaged through different resources in the past few days to be debriefed on how the current networking situation came to be. This post tries to sum up some of the things I learned. I might have got some of the dates wrong, or contorted some facts while trying to connect various sources from the internet, but I assure the general flow of events are pretty much the same. There will be lot of terms that you might not know (or probably know a lot about) but they eventually reveal themselves through to the end. And obvioulsy, there's always a lot to more to learn everyday.
As for the title of this post, eh, couldn't think of anything catchy and the "Brief History of the Internet" was already taken by the internet's founding fathers. Just assume that the Martians would be advanced enough to access the internet, comprehend english with technical jargon but stupid enough like me to not how how the internet works...yet.."
Short answer: It is an internconnected network of networks.
The United States Department of Defence funded a project called ARPANET (Advanced Reaserch Projects Agenct Network) that was some sort of communications network which would enable a user to access different computers from a single terminal. This was not easily achievable during these days.
early 1960s :
Paul Baran joined the RAND (Research and Development) Corporation. This organization helped the US in being informed about issues like the space race, US-Soviet nulear arms race, heatlh care, digital revolurion and a lot more. Paul was working on a communication network system that could survive even if one of the nodes(computers) in the network was dead. It was a system of distributed adaptive message block switching.
second half of 1960s:
A similar independent research was conducted by Donald Davies at the UK's National Physics Laboratory. He coined the term packet switching.
ARPANET followed the concepts of packet switching as suggested by Donald Davies.
Robert Kahn joined Vinton Cerf, who already worked on ARPANEt at this time. They came up with a protocol that could enable any network following their porotocol to be a part of ARPANET.
This was the beginning of the internet...
In the words of Vinton Cerf:
"The Internet is a design philosophy and architecture expressed in a set of protocols which makes it easier for it to adopt and absorb new communication technologies."
You might already be knowing this but just a brush up: A server is an entity that serves some information or data. A client is an entity that asks for or rececives that information or data. In our case this entity is computer
It's not that before 1970s computers were not able to communicate with each other, it was just that the system wasn't robust and 'connected' enough. In simplified crude terms, you can imagine it like an older centralized newtwork vs a newer distributed network. The centralized network had all computers connected directly to the computers that served the needed information. There were limited ways to reach from one computer to another and the most severe drawback was that all the data followed a single path from a given client to a given server. If this path was damaged, the data and communication link would be lost and the message had to be sent again via a different path. This was the circuit switching mechanism.
The newer packet-switching model had many nodes connected to each other or used the existing connections along with the then-new packet switching mechanism. In this mechanism, the data to be transmitted was broken up in chunks called packets and simultaneously sent across many possible paths from that node, instead of just one path to the destination. The allowed size of each packet has a range that is determined by the network protocols. These packets would then be recombined at the destination to form the complete message. Since all the packets took different paths, even if one of the path was blocked, the packet on that path would take a different path and would ultimately reach the destination. Various path options for the message to reach the destination provided resilience to a few dead nodes or even some broken paths.
This mechanism also ensured that even if a segment of the network went down, the other parts were still functioning.
Take a look at this example. Assume a message P is to be sent from the green computer on the left to the one on the right. In the packet-switching model, P is broken into parts p1, p2 and p3 and is sent along different paths with information about its final destination attached with each packet. If any of the paths is broken, the computer just at the start the broken path sends the packet to a different path based on the packets destination information. The selection of this new path is done by a routing algorithm.
The packets are supplied with the information such as the senders address and the destination address, just like a post card. But in this case, the post card is split up in different pieces each carrying the addresses and carried on various paths and later joined back at the destination. The information supplied along with the data is called packet-header and is used to rebuild the original message from the packets. Hence packets can be regarded as the basic units of information over the internet.
The packets need not arrive in the same order as they were sent; the headers have the information required to recombine the packets. Another advantage of package-headers is that messages from various sources and to various destinations can travel in the same channel/path without worrying about data mixup as the headers contain all the information to differentiate the data.
To decide on the superiority (or inferiority) of packet-switching, we need to compare it against its predecessor mechanism: circuit-switching. This type for model was used in earlier telephone networks.
|Circuit Switching||Packet Switching|
|A 'dedicated' connection is required to connect two devices. The route mostly has to be pre established before initializing connections.||Since packets "hop" off various nodes in the network, dedicated connections are not required.|
|Once a message is sent, its path is fixed, if there is a break in the path, the message has to be re-sent.||The route is not predetermined and mostly cannot be predicted. A routing algorithm decides the optimal route each packet should take to reach the destination even if there are some paths with blockages.|
|A particular line/channel/wire is occupied when a message is travelling through it and cannot be used by other computer to send messages.||Multiple packets from different sources can travel simultaneously through the same channel.|
|Pros of Packet Switching|
|Better use of network bandwidth (bits that can be conveyed or processed in unit time) due to smaller packets and shared channels|
|If some packets dont arrive as expected, the receiving computer (specifically TCP protocol) detects the missing packets and and request the particular packet to be resent.|
|If a channel or node fails, routing algorithms route the packets to another appropriate node in the network.|
There are also some minor drawbacks of packet-switching mechanism. Packaging and routing packets may take up some amount of extra time. This is mitigated by efficient algorithms and curbed to a large extent.
If some packets do not follow the protocols , they could prove to be a security risk for other packets travelling in the same channel, but thats altogether a separate topic of Network Security.
TCP/IP: The language of the Packets
TCP/IP: The language of the Packets
Now that we have established how the "Internet" was a different model of networks than its predecessors, lets look a bit more into what governs the movement of packets inside this vast network. Considering that we have called TCP/IP as the language of the packets (and of the internet) there needs to be some rules and protocols as in any other language. A protocol can be regarded as a set of agreed upon rules that just makes life easier for everyone involved.
Recalling Vincent Cerf's words "The Internet is a design philosophy and architecture expressed in a set of protocols which makes it easier for it to adopt and absorb new communication technologies." Even though ARPANET was leaps and bound better than the previous model, its main concept "the internet was not designed for just one application, but as a general infrastructure on which new applications could be conceived", could be achieved only when TCP/IP replaced its predecessor protocol on the ARAPNET in 1983.
TCP/IP (Transmisison Control Protocol/ Internet Protocol) was one such pair of protocol developed by Robert Kahn, Vinton Cerf and their team.
TCP/IP itself is a suite of protocols. Its functionality can be divided into four layers. Each of these layers have various protocols. This is one good link that succinctly discusses the TCP/IP model: Microsoft TechNet Discussing all these here would make this already long post even more boring. So I'll just list them out:
Network Interface Layer
- Standards for physical devices like wires, cables, radio frequencies.
- Protocols for Ethernet connections.
- MAC(Media Access Control) addresses.
- and a lot more..
- Connects hosts across networks
- IP (Internet Protocol) that takes care of IP addressing, routing.
- IP basically defines how computers send packets to each other.
- ARP, ICMP, IGMP are some of the other protocols in this layer
- Resolves host-to-host connection.
- This layer uses TCP as a standard on how to establish and maintain a network connections till the data exchange is complete and also deals with the fragmentation and reassembly of packets.
- This layer provides sessions and data to the last layer.
- UDP is another protocol for smaller data that usually fits in a packet.
- The protocols of these layers include HTTP(Hyper Text Transfer Protocol), FTP (File Transfer Protocol), SMPT (Siple Mail Transfer Protocol)
- This layer helps the previous layers by using protocols like DNS(Domain Name System) to resolve IP addresses, RIP (Routing Information Protocol) used by routers, etc.
This image taken from ElectronicDesign blog gives a nice visualisation of how the data is turned into packets by adding information at each layer of the OSI model (another model similar to the TCP/IP model). Do read their entire post for a better overview of TCP/IP and OSI.
Rewinding... The internet is a network of network. Data is sent in packets. Packets have addresses of where the data is sent from and where to. So how, why and what are these addresses?
The IP (Internet Protocol) from the TCP/IP deals with addresses as unique identifiers for computers in a network. At the dawn of the internet era, Internet Protocol was IPv4. This protocol uses a 32bit number to assign an address to each computer.
The IPv4 address comprises of a 32 Bit number usually represented by 4 "octets", each an integer from 0-255. The different octets represent different parts of the address, mainly the Network Number and Host Number. The Network number is assigned by the InterNIC (Network Informatino Center), an organization responsible for DNS (Domain Name System) domain name allocations. Later this task was taken up by ICANN (Internet Corporation for Assigned Names and Numbers). The Host Number (sometimes called a local or machine address) is assigned by the local network administrator. The division of Network and Host part of the IP is determined by different classes of IPv4 address protocol (A,B,C). The answers to the question here explains a lot more about the IP address and how to go about dissecting it.
If you do the math: an IPv4 address consists of a 32 bit number, so there can be about 2^32 (4,294,967,296) different theoretical IPv4. But that does not represent the total number of IP addresses that can be assigned to the devices. This is because certain IP address are assigned for special purposes. A list of reserved IP address can be found here.
The number of available IPv4 addresses per person is less than 1 if we go by the current population. Obviously the number of networking devices will be far more than the number of people. So how to deal with the limits to the available addresses? Enter IPv6. This new address format uses 128 Bits (represented as hexadecimal). Thats 2^128. Thats more than a billion addresses for every person on earth! The transition from IPv4 to IPv6 started on June 8 2011 (World IPv6 Day) and is gradually under way and will take a long long time.
As mentioned earlier, the internet is called as a network of networks. But since all the computers are connected, isnt it just one large network? The answers is yes and a bit of no.
Again refreshing some terminology before moving ahead:
Bandwidth can be regarded as a metric that determines how much data or how many data packets can move from one location to another at the same time. A higher bandwidth means more data packets travelling at the same time, which results in faster downloads from the internet.
Routers are computers in a network that connect multiple computers (or networks)together. There can be various types of routers depending on the network. We'll discuss more about these a bit later.
Modems, in simple terms can be thought of as devices that help computers to send and receive data over a communication channel. Modern routers can come built in with modems.
LAN (Local Area Network)
If you are (or know someone who is) into gaming, you might have heard of people playing Counter Strike or *insert-any-multiplayer-game-you-like* on LAN for better latencies. LAN is a local area network that interconnects a group of computers within a small area like a building. These connections can be wired or over a Wireless Local Area Network (WLAN). People often misuse the term "hotspot" inplace of WLAN. Hotspot just refers to the physical region around a wireless router in which you have connectivity.
A LAN network in itself does not have access to the computers in the outside world.
ISP (Internet Service Provider)
When the router is coupled with a modem, it can establish a connection to your Internet Service Provider (ISP). The modems have to be in accordance with the ISP's infrastructure. To reduce the hassle (and to make some extra money), the ISPs nowadays provide their own routers that have built in modems.
But how are ISPs connected to the internet?
ISPs are divided into different categories or Tiers.
Tier 1 ISPs are internet provides who exchange the internet traffic (the always moving data packets) between them. Tier 1 serivce providers are the ones who enable us to have intra-continental as well as inter-continental connectivity. Cogent Networks is an exmple of Tier 1 ISP. They provide connectivity on a scale that ranges from countries to continents. Since these have to deal with large traffics that literally support the entire internet, they are sometimes also refered to as backbone internet provides. The traffic is echanged between them by Peering Agreements (the agreement between two large Internet providers needing to exchange traffic. Without paying exorbitant fees to do so).
Tier 2 ISPs connect Tier 1 and Tier 2 ISPs. These are companies smaller than Tier 1 and find it easier to purchase Internet transit from Tier 1 ISPs, than to deal with the large hardware setups and peering agreements. Vodafone is one such example. They might sometimes also come into peering agreements with Tier 1 ISPs.
Tier 3 ISPs are those who only purchase internet transit. These are the ISPs that provide internet services to households and businesses. Comcast is a Tier 3 ISP. Since these ISPs are the last connection to your device, these are also called List Mile Internet providers. What we as customers pay these ISPs, is for a bandwidth, higher bandwidth equals higher download speeds (equals higher bills).
Its not always necessary that a packet will travel through all these tiers since the destination will not necessarily be on some other continent.
ISPs can also be classified on the basis of task they do: Access Provider ISP (provide customers with internet access, like Comcast), Hosting ISPs (these can also host your web servers, emails, or online storage), Transit ISPs (the different tiers of ISPs is a classification of transit ISPs) , etc.
Routers are small computers that help in connecting computers over a network, their main task being 'routing' packets to the correct destination. Just like your computer, a router has a small CPU and some memory. This memory is used mainly to store the operating system and the 'routing table' that has information about all the device that this router connects. Routers can be as small as the smallest router I ever owned, a TP-Link Nano WR702N, or as large as Cisco's CRS-X Router Pack.
Consumer grade routers handle bandwidth ranging from a few megabits per second (like my old WR70N) upto a few gigabits per second. These can be as small as the size of a donut.
The larger router systems (like the CRS-X) may occupy rooms and are used by ISPs and can handle larger traffic in the magnitudes of petabits per second. Such large routers are mostly used by backbone ISPs.
As mentioned earlier modern routers come with the modem coupled in the same hardware. This wasnt the case during the early days of the internet. One had to get their own separate router, modem and a lot more other devices!
A consumer grade router is a network router that connects the internet to your local area network. The ISP provides a dynamic IP address to your router and this is shared by all your household devices connected to the router. A physical WAN (Wide Area Network) port present on the router is the one that physiccally connects the cable coming from your ISP (the internet) to your network. LAN ports help in connecting all your other devices to this router. A typical modern day consumer grade router comprises of four components:
NAT (Network Address Translator)
Network address translation is something that can be done with hardware as well as with software. In consumer routers its mostly software based. NAT was introduced as something to conserve the limited IP addresses. We as consumers have more than one device connected to our personal router. Assigning a dedicated IP to every device we have will run out the IP addresses quicker than we can imagine. Also, not all devices are going to be always connected to the internet. So it makes no sense to have a permanent IP assigned to them. What NAT does is help us in representing all our devices as just one single IP to the internet. The entire privte network on our LAN is represented as one single IP address (unless the configurations of the router are tampered with). This is one of the many functions of NAT.
DCHP (Dynamic Host Configuration Protocol)
DHCP is a client/server protocol that provides an IP address to devices connected a network. It automatically leases IP address for a particular time period to any new device that connects to a network. If the device is no longer connected to the network, DHCP removes the IP address from the list of connected devices and keeps it in a pool for it to be allocated to new devices.
In computing, a firewall is a network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules. A firewall typically establishes a barrier between a trusted internal network and untrusted outside network, such as the Internet.
-Wikipedia (this was the simplest one)
Almost all routers these days have wireless capabilities. Earlier devices could be connected only using ethernet cables locally. With the advent of WiFi, most new routers have built in wireless connection capability.
You can see most of these configurations in your home routers by visiting its administration portal. For TP-Link routers, just go to a web browser on a device connected to the router and visit 192.168.0.1. This IP may vary depending upon the vendor.