Introduction
A protocol is a way of communication adhered to some guidelines which are used by 2 parties to interact with each other. HTTP Protocol is used on web browsers (Chrome, Internet Explorer etc.), Mobile Apps running on Android or IOs and many more devices. For example, when we try to visit Wikipedia articles or purchase books on Amazon.in, it is the HTTP protocol which tells the browser how to retrieve information from servers.
Importance of HTTP Protocol
If you are a developer or programmer then having a solid understanding of HTTP principals could help you develop a better application or web services. Knowing terminology about HTTP could even help you to take full advantage of it. Further, it will also help you to build more robust and secure application.
1. Resource Locator
URL
Various component of URL
1. URL Scheme- the part before :// is termed as URL scheme. We also have other schemes like HTTPS, FTP etc.
2. Host- google.com is a host as it is serving the content of the page. Actually, all websites are translated to an IP address via the Domain Name System (DNS) so a browser knows from where it should get the content.
Things to remember
URL is defined as:
<scheme>://<host>:<port>/<path>?<query>#<fragment>
URL Encoding
There are certain guidelines or formatting which we need to follow while constructing a URL. In URLs, there are some characters which are considered as unsafe and need to be encoded in particular format such as space ,^ etc. So space is generally encoded to %20 (where 20 is hexadecimal value for the US-ASCII space character).
Resources and Media Types
As we are aware there are different types of resources available on the web and in order to display them a web browser should be able to identify those resources correctly. An example of different resources Microsoft Word documents, PDFs, Images, Videos etc.
Whenever host sends a response, it also adds the appropriate Content-Type of the resource in order to help browser identify the type of resource. Content-Type in HTTP basically relies on Multipurpose Internet Mail Extensions (MIME) Standards. Example of Content-Type “text/html”, “image/jpeg” etc.
Content Type Negotiation
In most of the cases, our resources could have multiple formats or languages. That is the perfect place where we use Content-Type for asking for particular representations like format/language. It is up to the server which will take Content-Type in consideration while fulfilling the request. Examples application/json, application/xml etc.
HTTP Messages
When a web browser sends a set of information in order to Host in order to display the web page on the screen which is termed as an HTTP Request Message.
The Host sends information in context to request received by it which is termed as HTTP Response Message.
Currently, most browsers and servers use HTTP 1.1 version
HTTP Request Methods
Method
|
Description
|
GET
|
Retrieves a resource
|
PUT
|
Update a resource
|
DELETE
|
Remove a resource
|
POST
|
Store a resource
|
Here GET is termed as safest methods as it doesn’t alter any information on the server and it will retrieve the same information every time even if it is carried out multiple times.
POST is generally used for submitting a form. For example, a college counseling application form which needed to be filled and it takes multiple inputs from a user.
HTTP Request Header
A header contains useful information which helps a server process a request and sends the appropriate response. Example “Accept-Language“ header could help in requesting the page in multiple languages.
Header
|
Description
|
Referer
|
When the user clicks on a link, the client can send the URL of the referring page in this header.
|
User-Agent
|
Information about the user agent (the software) making the request.
|
Accept
|
Describes the media types the user agent is willing to accept.
|
Accept-Language
|
Describes the languages the user agent prefers.
|
Cookie
|
Contains cookie information
|
If-Modified-Since
|
Will contain a date of when the user agent last retrieved (and cached) the resource.
|
Below is an example of request headers which were part of a request.
HTTP Response
An HTTP response mainly contains HTTP response headers, status code and the resource if applicable for the request.
Status Codes
These codes help in identifying certain information about our requests. They are divided into 5 categories as under.
Range
|
Category
|
100-199
|
Informational
|
200-299
|
Successful
|
300-399
|
Redirection
|
400-499
|
Client Error
|
500-599
|
Server Error
|
We would also see some specific HTTP Codes which are mostly used.
Code
|
Type
|
Description
|
200
|
Ok
|
|
301
|
Moved Permanently
|
|
302
|
Moved Permanently
|
|
304
|
Not Modified
|
|
400
|
Bad Request
|
|
403
|
Forbidden
|
|
404
|
Not Found
|
|
500
|
Internal server error
|
|
503
|
Service Unavailable
|
|
Response Headers
These are also like request headers but contained detailed information about the requested resource and the host.
Proxies
A proxy server is a computer that sits between a client and server. An end user will not be able to identify whether he/she is using a proxy. Generally, companies use proxies to monitor or alter or filter the internet traffic going through their server or organizations.
Forward Proxy: A proxy which is closer to the client than the server. So when an organization is using a proxy to filter and block certain websites for their employees, it is termed as a forward proxy.
Reverse Proxy: A proxy server that is closer to the server than the client and again it is transparent to the client. Organizations could deploy a reverse proxy to hide their technology stack from outside world. Other use cases of reverse proxy are Load balancing, SSL encryption, gzip compression etc.
HTTP Security
HTTP is a stateless protocol, which means that each request—response set is independent of any other set. So in HTTP protocol a server doesn’t maintain any information for the subsequent request.
This statelessness of HTTP has both pros and cons. On the cons side, we could not identify users and this actually hampers us in making personalized web applications.
However, most of the web application we make are highly state fully. Example are Banking, Social Networking, and Messaging applications.
In order to maintain state between subsequent request-response set, we generally use sessions, cookies etc.
Cookies
A cookie is a collection of key-value pair stored at client side and which reveals certain information about the client. When a user first time visits a website, the website can give cookie in the HTTP header of the response message. Then the browser sends this cookie in every subsequent request.
Authentication
It is a process through which a user identifies itself by entering username and password or PIN.
Types of authentication
Basic Authentication
When a user visits the website and authenticates itself, the website issues an authorization token in the header. It is highly unsafe as the authorization token is the base64 conversion of username and password.
Digest Authentication
It is improved version of basic authentication as it doesn’t send user passwords in base64 encoding. Instead, the client sends a digest of the password. The client computes the digest using the MD5 hashing algorithm with a nonce the server provides during the authentication challenge. (A nonce is a cryptographic number used to prevent replay attacks.)
Windows Authentication
It is popular among Microsoft products and servers and at the same time it is supported by most popular browsers. It uses the Negotiate protocol which helps clients to select Kerberos or HTML. When a user tries to visit website first time it is challenged by windows authentication with WWW-Authenticate header with a value of Negotiate in it.
Forms-based Authentication
It is most popular authentication method available on the internet. When a user tries to visit website he or she is redirected to a login page. The user submits his/her information and the server set a cookie indication the user is now authenticated. Generally, this cookie is encrypted and hashed to prevent tampering.
OpenID
We have seen forms based authentication give a good amount of control over web application and prevent unauthorized access. OpenID is an open standard for decentralized authentication. With the help of openID user now identify his/her identity through OpendID providers such as Google or yahoo etc. Once user’s get identified he could be redirected to web application.