HTTP Overview
1. URL
Uniform Resource Locator (URL)
Syntax
URL is a subset of URI, so every HTTP URL conforms to the syntax of a generic URI:
scheme:[//authority]path[?query][#fragment]
where the authority component divides into three subcomponents:
authority = [userinfo@]host[:port]
Example:
https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top
- scheme: https
authority: john.doe@www.example.com:123
- userinfo: john.doe
- host: www.example.com
- port: 123
- path: /forum/questions/
- query: tag=networking&order=newest
- fragment: top
URL is not case sensitive.
2. HTTP Session
Take HTTP/1.0 for example:
When we want to visit http://www.ese.upenn.edu/about-people/index.php:
- Browser sends a DNS request to resolve the IP address of www.ese.upenn.edu.
- DNS resolve www.ese.upenn.edu as 158.130.68.91.
- Establish a TCP connection between Browser and Server (On server side, IP is 158.130.68.91 and port number is 80).
- Browser sends a HTTP GET request: GET /about-people/index.php
- Server response with index.php file.
- Release TCP connection.
Features:
- Connectionless: But TCP is connection-oriented.
- Stateless: Servers don’t remember clients.
3. HTTP/1.0 vs HTTP/1.1
The drawback of HTTP/1.0 is it has to initialize and release a TCP connection every time the browser send a GET request.
HTTP/1.1 solved it with persistent connection, namely connections will be kept for a while after response. There are two kinds of persistent connection:
- Without Pipelining: Just Stop and Wait protocol.
- With Pipelining: Like Sliding Window protocol.
4. Proxy Server (Web Cache)
Proxy servers save requests and responses recently. Proxy servers can work at clients side or server side, or middle systems.
Browsers also have caches, which have similar functions.
5. Message Format
- ▢: space
- CRLF: Carriage Return
Request Message: | |
---|---|
Request Line | Method▢URL▢Version|CRLF |
Header | Header 1:▢Value 1|CRLF |
Header 2:▢Value 2|CRLF | |
… | |
CRLF | |
Body | Body (Usually don’t use) |
Request Methods:
- GET: Requests a representation of the specified resource.
- POST: Adds info to server.
- PUT: Stores a document under the supplied URL.
- DELETE: Deletes the specified resource.
- HEAD: Asks for a response identical to that of a GET request, but without the response body.
- TRACE: Echoes the received request so that a client can see what (if any) changes or additions have been made by intermediate servers.
- CONNECT: Converts the request connection to a transparent TCP/IP tunnel, usually to facilitate SSL-encrypted communication (HTTPS) through an unencrypted HTTP proxy.
- OPTION: Returns the HTTP methods that the server supports for the specified URL. This can be used to check the functionality of a web server by requesting ‘*’ instead of a specific resource.
- PATCH: Applies partial modifications to a resource.
Response Message: | |
---|---|
Status Line | Version▢Status Code▢Reason Phrase|CRLF |
Header | Header 1:▢Value 1|CRLF |
Header 2:▢Value 2|CRLF | |
… | |
CRLF | |
Body | Body (Some responses don’t use) |
Status Code:
- 1XX: Request was received and understood.
- 2XX: Request received, understood and accepted.
- 3XX: Redirection
- 4XX: Client errors
- 5XX: Server errors
6. HTTP Cookie
HTTP is stateless, but sometimes servers want to remember clients.
When a client visits a website using Cookie, the server will assign an ID for this client, and save it in database. When the server sends back HTTP response, it add a header call _Set-cookie_:
Set-cookie: 12345678
When browser receives this response, it add this to its Cookie file, including host name and ID. Every time the client visit this website, it will add a header in the Request:
Cookie: 12345678
Misunderstanding about Cookie:
- Virus: Cookie is text file, not executable program.
- Privacy: Depends on Websites.
Check cookies on Mac OS: Safari → Preferences → Privacy → Manage Website Data.