The HTTP protocol defines a specific format for the contents of a message from a client to request information from a web server. A simple static page is retrieved with a GET request. Dynamic page requests that require a small amount of data to be sent as part of the request, also use the GET request and embed the data in the URL. A zip code or a part number are examples of the type of data that might be embeded inside a GET request. When a larger amount of data is sent to the server, such as when a form was filled out or file up-loaded, then a POST request is sent.
With HTTP, the client sends a message requesting data, which may be a static page or a page that the server will dynamically generate. The server then sends data back, usually in the form of an HTML, XHTML or similar document. HTTP is a stateless, connectionless protocol. Both of these term relate to the one request, one reply nature of HTTP.
With most protocols, the client and server send several message back and forth. So the server can keep track of the state of overall conversation for each client. This is not the case with HTTP. Each client request stands on its own as a request for information. Web servers often have server side applications, such as a store front, which treat the sequence of messages to and from each client as a session and would thus track the state of the clients. However, we are just talking about the web server proper, which uses the HTTP protocol.
This has very similar mean to stateless. When you connect to a ssh, ftp or telnet server, you have an ongoing connection (session) to the server. With HTTP, as soon as the request is received and reply sent, the socket connection is closed. So if you are using a web based application, such as web-mail to read your e-mail, then the overall session with the server side application actually consists of many distinct socket connections.
HTTP was really designed for simple web page retrieval, not on-going interactions with a server side application. For this reason, some have questioned if HTTP is really the protocol, which should be used for such activity. However, it seems to work well as a protocol designed for the simplest case, but applicable in conjunction with other technologies for more complex applications.
Here is how to retrieve a simple web page using socket programming. Notice, that we have to concern ourselves with not only the socket connection, but the syntax of the HTTP protocol.
import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('www.sal.ksu.edu', 80)) request = """GET /faculty/tim/index.html HTTP/1.0\n From: email@example.com\n User-Agent: Python\n \n""" s.send(request) fp = open("index.html", "w") while 1: data = s.recv(1024) if not len(data): break fp.write(data) s.close() fp.close()
A GET request with data embedded in the URL uses a question mark symbol (?) to separate the web address from the data in the URL. Using a web browser, you can often see URLs that send information as part of the URL.
The POST request is used when additional information needs to sent as part of the request, but the volume of the data is too large to be included as part of URL. The POST request is used when you complete a form on a web page and then click on a “submit” button.