In many cases, the purpose of an HTML Form is to send data to a server. The server processes the data and then sends a response to the user. This seems simple, but it's important to keep a few things in mind to ensure that the data does not damage your server or cause trouble for your users.
Where does the data go?
About client/server architecture
The web is based on a very basic client/server architecture that can be summarized as follows: a client (usually a Web browser) sends a request to a server (most of the time a web server like Apache, Nginx, IIS, Tomcat, etc.), using the HTTP protocol. The server answers the request using the same protocol.
On the client side, an HTML form is nothing more than a convenient user-friendly way to configure an HTTP request to send data to a server. This enables the user to provide information to be delivered in the HTTP request.
On the client side: defining how to send the data
The <form>
element defines how the data will be sent. All of its attributes are designed to let you configure the request to be sent when a user hits a submit button. The two most important attributes are action
and method
.
The action
attribute
This attribute defines where the data gets sent. Its value must be a valid URL. If this attribute isn't provided, the data will be sent to the URL of the page containing the form.
Examples
In this example, the data is sent to https://foo.com:
<form action="https://foo.com">
Here, the data is sent to the same server that hosts the form's page, but to a different URL on the server:
<form action="/somewhere_else">
When specified with no attributes, as below, the <form>
attribute causes the data to be sent to the page that includes the form:
<form>
Many older pages use the following notation to indicate that the data should be sent to the same page that contains the form; this was required because until HTML5, the action
attribute was required. This is no longer needed.
<form action="#">
Note: It's possible to specify a URL that uses the HTTPS (secure HTTP) protocol. When you do this, the data is encrypted along with the rest of the request, even if the form itself is hosted on an insecure page accessed using HTTP. On the other hand, if the form is hosted on secure page but you specify an insecure HTTP URL with the action
attribute, all browsers display a security warning to the user each time they try to send data because the data will not be encrypted.
The method
attribute
This attribute defines how data is sent. The HTTP protocol provides several ways to perform a request; HTML forms data can be sent through at least two of them: the GET
method and the POST
method.
To understand the difference between those two methods, let's step back and examine how HTTP works. Each time you want to reach a resource on the Web, the browser sends a request to a URL. An HTTP request consists of two parts: a header that contains a set of global metadata about the browser's capabilities, and a body that can contain information necessary to the server to process the specific request.
The GET method
The GET
method is the method used by the browser to ask the server to send back a given resource: "Hey server, I want to get this resource." In this case, the browser sends an empty body. Because the body is empty, if a form is sent using this method, the data sent to the server is appended to the URL.
Example
Consider the following form:
<form action="https://foo.com" method="get"> <input name="say" value="Hi"> <input name="to" value="Mom"> <button>Send my greetings</button> </form>
With the GET
method, the HTTP request looks like this:
GET /?say=Hi&to=Mom HTTP/1.1 Host: foo.com
The POST method
The POST
method is a little different. It's the method the browser sends the server to ask for a response that takes into account the data provided in the body of the HTTP request: "Hey server, take a look at this data and send me back an appropriate result." If a form is sent using this method, the data is appended to the body of the HTTP request.
Example
Consider this form (the same one as above):
<form action="https://foo.com" method="post"> <input name="say" value="Hi"> <input name="to" value="Mom"> <button>Send my greetings</button> </form>
When sent using the POST
method, the HTTP request looks like this:
POST / HTTP/1.1 Host: foo.com Content-Type: application/x-www-form-urlencoded Content-Length: 13 say=Hi&to=Mom
The Content-Length
header indicates the size of the body, and the Content-Type
header indicates the type of resource sent to the server. We'll discuss these headers in a bit.
Of course, HTTP requests are never displayed to the user (if you want to see them, you need to use tools such as the Firefox Web Console or the Chrome Developer Tools). The only thing displayed to the user is the URL called. So with a GET
request, the user will see the data in their URL bar, but with a POST
request, they won't. This can be very important for two reasons:
- If you need to send a password (or any sensitive piece of data), never use the
GET
method or you risk displaying it in the URL bar. - If you need to send a large amount of data, the
POST
method is preferred because some browsers limit the sizes of URLs. In addition, many servers limit the length of URLs they accept.
On the server side: retrieving the data
Whichever HTTP method you choose, the server receives a string that will be parsed in order to get the data as a list of key/value pairs. The way you access this list depends on the development platform you use and on any specific frameworks you may be using with it. The technology you use also determines how duplicate keys are handled; often, the most recently received value for a given key is given priority.
Example: Raw PHP
PHP offers some global objects to access the data. Assuming you've used the POST
method, the following example just takes the data and displays it to the user. Of course, what you do with the data is up to you. You might display it, store it into a database, send them by email, or process it in some other way.
<?php // The global $_POST variable allows you to access the data sent with the POST method // To access the data sent with the GET method, you can use $_GET $say = htmlspecialchars($_POST['say']); $to = htmlspecialchars($_POST['to']); echo $say, ' ', $to;
This example displays a page with the data we sent. In our example from before, the output would be:
Hi Mom
Example: Raw Python
This example uses Python to do the same thing--display the provided data on a web page. It uses the CGI Python package to access the form data.
#!/usr/bin/env python import html import cgi import cgitb; cgitb.enable() # for troubleshooting print("Content-Type: text/html") # HTTP header to say HTML is following print() # blank line, end of headers form = cgi.FieldStorage() say = html.escape(form["say"].value); to = html.escape(form["to"].value); print(say, " ", to)
The result is the same as with PHP:
Hi Mom
Other languages and frameworks
There are many other server-side technologies you can use for form handling, including Perl, Java, .Net, Ruby, etc. Just pick the one you like best. That said, it's worth noting that it's very uncommon to use these technologies directly because this can be tricky. It's more common to use one of the many nice frameworks that make handling forms easier, such as:
- Symfony for PHP
- Django for Python
- Ruby On Rails for Ruby
- Grails for Java
- etc.
It's worth noting that even using these frameworks, working with forms isn't necessarily easy. But it's much better, and will save you a lot of time.
A special case: sending files
Files are a special case with HTML forms. They are binary data—or considered as such—where all other data is text data. Because HTTP is a text protocol, there are special requirements to handle binary data.
The enctype
attribute
This attribute lets you specify the value of the Content-Type
HTTP header. This header is very important because it tells the server what kind of data is being sent. By default, its value is application/x-www-form-urlencoded
. In human terms, this means: "This is form data that has been encoded into URL form."
But if you want to send files, you need to do two things:
- Set the
method
attribute toPOST
because file content can't be put inside a URL parameter using a form. - Set the value of
enctype
tomultipart/form-data
because the data will be split into multiple parts, one for each file plus one for the text of the form body that may be sent with them.
For example:
<form method="post" enctype="multipart/form-data"> <input type="file" name="myFile"> <button>Send the file</button> </form>
Note: Some browsers support the multiple
attribute on the <input>
element in order to send more than one file with only one input element. How the server handles those files really depends on the technology used on the server. As mentioned previously, using a framework will make your life a lot easier.
Warning: Many servers are configured with a size limit for files and HTTP requests in order to prevent abuse. It's important to check this limit with the server administrator before sending a file.
Security concerns
Each time you send data to a server, you need to consider security. HTML forms are one of the first attack vectors against servers. The problems never come from the HTML forms themselves; they come from how the server handles data.
Common security flaws
Depending on what you're doing, there are some very well-known security issues:
XSS and CSRF
Cross-Site Scripting (XSS) and Cross-Site Request Forgery (CSRF) are common types of attacks that occur when you display data sent by a user back to the user or to another user.
XSS lets attackers inject client-side script into Web pages viewed by other users. A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same origin policy. The effect of these attacks may range from a petty nuisance to a significant security risk.
CSRF are similar to XSS attacks in that they start the same way—by injecting client-side script into Web pages—but their target is different. CSRF attackers try to escalate privileges to those of a higher-privileged user (such as a site administrator) to perform an action it shouldn't be able to do (for example, sending data to an untrusted user).
XSS attacks exploit the trust a user has for a web site, while CSRF attacks exploit the trust a web site has for its users.
To prevent these attacks, you should always check the data a user sends to your server and (if you need to display it) try not to display the HTML content as provided by the user. Intead, you should process the user-provided data so you don't display it verbatim. Almost all frameworks on the market today implement a minimal filter that removes the HTML <script>
, <iframe>
and <object>
elements from data sent by any user. This helps to mitigate the risk, but doesn't necessarily eradicate it.
SQL injection
SQL injection is a type of attack that tries to perform actions on a database used by the target web site. This typically involves sending an SQL request in the hope that the server will execute it (many times when the application server tries to store the data). This is actually one of the main vector attacks against web sites.
The consequences can be terrible, ranging from data loss to access to a whole infrastructure by using privilege escalation. This is a very serious threat and you should never store data sent by a user without performing some sanitization (for example, by using mysql_real_escape_string()
on a PHP/MySQL infrastructure).
HTTP header injection and email injection
These kinds of attacks can occur when your application builds HTTP headers or emails based on the data input by a user on a form. These won't directly damage your server or affect your users, but they are an open door to deeper problems such as session hijacking or phishing attacks.
These attacks are mostly silent, and can turn your server into a zombie.
Be paranoid: Never trust your users
So, how do you fight these threats? This is a topic far beyond this guide, but there are a few rules to keep in mind. The most important rule is: never ever trust your users, including yourself; even a trusted user could have been hijacked.
All data that comes to your server must be checked and sanitized. Always. No exception.
- Escape potentially dangerous characters. The specific characters you should be cautious with vary depending on the context in which the data is used and the server platform you employ, but all server-side languages have functions for this.
- Limit the incoming amount of data to allow only what's necessary.
- Sandbox uploaded files (store them on a different server and allow access to the file only through a different subdomain or even better through a fully different domain name).
You should avoid many/most problems if you follow these three rules, but it's always a good idea to get a security review performed by a competent third party. Don't assume that you've seen all the possible problems.
Conclusion
As you can see, sending form data is easy, but securing an application can be tricky. Just remember that a front-end developer is not the one who should define the security model of the data. Yes, as we'll see, it's possible to perform client side data validation but the server can't trust this validation because it has no way to truly know what really happens on the client side.
See also
If you want to learn more about securing a web application, you can dig into these resources: