Tuesday, May 22, 2012

A Brief Introduction to Input Validation


The importance of input validation should never be overlooked as a means of enhancing both the stability and security of your application, since input validation helps to ensure that your application only processes inputs that it was designed to process.  Ideally, any input supplied to your application should be treated as untrusted and passed through a validation routine to ensure that the data is the proper type and format required by your application.  This short tutorial will look at two basic approaches to input validation, one involving the concept of whitelisting and the other involving the concept of blacklisting. 

First we will consider a whitelist approach, whereby the whitelist consists of all of the inputs that are allowed to pass through to the application.  In other words, if the supplied input, matches the criteria laid forth in the whitelist the input will be treated as valid and will be processed by the application.  If the input does not meet the criteria laid forth in the whitelist, it will be considered an invalid input and will result in an error message rather than further processing.  As an example of this, let’s consider the following code snippet which uses a regular expression to whitelist valid U.S. phone numbers:

@inputs =('(555) 555-5555', '555-555-5555', '555 555-5555', '(xyz) abc-defg');
foreach $input (@inputs){
  if($input=~/(\s?\(?\d{3}\)?[-\s.]?\d{3}[-.]\d{4})/){
    print "$1 is a valid phone number\n";
    #execute code requiring phone number
  }
  else{print "invalid input\n";}
}

If we were to execute this code snippet, we would see that the first 3 values of @inputs pass the whitelist filter and could thus be used for further processing by the application, whereas the 4th value of @inputs does not match the whitelist criteria and instead results in an “invalid input” error message.  This approach of whitelisting valid inputs is actually the preferred way to perform input validation and should be made use of wherever it is feasible to define the allowable inputs according a precise set of values or a precise pattern, since whitelisting gives the application author very fine-grained control over what inputs will be considered valid and what inputs will not be considered valid. 

While whitelisting is always the more secure way of doing things, it is not always feasible to make use of a whitelisting approach, since possible inputs may be too varied to be readily packaged into a predefined set of inputs or a predefined pattern.  In cases where whitelisting is not practical, blacklisting can be used as an alternative approach.  Blacklisting, does not focus on listing valid (allowable) inputs, but rather focuses on listing inputs that should be considered invalid.  In a blacklisting approach, any input that matches the criteria laid forth in the blacklist is considered an invalid input and will result in an error message, while any input that does not match the blacklist criteria is treated as a valid input and passed through for further processing.  Let’s take the hypothetical example of writing a piece of software that serves as the backend for a Web-based forum.  Due to the large variances in types of content that we may want to allow users to post, it may be difficult to whitelist valid forum posts.  We may, however, want to blacklist certain types of content to prevent our forum from turning into a potential XSS attack vector.  In this case, we may consider employing something like the following code snippet, which would blacklist any tagged content:

@inputs=('123','abc def ghi','<script>123</script>');
foreach $input (@inputs){
  if($input=~/((\%3C)|<).*?((\%3E)|>)/){
    print "invalid input\n";
  }
  else{
    print "$input is valid\n";
    #allow use of input
  }
}

Note, how the untagged content is considered valid, but the tagged content is considered invalid. 

Kobo has over 2 million ebooks to choose from!

3 comments:

Anonymous said...

In general I think especially when white-listing one should probably anchor the regex to both the beginning and the end of the string using ^ and $. In this case the input string 'garbage (555) 555-5555' is accepted.

Though in this case $1 will only hold ' (555) 555-5555' I think it might be better to emphasize this.

cfrenz said...

Definitely a good point and a practice that I agree with. I was just trying to demonstrate the difference between a blacklisting and a whitelisting approach more so than creating the “best possible” filter, but it really always pays to be rigorous. Your post also demonstrates why it is important to test any security feature not just with typical use cases, but also with some “abuse” cases that attempt to bypass those security features.

Unknown said...

This is a great point to understand.and execute easily