Big Data/Analytics Zone is brought to you in partnership with:

Alan is what they call passionate about technology and software development. In his time he's done consulting, worked for large corporates, done product development, been part of an R&D/labs team and even done the startup thing. He's work in small teams and large, using methodologies that were Agile and no-so-Agile. You can always read his latest thoughts on his blog - skorks.com and he even puts in the occassional appearance on Twitter as @skorks. Alan is a DZone MVB and is not an employee of DZone and has posted 6 posts at DZone. You can read more from them at their website. View Full User Profile

Even Boring Form Data Can Be Interesting (For A Developer)

09.24.2012
| 4030 views |
  • submit to reddit
What could be more boring than capturing credit card data on a form? Well, it's actually not that boring -- since you may want to encrypt this particular data, which presents its own set of challenges. Nevertheless, it's still a textbox which takes digits that you store in a database – whoopty doo – not exactly rocket surgery. Well, I've got a piece of data that's got the credit card beat for sheer mundanity – the ABN. If you're an Australian you know all about this. For everybody else, it stands for Australian Business Number which is an 11 digit number, provided by the government to every company. It's not secret (you can look them up online), so you don't even need to encrypt it – difficult to get excited about that. Of course if that was the end of the story, this wouldn't be much of a blog post, so – as you might imagine – things are not as bland as they appear.

The Interesting Thing About A Credit Card Number…

At CrowdHired, we don't tend to deal much with credit card numbers, but ABNs are another matter entirely – since companies are one of the two types of users we have in the system (by the way – as you may have deduced – I've been working for a startup for the last few months, I should really talk about how that happened, it's an interesting story). Just like any piece of data, you want to validate the user input if you possibly can. When I started looking into this for ABNs I discovered that they had an interesting trait, it is a trait which credit card numbers share. You see, both credit cards and ABNs are self-verifying numbers.

I've been doing web development for many years now, but had no idea this was the case. So naturally – being the curious developer that I am – I had to dig a little further. It turns out that these kinds of numbers are quite common, with other well-known examples being ISBNs, UPCs and VINs. Most of these use a variation of a check digit-based algorithm for both validation and generation. Probably the most well-known of these algorithms is the Luhn algorithm which is what credit cards use. So, we'll use a credit card as an example.

Validating And Generating Credit Cards

Let us say we have the following credit card number:

4870696871788604

It is 16 digits (Visa and MasterCard are usually 16, but Amex is 15). This number is broken down in the following way:

Issuer Number | Account Number | Check Digit
487069        | 687178860      | 4

You can read lots more about the anatomy of a credit card, but all we want to do is apply the Luhn algorithm to check if this credit card is valid. It goes something like this:

 

1. Starting from the back, double every second digit

  4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 0 | 4
  8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 |00 | 4
 

2. If the doubled numbers form a double digit number, add the two digits

  4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 0 | 4
  8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 |00 | 4
  8 | 8 | 5 | 0 | 3 | 9 | 3 | 8 | 5 | 1 | 5 | 8 | 7 | 6 | 0 | 4
 

3. Sum up all the digits of this new number

  8+8+5+0+3+9+3+8+5+1+5+8+7+6+0+4 = 80
 

4. If the number is perfectly divisible by 10 it is a valid credit card number. Which in our case it is.

You can see how we can use the same algorithm to generate a valid credit card number. All we have to do is set the check digit value to X and then perform all the same steps. During the final step we simply pick our check digit in such a way as to make sure the sum of all the digits is divisible by 10. Let's do this for a slightly altered version of our previous credit card number (we simply set the digit before the check digit to 1 making the credit card number invalid).

  4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 1 | X
  8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 | 2 | X
  8 | 8 | 5 | 0 | 3 | 9 | 3 | 8 | 5 | 1 | 5 | 8 | 7 | 6 | 2 | X
  8+8+5+0+3+9+3+8+5+1+5+8+7+6+2+X = 78+X
  X = (78%10 == 0) ? 0 : 10 - 78%10
  X=2

As you can see no matter what the other 15 digits are, we'll always be able to pick a check digit between 0 and 9 that will make the credit card number valid.

Of course not every self-verifying number uses the Luhn algorithm, most don't use mod(10) to work out what the check digit should be, and for some numbers like the IBAN, the check digit actually consists of 2 digits. And yet, the most curious self-verifying number of the lot is the first one I learned about – the ABN. This is because, for the life of me, I couldn't work out what the check digit of the ABN could be.

The Curious Case Of The ABN

Curious

Australia is certainly is not averse to using check digit-based algorithms. The Australian Tax File Number (TFN) and the Australian Company Number (ACN) are just two examples, but the ABN seems to be different. At first glance the ABN validation algorithm is just more of the same, it just has a larger than normal "mod" step at the end (mod(89)).

  • Subtract 1 from the first (left) digit to give a new eleven digit number
  • Multiply each of the digits in this new number by its weighting factor
  • Sum the resulting 11 products
  • Divide the total by 89, noting the remainder
  • If the remainder is zero the number is valid

In-fact, here is some ruby code to validate an ABN which I appropriated from the Ruby ABN gem (and then rolled it into a nice Rails 3, ActiveRecord validator so we could do validates_abn_format_of in all out models :)) :

  def is_integer?(number)
    Integer(number)
    true
  rescue
    false
  end
 
  def abn_valid?(number)
    raw_number = number
    number = number.to_s.tr ' ',''
    return false unless is_integer?(number) && number.length == 11
 
    weights = [10, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
    sum = 0
    (0..10).each do |i|
      c = number[i,1]
      digit = c.to_i - (i.zero? ? 1 : 0)
      sum += weights[i] * digit
    end
 
    sum % 89 == 0 ? true : false
  end

But, while validating ABNs is easy, generating them is a whole other matter. As we've seen, with a check digit-based algorithm, generating the number is the same as validating the number, except we pick the digit in such a way as to make our 'mod' step evaluate to zero. But with a number such as the ABN, where there is no apparent check digit (perhaps I am just having a bout of stupid, so if you can see an obvious check digit with ABNs do let me know), how do you easily generate a valid number? In-fact, why would you want to generate these numbers in the first place, isn't being able to validate them enough?

Well, in the case of CrowdHired, we tend to create object trees that are quite deep, so we build an maintain some infrastructure code to allow us to create fake data for use during development (another interesting thing to talk about at a later date). Before we started using the self-validating properties of ABNs we simply generated any old 11 digit number as fake data for ABN fields, but once the validations started kicking in this was no longer an option. Being the pragmatic developers that we are (even if we do say so ourselves), we took some real ABNs (like our own) chucked them into an array and randomly picked from there. But, this offended the developer gods, or my developer pride – whichever, so one Saturday I decided to take a couple of hours to generate some truly random ABNs that were still valid. Here is the code I came up with (it is now a proud part of our fake data generation script):

  def random_abn
    weights = [10,1,3,5,7,9,11,13,15,17,19]
    reversed_weights = weights.reverse
    initial_numbers = []
    final_numbers = []
    9.times {initial_numbers << rand(9)+1}
    initial_numbers = [rand(8)+1, rand(7)+2] + initial_numbers
    products = []
    weights.each_with_index do |weight, index|
      products << weight * initial_numbers[index]
    end
    product_sum = products.inject(0){|sum, value| sum + value}
    remainder = product_sum % 89
    if remainder == 0
      final_numbers = initial_numbers
    else
      current_remainder = remainder
      reversed_numbers = initial_numbers.reverse
      reversed_weights.each_with_index do |weight, index|
        next if weight > current_remainder
        if reversed_numbers[index] > 0
          reversed_numbers[index] -= 1
          current_remainder -= weight
          if current_remainder < reversed_weights[index+1]
            redo
          end
        end
      end
      final_numbers = reversed_numbers.reverse
    end
    final_numbers[0] += 1
    final_numbers.join
  end

The idea is pretty simple. Let's go through an example to demonstrate:

1. Firstly we randomly generate 11 digits between 0 and 9 to make up our probably ABN (they are actually not all between 0 and 9 but more on that shortly)
  7 5 8 9 8 7 3 4 1 5 3
2. We then perform the validation steps on that number
  • multiply the digits by their weights to get weight-digit products 

      7x10=70
      5x1=5
      8x3=24
      9x5=45
      8x7=56
      7x9=63
      3x11=33
      4x13=52
      1x15=15
      5x17=85
      3x19=57
    
  • sum the products
    70+5+24+45+56+63+33+52+15+85+57 = 505
  • divide by 89 to get the remainder
    505 mod 89 = 60
3. Since we do mod(89) at worst we'll be off by 88 (although if we get 0 as the remainder we lucked out with a valid ABN straight away), we now use the weight-digit products to "give change", subtracting from the remainder as we go until we hit zero.

We start with the last digit where the weight is 19. We subtract 1 from this digit, which means we can subtract 19 from our remainder. We then move on to the next digit until the remainder hits zero

  Initial | Change  | Remainder
  -------------------------------
  7x10=70 | 7x10=70 | 0
  5x1=5   | 5x1=5   | 0
  8x3=24  | 8x3=24  | 0
  9x5=45  | 9x5=45  | 0
  8x7=56  | 8x7=56  | 0
  7x9=63  | 6x9=63  | 0
  3x11=33 | 3x11=33 | 9
  4x13=52 | 4x13=52 | 9
  1x15=15 | 0x15=0  | 9
  5x17=85 | 4x17=68 | 24
  3x19=57 | 2x19=38 | 41
4. This gives us our new number
7 5 8 9 8 6 3 4 0 4 2
5. Now we just need to add 1 to the very first number (as per the ABN validation steps) and we have our valid ABN
85898634042

There are a couple of nuances to those steps.

  • None of our initial generated digits can be zero. Since we "give change" by subtracting 1 from each digit we need to ensure we can actually subtract 1 (otherwise things become much more complex). So we ensure our digits are randomly generated between 1 and 9 rather tahn between 0 and 9.
  • Even when all our initial digits are at least 1 we may still fail to "give change" for some of the remainders, the easiest example of this is when our remainder is 2. The only digit when can use to give change is the one with the weight of 1 (i.e. the second digit of the ABN). If this digit is initially generated to be 1, we can only give change once at which point we end up with a remainder of 1 and there is nothing more we can do about it. In-fact this exact scenario can happen for a few other remainders namely 86, 77, 66, 53, 38, 21. The easies way to overcome this problem is to ensure that the digit that has a weight of 1, is always randomly generated to have a value of at least 2. This way we can use it to give change twice and our problematic remainders are covered.
  • Lastly, since we have to add 1 to the very first digit as our final step, we need to make sure that this digit is not already equal to 9, so we generate that one to be between 1 and 8.

Given these nuances, this algorithm won't generate every possible ABN, but it will give you a large percentage of possible ABNs which is good enough for our needs. It took about an hour to get that working (we won't mention the little bug where I forgot the remainder could be zero from the start, which caused much grief to our random data generator :)), but it was a fun little exercise – time well spent as far as I am concerned. And to think, all this learning about self-validating numbers and algorithmic coding fun was triggered by trying to capture the most mundane piece of data on a form. It just goes to show that you can learn and grow no matter where you are and what you're doing, you just need to see the opportunities for what they are.

 

Published at DZone with permission of Alan Skorkin, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)