11 comments on “1.2bn population of India to be given biometric ID cards

  1. Tim,

    This is not in fact correct… “Given the problems with false positives (as Alex has been trying to tell us for years) I’m not sure there’s enough computing power in the entire world to do this.”

    Alex cites David Moss who says ….. “Suppose that there were 60 million UK ID cardholders. To prove that each person is represented by a unique electronic identity on the population register, each biometric would have to be compared with all the rest. That would involve making 1.8 x 1015 comparisons.”

    A programmer would
    1: hash the data (a type of data compression)
    2: index the data (put into alphabetical order)
    3: as each record is found to be unique it can removed from the list, so the 60,000,000 records set reduces linearly as we work through the data.

    Basically, put the data in alphabetic order, then you only need to compare with adjacent records .. I could do it on a pc in minutes, though perhaps some months to write the prog (to commerical standard).

    He’s wrong, way way way out, it would take hours only on a PC.

    Give me 60 million biometrics and I’ll tell which ones are and are not unique very quickly, in fact I could probably write the prog using free software (mysql) without the hashing and it would still run in a few hours on my PC.

    Whatever objections there may be to ID cards, this is not one of them.

    More interesting to discuss how quickly 60 million biometrics could be scanned for duplicates. I reckon a few hours on a pc, all of it to build the indexes, then a few seconds/minutes to do the actual data run and comparisons.

    Anybody got a suggestion for how to do it even more quickly?

  2. I can see many reasons why the Indians might benefit from their government knowing who is who, I recommend that they do this scheme.

    It is only me and my government that I worry about, I don’t want those cunts knowing anything about me.

  3. Concerning Jonny Bonk’s comment starting: “This is not in fact correct… ”

    Actually, that is not correct either, on 2 counts.

    The name of the game being discussed is detection of multiple applications: ie someone trying to set up more than one identity, presumably with the intention of commiting crimes in a false identity, and having them traced to their true identity (registration of which is necessary to lead an ordinary sort of life).

    Firstly, the checks that need to be done arise for each new registration. The new applicant needs to have their biometric(s) compared to all those of currently registered people. The computational load is linearly proportional to the number of persons currently registered; towards the end of a ‘country’ registering everyone, the that is pretty much the whole (adult) population. The cumulative number of matches is that stated by David Moss n*(n-1)/2. However, and more importantly, the additional number of matches for the final applicant (‘n’) needs to be computer within a reasonable period, and preferably before the applicant leaves the registration site (after which he won’t be around to be prosecuted for false registration).

    Secondly, Jonny is assuming that there is a single fixed symbol string produced for each biometric registration (and would be right on all his hashing assumptions, if that were so). However, biometric measurements are subject to error on a continuous (analogue) scale of measurement. If you measure the same biometric from the same person twice, you do not get the same measurement. An example would be assuming you can measure someone’s height to the nearest micro-metre and get the same value every time, or that quantising the measure to the nearest 5mm is OK (and getting millions of false matches, each totally indistinguishable from a good match).

    More information on detection of multiple applications can be found here: http://www.camalg.co.uk/s03017_pr1/s03017_pr1.pdf (slide 5), and here: http://www.camalg.co.uk/nids_040116a/NIdS_A031219a_v2.pdf (point 10 and points 15 to 18).

    I have expressed concern over the ability to detect multiple applications within the UK population; India has a population around 15 times larger. They would need to use more biometric modalities (or many more instances; ie of fingers and/or irises) to get equivalent discrimination, and so (probably, though there are some interesting computational tricks – especially if you ignore artefact attacks) more than 15 times the computation, to get similarly reliable detection of multiple applications.

    Best regards

  4. @3 “Secondly, Jonny is assuming that there is a single fixed symbol string produced for each biometric registration “, yes there is that implicit assumption and it may not be correct in the real world, however genetic biometrics are certainly moving in that direction ..

    I must however disagree with you about the time taken to compare all records as “n*(n-1)/2” , once the data is indexed the time taken is trivially small, even a modest home spec PC will find duplicates in a fraction of a second once the data has been indexed …. IIRC its “log n” – or some such (much quicker than n*n), other bloggers (compsci types) may wish to edify us more precisely.

    That’s the power of indexes/indices, once you’ve built the index then you can match up from the most enormous data sets very quickly.

    Consider the telephone directory, if you want to see if “Tim Worstall” is in it you will work alphabetically and it hardly takes longer to find a match (or not) in the UK phone book as in the village phone book.

    You are correct that the data is not a nice convenient fixed datum, but the power of indices still applies. If the tech is done correctly then johnny fraudster will be detected instantly – or at least flagged as suspicious.

    The very fact that the govt CAN acquire this data power over us is indeed the worry with ID schemes and biometrics etc, if they couldn’t then we would not have to worry about their plans.

  5. but the problem with the data is that it is not reliable – there will be so many false matches and non-matches that reources will be hard-pressed to cope, according to the article that Tim references

  6. First a correction; in my 3rd paragraph @3 above, I meant: “and AVOIDING having them traced to their true identity”, which I hope was obvious.

    Jonny Bonk @4 writes: “once the data is indexed the time taken is trivially small”

    This is where we disagree, as I wrote previously. Biometric data cannot be reduced to a single string without so quantising the data and, through that, significantly reducing performance. Currently, biometric performance is seriously struggling to be good enough for detection of multiple applications; therefore it is not appropriate to reduce the performance, even if it reduces the cost of the computation. Roughly, knowing, quickly and cheaply, just about sod all is actually sod all use.

    Now, just in case any good mathematicians (possibly including Jonny) are reading here, one of the better prospects for improving this situation is (well, as far as I am concerned, possibly is) that using the theory of Voronoi (or Dirichlet) tessellations, perhaps including the associated technique of kd-trees. These techniques are potentially useful in reducing the search time for pattern matching by (k-) nearest neighbour search in multi-dimensional spaces. However, there are various issues that make this difficult for biometrics. I don’t really think Tim’s blog is a good place to go into these; also I’m not aware of any serious reports of experimental success using them in large-scale biometric detection of multiple applications or biometric identification. However, if anyone disagrees on either of those, perhaps they could just drop in the odd reference to appropriate journal or (reputable technical) conference papers, and I’ll look and come back on the issue.

    Best regards

  7. You only get a log2 n search cost if the data is well-ordered. If you cannot impose a well-ordering criterion on your data then amortised search cost becomes O(n), and thus O(n²) in the system as a whole. Into which category biometric data falls I do not know.

  8. but the fundamental problem remains…the data is bad…..matching physical measurements leads to unpredictable results with lots of cases to query by other methods…and that is even if the computing bit worked ok…the error rate in the underlying data is simply too large to be practical…

  9. Arguing about the database programming from a self-taught Visual Basic programmer’s perspective does not shed light on the problem.

    The most workable of the biometrics is iris scanning. Go and read a paper on how it actually works rather and how the false positive problem arises. Then you’ll see this is nothing to do with the speed of computation and all to with the human logistics of resolving the inevitable multiple matches on registration.

  10. @Kay Tie re @9: On this one, I think you are shooting from the hip, by missing out on the benefits of multi-modal biometric fusion, and how each new modality does contribute to better discrimination. Thus, even though a single biometric modality cannot solve the problem of detection of multiple applications in large populations (on the basis of current knowledge, and very likely unavoidably), using a multiplicity of biometrics might be able to do that.

    Either that or I’ve totally missed your point.

    Clearly there does need to be an operational procedure to deal with false matches. How often that happens matters, as well as how long it takes to resolve the conflict by non-biometric means. I’ve considered what would be a tolerable rate for flase alarms (1/10’th of applicants), and worked out approximately what biometric performance is required to obtain it for a particular size of population.

    Anyway, my technical papers and presentations ( http://www.camalg.co.uk/pubs_biometrics.htm ) might help you evaluate prior knowledge on biometrics; the most recent one includes work on iris recognition and the oldest but one explains about multi-modal biometric fusion.

    Best regards

  11. Nigel Sedgwick: I would note that even if there is a Voronoi tessellation on the measurement space, unless the space admits a Euclidean metric this does not guarantee the uniqueness of the tessellation. Further, even if non-Euclidean, a useful metric implies and is implied by a well-ordering criterion. I confess this is out of my field of expertise but these are general observations that are necessarily true.

Leave a Reply

Name and email are required. Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.