More More Data!

Can someone technicaly literate explain this to me?

The child benefits records scandal could have been avoided if Customs officials had spent £5,000 on removing bank account details from the computer discs that later went missing, The Daily Telegraph can disclose.

Emails showed that HMRC officials were concerned about paying to remove unnecessary information such as account details from the discs before they were sent to London.

Cutting the files would have cost as little £5,000, experts said yesterday, compared to the £200m cost that could result from the scandal, even if no fraud is committed.

It\’s a long long time since I used a database application so I\’m a little in the dark here. When you download a database you tell it which parts you want to download, don\’t you? You can set the filters to download it all, or this bit, or that bit. So why would it cost £5,000 to download only part of it rather than all of it?

I ask because I can see a defence being prepared here. Instead of setting that £ 5k against the huge costs incurred, they\’re going to set the multiples of £5 k, the cost of filtering the data every time they\’re asked for it, against the damage done by the failure. And they\’d be right to do so, of couse.

But only if it does in fact cost that £ 5 k each time. So, can anyone tell me? I\’m assuming that the cost difference between downloading the whole database and a partial one is in fact zero. Am I correct in that assumption?

16 thoughts on “More More Data!”

  1. I would be very surprised if the data for the disks had been obtained by somebody performing a direct query against the real child benefit database…. 25 million records is a big result set.

    Probably more likely is that there is a mechanism in place to perform a bulk extract of the data (maybe for backups, maybe to allow analysis in other tools, whatever) and this extract is what was taken.

    If this is the case then removing the non-essential information would, indeed, have required extra work.

    That is not to say it shouldn’t have been done but it probably is true to say that it would have required a little more work.

  2. I’m baffled. Surely when you set up a large “data base”, you automatically set it up so that you can extract any desired subset of the data? If not, why not?

  3. Richard is essentially correct, but I have to say that even on a dataset of that size, £5k for an extract is steep.

    The other pertinent question, however, is that of why data extracts for audit are a chargeable extra rather than part of the core contract?

  4. Well, even if we are not talking about adjusting a SELECT query directly from the database (which indeed would cost nothing), but rather, as RGB says, going through a bulk data dump, even the resulting text file must have been formatted in some way (most likely as a comma-delimited file). In which case deleting the bank account fields is a matter of two or three lines of Perl.

    For 5 grand (or – since I have to get the business – £4,995) I should be delighted to write those lines for HMRC….

  5. Without an understanding of the design and location of the data it’s all rather speculative. There’s no real hope of finding this out and I would imagine those are privy to this information wouldn’t have a clue what it means and those that do will be silenced. It is in the interests of the government and EDS to play up the complexity of the system to justify the fee.

    That said, £5K is most likely an absolute rip-off for a bit of data querying. How do I get a job for HMRC? I could do it for a cut-price £4K 🙂

  6. Agreeing with Roger – it is likely that you would not run the extraction query against the live database (a mistake could tie up processor power for days) but against a backup or full copy.

    As others have pointed out, writing the query, running it and validating it is work, but not a huge amount. Call it half-a-day, with lots of tea-breaks. And, of course, burning the CDs, putting them into the jiffy bag and handing them to TNT.

    £5k sounds awfully like a fixed charge for doing something ex-contract below a certain resource requirement – otherwise a change order would have to be raised, negotiated and approved, which would cost far more to administrate.

    To answer Unity’s point: because EDS are better at negotiating contracts than HMRC, and these non-headline £5k deals probably have quite a significant impact on margin, in aggregate.

  7. Ah, but my dears you are forgetting that the IT system at HMRC is run by an external contractor called EDS, under a strictly detailed contract. My suspicion is that the contract failed to stipulate that selected data sets might need to be downloaded, so that any such request is then additioanl to the contract. EDS charge £5k (kerching)

    Another example of the ‘simple shopper’ analogy promulgated by BOM at his web site

  8. I understand they still run ICL/Fujitsu VME Mainframes running with COBOL.

    So at worst, not quite as easy as changing a SQL SELECT query, but it’s perhaps a change of a dozen lines of code.

    £5K for that? You’re having a laugh. No wonder they are, as all the lefties claim, understaffed. They’re spending Le Manoir prices to eat at Burger King.

  9. +1 for Arthur’s opinion. There’ll be a set of standard reports coming from the dataset (for “security” and costing purposes). Anyhting else incurs a standard charge. Nice. Is there no limit to the Government’s negotiating ability? 😀

  10. Arthur. You mean, seriously, they are still running massive, important systems like this on proprietary 1970s technology? Shit it’s worse than I thought.

    Presumably though as previously pointed out, the OUTPUT of this data dump MUST have been formatted up in some way to make the data usable, either CSV, XML etc. Hence as also pointed out a couple of lines of perl would have been able to parse this and filter it quite easily.

  11. A request for data by auditors like this would be a one-off; a developer would have to go in, pull out the data, check that it looked right, format it, break it up to get on more than one disc. 2 CDs are 1.4Gb, for 25 million record that’s 50-60 bytes per record, so it’s obviously been compressed with something. The whole faff could take a developer a day, possibly even two, assuming that the data produced turned out to be readable & correct first time (right format, right character set, right record endings, right compression). Adding encryption would be one more thing that could go wrong, so better to skip that. A day at contractor rates would be getting on for a grand, so with a “government contractor” markup, five grand is not that surprising.

  12. Zorro,

    “Arthur. You mean, seriously, they are still running massive, important systems like this on proprietary 1970s technology? Shit it’s worse than I thought.”

    In my experience, the VME technology was very reliable, and COBOL is open. It’s probably the least of the problems (except that all the programmers are getting old).

Leave a Reply

Your email address will not be published. Required fields are marked *