Guardian’s investigations suggest bank’s problems began on Tuesday night when it updated key piece of software called CA-7
NatWest has admitted that it could not say exactly how much money should be in individual accounts as the crisis caused by a failed software update last week spiralled out of control for days.
The bank was quick to deny claims by the Unite union that the “offshoring” of IT jobs to locations in India had led to the the problems which appeared on Tuesday night and which paralysed its systems through to Friday, and which have not yet been fixed.
However a number of programmers and experts who have worked on or with NatWest systems told the Guardian that they could not imagine the problem happening in the period before the redundancies of experienced staff since 2010.
“[NatWest owner] Royal Bank of Scotland has 40 years’ experience running these systems and banks as a rule don’t drop the ball like this,” one said. “Somebody somewhere made a decision that has led to this.”
The Guardian’s investigations suggest that NatWest’s problems began on Tuesday night when it updated a key piece of software – CA-7, which controls the batch processing systems that deal with retail banking transactions – ahead of the regular nightly run.
RBS/NatWest has not said what went wrong, though one programmer who has worked on RBS/NatWest’s systems told the Guardian: “CA-7 is a very common and reliable product used to automate large sequences of batch mainframe work [which are usually referred to as 'jobs']. It will start jobs, wait for them to run, then start other jobs dependent on the first ones completing, and so on. RBS processes accounts overnight via thousands of jobs.”
The jobs take transactions from various places, such as ATM withdrawals, bank-to-bank salary payments, and so on, and finish by updating the master copy of the account – in a system known as Caustic – with the definitive balance.
“It seems whoever made the update to CA-7 managed to delete or corrupt the files which hold the schedule for the overnight jobs, so they did not run, or ran incorrectly,” the programmer told the Guardian. “They have backed out from this change, but now are trying to play catch-up, and have been doing so for a few days.”
The batch processing system, which reconciles the movement of money in and out of more than 10m NatWest and Ulster Bank accounts, did not run correctly for three nights – meaning that millions of transactions were not processed until it did begin running correctly on Friday. Even when it had been fixed, the batches of transactions have had to be re-run in order, beginning with Tuesday, so that nobody’s account goes wrongly into overdraft.
A NatWest spokesperson, asked whether it knew how much money people had at any time, said: “All the money is safe in the bank. It’s being applied to people’s accounts. We can show people statements on screens if they come into branches.” The bank is offering special opening hours, extending to 6pm, this week. NatWest also ran extra batches to catch up with the transaction backlog over the weekend.
NatWest, like many other banks, uses the CA-7 software and attendant files to fit its own custom needs. The problem only surfaced once the batch run was underway in the early hours of Wednesday. RBS appears to have advertised for specialists in CA-7 in February in India – to which a number of its IT jobs were moved after 2010. “Looking for candidates having 4-7 years of experience in Batch Administration using CA7 tool,” the advert read. “Urgent Requirement by RBS.”
The Unite union has criticised RBS management for cutting jobs in the UK and shifting a number of them offshore. Since 2010, hundreds of IT jobs have been cut from RBS’s Edinburgh headquarters and shifted abroad. RBS/NatWest has denied that it made any difference.
But some observers strongly disagree. “This was not inevitable – you can always avoid problems like this if you test sufficiently,” said David Silverstone, delivery and solutions manager for NMQA, which provides automated testing software to a number of banks, though not RBS/NatWest. “But unless you keep an army of people who know exactly how the system works, there may be problems maintaining it.”
One programmer who worked on the RBS/NatWest systems during the takeover in 2001-02 said that the latest problems suggested a paucity of staff on the spot with experience of what to do. “The people in India will have done their darndest to do a good job, but without the knowledge of the overall system that you get from years of experience on the ground, it’s easier to see how you get a big operational failure.”
Banks have for decades used huge mainframe systems to process payments such as cheques and to update customers’ accounts; the transactions for each day are collected and are then run in a single gigantic batch overnight, so that accounts have been credited and debited with the correct amounts by the morning. That is why internet banking transactions are not processed if you carry them out after certain times: the banks’ systems simply don’t add them into the queue for that night’s batch.
Sources familiar with NatWest’s systems, and who have also spoken to staff there, explained that the problems with the update surfaced during the batch run. NatWest confirmed on Monday that the problem first surfaced on Tuesday, and that “we confirmed the fix on Friday”.
The problems with the upgrade were spotted during the overnight run ahead of Wednesday morning. “We have guardian systems which spot when things go wrong,” a NatWest spokesperson said.
But by Friday, when the fix was implemented, three sets of batch runs had failed. If a batch fails badly – as here – then all of the transactions, including the payments in and out of accounts, are “rolled back” to the starting point, as if it had never run. The set of transactions from Wednesday was then added to the pending list on Wednesday, and attempted to run in the early hours of Thursday; that too failed. By the time the fix had been done, there were three days’ worth of unimplemented transactions queued up.
Richard Price, a Norwich-based systems developer who has worked on banking systems that linked into NatWest’s, explains: “Banking systems are like a huge game of Jenga [the tower game played with interlaced blocks of wood]. Two unrelated transactions might not look related now, but 500,000 transactions from now they might have a huge relation. So everything needs to be processed in order.” Thus Tuesday’s batch must run before Wednesday’s or Thursday’s to avoid, for example, penalising someone who has a large sum of money leave their account on Thursday that might put them in debt but which would be covered by money arriving on Wednesday.
Price said that any software update would first have been subjected to quality assurance and user acceptance testing before being implemented.
CA-7 is familiar to many in the banking industry: it was originally released in 1980 by Uccel – which was then taken over by Computer Associates, which provides key software for scores of banks. Computer Associates told the Guardian: “RBS is a valued CA Technologies customer, we are offering all assistance possible to help them resolve their technical issues which are highly unique to their environment. We do not comment on customer confidential issues.” However it declined to say whether CA-7 lay at the heart of the problems.