Tuesday, 10 February 2015

The Missing Sundays Mystery

One of our key customers sends us a datafile every day, and it automatically gets loaded.  Except sometimes - it doesn't...

<Cue theme from "The Twilight Zone">

So I wrote some code to see when data was being loaded.  Sure enough, it works every day of the week, but on the Sunday between Christmas and New Year - it failed to load.  The following Sunday - it worked fine!  But since then, it failed every Sunday.  Only on Sunday.  Then on Monday, a double dose of data gets loaded. 

Now - I didn't tinker with the database - it's an Oracle system, running under a Cron job  under Unix.  So a long way over to the Dark Side.  I don't even have access to the box it runs on - even if I wasn't afraid to touch it, I couldn't tinker. 

And the developers swear blind that they haven't mucked about with the application for months...

I looked in the FTP site - there was a litter of .tmp files, apparently some sort of by-product of the loading, renaming and moving process.  No clues there though, and removing them made no difference. 

Luckily the client was able to send us a log showing us what files were sent and at what date and time.  Helpfully, he highlighted the missing Sundays in yellow.  and all became clear.  The ones that worked were sent to us at various times ranging from about 0500 to 0700.  The ones that failed were sent to us at various times between 0700 and 0800. 
Guess when the Cron job runs?

The job to send us the data is automated - but runs when other things finish, hence the variety of times.  And Sundays?  "That's the day we bounce our servers..."

1 comment:

  1. We could change this, but we would need to set up an Oracle development environment, plus ideally a test environment. We would have to purchase new licences for Oracle on those servers, and employ an Oracle DBA to set it up and run it.

    I estimate approximately $1,000,000

    We could ask the client if they would like to consider this, but I suspect that they will prefer to go back to sending the data before 0700 as they did throughout 2014

    ReplyDelete