Coder war stories

Lads, I thought it would be entertaining if we posted stories we may have about experiences in our careers in tech. Not exclusively coders, just any kind of interesting tech situation/dilemma/disaster/intrigue or whatever.

I will get the ball rolling with one of mine.

I work at a [REDACTED] as a programmer/general dogs body of IT. I was on call one day back in 2015 when I got a call about 7.30pm on a Tuesday night saying that an electronic manifest for a train had not transmitted correctly. I logged in expecting to see some kind of comms issue, or perhaps the file had not generated properly. I started looking through the folder that contained the application and its files. First thing I noticed was that all the files now contained gibberish. Also every directory in the file system now had two files in it: one a text file the other an html one. I opened the text one in a plain text editor. It was a demand for bitcoin - we had been hit by a crypto locker virus that had encrypted every single data file and script. I raised the alarm around 8pm, and by 9pm we had the whole team on site.

There was a funny moment when our [REDACTED] arrived on site and pompously declared that we needed a situation desk. He sets up a desk and while we are all standing around watching him he produces a pad ready to write on and says “I need a pencil.” We all stand there blankly and next thing he shouts “Guys! Snap to it! Paper – and - pencil!” So someone goes off and gets him a pencil. Later on we are all talking to each other and wondering what the hell was that about a paper and pencil?

Then he made the call to close the entire [REDACTED] - because at that stage we did not know the extent of the infection, or if it was going to get worse. I remember calling one of our inland facilities and telling them the system was going bye-bye for a while. He pauses for a bit after I tell him the good news then says “Fuck off – we got all our truck lanes backed up to the main road and a train is currently being deramped. Why?” I wasn't allowed to tell him because we didn't want the media getting wind of it so I just said “emergency maintenance. Something has fucked up.” I believe that was my exact technical explanation. “When will it be back up?” he says. “Dunno at this point.” I ring off.

There was a moment where I thought - Shit, is this something I have let in? I do a review of my desktop after physically disconnecting from the network and disconnecting any shares to servers I have set up (in those days we were allowed to just connect shares to whatever dev servers we were working with – or rather, there was no rule about it and your network permissons allowed you to so have at it.

After satisfying myself as best I can that there is nothing going on with my machine I experienced a brief period of euphoria. It was actually kind of exciting. I spent the next 8 or so hours helping to scan all our vm's before bringing them back online. We restored the two servers that had been infected and made copies of the infected ones.

We ended up pulling an all nighter to get things under control. As it turned out only two servers were infected, both reports servers. At that time we had a mcafee solution for network security and we ended up sending a hard drive off to Mcafee with the virus along with the VM's it had infected. They said it was a new one to them and they wanted a copy of it.

Looking back on it now 5 years later I believe the attack was a blessing in disguise. It exposed a lot of vulnerabilities that have since been addressed and the company is on a better war footing now. Before it happened we were pro-active with security and had independent audits of our security that had found tonnes of lapse practices, but when the enemy came it presented a whole new vector of attack that had not been fully understood before. Experience is a bitch but it really is the only true teacher. Now days the network is very restrictive and a pain in the ass to work with, but I understand why it has to be this way. Management at my company now are super paranoid about the threat of lapse security and it is really only because of The Great CryptoLocker virus of 2015.

Recently I developed a web service and the hell we had to go through to get this thing approved by the company and put into place added literally weeks to the project. As a programmer it creates a lot of time where the code is essentially ready to deploy but we are waiting on infrastructure and management to sort these issues. The company needs to become more agile about these decisions but the pleasant side effect of it is we use this time for more extensive testing. Rapid Application Methodologys tend to release code that is not tested well I believe and you end up having to swat a lot bugs after delivery which is not ideal as any programmer will tell you. When my web service finally went live I had just spent two weeks swatting bugs that our UAT testers uncovered but since it went live I have not had to look at it again.

And that is as satisfying as it gets for a programmer, at least in my opinion.


So have at it lads, I know there must be thousands similar stories out there.
 
Just switched over to SAP at a billion-dollar a year manufacturing plant. One of the systems that supported getting off the old mainframe was a paperless shop floor information system. Back then, the data was in SQL Server and presented via ASP.NET. Most of the data we needed we snagged from SAP when it passed data down to the automation system. But some data we needed straight from SAP. So we decided to have one of the consultants (from a 3-letter consultancy everyone has heard of) onsite use Winshuttle to query the data into SQL from SAP for us and have that update daily. He worked on it, and we had data, so good, right? Turned out the data was not correct. Upon investigation, he had pulled the data from the dev environment instead of prod. Asked him about it and the reply was, "but I could not get it to work in prod." So there you go, just use test or dev if prod won't work for a live production application.

At another company, corporate IT had what they called a data warehouse in SQL that was data replicated from RMS flat files from a business system on a VAX. It was just copies of tables, not star-schema, etc., so calling it a data warehouse was a stretch. Anyway, it was useful to write reports against as the mainframe was not easy to use. But we kept on having problems with the database not being available some mornings. Made some inquires, and it turns out they did not know how to do a MERGE or otherwise append new, update changed, and delete old data, so they just truncated every one of the 100+ tables in the SQL database and then inserted all the data from the mainframe going back a few years, every single morning. Eventually at the plant we learned how to query the mainframe directly and we secretly copied data over every day to create tables on our server of critical business data (open orders, customer addresses, product specifications, etc.) in case their system died--that was how much confidence we had in them.

At one company, had a system that used a database that the IT group had. That group really did not want people to know what they were up to--very protective of their turf. The IT manager used to be an electronics technician; he mainly got excited about pulling cat-5 cable. Anyway, IT set up the database then I added the tables and such and made the application to use it. Worked fine, but had a complaint that it was slow around 1AM every morning. So stayed up one night and sure enough, it was slow. Pried into the situation, and turns out that their "DBA" did not know how to back up SQL databases, so he was making a disk image of the whole sever every night at 1AM. So log files kept on getting huge, and not as though restoring an image would have restored a database anyway. They called in a consultant to tell them how to back up a database (I guess he wrote "BACK UP DATABASE XYZ TO ABC;" on a white board, then sent them a bill) then the 1AM performance issue went away.
 

nathan

Sparrow
In my consulting days, I took over a project for a friend of a friend who runs a weather company. The previous dev wrote the bulk of the system and then he moved on to a new job. When I took over, it was clear that the original dev was totally new to development. As a dev, I never want to talk crap about another dev because I know how it is, but this was legit the worst codebase I had ever seen. To give you an idea, he would do things like, for example, instead of storing a week's worth of daily data in an array or any other smart data structure, create seven separate variables, each with an increasing number of consonants. So, the rain variable for 0 days ago would be named rain, then the value for 1 day ago would be named rainn, and so on until the value for 7 days ago being named rainnnnnnnn. I also remember refactoring his 4000-line php file for the front end into 500 lines, mostly by just employing DRY principles which he had apparently never heard of. There was also a session problem with his code, and rather than fixing it, he just made a cron job that would restart the server every day, often in the middle of various processes that were running.

Point is, this stuff was horribly written. Now here's the kicker: me and my business partner took on this assignment in 2016. The owner told us his original dev got a job at... Boeing. To write their pilot software... To which my business partner said at the time "well, looks like we shouldn't fly in any Boeing planes anymore". Now lo and behold, in case you didn't know, last year Boeing found itself in sort of an existential crisis due to faulty software crashing several of their newer planes. :hmm:
 
Point is, this stuff was horribly written. Now here's the kicker: me and my business partner took on this assignment in 2016. The owner told us his original dev got a job at... Boeing. To write their pilot software... To which my business partner said at the time "well, looks like we shouldn't fly in any Boeing planes anymore". Now lo and behold, in case you didn't know, last year Boeing found itself in sort of an existential crisis due to faulty software crashing several of their newer planes. :hmm:


This is what happens when you skimp out on good codemonkeys and go to the lowest bidder
 
Was working at a rolling mill in the metals industry. There was a computer in the server room always spinning away at calculating some data points to set up a rolling mill so it could pre-adjust to get on gauge and shape faster when it got to particular coils of metal. It usually ran behind. The person who had done that whole system was no longer there. I figured, well, I'll calculate what it is worth to have the data all the time instead of only part of the time, to justify a project to do something about it. So I did a matched-pair study looking for coils that had ran back to back off the same caster (i.e., as close to each other as possible) where one had the data and one did not. So how worse was the scrap on the coils without the data? Actually much better. Coils with the data had worse scrap than those without.

Looked at the source code, which was in Java because I guess that was the hot thing to learn back then. Nothing else at the plant was in Java, only that application. What was supposed to happen was it was to calculate 5 averages out of thousands of data points that was fed to it in text files. The way the calculation was coded was RESULT = (RESULT + NEW_DATA_POINT)/2. The comment next to that line was something like */CAN NOT FIGURE OUT HOW TO WORK WITH LARGE NUMBERS, SO DOING IT THIS WAY/* So instead of averaging thousands of data points, it was 50% the very last data point, 25% the subsequent data point, etc. The rolling mill was basically being fed random numbers. And his comments in the code said as much. The controls engineer in that area just plugged in long term known averages and the scrap went down immediately. We ended up just unplugging the computer. I heard that guy went on to be a VP in the IT department of a credit card company. That is a credit card I do no use.
 

ArizonaGuy

Newbie
Took a cab to the front gate of an industrial park outside of Tokyo on a snowy Sunday, had just flown in to Japan from California the day before. A huge rebar bender wasn't making the correct angles even accounting for temperature and spring-back of the steel. The program that controlled the hydraulic bending head was written in C and ran in DOS. About three hours into it I realized the original programmer had not used the sin() call from the math library but did the Taylor expansion by hand - and got the sign of one term backwards. Since the long length of the segments amplified the error angle, this was recognized by the Japanese construction guys. Easy fix.

A good programming moral here: don't reinvent the wheel when you can pay for the smarter wheel engineer who came before you.
 

ArizonaGuy

Newbie
Looked at the source code, which was in Java because I guess that was the hot thing to learn back then.
My one experience using Java was in '96 or '97 and I thought "What idiot decided to not include unsigned integers in a language? How am I supposed to construct UDP packets?" Little did I know back then this was just the beginning of the Promethian ideals around which Silicon Valley was based: we know better than you stupid dirt people; trust our goatees, pink hair and guys calling themselves Ada. Rust is even worse with its implied exceptionalism. I'll just stick with C and ASM to play in the box of razor blades that I know oh so well.
 
Top