Archive for the ‘error’ Category

Don’t blame the OS for hardware problems

Many times hard­ware prob­lems may be the actual cause of a prob­lem that is blamed on the OS. It’s very easy to blame an OS. It’s fash­ion­able to blame Microsoft and par­tic­u­larly Vista when some­thing goes wrong. Many times it’s not the OS but an appli­ca­tion or dri­ver. That’s a sub­ject for another post. Today I’m going to talk about another prob­lem that is often mis­tak­enly blamed on a flaky OS – hard­ware errors. Com­put­ers are very com­plex sys­tems. A moth­er­board needs to be man­u­fac­tured to very close tol­er­ances. A minus­cule bit of sol­der or a bad trace can change the capac­i­tance of a cir­cuit enough that you may get extremely ran­dom errors. PSUs (power sup­ply units) are another cause of ran­dom hard­ware errors. In Win­dows these errors trans­late to ran­dom BSODs and lock ups. Test­ing the hard­ware with soft­ware run­ning on that same hard­ware rarely finds prob­lems like this. You would need very expen­sive equip­ment and the knowl­edge of how to inter­pret the results of test­ing with this equip­ment. Test­ing RAM runs into the same prob­lem. I like memtest86+. The prob­lem is even if memtest86+ passes all the tests, even after sev­eral hours of test­ing, this doesn’t mean the RAM is OK. Even fail­ing memtest86+ is only an indi­ca­tion the RAM is bad. The actual prob­lem may be the moth­er­board, CPU, or power sup­ply. Does this mean that run­ning soft­ware that tests the hard­ware is use­less? No it doesn’t. I use sev­eral dif­fer­ent soft­ware tests when diag­nos­ing com­puter prob­lems. They can be very use­ful at nar­row­ing down the prob­lem. If a soft­ware test fails you know you have a hard­ware prob­lem and can be pretty sure of the actual com­po­nent caus­ing the prob­lem. If a soft­ware test passes you have a rea­son­able chance that there are no hard­ware prob­lems related to the test but you can’t be sure. I was recently work­ing on a com­puter that illus­trates this. This was a new com­puter with XP Home SP3. It was only a cou­ple of weeks old. It was expe­ri­enc­ing inter­mit­tent prob­lems with Inter­net con­nec­tions. Pro­grams would quit with the infa­mous “This pro­gram has expe­ri­enced a seri­ous error”. The event logs had sev­eral seem­ingly unre­lated errors. I tried chang­ing the AV pro­gram, updat­ing the BIOS, mak­ing sure all the lat­est dri­vers were installed, yada, yada, yada. Every­thing would be fine for a few days or even a week then some­thing dif­fer­ent would hap­pen. None of the errors were repeat­able. I ran memtest86+ for six hours with no errors. I ran sev­eral hard drive test­ing pro­grams with no errors. I changed out the PSU. At this point many peo­ple would have said it’s just the way Win­dows works, live with it. If Vista had been on the com­puter I’m sure that’s where the blame would have been placed by many. I replaced the RAM, which had been tested many times for many hours. The com­puter has been run­ning trou­ble free ever since. The RAM is now in a dif­fer­ent com­puter also run­ning trou­ble free. Who knows what the cause of the prob­lem was. I’m sure it’s because mass pro­duc­ing things like moth­er­boards and RAM to a price point means that cor­ners are cut. The moral of this story? Diag­nos­ing com­puter prob­lems is as much an art as a science.