Tips and Tricks to SUSE Troubleshooting
Hey everyone, my name is Colin Hamilton and I’m one of the Linux Support Engineers on the SUSE team. I’m troubleshooting SUSE everyday and so I’ve got a few tips and tricks I highly recommend using in your everyday SUSE tech filled life.
1. Have a test system for exploring
I have over 30 virtual machines with various configurations and releases of the SUSE Linux Enterprise OS. Each virtual machine also has snapshots so that I can quickly mirror issues someone might be having and then revert back afterward. This also makes it very easy to test new configurations to see how they might affect a server. I know that, at least for me, hands-on experience ends up being the most effective training.
You wanna know more about PAM and what security measures you can implement? Your test system can help with that.
You’ve always wanted to know what could be done if an entire lib directory was lost? Your test system can help with that.
You’ve always wanted to try out that cool new technology SUSE added to their repositories? Bam. Your test system can help with that.
Exploring is really the key here. A test system is only as good as the user makes it. These are the opportunities to set things up, break things down, and build them back up again. Then when a real issue occurs, or a real change is wanted on the production system, you’ll be more prepared for it.
2. Learn the process step by step
A great example of this is the boot process of Linux. If the system is giving you a GRUB prompt, but you don’t even know what GRUB is or where its CLI might be in the process of booting, things are going to get sticky. If the system says it can’t find the OS, if it’s hanging on a particular message on the screen, or if it’s kicked you into emergency mode, you’re going to have a much harder time troubleshooting the issue if you don’t know the process well.
Take the time to learn a process step by step so that when it breaks you can quickly determine where the problem is coming from. You may not know what the problem is exactly yet, but you will have ruled out 100 other possibilities right from the get-go. If you’re like me, and in your heart you’re as lazy as sin, don’t make yourself work harder than you have to. Time spent preparing early on makes for quick problem solving later so that we can return to our regular showing of #nerdlife. (What fictional character bobble-head should I get for my desk this time? Hmmmm.)
3. Read the documentation
I like to try and see the good in people… I really do. But I can’t tell you how many times I’ve had a problem come in that came from simply not reading the documentation. “Remember that big square that was in red and said IMPORTANT? You didn’t do that.” Now we’ve all had that time where we accidentally glazed over part of the documentation, but in the end It all comes down to that smart laziness factor. Better troubleshooting makes for better laziness later.
How much time did that person spend trying to fix a self-inflicted problem? If they had just read the documentation they would have made much more time for laziness later. I prefer to maximize my laziness. So do yourself a flavor and read the documentation.
4. Use your google-fu
Whenever I’m doing a tech interview I always make sure I have a browser window open and available to the interviewee. I let them know, “Googling is not looked down on here. In fact, Google-fu is an important skill we want you to have. Feel free to use it as needed.”
This doesn’t mean I’m not expecting them to have a strong foundation of knowledge in Linux already, but Linux is an immense platform with hundreds and hundreds of different services. It’s silly to think one person can know them all perfectly. Learn all that you can, constantly build on your knowledge, and supplement the rest with Google prowess.
In summary: Google is your friend. Don’t snub your friend. (Especially when this friend has literally fountains of knowledge that could help you.)
5. Be on the latest patches
This one is of particular value. Now, I know that in a real production system we can’t always go around patching all the time willy-nilly. However, if you’re running into what is starting to look like a bug, odds are the bug is already fixed in the patches you haven’t applied. Now is the time to have that handy-dandy test system I told you about ready to replicate the issue. You can patch and see if the issue is gone, and make sure all the services your production system use still work normally.
If the test system, fully patched, is still having the problem that is a good time to get us Support Engineers on it so that we can verify if it is a bug, create a bug report, and give it the love and attention it deserves. Best of all, since you’re on the latest code, we won’t have to question whether the issue was already fixed in a patch. This means that if it is an unresolved bug, we’ll be able to get it taken care of that much faster.
I hope I’ve changed all of your lives forever and that this paradigm shift in your lives will mean less work for me. Now back to finding a new toy for my desk…