DevOps Zone is brought to you in partnership with:

Ranjib is a system administrator at Google. Prior to Google, Ranjib was a senior consultant with ThoughtWorks. He works on private cloud implementation strategies, cloud adoption, system automation etc. He has worked on both application development as well as system administration, for past 6 years. Prior to ThoughtWorks, Ranjib was working with Persistent Systems . Ranjib has done his gradation in lifescience and masters in Bioinformatics. Ranjib is a staunch FOSS supporter. Ranjib is a DZone MVB and is not an employee of DZone and has posted 13 posts at DZone. You can read more from them at their website. View Full User Profile

Generic Linux System Debugging

12.18.2012
| 1561 views |
  • submit to reddit

Following is a list of commands i use for day to day system debugging. These are very generic commands, they do not assume any understanding of what the server is running. As in, whether its a database server, a web server or a backup server etc. I use alsmost the same set of tools also to monitor most of our servers. 

I am assuming we are on RHEL or RHEL derivatives , albeit exact same or similar tools are available for debians as well 

First install some of the necessary packages (you should remove some of these once the troubleshooting session is over)

yum install htop traceroute screen telnet sysstat iptraf-ng

Check  global server health related stuffs

Check system resources:

Check cpu load:

w

Check  memory:

free -m

 Total Number of processess:

ps aux | wc -l

Check free disk space:

df -h

Look ate changes in pattern, context switches, io rate changes. You should know the output of vmstat very well.

vmstat 1 5 

Check disk usage of a file/directory

du -sh  /*

To externally figure out what all ports are open(from outside the server),execute this:

nmap -P0 <ip>

To check tcp network reachability:

tcptrace-route <ip>

To check the bandwidth usage of by host , traffic type use these tools:

iptraf-ng

tcpdump

Wondering what a file does? Whats his type? From where it came?

To check the type of a file execute this:

file <filename>

To check which package has installed this file, execute this:

rpm -qf <path to a file>

To check what all that package has installed, execute this:

rpm -ql <name of package>

99% of the problems are resource crunch(disk, memory, cpu, io etc) due to one or many processes 

Following is a set of command that can help you nail down the process

To list out all the programs that are listening to a tcp or udp port, execute this

netstat -tulpn

To nail down a process from its behavior:

To Find the process bind to a port, execute any of this:

lsof -i :<port>

netstat -tulpn | grep <port>

fuser <port>/<protocol>

To Find the process that is using a file, execute this:

fuser <filename>

Once the process causing the crunch is known

To list resource usage of an individual process, execute this:

ps -p <pid> -o comm,args,pcpu,pmem,rss

To check syscall profiles for a program/executable:

strace -c <executable file name>

To attach to a running process, and check the syscall related details

strace -p <pid>

Published at DZone with permission of Ranjib Dey, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)