Linux Admin & Troubleshooting Checklists

This page helps you troubleshoot Linux systems in a clear and structured way.
Start at the top if the issue is not obvious, or jump to a specific section if you already know where the problem might be.
Follow the checks in order and confirm each step before moving ahead.

General Linux Troubleshooting Checklist

Use this checklist when the system is slow, unstable, or behaving unexpectedly and the root cause is not yet clear. This is the starting point when symptoms are vague or multiple components seem affected.

Initial System Health

Check system uptime and recent reboots to understand whether the issue started after a restart or a long-running session
Review CPU load and overall system usage to see if the system is under constant pressure or experiencing spikes
Verify memory and swap usage to identify memory exhaustion or excessive swapping
Check disk usage on all mounted filesystems to rule out storage related failures

Running Processes and Services

Identify processes consuming unusually high CPU or memory over time
Check the status of critical system and application services
Look for services that are frequently restarting or failing silently
Confirm that expected background jobs and schedulers are running

Logs and Errors

Review system logs for recent errors, warnings, or failures
Look for repeated log messages that indicate persistent issues
Note timestamps and correlate them with reported problems
Pay attention to permission, resource, and timeout related errors

Network Basics

Confirm network interfaces are up and correctly configured
Verify basic connectivity to the local network and external hosts
Check DNS resolution to rule out name resolution issues
Ensure there are no obvious routing or gateway problems

Disk Space and Filesystem Troubleshooting

Use this section when a filesystem is full, applications fail due to storage issues, or disk usage grows unexpectedly over time.

Filesystem Usage

Check overall disk usage by mount point to identify affected filesystems
Confirm which filesystem is approaching capacity
Verify mount options and available space reporting

Finding Space Consumers

Locate directories consuming the most disk space
Identify large files that may have grown unexpectedly
Check application data directories for abnormal growth
Look for temporary or cache directories that are not being cleaned

Common Disk Issues

Log files growing continuously due to missing or broken rotation
Temporary files accumulating over time
Deleted files still held open by running processes
Inode exhaustion even when disk space appears available

Safety Notes

Avoid deleting files without understanding their purpose
Always confirm ownership and usage before removing data
Prefer stopping services before cleaning related files

Memory and Performance Checks

Use this checklist when the system is slow, unresponsive, or terminating processes unexpectedly.

Memory Checks

Review total memory usage and available free memory
Check swap usage and determine whether the system is actively swapping
Identify processes consuming excessive memory
Look for memory usage that grows steadily over time

CPU and Load

Compare load average with the number of CPU cores
Identify processes causing sustained high CPU usage
Check for short spikes versus constant overload

System Stability

Look for out of memory events and kernel warnings
Check whether services are being killed or restarted automatically
Identify performance issues tied to specific workloads or schedules

Service and Application Troubleshooting

Use this section when an application fails to start, crashes repeatedly, or behaves inconsistently.

Service Status

Check whether the service is running as expected
Identify restart loops or repeated failures
Review service startup logs and exit codes

Configuration and Dependencies

Verify configuration files for syntax or permission issues
Ensure all dependent services are running
Confirm required ports are open and listening
Check environment variables and file paths used by the service

Log Analysis Checklist

Logs often reveal the root cause when symptoms are unclear or inconsistent.

What to Review

System logs for errors and warnings
Application specific logs related to the affected service
Authentication and permission related logs
Startup and shutdown logs for recent changes

How to Read Logs

Focus on recent entries first to narrow the timeframe
Look for repeated messages or patterns
Correlate timestamps with user reports or monitoring alerts
Ignore noise and focus on errors that align with symptoms

Network Connectivity Troubleshooting

Use this checklist when the system cannot reach the network, resolve DNS, or access services over specific ports.

Interface and IP

Verify network interfaces are up and correctly configured
Confirm IP address assignment and subnet settings
Check for duplicate or conflicting addresses

Connectivity and Routing

Test connectivity to the local gateway
Test external connectivity to rule out routing issues
Review routing configuration for incorrect or missing routes

DNS and Firewall

Confirm DNS resolution works as expected
Check firewall rules for blocked traffic
Verify required ports are open and accessible
Ensure security rules match the intended network behavior

Why These Checklists Matter

Commands and tools change over time, but troubleshooting logic stays the same.
These checklists help you develop a repeatable way of thinking so you can diagnose problems methodically instead of guessing under pressure.

0 0 votes

Article Rating