Learn, Solve & Master Python, Linux, SQL, ML & DevOps
Linux Admin & Troubleshooting Checklists
This page helps you troubleshoot Linux systems in a clear and structured way.
Start at the top if the issue is not obvious, or jump to a specific section if you already know where the problem might be.
Follow the checks in order and confirm each step before moving ahead.
General Linux Troubleshooting Checklist
Use this checklist when the system is slow, unstable, or behaving unexpectedly and the root cause is not yet clear. This is the starting point when symptoms are vague or multiple components seem affected.
Initial System Health
Check system uptime and recent reboots to understand whether the issue started after a restart or a long-running session
Review CPU load and overall system usage to see if the system is under constant pressure or experiencing spikes
Verify memory and swap usage to identify memory exhaustion or excessive swapping
Check disk usage on all mounted filesystems to rule out storage related failures
Running Processes and Services
Identify processes consuming unusually high CPU or memory over time
Check the status of critical system and application services
Look for services that are frequently restarting or failing silently
Confirm that expected background jobs and schedulers are running
Logs and Errors
Review system logs for recent errors, warnings, or failures
Look for repeated log messages that indicate persistent issues
Note timestamps and correlate them with reported problems
Pay attention to permission, resource, and timeout related errors
Network Basics
Confirm network interfaces are up and correctly configured
Verify basic connectivity to the local network and external hosts
Check DNS resolution to rule out name resolution issues
Ensure there are no obvious routing or gateway problems
Disk Space and Filesystem Troubleshooting
Use this section when a filesystem is full, applications fail due to storage issues, or disk usage grows unexpectedly over time.
Filesystem Usage
Check overall disk usage by mount point to identify affected filesystems
Confirm which filesystem is approaching capacity
Verify mount options and available space reporting
Finding Space Consumers
Locate directories consuming the most disk space
Identify large files that may have grown unexpectedly
Check application data directories for abnormal growth
Look for temporary or cache directories that are not being cleaned
Common Disk Issues
Log files growing continuously due to missing or broken rotation
Temporary files accumulating over time
Deleted files still held open by running processes
Inode exhaustion even when disk space appears available
Safety Notes
Avoid deleting files without understanding their purpose
Always confirm ownership and usage before removing data
Prefer stopping services before cleaning related files
Memory and Performance Checks
Use this checklist when the system is slow, unresponsive, or terminating processes unexpectedly.
Memory Checks
Review total memory usage and available free memory
Check swap usage and determine whether the system is actively swapping
Identify processes consuming excessive memory
Look for memory usage that grows steadily over time
CPU and Load
Compare load average with the number of CPU cores
Identify processes causing sustained high CPU usage
Check for short spikes versus constant overload
System Stability
Look for out of memory events and kernel warnings
Check whether services are being killed or restarted automatically
Identify performance issues tied to specific workloads or schedules
Service and Application Troubleshooting
Use this section when an application fails to start, crashes repeatedly, or behaves inconsistently.
Service Status
Check whether the service is running as expected
Identify restart loops or repeated failures
Review service startup logs and exit codes
Configuration and Dependencies
Verify configuration files for syntax or permission issues
Ensure all dependent services are running
Confirm required ports are open and listening
Check environment variables and file paths used by the service
Log Analysis Checklist
Logs often reveal the root cause when symptoms are unclear or inconsistent.
What to Review
System logs for errors and warnings
Application specific logs related to the affected service
Authentication and permission related logs
Startup and shutdown logs for recent changes
How to Read Logs
Focus on recent entries first to narrow the timeframe
Look for repeated messages or patterns
Correlate timestamps with user reports or monitoring alerts
Ignore noise and focus on errors that align with symptoms
Network Connectivity Troubleshooting
Use this checklist when the system cannot reach the network, resolve DNS, or access services over specific ports.
Interface and IP
Verify network interfaces are up and correctly configured
Confirm IP address assignment and subnet settings
Check for duplicate or conflicting addresses
Connectivity and Routing
Test connectivity to the local gateway
Test external connectivity to rule out routing issues
Review routing configuration for incorrect or missing routes
DNS and Firewall
Confirm DNS resolution works as expected
Check firewall rules for blocked traffic
Verify required ports are open and accessible
Ensure security rules match the intended network behavior
Why These Checklists Matter
Commands and tools change over time, but troubleshooting logic stays the same.
These checklists help you develop a repeatable way of thinking so you can diagnose problems methodically instead of guessing under pressure.
