Troubleshooting Exit Code Errors in Kubernetes

Kubernetes is an open-source platform that automates day-to-day operations like deployment, scaling, and management. One of the plus points of Kubernetes is its easy availability of support and tools. Some of the key terms of Kubernetes that you must be aware of before proceeding are Cluster, node, container, pod, and deployment.

Kubernetes troubleshooting is the process of resolving issues in Kubernetes clusters, nodes, containers, or pods.

In this article, we will look into a few exit code errors and learn how to troubleshoot them. Here are some of the exit codes provided by kubectl(command-line tool for Kubernetes) :

  • Exit Code 0 – The container exited normally; hence, no troubleshooting is required.
  • Exit Codes 1-128 – The container terminated due to an internal error.
  • Exit Codes 129-255 – the container was stopped as a result of an operating signal, such as SIGKILL or SIGINT.

Exit Code 1: Application Error

Exit Code 1 tells us that the container was stopped due to programming errors in code such as dividing by zero or image specifications referring to files that do not exist.

Start troubleshooting by diagnosing and recognizing where the issue is.

  • Try finding the files listed in the image specification by checking container logs. If you are unable to find them, correct the image specification and point towards the correct path and filename.
  • Try debugging the library that caused the error and check the container logs for an application error.
  • If an application is running for a long time it can move to a broken state and can only be resolved by restarting. To detect and resolve such situations, you can run a kubernetes liveness probe.

Exit Code 125: Container Failed to Run

Exit Code 125 refers to an error in which the docker run command did not execute successfully. This might happen due to the use of an undefined flag in the command or incompatibility between the container engine and the host operating system or hardware.

blog1.png

Source

To troubleshoot this make sure the command used has the proper syntax, and the user running the container has sufficient permissions to create containers on the host. Next, try alternate commands. For example, in Docker, instead of using docker run, you can try docker start. You can also run other containers on the host system by using the same username or context. And, you can resolve the compatibility issue by reinstalling the container engine.

Exit Code 126: Command Invoke Error

Exit Code 126 happens because of a missing dependency or an error in a continuous integration script.

So, how can we troubleshoot it? First, try to find the command that could not be invoked by checking the container logs. Next, make sure you are using the correct syntax and that all dependencies are available. Finally, after correcting the container specification, retry running the container.

Exit Code 127: File or Directory Cannot Be Found

Exit Code 127 means when a command specified in the container specification refers to a file or directory that does not exist.

Troubleshooting Exit Code 127 is the same as troubleshooting Exit Code 126. Make sure you have a filename and file path available within the container image.

Exit Code 128: Invalid Exit Code

Exit Code 128 happens when the code triggers an exit command without giving a valid exit code.

blog2.png Source

So, how can we troubleshoot Exit Code 128? Try to find the library that caused the container to exit by checking container logs. Alternatively, identify the library and correct it by providing a valid exit code.

Exit Code 134: Abnormal Termination (SIGABRT)

Exit Code 134 happens when the container has abnormally terminated. SIGABRT provides two functions. The abort() function detects an internal error, and the assert() macro, is used for debugging.

To troubleshoot Exit Code 134, put the library which triggered the SIGABRT signal in debug mode, and troubleshoot the library by modifying it.

Exit Code 137: Immediate Termination (SIGKILL)

Exit Code 137 happens when a container immediately gets terminated via the SIGKILL signal. This can happen due to the docker kill command. Alternatively, it can occur if you run out of memory, which is triggered automatically by the host.

So, how do we troubleshoot Exit Code 137. The first option is checking whether it received a SIGTERM signal (graceful termination) before receiving SIGKILL. The second method is checking if SIGTERM can gracefully terminate. The last option is to troubleshoot memory issues on the host if a container reported an OOMKilled error.

Conclusion

In this article, we have looked into seven exit codes and troubleshoot them. Kubernetes is quite popular because of its flexible nature as it makes scaling the application easy and development pipelines more successful.

Troubleshooting Kubernetes can be very complex. It’s important first to understand where and why the error is happening.

Always remember the three aspects of troubleshooting effectively: try to understand the problem, solve the problem, and prevent the problem from recurring.