该问题由于引用触发oom，进而因为kill 信号，致使pod 终端停止。
If a container is no longer running, use the following command to find the status of the container:
docker container ls -a
This article explains possible reasons for the following exit code:
"task: non-zero exit (137)"
With exit code 137, you might also notice a status of
Shutdown or the following failed message:
Failed 42 hours ago
"task: non-zero exit (137)" message is effectively the result of a
kill -9 (
128 + 9). This can be due to a couple possibilities (seen most often with Java applications):
The container received a
docker stop, and the application didn't gracefully handle
kill -15) — whenever a
SIGTERM has been issued, the docker daemon waits 10 seconds then issue a
kill -9) to guarantee the shutdown. To test whether your containerized application correctly handles
SIGTERM, simply issue a
docker stop against the container ID and check to see whether you get the
"task: non-zero exit (137)". This is not something to test in a production environment, as you can expect at least a brief interruption of service. Best practices would be to test in a development or test Docker environment.
The application hit an OOM (out of memory) condition. With regards to OOM condition handling, review the node's kernel logs to validate whether this occurred. This would require knowing which node the failed container was running on, or proceed with checking all nodes. Run something like this on your node(s) to help you identify whether you've had a container hit an OOM condition:
journalctl -k | grep -i -e memory -e oom
Another option would be to inspect the (failed) container:
docker inspect <container ID>
Review the application's memory requirements and ensure that the container it's running in has sufficient memory. Conversely, set a limit on the container's memory to ensure that wherever it runs, it does not consume memory to the detriment of the node.
If the application is Java-based, you may want to review the maximum memory configuration settings.