Skip to main content

Recover CNPG App after Reboot

Apps with a PostgreSQL database that were updated to the new CNPG common sometimes don't survive a reboot of TrueNAS Scale. The App then hangs on DEPLOYING and pods are in state Completed or TaintToleration.

Best Effort Policy

This guide has been written with the best efforts of the staff and tested as best possible. We are not responsible if it doesn't work for every scenario or user situation or if you suffer data loss as a result. This guide has been tested with TrueNAS SCALE 22.12.4.2, Cobia beta, CNPG 1.20.2_2.0.3 and HomeAssistant 2023.10.3_20.0.12.

Symptoms

If you have rebooted and your Apps are hanging on DEPLOYING, check if you see pods in state Completed or TaintToleration and the apps main pod in state Init with the command k3s kubectl get all -n ix-<app-name> .

Examples:

k3s kubectl get all -n ix-home-assistant
NAME READY STATUS RESTARTS AGE
pod/home-assistant-cnpg-main-1 0/1 TaintToleration 0 12h
pod/home-assistant-cnpg-main-2 0/1 TaintToleration 0 12h
pod/home-assistant-85865456d5-tc8h4 0/1 TaintToleration 0 12h
pod/home-assistant-85865456d5-kl96x 0/1 Init:0/2 0 12h

k3s kubectl get all -n ix-home-assistant
NAME READY STATUS RESTARTS AGE
pod/home-assistant-cnpg-main-2 0/1 Completed 0 22m
pod/home-assistant-cnpg-main-rw-df9bcbccc-s8z2n 0/1 Completed 0 23m
pod/home-assistant-cnpg-main-rw-df9bcbccc-ptltn 0/1 Completed 0 23m
pod/home-assistant-cnpg-main-rw-df9bcbccc-jbbcj 1/1 Running 0 12m
pod/home-assistant-5867d984d9-gfznd 0/1 Completed 0 23m
pod/home-assistant-cnpg-main-1 0/1 Completed 0 23m
pod/home-assistant-cnpg-main-rw-df9bcbccc-q2w2d 1/1 Running 0 12m
pod/home-assistant-5867d984d9-vcp6x 0/1 Init:0/2 0 12m

Logs from the cnpg-wait container in the main app pod show something like this:

Testing database on url:  home-assistant-cnpg-main-rw
home-assistant-cnpg-main-rw:5432 - no response

Recovery Steps

To recover your app, you need to first stop it (do not click the Stop button!), delete the hanging pods and then restart the app.

  1. Stopp the app either by checking "Stop All" in the app settings or with HeavyScript.
heavyscript app --stop <app-name>`
  1. Wait 2-3min
  2. Delete any still hanging pods with
k3s kubectl delete pods -n ix-<app-name> <pod name>`  
e.g. k3s kubectl delete pods -n ix-home-assistant home-assistant-85865456d5-tc8h4
  1. Start the app either by unchecking "Stop All" in the app settings or with HeavyScript
heavyscript app --start <app-name>
  1. If you unchecked "Stop All" you might have to click the Start Button on the GUI (Start is safe, Stop is NOT).
    There also might be a task that gets stuck in TrueNAS under Jobs (top right). You can get rid of those by restarting TrueNAS GUI with
systemctl restart middlewared
  1. Wait 2-3min
  2. Check that the app and all of its pods are running. In the third paragraph there should be no deployment.apps with 0 AVAILABLE
Example:  
k3s kubectl get all -n ix-home-assistant`
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/home-assistant-cnpg-main-rw 0/0 0 0 14h
deployment.apps/home-assistant 1/1 1 1 14h
  1. You can scale them up manually to 1 replica or if it's a cnpg-main-rw pod you might want 2 replicas
k3s kubectl scale deploy <deployment.apps-name> -n ix-<app-name> --replicas=1
e.g. k3s kubectl scale deploy home-assistant-cnpg-main-rw -n ix-home-assistant --replicas=2

Safe Reboot

You have recovered your apps but need to reboot again? There is a safe and faster way by unsetting the app pool before rebooting:

  1. Apps --> Settings --> Unset Pool
  2. Wait for apps to scale down. No installed apps will be shown, but they are not deleted.
  3. Reboot
  4. Apps --> Settings --> Choose Pool --> select your app pool
  5. Wait for apps to scale up
note

There is already an issue for this problem and the CNPG code is currently being reworked that should address this. Please do not create additional issues or +1 comments, the devs are aware of this.

Credit

Thanks to Zasx from the TrueCharts team for the steps used to create this guide.