Skip to content

TrueCharts News

New MetalLB chart and our own operator charts.

Introdocution: Our own Operator Charts

The last few months, we’ve experimented with injecting so-called “operators” into the cluster directly when using our charts. Manifests for things like: MetalLB, Cert-Manager and CNPG where always loaded. While this system guaranteed users where always running the latest operator versions, we’ve also encountered some downsides. Primarily:

  • Loading manifests from the web is a security issue
  • Loading manifests required a pre-install job, with full-cluster permissions. Which is also a security issue.
  • Mistakes in the manifests, directly affect all users regardless of version
  • It requires creating namespaces outside of the ix-something style, while not an issue that’s something somehow iX developers voiced annoyance with.
  • It lacks any configurability for users that need a customization
  • It prevents users from using these operators outside of the TrueCharts scope on non-scale systems

To fix all of these issues, we’ve had quite the challenge. First off we needed to figure out a way of preventing users from installing multiple instances of the same operator. But we also needed to ensure ourselves that users always had the correct operators installed for the charts they want to install.

We’ve by now designed an industry leading helm logic, that scans your cluster for references of installed operators and compares those to the required operators.

Besides this logic, we also need to write the Helm Charts ourselves. This is a lot of work, as operators are often notoriosly complex to write helm charts for. Luckily we’ve enough experienced Kubernetes developers that we’re certain to pull this off!

First chart: MetalLB

As a first example of our new logic, we’re super happy to introduce our first self-build operator helm chart: MetalLB. It will be completely self-contained within it’s own namespace, not load dynamic manifests from the web and doesn’t contain risky security practices.

Obviously this chart, in the operators train, has a naming conflict with the old metallb chart in the enterprise train, so the later has been renamed to metallb-config requiring a reinstall. We want to point out that only the new metallb-config chart is compatible with the new self-build metallb operator.

We are very happy to also announce that the metallb-config chart, is fully compatible with our old and new ways of installing/managing metallb. However, new installs of the old way of handling metallb (without the chart from the operators train), will be actively disabled from now on.

To use MetalLB on new installs, one needs to install both metallb and metallb-config, in that order.

Updating to the new MetalLB helm chart

We want to point out though, that users should update the new MetalLB Helm chart as soon as possible. To update a current install using MetalLB to the new system, the following procedure can be used:

  • remove the old metallb chart coming from the enterprise train
  • run this in a root shell: k3s kubectl delete --grace-period 30 --v=4 -k https://github.com/truecharts/manifests/delete
  • install the new metallb chart from the operators train
  • wait a few minutes
  • install or update metallb-config to the latest version
  • wait a few minutes
  • Hit edit on metallb-config and save without changes if you where already on the latest version or it isn’t working yet
  • wait a few minutes

If you run into additional issues, please file a ticket with our dedicated support staff via the #support channel of our discord as normal.

Traefik Changes

BLUF: Traefik (Stable) is Deprecated. Users need to add the Enterprise channel and install Traefik. See how to get started

The use of TrueNAS Scale Certificates is also deprecated and you must migrate to Clusterissuer. (note: Clusterissuer replaced Cert-Manager)

As some of you might’ve noticed, Traefik has been a bit outdated the last few weeks. The reason behind this, was a multitude of potentially breaking todo’s where left and we don’t want to bother users with continues manual intervention on breaking changes. By now we’ve fixed the remaining issues and will soon release a breaking-change release for traefik and a patch for all the charts.

In short we’ve ensured that we only use our signature “tc-system” namespace for storing configuration and middlewares for traefik. This ensures consistent behavior for users using ingressClasses and allowed us to, cleanly, fix the known bug where a port got appended to the TrueNAS SCALE “portal” button.

This also means that charts that do not get patches because they are not ported to new common, most notably: Nextcloud Will inherently not work anymore. Though, users would’ve been ill-advised using it at all currently… due to the big ongoing nextcloud rework.

Unrelated new issue

We also got the request from iX-systems staff a while ago to limit our use of non-ix-prefixed namespaces on kubernetes. While the other work to do so, requires a lot of work building our own operator helm-charts, these Traefik changes are the initial step to comply to those wishes. The “low hanging fruit”.

As we’re working hard on building separate operator helm-charts, instead of handling it in the background.This work leads to a unrelated temporary issue, which has been created on purpose: CNPG will currently only be installed on new systems, if one of our “enterprise” charts is being installed. More news about this will be released later.

For any help, please file a ticket with our dedicated support staff via the #support channel of our discord as normal.

Introducing: TrueCharts Stop-All

Previously we’ve warned users against using the stop-button on TrueNAS SCALE. At the same time we also understand, that users expect platform uniformity between Helm and SCALE. That’s why we’re happy to announce our own solution stop our Charts: TrueCharts Stop-All!

About that stop button

First off, we would like to go into a bit more depth about the design issues of the TrueNAS SCALE “Stop” Button. We’ve hinted at it previously, but it’s always good to explain why we need to step in ourselves.

The idea with Kubernetes, is that one tool should be managing deployment of objects at a time. Often indicated by a managed-by annotation on said objects. With TrueNAS SCALE, the middleware, triggers a management tool called “Helm” to deploy objects. So far so good, a GUI isn’t magically able to trigger other software, after all.

However, with the stop button, iX decided to also start editing some of those objects themselves. Specifically “Deployments” and “StatefulSets”, setting them to 0 “replica’s” meaning “run nothing”. That sounds completely fine, however: In these cases “Helm” is the actual management tool for those objects, so everything a helm action is triggered, those modifications are instantly removed.

That’s where the problems start to become bigger and bigger, because helm actions are triggered more often than people realize. For example: A reboot also triggers helm, requiring the same “hacks” to put things “back to sleep” again.

iX also decided to not even include all default objects that are technically “running”, like: Daemonsets, Jobs and Cronjobs. Which leads to issues with breaking jobs or daemonsets/jobs locking access to PVC’s. There it becomes more complicated, as kubernetes does not only exists of those “default” objects. There are also “Custom Resources”, objects that are defined by other charts and there is also the ability of other management tools, like Operators, to add objects.

When making such changes through Helm, it would be relatively easy for Chart/App developers to mitigate this. However, iX decided not to and does not even expose the “stop” button state to the Chart/App developers, leaving us completely without tools to mitigate these design flaws.

In the end, that leave out how the stop button can get into a near endless state of limbo, if not all running objects are stopped correctly… Putting the cherry on top.

Looking for a better way forward

With that all concluded, we decided to look into “what needs to be done”, to get all our Charts to have “stop” button functionality back again. It’s clear that the stop button, even with little fixes, isn’t going to be a future proof design. It completely needs to be redesigned, including all it’s backend logic. Sadly enough, refactors of said scale (pun not even intended), are currently not the priority of iXsystems, so not something we can rely on for our users.

We concluded that the only way to do so reliably, is through Helm itself. We know which objects we have, how they need to be stopped and can do so reliably through Helm. Which means: Do it ourselves!

The solution: TrueCharts Stop-All

With the most recent updates, we’ve introduced a new option: TrueCharts Stop-All This option will cause all your running objects to slowly shut themselves down or, in the case of our postgresql backend (CNPG), “hibernate”.

It’s designed to feature support for all default kubernetes objects deployed using our common chart: Daemonsets, Deployments, StatefulSets, Jobs and Cronjobs. On top we can easily expand that to cover any operator based objects, like cnpg, that needs to be shut down as-well in the future!

How To Use Stop-All

On SCALE

On SCALE this is a little checkbox on editing the App. Check it and its done

NOTE: Do not forget to uncheck the “Stop-All” checkbox to start the App again.

Using Helm

On native Helm, the same functionality is also available: Simply set the following in your values.yaml file:

global:
stopAll: true

Updates recontinued, common-migration mostly done

We’re glad to finally announce the end of our code-freeze. Since a few days we’ve re-enabled our automatic updates and within a few weeks everything should balance out again automatically!

At the same time, we’ve not completely finished porting all stable-train charts to the new common, 65 are still missing. But we’ve clearly label those updates as breaking in the changelog when they come in. Most of those are charts that have more complications than anticipated, so need a little quality time with our maintainers which takes a while.

Known Issues

Now that we’re mostly done, we also need to report a few known issues with the new backend:

  1. DO NOT USE THE STOP BUTTON

The Stop button should not be used on any TrueNAS SCALE Apps that uses postgreSQL. Due to severe design mistakes by iXsystems, it will get into an endless loop and never finish. We’re reported the issue to iXsystems and they are not interested in fixing this.

  1. PostgreSQL breaking on reboot

We’ve seen some edge cases where the new database backend breaks after a reboot. Often after the STOP button was used, though we cannot trace the issue down back to the use of the stop button itself. These issues are reported to the folks over at CNPG and we’ve also thrown them an email to discuss whether we can fund them to fix these issues.

  1. hostNetworking changes

After much R&D, our staff have discovered quite a few nasty kubernetes-level bugs with hostNetworking. As a result, we’ve decided to never enable it by default anymore on any of our charts/apps, as we cannot guarantee its stability. For some charts that, often, require this setting (like tailscale), users would have to manually and explicitly enable it from now on.

The setting has also moved in the GUI.

  1. Deprecated certificate system and you

With most Charts ported, we want to highlight the fact that the “TrueNAS SCALE (Deprecated)” certificate option, should not be used anymore. We cannot guarantee it’s stability nor can do anything at-all to help out. It will also be removed as an option in the future, though that will be months rather than weeks.

The future

With the charts slowly all being ported, we can start working on our long-term plans again. One of those plans is a renewed focus on native Helm Charts.

For May and June, we’re planning to go all-in on improving documentation for use of our charts as normal Helm charts. At the same time we’re going to work on ensuring all our SCALE specific tricks (of which only a few are left, luckily), will have automatic alternatives for normal kubernetes clusters.

To highlight this, we’ve asked Artifact hub, to highlight our Common-Library chart, as an “official” TrueCharts Helm chart. All users of helm should be able to use the power of this advanced common-library, to build the Helm Charts they please… Without even relying on TrueCharts to host their charts for them!

Check it out here and also check out the docs as always.

*Arr revert

While most of our migration to new common worked out reasonably well, we’ve received many issues with regards to another change. Our change for the “Arr” Apps, like Radarr and Prowlarr, to their new Postgresql backend ended up terribly.

We did not correctly anticipate how hard that migration was going to be for our users and also encountered a number of bugs and design mistakes for those Apps. After long consideration and attempted bug-fixing, we’ve decided to revert the move to Postgreqsql for the “Arr” Apps, back to sqlite.

This also means that after next change (which will be flagged as breaking due to moving back the database change) you will also be able to neatly import your “Arr” App backups from old common again.

We’ve very sorry for this revert and we completely understand that we should’ve done considerably more research before implementing this move to a different database version. The revert should be made available shortly, within 24 hours.