Phoeβe, where AI meets Linux
Phoeβe (/ˈfiːbi/) wants to add basic artificial intelligence capabilities to the Linux OS.
System-level tuning is a very complex activity, requiring the knowledge and expertise of several, if not all, layers which compose the system itself and how they interact with each other. Quite often, it is required to also have an intimate knowledge of the implementation of the various layers.
Failure is a common aspect of running systems, and learning how to deal with it is critical. This is less about a machine catching fire but more about an overloaded system, caused by misconfiguration, which could lead to starvation of the available resources.
In many circumstances, operators are used to dealing with telemetry, live charts, alerts, etc. which could help them identify the offending machine(s) and (re)act to fix any potential issues.
However, one question comes to mind: wouldn’t it be awesome if the machine could auto-tune itself and provide a self-healing capability to the user? Well, if that is enough to trigger your interest then this is what Phoeβe aims to provide.
Phoeβe uses system telemetry as the input to its brain and produces settings which get applied to the running system. The decision made by the brain is continuously reevaluated (considering the grace_period setting) to offer eventually the best possible setup.
Phoeβe is designed with a plugin architecture in mind, providing an interface for new functionality to be added with ease.
The code allows for both training and inference: all the knobs which can modify the run-time behavior of the implementation are configurable via the settings.json file, where each parameter is explained in detail.
For the inference case, when a match is found, then the identified kernel parameters are configured accordingly.
The inference loop runs every N seconds and the value is configurable via the inference_loop_period. Depending on how quick we want the system to react to a situation change, then the value given to the inference_loop_period will be bigger or smaller.
The code has a dedicated stats collection thread which periodically collects system statistics and populates structures used by the inference loop. The statistics are collected every N seconds, and this value is configurable via the stats_collection_period. Depending on the overall network demands, the value of stats_collection_period will be bigger or smaller to react slower or quicker to network events.
In case a high traffic rate is seen on the network and a matching entry is found, then the code will not consider any lower values for a certain period of time: the value is configurable via the grace_period in the settings.json file.
That behavior has been implemented to avoid causing too much reconfiguration on the system and to prevent sudden system reconfiguration due to network spikes.
The code also supports few approximation functions, also available via the settings.json file.
The approximation functions can tune the tolerance value – runtime calculated – to further allow the user for fine tuning of the matching criteria. Depending on the approximation function, obviously, the matching criteria could be narrower or broader.
The code is full publicly available on GitHub at https://github.com/SUSE/phoebe/ where more information about setting Phoeβe up and its code could be found.