RL Model Versioning#

OctaiPipe has introduced a new framework for versioning Reinforcement Learning (RL) models, allowing users to manage different versions of their RL models effectively. This feature is particularly useful for tracking changes in model configurations, hyperparameters, and training environments over time.

Model versions follow a two-number system similar to semantic versioning, e.g. 1.1, 1.2, or 2.0.

Two models with the same major version, e.g. 1.1 and 1.2, are considered compatible, meaning they could be swapped out in inference and be used for the same use-case. A change in the major version, e.g. from 1.x to 2.0, indicates a breaking change, meaning the models are not compatible and cannot be used interchangeably.

Models are considered the same when:#

The model name is the same
The policy is the same
The action space inputs are the same, i.e. the same actions can be taken
The reward function inputs are the same, i.e. the same reward parameters are used

Things that do not affect the model versioning include:#

Using a different environment
Using a different observation space
Changing hyperparameters of the policy

Setting up versioning#

RL model versioning requires the user to specify which parameters define the action space and reward function inputs in the model configuration file. This is done by adding the versioning section under model_params in the configuration file, as shown below:

 model_specs:
   type: FRL
   load_existing: false
   name: test_model
   model_params:
     policy:
       name: PPO
       params: {}
     env:
       path: ./path/to/env_file.py
       params:
         init_states_path: ./path/to/file.csv
     versioning:
       action_space_args: ["action1", "action2"]
       reward_func_args: ["reward_param1", "reward_param2"]

If the versioning section is not provided, OctaiPipe will just use empty lists for both action_space_args and reward_func_args.