Quickstart¶
If you just want to run all of the steps from training a model to using the model to obtain predictions, here are the steps to run. You may be able to skip some of these depending on if the data has already been processed.
Each of the Python files can use an input configuration .yml file which contains all inputs to be used alongside descriptions of the inputs.
If used, the input file, which will be referenced as input_config.yml
, will be passed when calling the training process (this file will be discussed below).
Each Python file can also use values passed to through command line arguements. All values except the config file are optional.
Thus, the config file must still be passed, but the CLI arguements will be given priority. A mix of command line and
config files can also be used.
Note
The scripts for the project are in the src/
folder of the repository. All CLI parameters can be viewed
by calling the python file with --help
arguement (Example: python train_model_pipeline.py --help
).
Note
Every file can be run independently, but the recommended steps are to run one of the two pipelines presented below.
Warning
Depending on the size of the input files, the amount of RAM needed can be exceedingly high. Thus, the inputs should be split as needed if the RAM usage is too high. Further works can add the functionality to further train a model.
Training a model¶
The first step to perform is to generate a trained Machine Learning model to later be used for obtaining predictions on energy or costing values.
Obtaining a trained model can be performed by calling the train_model_pipeline.py
file and passing the appropriate arguements.
Assuming that only the input_config.yml
file is used and located inside of the current directory, the arguement to pass is:
python train_model_pipeline.py input_config.yml
This will preprocess all input files and any corresponding weather files which need to be retrieved (outputting various preprocessed files which can later be removed), perform feature selection (outputting a .json file), and train the model (outputting a .h5 trained model, two .pkl files, a .json file, and a .csv file). These steps are performed appropriately for both energy and costing in sequence.
Getting model predictions¶
With a trained Machine Learning model in the form of a .h5 file, the model will be used to obtain the predicted daily Megajoules per meter squared
and the predicted aggregated Gigajoules per meter squared for energy predictions (for a specified timeframe) for a batch of specified building files
within a specified directory. Costing predictions will then be output. The ouputs can be provided as the total energy/costing values or the
brekdowns of those totals depending on which Docker image available is used.
Obtaining the predictions from a model can be performed by calling the run_model.py
file and passing the appropriate arguements.
Assuming that only the input_config.yml
file is used and located inside of the current directory, the arguement to pass is:
python run_model.py input_config.yml
This will preprocess all input files and any corresponding weather files which need to be retrieved, and obtain predictions for all input data (outputting two energy .csv files and one costing .csv file which contain the prediction outputs alongside identifiers for each building in the form building_file_name/building_index).
Input configuration file (input_config.yml)¶
When providing an input to one of the pipelines above, the command line arguements can be used or the input configuration can be used (or a combination of the two).
The template for the input_config.yml
file can be found here.
Within this configuration file are different fields which are required for training and/or running. Each section is clearly labelled with
comments describing what the input is used for and which are mandatory. If not provided within this file, the appropriate input must be provided through
the CLI.
It is important to note that any input locations specified within this file or which are provided through the CLI are modified within the code.
Specifically, any input filename or directory provided as an input will have the Docker file’s input path appended to the beginning of the value if needed.
For example, if a file named electricity_building_params.xlsx is within the root folder which has been linked to the Docker’s input path
(/home/btap_ml/input), then when input to the input_config.yml file, only the value electricity_building_params.xlsx should be used in the
configuration file. When running outside of Docker, the input and output paths can be adjusted within the config.py
file.
Similarly, when passing the input_config.yml
file to the pipelines through the CLI, if the configuration file is at the root of the linked
input file from the Docker container, only input_config.yml
will need to be passed as an arguement to the CLI. More information on
the Docker image is available on DockerHub.