Forecasting Solar Radiation using DataRobot to Optimize Power Generation

This article was written by Abdul Khader Jilani and originally appeared on the DataRobot Blog here:


“I’d put my money on the sun and solar energy,” said Thomas Edison to Henry Ford and Harvey Firestone. Indeed, the race to renewable power generation is catching pace and solar power is one of the cleanest power generation techniques in the renewable energy space.

Like with any power generation methodology, solar power generation needs to be consumed without waste; however, the availability of sunlight is limited. There is a need to optimize supply to meet demand. One way to determine if supply can meet demand is to forecast how much solar power can be generated in advance.

Forecasting how much solar power will be generated is directly dependent on the availability of solar radiation or sunlight in layman terms. Although sunlight can seem to be a simple term, the solar radiation available is measured in terms of irradiance, specifically Direct Normal Irradiance, Diffuse Horizontal Irradiance, and Global Horizontal Irradiance. Predicting these irradiance metrics across a day will allow us to accurately assess the amount of solar power that can be generated.

Direct Normal Irradiance (DNI) is the amount of solar radiation received per unit area by a surface that is always held perpendicular (or normal) to the rays that come in a straight line from the direction of the sun at its current position in the sky.

Diffuse Horizontal Irradiance (DHI) is the amount of radiation received per unit area by a surface (not subject to any shade or shadow) that does not arrive on a direct path from the sun, but has been scattered by molecules and particles in the atmosphere and comes equally from all directions.

Global Horizontal Irradiance (GHI) is the total amount of shortwave radiation received from above by a surface horizontal to the ground. This value is of particular interest to photovoltaic installations and includes both DNI and DHI.

Global Horizontal (GHI) = Direct Normal (DNI) X cos(θ) + Diffuse Horizontal (DHI)


Measuring irradiance allows us to estimate the solar power reaching the surface, and then using conversion models for solar panels or power plants, we can estimate the amount of solar power generated from said power generation facility. One such traditional method is mentioned in this paper.

Now that we understand how to measure the available solar energy, we can use DataRobot to employ the latest machine learning techniques to forecast the solar energy for producing solar based electricity. For supervised machine learning models, you need historical data to train the models and make forecasts for the future. Solar radiation measurement initiatives have been available for a while and with the advent of advanced sensors and satellite imagery, they are improving at a fast pace. The National Solar Radiation Database (NSRDB) is one such comprehensive database that monitors and stores temporal and spatial solar radiation information from many locations across the globe. This information is currently measured using geo-stationary satellites and earlier using geo-sensors at airports.

National Solar Radiation Database (NSRDB)


The dataset for this exercise can be downloaded using the NSRDB Data Viewer.

NSRDB Data Viewer


From the dataset we can observe the targets and the input features. The targets are Clearsky GHI, Clearsky DNI, and Clearsky DHI, and the units are in watts per square meters. Input features include Year, Month, Day, Hour, Minute, Cloud Type, Dew Point, Temperature, Pressure, Relative Humidity, Solar Zenith Angle, Precipitable Water, Wind Direction, Wind Speed, and Fill Flag. The data is available at half-hour intervals. We will add a new column “Time”, which is not explicitly available in the dataset. This will allow us to leverage the DataRobot’s Automated Time Series models.

We can start building our models to forecast DHI and DNI, and we can empirically calculate GHI. We will build models to forecast DHI for the next 12 hours, and this can be seen in the following project settings.

DataRobot Time Aware Modeling


DataRobot automatically determines the best backtest methodology for the dataset, however, we can customize it further.

DataRobot automatically determines the best backtest methodology


DataRobot starts modeling after we enable some additional settings like including advanced ensembling and blueprints. Once the DataRobot project is ready with the models trained and recommended, we can explore the performance of the models.

DataRobot’s Automated Time Series feature automatically generates time aware features from this dataset. In one iteration DataRobot had generated 247 time aware features from the 19 input features and determined that 44 features were enough for an accurate and fast model.

The recommended model is quite stable across backtests and holdout.


Looking at the feature impact we can understand what factors determine the amount of solar radiation available.

DataRobot Feature Impact


Given that all this is possible with just a few clicks, experimenting with different ideas is a breeze. We tried a few experiments namely increasing the feature derivation window, modeling with only recent data versus all available data, and analyzing leap year versus non leap year to evaluate if these improve performance of the models.


Now that we are able to forecast DHI, we can repeat the above steps either through platform interface or through the Python API to model DNI. GHI can be calculated from the predicted DHI and DNI using this formula:

Global Horizontal (GHI) = Direct Normal (DNI) X cos(θ) + Diffuse Horizontal (DHI).

Once GHI is forecasted, we can use mathematical formulations to calculate the power produced in kilowatt hours by a solar plant from Irradiance which is in watts per square meter.