Original Research | Open | Published:

# Autoencoder for wind power prediction

*Renewables: Wind, Water, and Solar***volume 4**, Article number: 6 (2017)

## Abstract

Successful integration of renewable energy sources like wind power into smart grids largely depends on accurate prediction of power from these intermittent sources. Production of wind power cannot be controlled as the wind speed can vary based on weather conditions. Accurate prediction of wind power can assist smart grid that intelligently decides on the usage of alternative power sources based on demand forecast. Time series wind speed data are normally used for wind power prediction. In this paper, we have investigated the usage of a set of secondary features obtained using deep learning for wind power prediction. Deep learning is a special form on neural network that is capable of capturing the structural properties of time series data in terms of a set of numeric features. More precisely, we have designed a two-stage autoencoder (a particular type of deep learning) and incorporated the structural features into a prediction framework. Using the structural features, we have achieved as high as 12.63% better prediction accuracy than traditionally used statistical features.

## Introduction

Renewable energy sources like wind are becoming integral part of modern power systems. As reported in IRENA (2017), renewable energy accounts for around 22% of global power generation. This share is expected to double in the next 15 years. This is due to the rapid growth of variable renewable energy from sources like wind and solar photovoltaic (IRENA 2017). Renewable energy offers several advantages such as easy availability, applicability, and environmental friendly. The application of smart grid in renewable energy makes it even more promising. Smart grid engineering is the key for a beneficial use of widespread energy resources. This fusion of smart grid and renewable energy enables the efficient use of such sources.

Alongside offering the opportunities, integration of renewable energy like wind power into smart grids is not without challenges. The key issue is being the intermittent nature of wind power. The wind speed varies and so is the power produced from wind-driven power station. Also the produced energy needs to be consumed immediately unless that is stored at additional cost. It is thus highly beneficial to know in advance the amount of wind power that can be expected. It is also important from demand management point of view. Fossil fuel supplies for power generation can be adjusted based on expected demand. That is, however, not possible for wind energy sources for the reasons explained above. Prediction/forecasting of wind power is thus a necessity for integrating wind energy into smart grids.

Wind power prediction methods are developed to deal with this problem and aim to predict generated power based on historical weather/wind data by utilising data mining methods (Wang et al. 2011, 2016; Colak et al. 2012; Soman et al. 2010; Zhao et al. 2016; Jiang et al. 2017). In general, historical wind data obtained from weather stations are used by data mining algorithms to make the predictions. Wind data over time is time series data. Traditional data mining approaches model predicted wind power as a function of raw wind data over a period of time. A wind power prediction method was previously attempted in Tasnim et al. (2014) by modelling predictions as a function of statistical features extracted from raw time series data. Promising results were reported when the ensemble feature-based prediction framework was adopted.

The trend of investigating new feature representations for day-ahead wind power prediction is continued in our research presented in this paper. In this particular research work, feature representations are learnt using a particular kind of deep learning algorithm called stacked autoencoders (Ng et al. 2016; Bengio et al. 2007; Shin et al. 2014). Autoencoders generate a compressed low-dimensional structural representation of the time series (Bengio et al. 2007). A stacked autoencoder obtains structural representations (i.e. features) at multiple stages by repeated application of autoencoders on the compressed feature space. Supervised learning algorithms are trained on the compressed feature space. State-of-the-art learning performance was achieved by stacked autoencoders on images (Vincent et al. 2010; Gehrig et al. 2013), speech (Gehrig et al. 2013), agricultural applications (Rahman et al. 2016), and other structured time series (Shin et al. 2011) signals. This paper investigates whether the stacked autoencoder provides an effective representation for wind power prediction. In previous studies (Tasnim et al. 2014), an ensemble framework was considered for wind power prediction. For the sake of completeness, we also embedded the autoencoder features in cluster-based ensemble framework in Rahman and Verma (2011) and Rahman et al. (2010) and investigated its effectiveness as part of the framework too.

To the best of our knowledge, incorporation of autoencoder features in day-ahead wind power prediction framework is a novel idea and we consider this as the key contribution in this research. We have investigated the following research questions in this paper: (1) investigating the effectiveness of autoencoder features for wind power prediction, (2) comparing the performance of autoencoder features to statistical features for wind power prediction, and (3) how much improvement do we achieve by embedding the autoencoder features in ensemble framework. Experimental results reveal that we achieved as high as 12.63% improvement by using autoencoder features over statistical features.

## Proposed prediction framework

The prediction framework has normally two components: training and prediction module. During training, historical time series data are split into small time windows and prediction targets are set for each window. Feature vectors are computed from each time window. This produces a 2D (*two-dimensional*) matrix where each row represents a feature vector. The targets are presented in a column vector where *i*th entry is the target for the *i*th row in the 2D matrix. A regression algorithm is trained on these matrices to produce a model that can reproduce the targets (with minimum error) given the input vectors from the 2D matrix. During prediction, data available up-to-date are windowed and presented to the regression model to produce the predictions in the future. In this paper, we have investigated autoencoder features and also their effectiveness as part of cluster-based ensemble learning algorithms. We present both in this section. For the sake of completeness, we present the statistical features as well.

### Statistical features

We need to specify the structure of the input vector and target for training the regression models. We have used wind power as the target that needs to be predicted. Let \(\varvec{ws} = ({\text{ws}}_{0}\), \({\text{ws}}_{1}\), …, \({\text{ws}}_{n - 1} )\) be the vector representing the wind speed over \(n\) consecutive days. A set of \(m\) statistical features \(\varvec{s} = s_{1}\), \(s_{2}\), …, \(s_{m}\) are computed from the wind speed vector \({\mathbf{ws}}\) and the vector \(\varvec{s}\) as the input vector for the regression algorithm. The features were computed from the time and frequency domain representations of the wind speed vector \({\mathbf{ws}}\). *Discrete Fourier transformation* (DFT) was applied on \({\mathbf{ws}}\) to obtain the frequency domain representation of the time series data. Let \({\text{ws}}_{t}\) be the *t*th element of the time series. The *j*th element of the frequency domain representation is obtained as:

where \(n\) is the length of the vector. Here \(\varvec{ws}\) represents the wind speed at various points in time and \(\varvec{f}\) represents the signal strength at various frequencies. We have used the DC (*direct current*) component of the DFT (\(f_{0}\): component corresponding to 0 frequency) as a feature. A set of statistical features are then computed from the remaining high-frequency (> 0) spectrum of \(\varvec{f}\). The following statistical features are computed: mean, standard deviation, skewness, and kurtosis. We also used minimum and maximum of the series \(\varvec{ws}\) and \(\varvec{f}\) as features. The standard deviation, minimum and maximum features were used to represent the intensity. A total of 13 statistical features were computed from \(\varvec{ws}\) and \(\varvec{f}\).

### Autoencoder features

An *autoencoder* (AE) is one form of deep learning algorithm (Ng et al. 2016; Bengio et al. 2007; Shin et al. 2014). AE can be considered as an unsupervised variant of a neural network with one hidden layer where the target vector is set to be equal to the input vector. AE thus tries to learn an identity function. However, by reducing the number of nodes (compared to input) in the hidden layer, interesting structural features can be learned (Bengio et al. 2007). Normally a backpropagation algorithm is applied to learn the weights in the network.

For the wind power prediction problem, the AE will try to learn a function \(I_{\theta ,b}\) such that \(I_{\theta ,b} \left( {\varvec{ws}} \right) \approx \varvec{ws}\) where \(\varvec{ws}\) is the wind speed vector, \(\theta\) and \(b\) are the network and bias node weights, respectively. In other words, it tries to learn an approximate identity function such that the output \(\widehat{{\varvec{ws}}}\) is similar to \(\varvec{ws}\). A *stacked autoencoder* (SAE) (Bengio et al. 2007) is a neural network consisting of multiple layers of AE. In SAE, the outputs of one stage become the input to the successive stage. The parameters of each stage of the SAE are learned independently in a greedy fashion. In the first stage of SAE, \(\varvec{ws}\) is converted to new feature vector \(\varvec{h}_{\text{ws}}^{1}\) that represents the output of the hidden units. \(\varvec{h}_{\text{ws}}^{1}\) the first set of structural features. In the second stage of SAE, \(\varvec{h}_{ws}^{1}\) set as the input and target of the AE and a new set of structural features \(\varvec{h}_{\text{ws}}^{2}\) is learned. The process is repeated to obtain structural features \(\varvec{h}_{\text{ws}}^{N} \varvec{ }\) at SAE stage \(N\). The best number of layers is decided based on trial and error.

We have utilised a SAE to obtain the structural features from the different stages, which are then used as the features in the wind power prediction framework. The SAE feature-based wind power prediction framework is present in Fig. 1.

### Cluster-based ensemble prediction

Previous studies (Tasnim et al. 2014) indicate the ensemble learning can improve prediction performance. In addition to SAE features, we thus investigated how the AE features perform in combination with ensemble learning. Data suggest existence of natural clusters within wind data, and we thus investigated cluster-based ensemble learning in this regard. Cluster-based classification was investigated previously in Verma and Rahman (2012), and later adopted in Tasnim et al. (2017) as cluster-based regression. The training and test workflow of *cluster-based ensemble regression* (CBER) for wind power prediction are present in Figs. 2 and 3, respectively. Training data (built on feature representation of wind speed data) are clustered first, and regression models (mapping functions) are trained on each cluster separately. When predicting for test/new sample, the appropriate (nearest) cluster is first found, and the regression model corresponding to that cluster maps the test sample to wind power. We have investigated the influence of incorporating SAE features into CBER framework.

## Experimental setup

We have obtained historical wind speed data from *Bureau of Meteorology* (BoM) at 70 different stations across Australia. A total of ten locations were selected from each state randomly, and daily wind speed data were collected for each location. The duration of the time period varies between stations. A time window of 30 days was used to extract wind speed records from historical data, and 13 statistical features (“Statistical features” section) were computed on each of these time windows. The target for the vector was set to be day-ahead wind power. We did not have any historical record of wind power production. However, a power curve associated with a turbine can provide a nonlinear transformation from wind speed to power. We have utilised the power curve of Siemens SWT–2.3 82 turbine (Staffell 2017) as used in Tasnim et al. (2014) to produce the corresponding power a day ahead. The power curve for this turbine is present in Fig. 4. We used 80% of the data for training and 20% for testing. The best learning models were obtained from the training data and applied on test data to compute prediction accuracy. The performance reported here is based on test set errors.

Given a window size of 30 for the time series data, we developed a two-stage autoencoder with 25 hidden nodes at stage 1 and 13 nodes at stage 2. Thus, the number of features (i.e. activations) at stage 1 and 2 are 25 and 13, respectively. The networks were trained for 100 and 50 iterations at stage 1 and stage 2, respectively. The desired average activations were set to 0.01. The weight decay parameter and sparsity penalty terms were set to zero. The autoencoder was designed using the guidelines from UFLDL Tutorial (2016). We have conducted the experiments in MATLAB. We have utilised the linear regression implementations in MATLAB and LibSVM (Chang and Lin 2011) implementation of the nonlinear SVM (*support vector machine*) regression. We have used \(k\)-means clustering algorithm for the CBER framework. We prepared the data set to forecast one-day-ahead power. We assume that all the features are equally important unlike feature selection methods (Rahman and Murshed 2004).

## Results and discussion

The analysis on the performance of the SAE features for wind power prediction is presented in this section. We first compare the performance of SAE features at different stages. Given that the length of the input feature space is only 30, we designed a two-stage SAE with the first stage producing 25 (around 80% of length of input vector) and the second stage producing 13 (around 50% of the length of stage 1 SAE vector) structural features. If additional stages are added with further reduction in hidden units, the feature space will be too small after that to represent anything meaningful. Hence, we designed the two-stage SAE. The regression errors produced on the 70 stations with SAE stage 1 and stage 2 structural features using *linear regression* (LR) and SVM regression methods are present in Fig. 5. Stage 1 SAE features perform better than stage 2 SAE features on 68 out of 70 stations using both LR and SVM. This suggests that stage 1 SAE features capture the structure of the underlying time series better than stage 2 SAE features. On an average, the regression error with stage 1 SAE features 4.91% lower than that of stage 2 SAE features.

The prediction performance of SAE features to that of statistical features is compared next in Figs. 6 and 7. *Stage* 1 (S1) and *stage* 2 (S2) SAE features perform better than statistical features on 59 and 52 stations, respectively, using LR. Similarly, SAE S1 and S2 features outperform statistical features on 68 and 61 stations, respectively, using SVM regression. This implies SAE structural features are more suitable for regression compared to structural features. On an average, SAE S1 features perform 8.57 and 12.63% better than statistical features using LR and SVM regression, respectively. Similarly, SAE S2 features perform better than statistical features by 3.66 and 6.05% using LR and SVM regression, respectively.

Next we investigated the performance of SAE features as part of the CBER framework discussed in “Cluster-based ensemble prediction” section. CBER is formulated on incorporation of the natural groups within data into the learning process. Figure 8 presents the performance of SAE features as part of CBER (ensemble) framework and when used individually without any ensemble model. The CBER either performs better than individual or equally in all 70 stations using LR and SVR. On an average, CBER performs 1.20 and 0.53% better than the base (i.e. individual) learner using LR and SVR, respectively. The improvement, however, is very little and this implies CBER framework has little influence on improving the performance of SAE features. Finally, we compared the performance of LR and SVR with SAE features in Fig. 9. LR performs better than SVR on 37 occasions. Realistically there is no significant difference between them. This implies linear regression (LR) suits well for some stations whereas nonlinear regression (SVR) suits well for some stations.

## Conclusion

In this paper, an algorithm for wind power prediction is presented using autoencoder. A two-stage stacked autoencoder (a particular type of deep learning) is designed to produce structural features and incorporate them into different learning frameworks for predicting wind power. The performance of SAE features is also compared with commonly used statistical features. Then, we investigated how well the SAE features integrate with cluster-based ensemble regression. Experiments were conducted on 70 sites across the different states of Australia. Following are the findings: (1) Stage 1 SAE features perform better than the following stages. This is because of a small number of features at later stages that fail to appropriately capture the structure of the data, (2) SAE features perform as high as 12.63% better than statistical features; however, the performance depends on the usage of underlying learning algorithm, (3) Incorporation of SAE features in CBER framework improves the prediction performance; however, the improvement is very little, and (4) Choice of linear or nonlinear regression algorithm with SAE features depends on the data characteristics of the station as there was not a clear winner. In future, we aim to investigate other variants of deep learning algorithms to improve prediction accuracy of wind power.

## References

Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In B. Scholkopf, J. Platt, & T. Hoffman (Eds.),

*Advances in neural information processing systems 19 (NIPS’06)*(pp. 153–160). Cambridge: MIT Press.Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines.

*ACM Transactions on Intelligent Systems and Technology,**2,*27:1–27:27.Colak, I., Sagiroglu, S., & Yesilbudak, M. (2012). Data mining and wind power prediction: A literature review.

*Renewable Energy,**46,*241–247.Gehrig, J., Miao, Y., Metze, F., & Waibel, A. (2013). Extracting deep bottleneck features using stacked auto-encoders. In

*Proceedings of IEEE international conference on acoustics, audio and speech*(pp. 3371–3381). Vancouver.IRENA. (2017).

*Renewable energy integration in power grids*. http://www.irena.org/menu/index.aspx?mnu=Subcat&PriMenuID=36&CatID=141&SubcatID=644. Last accessed August 2017.Jiang, P., Liu, F., & Song, Y. (2017). A hybrid forecasting model based on date-framework strategy and improved feature selection technology for short-term load forecasting.

*Energy 119*, 694–709, ISSN 0360-5442. https://doi.org/10.1016/j.energy.2016.11.034.Ng, A. (2016).

*Sparse autoencoder*(pp. 1–19). http://web.stanford.edu/class/cs294a/sae/sparseAutoencoderNotes.pdf. Last accessed January 2016.Rahman, A., & Murshed, M. (2004). Feature weighting methods for abstract features applicable to motion based video indexing. In

*IEEE international conference on information technology: Coding and computing (ITCC)*(Vol. 1, pp. 676–680).Rahman, A., Smith, D., Hills, J., Bishop-Hurley, G., Henry, D., & Rawnsley, R. (2016). A comparison of autoencoder and statistical features for cattle behaviour classification. In

*2016 international joint conference on neural networks (IJCNN)*(pp. 2954–2960). Vancouver. https://doi.org/10.1109/ijcnn.2016.7727573.Rahman, A., & Verma, B. (2010). A novel ensemble classifier approach using weak classifier learning on overlapping clusters. In

*The 2010 international joint conference on neural networks (IJCNN)*(pp. 1–7). Barcelona. https://doi.org/10.1109/ijcnn.2010.5596332Rahman, A., & Verma, B. (2011). Novel layered clustering-based approach for generating ensemble of classifiers.

*IEEE Transactions on Neural Networks,**22*(5), 781–792. https://doi.org/10.1109/TNN.2011.2118765.Shin, H., Orton, M., Collins, D. J., Doran, S., & Leach, M. O. (2011). Autoencoder in time-series analysis for unsupervised tissues characterisation in a large unlabelled medical image data set. In

*Proceedings of IEEE international conference on machine learning and application*(pp. 259–264).Shin, H.-C., Orton, M. R., Collins, D. J., Doran, S. J., & Leach, M. O. (2014). Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data.

*IEEE Transactions on Pattern Analysis and Machine Intelligence,**8*(35), 1930–1943.Soman, S. S., Zareipour, H., Malik, O., & Mandal, P. (2010). A review of wind power and wind speed forecasting methods with different time horizons.

*North American Power Symposium (NAPS),**2010,*1–8. https://doi.org/10.1109/NAPS.2010.5619586.Staffell. (2017).

*Wind turbine power curves*. http://www.academia.edu/1489838/Wind_Turbine_Power_Curves. Last accessed August 2017.Tasnim, S., Rahman, A., Shafiullah, G. M., Oo, A. M. T., & Stojcevski, A. (2014). A time series ensemble method to predict wind power. In

*2014 IEEE symposium on computational intelligence applications in smart grid (CIASG)*(pp. 1–5), Orlando, FL. https://doi.org/10.1109/ciasg.2014.7011544Tasnim, S., Rahman, A., Oo, A. M. T., & Haque, M. E. (2017). Wind power prediction using cluster based ensemble regression.

*International Journal of Computational Intelligence and Applications*. https://doi.org/10.1142/S1469026817500262.UFLDL Tutorial. (2016).

*Sparse autoencoder*. http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial. Last accessed January 2016.Verma, B., & Rahman, A. (2012). Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning.

*IEEE Transactions on Knowledge and Data Engineering,**24*(4), 605–618. https://doi.org/10.1109/TKDE.2011.28.Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manjagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criteria.

*Journal of Machine Learning Research,**11,*3371–3408.Wang, X., Guo, P., & Huang, X. (2011). A review of wind power forecasting models.

*Energy Procedia,**12,*770–778.Wang, J., Song, Y., Liu, F., & Hou, R. (2016). Analysis and application of forecasting models in wind power integration: A review of multi-step-ahead wind speed forecasting models.

*Renewable and Sustainable Energy Reviews 60*, 960–981, ISSN 1364-0321. https://doi.org/10.1016/j.rser.2016.01.114.Zhao, J., Guo, Z.-H., Su, Z.-Y., Zhao, Z.-Y., Xiao, X., & Liu, F. (2016). An improved multi-step forecasting model based on WRF ensembles and creative fuzzy systems for wind speed.

*Applied Energy 162*, 808–826, ISSN 0306-2619. https://doi.org/10.1016/j.apenergy.2015.10.145.

## Authors’ contributions

All the authors made their contributions to the research and paper. The ordering of the authors is as follows based on their contribution: ST, AR, AMTO, and MEH. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Availability of data and materials

The data are all publicly available.

### Consent for publication

Not applicable.

### Ethics approval and consent to participate

Not applicable.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Author information

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

#### Received

#### Accepted

#### Published

#### DOI