time series forecasting, LSTM network, LSTM layer size, forecasting accuracy, root-mean-square error, maximum absolute error


Background. LSTM neural networks are a very promising means to develop time series analysis and forecasting. However, as well as neural networks for other fields and applications, LSTM networks have a lot of architecture versions, training parameters, and hyperparameters, whose inappropriate selection may lead to unacceptably poor performance (poor or badly unreliable forecasts). Thus, optimization of LSTM networks is still an open question.

Objective. The goal is to ascertain whether the best forecasting accuracy is achieved at such a number of LSTM layer neurons, which can be determined by the time series lag.

Methods. To achieve the said goal, a set of benchmark time series for testing the forecasting accuracy is presented. Then, a set-up of the computational study for various versions of the LSTM network is defined. Finally, the computational study results are clearly visualized and discussed.

Results. Time series with a linear trend are forecasted worst, whereas defining the LSTM layer size by the lag in a time series does not help much. The best-forecasted are time series with only repeated random subsequences, or seasonality, or exponential rising. Compared to the single LSTM layer network, the forecasting accuracy is improved by 15 % to 19 % by applying the two LSTM layers network.

Conclusions. The approximately best forecasting accuracy may be expectedly achieved by setting the number of LSTM layer neurons at the time series lag. However, the best forecasting accuracy cannot be guaranteed. LSTM networks for time series forecasting can be optimized by using only two LSTM layers whose size is set at the time series lag. Some discrepancy is still acceptable, though. The size of the second LSTM layer should not be less than the size of the first layer.


B. Schelter, M. Winterhalder, and J. Timmer, Handbook of Time Series Analysis: Recent Theoretical Developments and Applications, Wiley, 2006, doi: 10.1002/9783527609970.

V. Kotu and B. Deshpande, “Chapter 10. Time Series Forecasting”, in: Predictive Analytics and Data Mining, Kotu V. and Deshpande B., Eds. Morgan Kaufmann, 2015, pp. 305–327, doi: 10.1016/B978-0-12-801460-8.00010-0.

V. Kotu and B. Deshpande, “Chapter 12. Time Series Forecasting”, in: Data Science, 2nd ed., Kotu V. and Deshpande B., Eds. Morgan Kaufmann, 2019, pp. 395–445, doi: 10.1016/B978-0-12-814761-0.00012-5.

J. G. De Gooijer and R. J. Hyndman, “25 years of time series forecasting”, Int. J. of Forecasting, vol. 22, no. 3, pp. 443–473, 2006, doi: 10.1016/j.ijforecast.2006.01.001.

R. DiPietro and G. D. Hager, “Chapter 21. Deep learning: RNNs and LSTM”, in: Handbook of Medical Image Computing and Computer Assisted Intervention, Zhou S. K., Rueckert D., and Fichtinger G., Eds. Academic Press, 2020, pp. 503–519, doi: 10.1016/B978-0-12-816176-0.00026-0.

M. Fakhfekh and A. Jeribi, “Volatility dynamics of crypto-currencies’ returns: Evidence from asymmetric and long memory GARCH models”, Res. in Int. Bus. and Finance, vol. 51, 101075, 2020, doi: 10.1016/j.ribaf.2019.101075.

M. Sangiorgio and F. Dercole, “Robustness of LSTM neural networks for multi-step forecasting of chaotic time series”, Chaos, Solitons & Fractals, vol. 139, 110045, 2020, doi: 10.1016/j.chaos.2020.110045.

V. V. Romanuke, “Regard of parameters and quality of forecast in selecting the neural net optimal architecture for a problem of the time series neuronet forecasting”, Sci. and Econ., no. 3 (27), pp. 164–168, 2012.

G. Box et al., Time Series Analysis: Forecasting and Control, Prentice Hall, Englewood Cliffs, NJ, 1994.

V. V. Romanuke, “Decision making criteria hybridization for finding optimal decisions’ subset regarding changes of the decision function”, J. of Uncertain Syst., vol. 12, no. 4, pp. 279–291, 2018.

R. Kneusel, Random Numbers and Computers, Springer International Publishing, 2018, doi: 10.1007/978-3-319-77697-2.

V. V. Romanuke, “Time series smoothing and downsampling for improving forecasting accuracy”, Appl. Comput. Syst., vol. 26, no. 1, pp. 60–70, 2021, doi: 10.2478/acss-2021-0008

R. E. Edwards, Functional Analysis. Theory and Applications, Hold, Rinehart and Winston, 1965.

V. V. Romanuke, “Wind speed distribution direct approximation by accumulative statistics of measurements and root-meansquare deviation control”, Elect., Control and Commun. Eng., vol. 16, no. 2, pp. 65–71, 2020, doi: 10.2478/ecce-2020-0010.

F. C. Pereira and S. S. Borysov, “Machine Learning Fundamentals”, in: Mobility Patterns, Big Data and Transport Analytics, Antoniou C., Dimitriou L., and Pereira F., Eds., Elsevier, 2019, pp. 9–29. doi: 10.1016/B978-0-12-812970-8.00002-6.

J.-T. Chien, “Deep Neural Network”, in: Source Separation and Machine Learning, Chien J.-T., Ed., Academic Press, 2019, pp. 259–320. doi 10.1016/B978-0-12-804566-4.00019-X.