پیش بینی نرخ ارز در ایران با استفاده از تلفیق داده ها و مدل جامع مبتنی بر یادگیری ماشین (مقاله علمی وزارت علوم)
درجه علمی: نشریه علمی (وزارت علوم)
آرشیو
چکیده
نرخ ارز همواره از مهم ترین شاخص های اقتصادی بوده که عوامل مختلفی در تعیین آن مؤثرند. بعضی از این عوامل در قالب متغیرهای اقتصادی و برخی دیگر به شکل اخبار سیاسی-اقتصادی بازتاب دارند. پرسش مهمی که تاکنون پاسخ دقیقی به آن داده نشده، این است که آیا می توان مدلی جامع و توسعه پذیر به منظور مدل سازی و پیش بینی نرخ ارز داشت به نحوی که دربرگیرنده تمامی متغیرها و عوامل مؤثر باشد؟ در این پژوهش به عنوان پاسخی برای این پرسش، با استفاده از یادگیری ماشین و رویکرد تلفیق داده ها، مدلی جامع مبتنی بر یادگیری عمیق ارائه شده که از انواع داده پشتیبانی می کند. به منظور آموزش مدل اخبار مؤثر بر نرخ ارز از 10 پایگاه اصلی داخلی و خارجی در بازه زمانی 1393 تا 1402 جمع آوری شده و به همراه داده های نرخ ارز و سایر شاخص های اقتصادی مستقیماً به مدل داده شده است. به منظور یافتن بهترین مدل، 8 مدل یادگیری ماشین، 2 مدل آماری و یک مدل زبانی بزرگ در هر دو حالت رگرسیون و کلاس بندی آموزش و آزموده شده اند. برای اجتناب از سوگیری و نتایج تصادفی، از تکنیک های اعتبارسنجی متقابل منطبق بر توالی زمانی و تکرار آموزش و آزمون مدل ها با مقادیر اولیه تصادفی متفاوت، استفاده شده است. نتایج به دست آمده حاکی از آن است که رویکرد جامع و توسعه پذیر پیشنهادی با لحاظ کردن تمامی عوامل مؤثر به صورت مستقیم، به طور قابل توجهی عملکرد بهتری در مقایسه با رویکردهای گذشته داشته است.Exchange Rate Forecasting in Iran Using Data Fusion and a Comprehensive Machine Learning Model
The exchange rate is recognized as a key economic indicator influenced by multiple factors. Some of these factors manifest as measurable economic variables, while others are reflected in political and financial news. A central, unresolved question is whether it is possible to develop a comprehensive and scalable model for exchange rate modeling and forecasting that accounts for all relevant variables and factors. Using a data fusion approach, the present study proposed a comprehensive deep learning–based model supporting multiple data types. To train the model, exchange rate–related news was collected from ten major national and international sources covering the period from 2014 to 2023 (1393–1402 in the Iranian calendar). The data was then combined with exchange rate figures and other economic indicators. To identify the best model, eight machine learning models, two statistical models, and one large language model were trained and evaluated under both regression and classification settings. To mitigate bias and random effects, the study applied time series–aware cross-validation along with repeated training and testing using different random initializations. The results demonstrated that the proposed approach, which directly incorporates all influential factors, significantly outperforms existing methods. Introduction Exchange rate fluctuations represent one of the most complex challenges in modern economic analysis, shaped by a dynamic interplay of macroeconomic fundamentals, policy decisions, and informational signals disseminated through the media. Traditional econometric approaches often fail to capture these multidimensional interactions, as they rely primarily on quantitative variables and lagged historical data. As a result, they tend to overlook the qualitative influence of news, market sentiment, and expectations that often precede measurable economic changes. Recent advances in artificial intelligence and machine learning have introduced powerful tools for integrating diverse forms of data—both numerical and textual—into unified predictive systems. The present research tried to propose a comprehensive and extensible model for forecasting exchange rates in Iran, combining structured economic indicators with unstructured news data through a data fusion approach. Materials and Methods This study employed a quantitative and applied methodology based on supervised machine learning techniques. The dataset spans the period from April 2014 to March 2023 (1393–1402 in the Iranian calendar). Daily free-market exchange rates were obtained from three verified sources: the National Exchange website, the Gold and Currency Information Network, and the Bonbast platform. Additionally, key macroeconomic indicators—including GDP growth, inflation rate, unemployment rate, trade balance, public debt, foreign reserves, and oil prices—were collected from official statistical repositories. Then the study went on to incorporate qualitative dimensions. In this respect, news articles related to exchange rate dynamics were gathered from ten major national and international media outlets, including Donya-e-Eqtesad, San’at-Madan-Tijarat, Asia Daily, ISNA, Khabaronline, Tabnak, BBC Persian, and Voice of America Persian. Each news item was labeled according to the contemporaneous changes in exchange rates. Data preprocessing involved normalization, outlier removal, and interpolation of missing values for numerical data. Textual data underwent cleaning, tokenization, and embedding using the ParsBERT model (Farahani et al., 2021), which was fine-tuned on domain-specific economic texts to improve contextual representation. Following preprocessing, approximately 388,354 fused samples were constructed. Eight machine learning models (Random Forest, XGBoost, LightGBM, CNN-LSTM, GRU, Bi-GRU, LSTM, and Bi-LSTM), two statistical models (ARIMA and Prophet), and one large language model (GPT-4) were trained and compared under both regression and classification settings. Model evaluation was conducted through time-series–aware cross-validation and repeated random initialization to minimize bias. Performance metrics included Mean Absolute Error (MAE), Mean Squared Error (MSE), Accuracy, and F1-score. Results and Discussion The results revealed that models integrating textual and numerical data substantially outperform those trained solely on numerical inputs. Specifically, the inclusion of news embeddings reduced forecasting error by more than 5% across most deep learning architectures. Among the evaluated models, the fine-tuned GPT-4 achieved the highest overall accuracy and the lowest error metrics in both regression and classification tasks. However, considering constraints on interpretability and data security, the Bi-directional Gated Recurrent Unit (Bi-GRU) model was identified as the optimal choice for practical implementation. The Bi-GRU model exhibited strong learning capability in capturing temporal dependencies and contextual relationships between macroeconomic variables and market sentiment. In classification mode, it achieved an F1-score of 0.84 and an accuracy rate of 0.86 when textual data were incorporated. In contrast, traditional statistical models such as ARIMA and Prophet showed limited capacity to reflect short-term market shocks influenced by real-time news. The findings highlighted the importance of data fusion in financial forecasting. Textual news data provide early signals of market sentiment that often precede observable changes in economic variables. By integrating these heterogeneous data sources, the proposed model can offer a more dynamic and responsive forecasting framework, particularly suited to volatile markets such as Iran’s foreign exchange sector. Conclusion This study proposed a comprehensive machine learning–based model that successfully integrates textual and numerical data for exchange rate forecasting in Iran. The results confirmed that data fusion enhances predictive accuracy and robustness, outperforming both conventional econometric methods and single-modality deep learning models. Among the evaluated architectures, Bi-GRU offered the most practical balance between performance, interpretability, and computational efficiency. The findings underscored that incorporating news-driven sentiment and contextual information provides a timely advantage for policy formulation and risk management. Moreover, the modular structure of the proposed model allows for future extensions to other economic domains such as stock market analysis and inflation forecasting. Future studies are recommended to expand the dataset to include social media sentiment and to adopt explainable AI (XAI) techniques to improve interpretability and transparency.








