مطالب مرتبط با کلیدواژه

Machine Learning


۴۱.

Comparative Analysis of Missing Values Imputation Methods: A Case Study in Financial Series (S&P500 and Bitcoin Value Data Sets)(مقاله علمی وزارت علوم)

نویسنده:

کلیدواژه‌ها: Missing values Imputation Machine Learning Statistical methods Finance Data S&P 500 Bitcoin time series analysis

حوزه‌های تخصصی:
تعداد بازدید : ۱۲۱ تعداد دانلود : ۱۰۱
The accurate imputation of missing values in time series data is paramount for maintaining the integrity and reliability of analyses and predictions. This article investigates the effica-cy of various missing values imputation methods, encom-passing well-known machine learning and statistical tech-niques. Moreover, for a better understanding, they imple-mented two financial data time series: S&P 500 and Bitcoin markets spanning from 2016 to 2023 on a daily frequency. Initially utilizing complete datasets, controlled missingness was introduced by randomly removing 45 data points. Then, these methods applied multiple imputation strategies for estimating and substituting these missing values. Experi-mental evaluation yielded insightful findings regarding the performance of the different methods. The examined ma-chine learning methods, including k-Nearest Neighbors (k-NN), Random Forest, Deep Learning, and Decision Trees, consistently outperformed their statistical counterparts, such as Mean Imputation, Regression Imputation, Hot-Deck Im-putation, and Expectation-Maximization Imputation. Nota-bly, Random Forest emerged as the most effective method, showcasing superior performance in terms of accuracy and robustness. Conversely, the Mean Imputation method exhibited com-paratively inferior outcomes, suggesting its limited suitabil-ity for financial time series data. This research contributes to the ongoing discourse on data integrity within finance ana-lytics and serves as a comprehensive guide for practitioners seeking optimal missing values imputation methods. The empirical evidence provided herein advances the under-standing of imputation techniques' relative performance and their application in financial data, facilitating enhanced de-cision-making processes and yielding more reliable predic-tions.
۴۲.

The Influence of Predictive Maintenance Technologies on Operational Efficiency in Manufacturing Startups

کلیدواژه‌ها: Predictive maintenance Operational Efficiency manufacturing startups Data analytics Machine Learning Internet of Things

حوزه‌های تخصصی:
تعداد بازدید : ۱۷۴ تعداد دانلود : ۱۱۰
The objective of this study is to explore the influence of predictive maintenance technologies on operational efficiency in manufacturing startups, focusing on implementation processes, operational impacts, and the challenges encountered. This qualitative study employed semi-structured interviews to gather data from key stakeholders in manufacturing startups, including founders, operations managers, and maintenance engineers. A total of 22 participants were interviewed, with the sample size determined by theoretical saturation. The interviews were transcribed verbatim and analyzed using NVivo software. Thematic analysis was conducted to identify and categorize key themes and subthemes related to the implementation and impact of predictive maintenance technologies. The analysis revealed three main themes: Implementation Process, Operational Impact, and Challenges and Barriers. Within these themes, several categories and concepts emerged. The Implementation Process theme highlighted the importance of planning, technology selection, system integration, employee involvement, pilot testing, change management, and post-implementation review. The Operational Impact theme identified efficiency gains, predictive analytics, maintenance scheduling, resource optimization, and quality improvement as significant outcomes. The Challenges and Barriers theme underscored technological challenges, financial constraints, organizational resistance, skill gaps, data management issues, and the necessity of vendor support. The findings indicate that predictive maintenance technologies significantly enhance operational efficiency in manufacturing startups by reducing downtime, increasing productivity, and optimizing resource utilization.
۴۳.

Authentic and Fake Reviews Recognition on E-Commerce Websites through Sentiment Analysis and Machine Learning Techniques(مقاله علمی وزارت علوم)

تعداد بازدید : ۱۰۴ تعداد دانلود : ۹۲
The proliferation of e-commerce has led to an overwhelming volume of customer reviews, posing challenges for consumers who seek reliable product evaluations and for businesses concerned with the integrity of their online reputation. This study addresses the critical problem of detecting fake reviews by developing a comprehensive framework that integrates Natural Language Processing (NLP) and machine learning techniques. Our methodology centers on sentiment analysis to discern the emotional valence of reviews, coupled with Part-of-Speech (PoS) tagging to analyze linguistic patterns that may signal deception. We meticulously extract a rich set of textual and statistical features, providing a robust basis for our predictive models. To enhance classification performance, we strategically employ both traditional machine learning algorithms and powerful ensemble techniques. Experimental results underscore the efficacy of our approach in detecting fraudulent reviews. We achieved a notable F1-Score of 82.9% and an accuracy of 82.6%, demonstrating the potential to safeguard consumers from misleading information and protect businesses from unfair practices.
۴۴.

Tools for Consumer Preference Analysis Based in Machine Learning(مقاله علمی وزارت علوم)

کلیدواژه‌ها: Machine Learning Data Analysis Pandas Data set

حوزه‌های تخصصی:
تعداد بازدید : ۱۱۱ تعداد دانلود : ۷۰
Today, users generate various data increasingly using the Internet when choosing a product or service. This leads to the generation of data about the purchases and services of various consumers. In addition, consumers often leave feedback about the purchase. At the same time, consumers discuss their attitudes about goods and services on social networks, messengers, thematic sites, etc. This leads to the emergence of large volumes of data that contain useful information about various manufacturers of goods and services. Such information can be useful to both ordinary users and large companies. However, it is practically impossible to use this information due to the fact that it is located in different places, that is, it has a raw, unstructured character. At the same time, depending on the target group of users, not the entire data set is needed, but a specific target sample. To solve this problem, it is necessary to have a tool for structuring information arrays and their further analysis depending on the set goal. This can be done with the help of various frameworks that use methods of machine learning and work with data. This work is devoted to elucidating the problem of creating means for evaluating consumer preferences based on the analysis of large volumes of data for its further use by the target audience.  The goal of the development of big data analysis systems is obtaining new, previously unknown information. The methodology of application of algorithms of work with large data sets and methods of machine learning is used, namely the pandas library for operations on a data set and logistic regression for information classification As a result, a system was built that allows the analysis of lexical information, translate it into numerical format and create on this basis the necessary statistical samples. The originality of the work lies in the use of specialized libraries of data processing and machine learning to create data analysis systems. The practical value of the work lies in the possibility of creating data analysis systems built using specialized machine learning libraries.
۴۵.

Developing Financial Distress Prediction Models Based on Imbalanced Dataset: Random Undersampling and Clustering Based Undersampling Approaches(مقاله علمی وزارت علوم)

کلیدواژه‌ها: Imbalanced datasets Undersampling Machine Learning Financial distress prediction models Financial Ratios

حوزه‌های تخصصی:
تعداد بازدید : ۹۲ تعداد دانلود : ۶۶
So far, distress prediction models have been based on balanced, such sampling is not consistent with the reality of the statistical community of companies. If the data are balanced, the bias in sample selection may lead to an underestimation of typeI error and an overestimation of the typeII error of models. Although imbalanced data-based models are compatible with reality, they have a higher typeI error compared to balanced data-based models. The cost of typeI error is more important to Beneficiaries than the cost of typeII error. In this study, for reducing typeI error of imbalanced data-based models, random and clustering-based undersampling were used. Tested data included 760 companies since 2007-2007 with 4 different degrees and the results of the H1 to H3 test represented them. In all cases of the typeI error, typeII error of balanced data-based models were lower and more, respectively, compared to imbalanced data-based models; also, in most cases, the geometric mean of balanced data-based models was higher compared to imbalanced data-based models, respectively. The results of testing H4 to H6 show that in most cases, typeI error, typeII error and the geometric mean criterion of models based on modified imbalanced data were less, more, and more, respectiively compared to the models based on imbalanced data, in other words, applying Undersampling methods on imbalanced training data led to a decrease in typeI error and an increase in typeII error and geometric mean criteria. As a result using models based on modified imbalanced data is suggested to Beneficiaries
۴۶.

Early Warning Model for Solvency of Insurance Companies Using Machine Learning: Case Study of Iranian Insurance Companies(مقاله علمی وزارت علوم)

کلیدواژه‌ها: Insurance Solvency Early Warning Model Machine Learning Financial Ratio Analysis

حوزه‌های تخصصی:
تعداد بازدید : ۱۰۶ تعداد دانلود : ۷۸
Stakeholders of an organization avoid undesirable outcomes caused by ignoring the risks. Various models and tools can be used to predict future outcomes, aiming to avoid the undesirable ones. Early warning models are one of the approaches that could help them in doing so. This study focuses on developing an early warning system using machine learning algorithms for predicting solvency in the insurance industry. This study analyses 23 financial ratios from Iranian general insurance companies listed on the Tehran Stock Exchange between 2015 and 2020. The model uses Decision Tree, Random Forest, Artificial Neural Networks, Gradient Boosting Machine and XGBoost algorithms, with Boruta as a feature selection method. The dependent variable is the solvency margin ratio, and the other 22 ratios are the independent variables, which Boruta reduces to 7 variables. Firstly, the performance of the machine learning models on two datasets, one with 22 independent variables and one with 7, is compared based on RMSE values. The XGBoost algorithm performs the best on both data sets. Additionally, the study predicts the 2020 values for 19 insurance companies, performs stage classifications, and compares actual stages to predicted stages. In this analysis, Random Forest has the best estimate accuracy on both data sets, while Gradient Boosting Machine has the best estimate accuracy on the Boruta data set. Finally, the study compares the machine learning models' results in terms of capital adequacy classification, where Random Forest performs the best on both data sets, and Gradient Boosting Machine on the Boruta data set.
۴۷.

Examining Financial Performance and Corporate Governance in Tehran Stock Exchange: A Hybrid Machine Learning and Data Envelopment Analysis Approach(مقاله علمی وزارت علوم)

کلیدواژه‌ها: Financial performance Corporate Governance Machine Learning Human capital

حوزه‌های تخصصی:
تعداد بازدید : ۹۳ تعداد دانلود : ۷۰
In the backdrop of an ever-evolving global business landscape and intense market competition, companies are faced with the imperative of strategically managing factors that influence their financial performance. This research delves into the intricate relationship between financial performance enhancement and corporate governance, with particular attention to the mediating role of human capital. The study centers its investigation on companies listed on the Tehran Stock Exchange and comprises a comprehensive sample of 140 top-level managers. A composite sampling approach, comprising a simple random sampling technique and Morgan's table, was employed to judiciously select a representative cohort of 103 participants. In the pursuit of rigorous academic analysis, the research leverages a goal-oriented, applied methodology, employing a descriptive survey design and a quantitative approach. The primary data for the study were methodically collected through rigorously designed and standardized questionnaires. Subsequent to data acquisition, a meticulous analytical process was undertaken using the Partial Least Squares (PLS) software, aligning with the latest developments in quantitative research techniques. The results stemming from hypothesis testing offer compelling insights into the dynamic relationship between corporate governance, human capital, and financial performance enhancement. Our findings convincingly demonstrate a significant positive impact of both corporate governance and human capital on the enhancement of financial performance in the context of Tehran Stock Exchange's listed companies. Furthermore, the empirical evidence strongly suggests that human capital plays a pivotal mediating role in the relationship between corporate governance practices and financial performance improvements. This study, in its pursuit of academic rigor, underscores the effectiveness of a novel hybrid approach, thoughtfully integrating machine learning and data envelopment analysis, to comprehensively examine the intricate interplay between financial performance enhancement and corporate governance within the context of the Tehran Stock Exchange's listed companies. The study contributes to the evolving body of literature in this domain and provides valuable insights for practitioners, policymakers, and researchers.
۴۸.

Designing a Trading Strategy to Buy and Sell the Stock of Companies Listed on the New York Stock Exchange Based on Classification Learning Algorithms(مقاله علمی وزارت علوم)

کلیدواژه‌ها: Trading Strategy Machine Learning Classification Algorithms

حوزه‌های تخصصی:
تعداد بازدید : ۹۱ تعداد دانلود : ۷۲
This research investigated the development of a stock trading strategy for companies on the New York Stock Exchange (NYSE), a prominent global market. Data was acquired from established libraries and the Yahoo Finance database. The model employed technical analysis indicators and oscillators as input features. Machine learning classification algorithms were used to design trading strategies, and the optimal model was identified based on statistical performance metrics. Accuracy, recall, and F-measure were utilized to evaluate the classification algorithms. Additionally, advanced statistical methods and various software tools were implemented, including Python, Spyder, SPSS, and Excel. The Kruskal-Wallis test was employed to assess the statistical differences between the designed strategies. A sample of 41 actively traded NYSE companies across diverse sectors such as financial services, healthcare, technology, communication services, consumer cyclicals, consumer staples, and energy were chosen using a filter-based approach on June 28th, 2021. The selection criteria included a market capitalization exceeding $200 billion and an average daily trading volume surpassing 1 million shares. Evaluation metrics revealed that the designed random forest trading strategy achieved a good fit with the data and exhibited statistically significant differences from other strategies based on classification learning algorithm.
۴۹.

Enhancing Oncological Diagnosis by Single-Cell ATAC-seq Data for Internet of Medical Things(مقاله علمی وزارت علوم)

تعداد بازدید : ۱۴۵ تعداد دانلود : ۴۵
Early cancer detection is crucial for improving patient survival rates, as timely intervention greatly enhances treatment efficacy. One promising method for early detection is identifying cancerous cells through the detection of protein-level modifications, which serve as early indicators of malignancy. These protein modifications often result from complex biochemical processes that occurs before visible cellular abnormalities, making them critical targets for diagnostic technologies. In recent years, wireless biomedical sensors have advanced significantly, enabling precisely detecting these protein-level changes. These sensors have the potential to detect cancer at its earliest stages by monitoring the subtle alterations in protein structures and functions that distinguish healthy cells from cancerous ones. As the costs of genetic analysis continue to decrease, the development of Medical Internet of Things (MIoT) devices has become increasingly feasible. These devices are designed to perform real-time analyses of biological specimens—such as blood and urine—by detecting protein-level changes indicative of cancer. In this paper, a new machine learning method based on Extreme Randomized Trees (ERT) is developed to increase the speed of classification of cancerous cells based on single-cell Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq). The proposed method enhances the classification speed of the limited and noisy ATAC-seq data as it requires less computation to determine the best splits at each node of the decision trees. This method can significantly improve near real-time cancer risk assessment using samples collected by MIoT. Our proposed method achieves classification accuracy comparable to state of the art single-cell ATAC-seq data analysis techniques while reducing processing time by 259%, challenged by various low-data scenarios. This approach presents an efficient solution for rapid cancer monitoring within the MIoT framework.
۵۰.

A Combined Approach Of Adasyn And Tomeklink For Anomaly Network Intrusion Detection System Using Some Selected Machine Learning Algorithms(مقاله علمی وزارت علوم)

تعداد بازدید : ۱۰۷ تعداد دانلود : ۴۱
Securing computer networks against malicious attacks requires an efficient Network Intrusion Detection System (IDS). While machine learning techniques are commonly used for anomaly-based intrusion detection, data imbalance challenges conventional algorithms, leading to biased predictions and reduced accuracy. This study introduces a novel approach that combines ADASYN and Tomek links to address this issue, along with specific machine learning algorithms. ADASYN generates synthetic samples for the minority class to achieve dataset balance, and Tomek links eliminate redundant instances from the majority class. Four supervised machine learning algorithms (Random Forest, J48, Multilayer Perceptron, and Bagging) were assessed on both imbalanced and balanced datasets. Results show Random Forest exhibited 99.67% accuracy, while J48 and Bagging yielded 99.30%, and MLP recorded 98.53%. Notably, Random Forest emerges as a highly effective algorithm for Intrusion Detection, demonstrating flawless accuracy with balanced data. These outcomes highlight the proposed approach's ability to enhance prediction accuracy in network intrusion detection compared to imbalanced datasets, validated through a comparative analysis with state-of-the-art solutions.
۵۱.

A Data Mining Approach to Consumers’ Choice of Retail Market: The Case of Urban Retail Markets in Iran(مقاله علمی وزارت علوم)

کلیدواژه‌ها: Consumer Behavior Data mining Decision tree Machine Learning

حوزه‌های تخصصی:
تعداد بازدید : ۱۱۳ تعداد دانلود : ۴۵
Urban retail markets are state-owned retail markets that were recently established in Iran to increase the welfare of consumers and producers. To achieve this goal and expand its presence in the Iranian retail sector, it is essential to gain a comprehensive understanding of consumer behavior within these markets. This study examines the various socio-economic factors influencing consumers' decisions in the retail market by using the C4.5 algorithm. The data were collected using a random sampling method through a survey of 189 consumers, focusing on the population of Mashhad, Iran, during 2019-2020. Results revealed that awareness of available discounts significantly drives consumer choices in urban retail markets. Despite existing discounts, awareness among consumers remains low, suggesting a need to review promotional strategies within the marketing mix. The study also identifies previous purchases from urban markets, household income, and education as influential factors. Findings offer valuable insights for policymakers, market strategists, and stakeholders seeking to enhance the effectiveness of local retail markets in Iran. By leveraging insights into consumer behavior and market dynamics, these markets can thrive, benefiting Iran's retail sector and overall economy. Following the study, recommendations such as enhanced promotional campaigns, education-oriented strategies, loyalty programs, collaborations with local producers, and inclusive marketing policies was made aim to improve access for all consumers to urban retail markets.
۵۲.

Predicting the trend of the total index of the Tehran Stock Exchange using an image processing technique(مقاله علمی وزارت علوم)

کلیدواژه‌ها: Tehran Stock Exchange image processing Market trend prediction Machine Learning

حوزه‌های تخصصی:
تعداد بازدید : ۱۸۸ تعداد دانلود : ۸۴
This study explores the considerable significance of candlestick chart patterns as a foundational asset within the realm of stock market analysis and prediction. As a graphical representation of historical price movements and patterns, Candlestick charts offer a distinct and valuable perspective for understanding how the financial market operates. This perspective assists us in accurately pinpointing the most advantageous times for making decisions to buy or sell financial securities, such as stocks or bonds. These charts provide insights into market trends and potential trading opportunities. We adopt an innovative approach by harnessing image processing techniques to extract and analyze patterns from Candlestick charts systematically. Our findings underscore the pivotal role of visual data in financial analysis, particularly in times of market volatility and uncertainty. Investors often resort to technical analysis strategies when confronted with erratic market trends, often relying on insights derived from chart-based analysis to guide their decision-making processes. By meticulously extracting essential insights from candlestick charts, our study aims to provide investors with more efficient and less error-prone tools. Ultimately, this endeavor contributes to the enhancement of decision-making precision and the mitigation of risks inherent in participating in the dynamic stock market landscape.
۵۳.

Comparing the Prediction Power of Logit Regression Model and LightGBM Algorithm in Credit Card Fraud Detection(مقاله علمی وزارت علوم)

نویسنده:

کلیدواژه‌ها: fraud detection Financial Institution Credit card Logit LightGBM Machine Learning

حوزه‌های تخصصی:
تعداد بازدید : ۱۸۳ تعداد دانلود : ۴۵
Relying on the Area Under the Curve (AUC) measure, we compare the performance of the Logit regression model and the LightGBM algorithm. Despite these methods being common in the literature, our study emphasizes the role of statistical inference to evaluate and compare the results comprehensively. We use the training set of the Vesta (2018) dataset, provided by Vesta—a global fraud prevention company headquartered in the United States specializing in payment solutions and risk management. Originally released as part of a Kaggle competition focused on credit card fraud detection, this dataset comprises diverse transaction records, representing a rich source for exploring advanced fraud detection methods. Our analysis reveals that while the LightGBM algorithm generally yields higher predictive accuracy, the differences between the calculated AUCs of the two methods are not statistically significant. This underscores the importance of using inferential techniques to validate model performance differences in fraud detection.
۵۴.

Performance Evaluation and Accuracy Improvement in Individual Record Linking Problems Using Decision Tree Algorithm in Machine Learning(مقاله علمی وزارت علوم)

کلیدواژه‌ها: Machine Learning Record Linkage Decision tree Performance Evaluation

حوزه‌های تخصصی:
تعداد بازدید : ۵۷ تعداد دانلود : ۴۱
Record linkage is vital for consolidating data from different sources, particularly in Persian records where diverse data structures and formats present challenges. To tackle these complexities, an expert system with decision tree algorithms is crucial for ensuring precise record linkage and data aggregation. Adaptation operations are created based on predefined rules by incorporating decision trees into an expert system framework, simplifying the aggregation of disparate data sources. This method surpasses traditional approaches like IF-THEN rules in effectiveness and ease of use and improves accessibility for non-technical users due to its intuitive nature. Integrating probabilistic record linkage results into the decision tree model within the expert system automates the linkage process, allowing users to customize string metrics and thresholds for optimal outcomes. The model’s accuracy rate of over 95% on test data highlights its effectiveness in predicting and adjusting to data variations, confirming its reliability in various record linkage scenarios. The innovative utilization of machine learning decision trees alongside probabilistic record linkage in an expert system represents a significant advancement in the field, providing a robust solution for data aggregation in intricate environments and large-scale projects involving Persian records. Combining decision tree algorithms and probabilistic record linkage within an expert system offers a powerful tool for handling complex data integration tasks. This approach not only streamlines the process of consolidating diverse data sources but also enhances the accuracy and efficiency of record linkage operations By leveraging machine learning techniques and automated decision-making processes, organizations can achieve significant improvements in data quality and consistency, paving the way for more reliable and insightful analytical results in implementing statistical registers. In conclusion, integrating decision trees and probabilistic record linkage in an expert system represents a cutting-edge solution for addressing data aggregation challenges in Persian records and beyond.
۵۵.

A Comprehensive Multidimensional Analysis of Mental Health Challenges in the Digital Age(مقاله علمی وزارت علوم)

تعداد بازدید : ۴۷ تعداد دانلود : ۲۵
The digital era has introduced mental health challenges, especially for youth. Despite increasing awareness, comprehensive analyses of these challenges remain limited. This study collects and examines the prevalence of 15 key mental health challenges related to digital engagement, based on a sample of 555 participants. The prevalence of these challenges varied, with pressures related to parenting, hoarding, and inappropriate content being the most common, affecting 60.13%, 52.76%, and 45.39% of the participants, respectively. The research also highlights gender and age differences, noting that males report higher levels of issues like FOMO and Nomophobia compared to females. Adults (18+) face more severe challenges, such as memory decline, while younger individuals report fewer problems. Correlation analysis revealed significant relationships between several mental health challenges, such as Nomophobia and TAD (r = 0.68) and FOMO and TAD (r = 0.50), indicating that individuals experiencing one challenge are likely to face others. A decision tree analysis was used to predict mental health challenges by examining the relationships between different mental health conditions, uncovering specific patterns and rules associated with the occurrence of these challenges. Additionally, cluster analysis in this study identified distinct population segments, with 21% of individuals falling into a cluster that experiences severe mental health challenges. The findings suggest that a significant portion of the population is at risk for severe mental health issues, highlighting the need for targeted interventions.
۵۶.

Analysis and Optimization of Customer Lifetime Value Prediction using Machine Learning and Deep Learning Models by RFM Techniques(مقاله علمی وزارت علوم)

تعداد بازدید : ۵۵ تعداد دانلود : ۲۵
In today’s data-driven hospitality sector, customer interactions increasingly occur through digital platforms, generating extensive behavioral and transactional information. This study analyse the prediction of Customer Lifetime Value (CLV) using machine learning models—Linear Regression, Random Forest, and LightGBM—trained on features derived from hotel website interactions and booking records. After comprehensive data preprocessing, the models were evaluated using MAE, RMSE, and R² metrics. LightGBM achieved the highest predictive performance (R² = 0.504), followed by Random Forest (R² = 0.497), while Linear Regression underperformed (R² = 0.386), highlighting the advantages of non-linear models in modeling intricate customer patterns. Residual analyses confirmed LightGBM's stability and low bias across diverse customer profiles. Apart from prediction, the study applies Recency-Frequency-Monetary (RFM) analysis to segment customers into distinct value-based groups. These segments form the basis for tailored marketing strategies, allowing hotels to allocate resources more efficiently, enhance customer retention, and develop targeted campaigns aligned with customer potential. By integrating web-derived behavioral data with advanced modeling and segmentation, this research offers hotel managers practical tools for strategic planning in customer relationship management.