The increasing adoption of large language models (LLMs) has intensified concerns about hallucinations—outputs that are syntactically fluent but factually incorrect. In this paper, we propose a method for detecting such hallucinations by evaluating the consistency of model responses to paraphrased versions of the same question. The underlying assumption is that if a model produces consistent answers across different paraphrases, the output is more likely to be accurate. To test this method, we developed a system that generates multiple paraphrases of each question and analyzes the consistency of the corresponding responses. Experiments were conducted using two LLMs—GPT-4O and LLaMA 3–70B Chat—on both Persian and English datasets. The method achieved an average accuracy of 99.5% for GPT-4O and 98% for LLaMA 3–70B, indicating the effectiveness of our approach in identifying hallucination-free outputs across languages. Furthermore, by automating the consistency evaluation using an instruction-tuned language model, we enabled scalable and unbiased detection of semantic agreement across paraphrased responses.