This technical report discusses the evaluation of machine translation (MT) quality in the context of LLM-based MT systems. It outlines the necessity of automated quality measurements, such as BLEU and COMET, highlighting their role in providing rapid feedback during the development of MT engines. The report explains that while automated metrics are essential for quick assessments, they often diverge from human evaluations, particularly in the case of LLM outputs. The document details the strengths and weaknesses of COMET, noting its semantic focus and the challenges it faces, such as penalizing creative paraphrases and failing to account for broader translation quality aspects. It emphasizes the importance of human assessments alongside automated metrics, especially as the industry recognizes the limitations of current evaluation methods. The report concludes with insights from a discussion with COMET's author, suggesting that a combination of metrics and human evaluations is necessary for a comprehensive understanding of translation quality.