Emotion plays a vital role in human communication, shaping not only language but also vocal tone, facial expression, and body posture. In the context of emotionally expressive text generation, the lack of reliable evaluation metrics still remains a key challenge. This paper introduces a two-step evaluation framework using embedding analogy-based metrics to assess the emotional expressiveness of large language models (LLMs). In the first step, we evaluate the model’s ability to neutralize emotional content from a given text while preserving its semantic meaning. In the second step, we test the model’s capacity to reinject the intended emotion back into the neutralized text. Our experiments demonstrate that GPT-4.1 outperforms other models in both semantic retention and emotional reconstruction, while llama-3.3-70b-instruct performs best among open-source models. This work lays the foundation for future research on cross-modal affective computing, aiming to build emotionally intelligent agents capable of nuanced and empathetic communication across text, speech, and video.