Multimodal sentiment analysis (MSA) traditionally assumes a unified emotional signal across modalities such as text, audio, and video. However, recent findings suggest that each modality may convey distinct affective perspectives. Motivated by perspectivist theories from cognitive science and natural language processing, this paper introduces Label Divergence Weighting (LDW), a modality-weighting strategy that dynamically adjusts trust in each modality based on its alignment with the overall sentiment label. The LDW framework leverages training-time supervision from the divergence between unimodal and multimodal sentiment annotations to learn modality reliability, and applies this learning to unseen data without requiring unimodal labels at inference time. Integrated into a multitask variant of the Tensor Fusion Network (MTFN), the proposed LDW-MTFN model achieves state-of-the-art results on both the acted Chinese dataset CH-SIMS and the authentic English dataset UniC. Extensive experiments and ablation studies demonstrate the robustness and generalizability of LDW across datasets with different cultural, linguistic, and environmental characteristics.