In this research, we present pilot experiments to distil monolingual models from a jointly trained model for 102 languages (mBERT). We demonstrate that it is possible for the target language to outperform the original model, even with a basic distillation setup. We evaluate our methodology for 6 languages with varying amounts of resources and belonging to different language families.