Domanda di colloquio di Kyndryl

Can we use CNN in multi-modal architecture for image processing?