Multimodal Deep Learning in the Real World
Many real world business problems are multi-modal in nature and would benefit from using a combination of text, imagery, audio, and numerical data. Recently, there has been a surge in powerful deep learning models that fuse multiple modalities of data, however, fine-tuning, deploying, and versioning these models remains challenging for most companies. This tutorial will discuss some of the latest research in the field and then walk through several real world examples of fine-tuning, deploying, and serving multi-modal deep learning models using open source frameworks like HuggingFace, Kubeflow, and Django.