Powering Multimodal Intelligence for Video Search - Enggist