# MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

> Research article (2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023) · cited 78× · AI/ML

**Wikidata**: [openalex:W4386071468](https://www.wikidata.org/wiki/openalex:W4386071468)  
**Source**: https://4ort.xyz/entity/mist-multi-modal-iterative-spatial-temporal-transformer-for-long-form-video-question-answering