加载中...

Transformer and MLP 论文阅读


现有主流视觉机制有三类:CNN、Attention和MLP-mixer。后两种都是谷歌提出的基于Transformer和token的架构。token就是图片分块后进行线性映射的向量。怎么自定义token-mixing操作是各种基于mlp架构的关键。此外MLP-mixer其实就是attention的简化版本,通过mlp来实现所要用的attention。

【论文阅读】End-to-End Object Detection with Transformers

【论文阅读】Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

【论文阅读】Efficient Transformers: A Survey

【论文阅读】An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Vison Transformer

【论文阅读】A Survey on Visual Transformer

【论文阅读】DeepViT: Towards Deeper Vision Transformer

【论文阅读】ViViT: A Video Vision Transformer

【论文阅读】MLP-Mixer: An all-MLP Architecture for Vision

【论文阅读】Pay Attention to MLPs

【论文阅读】A Survey of Transformers

【论文阅读】AS-MLP: An Axial Shifted MLP Architecture for Vision

【论文阅读】CycleMLP: A MLP-Like Architecture for Dense Prediction

【论文阅读】ConvMLP: Hierarchical Convolutional MLPs for Vision

【论文阅读】A Survey of Visual Transformers

【论文阅读】Attention Mechanisms in Computer Vision: A Survey


文章作者: JiJunhao
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 JiJunhao !