LBRY Block Explorer

LBRY Claims • involution-inverting-the-inherence-of

e5e0d5ea2dab1585a5911571e2c9d65b6196cfd7

Published By
Created On
8 May 2021 10:21:39 UTC
Transaction ID
Cost
Safe for Work
Free
Yes
Involution: Inverting the Inherence of Convolution for Visual Recognition (Research Paper Explained)
#involution #computervision #attention

Convolutional Neural Networks (CNNs) have dominated computer vision for almost a decade by applying two fundamental principles: Spatial agnosticism and channel-specific computations. Involution aims to invert these principles and presents a spatial-specific computation, which is also channel-agnostic. The resulting Involution Operator and RedNet architecture are a compromise between classic Convolutions and the newer Local Self-Attention architectures and perform favorably in terms of computation accuracy tradeoff when compared to either.

OUTLINE:
0:00 - Intro & Overview
3:00 - Principles of Convolution
10:50 - Towards spatial-specific computations
17:00 - The Involution Operator
20:00 - Comparison to Self-Attention
25:15 - Experimental Results
30:30 - Comments & Conclusion

Paper: https://arxiv.org/abs/2103.06255
Code: https://github.com/d-li14/involution

Abstract:
Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision. In this work, we rethink the inherent principles of standard convolution for vision tasks, specifically spatial-agnostic and channel-specific. Instead, we present a novel atomic operation for deep neural networks by inverting the aforementioned design principles of convolution, coined as involution. We additionally demystify the recent popular self-attention operator and subsume it into our involution family as an over-complicated instantiation. The proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition, powering different deep learning models on several prevalent benchmarks, including ImageNet classification, COCO detection and segmentation, together with Cityscapes segmentation. Our involution-based models improve the performance of convolutional baselines using ResNet-50 by up to 1.6% top-1 accuracy, 2.5% and 2.4% bounding box AP, and 4.7% mean IoU absolutely while compressing the computational cost to 66%, 65%, 72%, and 57% on the above benchmarks, respectively. Code and pre-trained models for all the tasks are available at this https URL.

Authors: Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584

If you want to support me, the best thing to do is to share out the
...
https://www.youtube.com/watch?v=pH2jZun8MoY
Author
Content Type
Unspecified
video/mp4
Language
English
Open in LBRY

More from the publisher

Controlling
VIDEO
[NEWS
Controlling
VIDEO
GREG
Controlling
VIDEO
CELEB
Controlling
VIDEO
ML NE
Controlling
VIDEO
[ML N
Controlling
VIDEO
BLEUR
Controlling
VIDEO
[ML N
Controlling
VIDEO
ACCEL
Controlling
VIDEO
OPENA