LBRY Block Explorer • Claim • gradient-surgery-for-multi-task-learning

LBRY Claims • gradient-surgery-for-multi-task-learning

4080fda7cd3e906817626c72e893e6a119141c63

Published By

@yannickilcher

Created On

1 Mar 2021 12:11:33 UTC

Transaction ID

9b7c75dd68f158bfde454be84160d825b9d530387a849048932e0dcd80ac05fd

Cost

Safe for Work

Free

Yes

Gradient Surgery for Multi-Task Learning

Multi-Task Learning can be very challenging when gradients of different tasks are of severely different magnitudes or point into conflicting directions. PCGrad eliminates this problem by projecting conflicting gradients while still retaining optimality guarantees.

https://arxiv.org/abs/2001.06782

Abstract:
While deep learning and deep reinforcement learning (RL) systems have demonstrated impressive results in domains such as image classification, game playing, and robotic control, data efficiency remains a major challenge. Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks to enable more efficient learning. However, the multi-task setting presents a number of optimization challenges, making it difficult to realize large efficiency gains compared to learning tasks independently. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. In this work, we identify a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develop a simple yet general approach for avoiding such interference between task gradients. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient. On a series of challenging multi-task supervised and multi-task RL problems, this approach leads to substantial gains in efficiency and performance. Further, it is model-agnostic and can be combined with previously-proposed multi-task architectures for enhanced performance.

Authors: Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
...
https://www.youtube.com/watch?v=PZypP7PiKi0

Author

Content Type

Unspecified

video/mp4

Language

English

Open in LBRY

More from the publisher

Controlling

VIDEO

[DRAM

drama-yann-lecun-against-twitter-on

lbry://@yannickilcher/drama-yann-lecun-against-twitter-on

Yann LeCun points out an instance of dataset bias and proposes a sensible solution. People are not happy about it. Original Tweet: https://twitter.com/ylecun/status/1274782757907030016 ERRATA: - My specific example of the L1 regularizer wrt to Porsches and Ferraris does not actually work in this particular case. What I mean is a general sparsity-inducing regularizer. - When I claim that an L1 regularizer would make the problem worse, this only holds in certain circumstances, for example when the data is Gaussian iid. Thumbnail: https://commons.wikimedia.org/wiki/File:Yann_LeCun_-_2018_(cropped).jpg by Jérémy Barande / Ecole polytechnique Université Paris-Saclay / CC BY-SA 2.0 Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher ... https://www.youtube.com/watch?v=n1SXlK5rhR8

Transaction

Created

1 year ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

LLAMA

llama-open-and-efficient-foundation

lbry://@yannickilcher/llama-open-and-efficient-foundation

#ai #meta #languagemodel LLaMA is a series of large language models from 7B to 65B parameters, trained by Meta AI. They train for longer on more data and show that something like gpt-3 can be outperformed by significantly smaller models when trained like this. Meta also releases the trained models to the research community. OUTLINE: 0:00 - Introduction & Paper Overview 4:30 - Rant on Open-Sourcing 8:05 - Training Data 12:40 - Training Hyperparameters 14:50 - Architecture Modifications 17:10 - Optimizer 19:40 - Efficient Implementation 26:15 - Main Results 38:00 - Some more completions 40:00 - Conclusion Paper: https://arxiv.org/abs/2302.13971 Website: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ Abstract: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community. Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=E5OnoYF2oAk

Transaction

Created

1 year ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

GEMIN

gemini-has-a-diversity-problem

lbry://@yannickilcher/gemini-has-a-diversity-problem

Google turned the anti-bias dial up to 11 on their new Gemini Pro model. References: https://developers.googleblog.com/2024/02/gemini-15-available-for-private-preview-in-google-ai-studio.html https://blog.google/technology/developers/gemma-open-models/?utm_source=tw https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf https://twitter.com/ClementDelangue/status/1760324815888486668?t=spXd7Oq_cSrRN2A-3r6gnQ&s=09 https://twitter.com/paulg/status/1760078920135872716?t=PVZkHQA_p7GxmeUX0hcZ_Q&s=09 https://twitter.com/yoavgo/status/1760445342691016811/photo/3 https://twitter.com/alex_peys/status/1760327435890135279/photo/2 https://twitter.com/woke8yearold/status/1760310705142558781/photo/1 https://twitter.com/stratejake/status/1760333904857497650?t=Z3BZOBaLI1EYAJ-CBAMNEg&s=09 https://twitter.com/JohnLu0x/status/1760066875583816003?t=Z3BZOBaLI1EYAJ-CBAMNEg&s=09 https://twitter.com/IMAO_/status/1760093853430710557?t=0eNmoTuvYZl9HQRaUBOKNw&s=09 https://twitter.com/WallStreetSilv/status/1760474958151426340?t=6k4VwKFvciw2VoDc70Tl2A&s=09 https://twitter.com/JackK/status/1760334258722250785 https://twitter.com/TRHLofficial/status/1760485063941149100?t=hx48DQd64JbVxZ3OzhD0wg&s=09 https://twitter.com/gordic_aleksa/status/1760266452475494828?t=VZ2lX_v-KrY4Thu4FvDh4w&s=09 https://twitter.com/benthompson/status/1760452419627233610?t=qR9D9KDC1axOx3gDBKKc2Q&s=09 https://twitter.com/altryne/status/1760358916624719938?t=PVZkHQA_p7GxmeUX0hcZ_Q&s=09 https://twitter.com/pmarca/status/1760503344035180601?t=6k4VwKFvciw2VoDc70Tl2A&s=09 Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=Fr6Teh_ox-8

Transaction

Created

2 months ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

THE M

the-man-behind-stable-diffusion

lbry://@yannickilcher/the-man-behind-stable-diffusion

#stablediffusion #ai #stabilityai OUTLINE: 0:00 - Intro 1:30 - What is Stability AI? 3:45 - Where does the money come from? 5:20 - Is this the CERN of AI? 6:15 - Who gets access to the resources? 8:00 - What is Stable Diffusion? 11:40 - What if your model produces bad outputs? 14:20 - Do you employ people? 16:35 - Can you prevent the corruption of profit? 19:50 - How can people find you? 22:45 - Final thoughts, let's destroy PowerPoint Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=YQ2QtKcK2dA

Transaction

Created

1 year ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

STOCH

stochastic-rnns-without-teacher-forcing

lbry://@yannickilcher/stochastic-rnns-without-teacher-forcing

We present a stochastic non-autoregressive RNN that does not require teacher-forcing for training. The content is based on our 2018 NeurIPS paper: Deep State Space Models for Unconditional Word Generation https://arxiv.org/abs/1806.04550 ... https://www.youtube.com/watch?v=_PyusGsbBPY

Transaction

Created

1 year ago

Content Type

Language

video/mp4

Controlling

VIDEO

CORNE

cornernet-detecting-objects-as-paired

lbry://@yannickilcher/cornernet-detecting-objects-as-paired

Many object detectors focus on locating the center of the object they want to find. However, this leaves them with the secondary problem of determining the specifications of the bounding box, leading to undesirable solutions like anchor boxes. This paper directly detects the top left and the bottom right corners of objects independently, along with descriptors that allows to match the two later and form a complete bounding box. For this, a new pooling method, called corner pooling, is introduced. OUTLINE: 0:00 - Intro & High-Level Overview 1:40 - Object Detection 2:40 - Pipeline I - Hourglass 4:00 - Heatmap & Embedding Outputs 8:40 - Heatmap Loss 10:55 - Embedding Loss 14:35 - Corner Pooling 20:40 - Experiments Paper: https://arxiv.org/abs/1808.01244 Code: https://github.com/princeton-vl/CornerNet Abstract: We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors. Authors: Hei Law, Jia Deng Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher ... https://www.youtube.com/watch?v=CA8JPbJ75tY

Transaction

Created

1 year ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

CAN W

can-we-contain-covid-19-without-locking

lbry://@yannickilcher/can-we-contain-covid-19-without-locking

My thoughts on the let-the-young-get-infected argument. https://medium.com/amnon-shashua/can-we-contain-covid-19-without-locking-down-the-economy-2a134a71873f Abstract: In this article, we present an analysis of a risk-based selective quarantine model where the population is divided into low and high-risk groups. The high-risk group is quarantined until the low-risk group achieves herd-immunity. We tackle the question of whether this model is safe, in the sense that the health system can contain the number of low-risk people that require severe ICU care (such as life support systems). Authors: Shai Shalev-Shwartz, Amnon Shashua Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher ... https://www.youtube.com/watch?v=XdpF9ZixIbI

Transaction

Created

1 year ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

[ML N

ml-news-mmo-game-destroys-gpus-openai

lbry://@yannickilcher/ml-news-mmo-game-destroys-gpus-openai

#chai #mlnews #nvidia Follow Saynam here: YouTube: https://www.youtube.com/c/ChaiTimeDataScience Twitter: https://twitter.com/bhutanisanyam1 Apple Podcasts: https://podcasts.apple.com/us/podcast/chai-time-data-science/id1473685440?uo=4 LinkedIn: https://www.linkedin.com/in/sanyambhutani/ Spotify: https://open.spotify.com/show/7IbEWJjeimwddhOZqWe0G1 Anchor.fm RSS: https://anchor.fm/s/c19772c/podcast/rss Outline: 0:00 - Intro & Overview 1:30 - Amazon's MMO may destroy gaming GPUs 2:40 - OpenAI pivots away from Robotics 3:35 - Google parent Alphabet launches Intrinsic 4:55 - AI learns how vegetables taste 5:55 - NASA uses AI to better understand the sun 6:50 - Man used AI to bring back deceased fiancee 7:45 - Robot collision sparks warehouse fire 8:20 - AI deduces patients' racial identities from medical records 9:40 - AlphaFold protein structure database 10:15 - ICCV BEHAVIOR challenge 11:05 - IBM, MIT, Harvard release Common Sense database 11:35 - High quality image generation using diffusion models 12:50 - Conclusion References: 1 Amazon’s new MMO may be bricking Nvidia 3090s https://www.theverge.com/2021/7/21/22587616/amazon-games-new-world-nvidia-rtx-3090-bricked-evga-closed-beta https://www.youtube.com/watch?v=KLyNFrKyG74 2 Open AI pivotes from Robots https://venturebeat.com/2021/07/23/ai-weekly-openais-pivot-from-robotics-acknowledges-the-power-of-simulation/ 3 Google parent Alphabet launches Intrinsic: a new company to build software for industrial robots https://www.theverge.com/2021/7/23/22590109/google-intrinsic-industrial-robotics-company-software Introducing Intrinsic https://blog.x.company/introducing-intrinsic-1cf35b87651 https://x.company/projects/intrinsic/ https://www.forbes.com/sites/jenniferhicks/2021/07/20/ai-is-learning-to-understand-how-vegetables-taste/?sh=73e6f646e1b2 4 Artificial Intelligence Helps Improve NASA’s Eyes on the Sun https://www.nasa.gov/feature/goddard/2021/artificial-intelligence-helps-improve-nasa-s-eyes-on-the-sun 5 A man used AI to bring back his deceased fiancé. But the creators of the tech warn it could be dangerous https://www.businessinsider.co.za/man-used-ai-to-talk-to-late-fiance-experts-warn-tech-could-be-misused-2021-7 6 Robot collision at Ocado warehouse near London sparks fire, delaying customer orders https://www.theverge.com/2021/7/18/22582454/robot-collision-ocado-warehouse-england-fire-delayed-orders 10 Reading Race: AI Recognizes Patient’s Racial Identity In Medical Images https://arxiv.org/pdf/2107.10356.pdf 11 AlphaFold Protein Structure Database https://alphafold.ebi.ac.uk https://www.theverge.com/2021/7/22/22586578/deepmind-alphafold-ai-protein-folding-human-proteome-released-for-free 12 Behavior Challenge http://svl.stanford.edu/behavior/challenge.html 13 Researchers from ... https://www.youtube.com/watch?v=4xklF7PZ-BY

Transaction

Created

1 year ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

NVAE:

nvae-a-deep-hierarchical-variational

lbry://@yannickilcher/nvae-a-deep-hierarchical-variational

VAEs have been traditionally hard to train at high resolutions and unstable when going deep with many layers. In addition, VAE samples are often more blurry and less crisp than those from GANs. This paper details all the engineering choices necessary to successfully train a deep hierarchical VAE that exhibits global consistency and astounding sharpness at high resolutions. OUTLINE: 0:00 - Intro & Overview 1:55 - Variational Autoencoders 8:25 - Hierarchical VAE Decoder 12:45 - Output Samples 15:00 - Hierarchical VAE Encoder 17:20 - Engineering Decisions 22:10 - KL from Deltas 26:40 - Experimental Results 28:40 - Appendix 33:00 - Conclusion Paper: https://arxiv.org/abs/2007.03898 Abstract: Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ as shown in Fig. 1. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256×256 pixels. Authors: Arash Vahdat, Jan Kautz Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher ... https://www.youtube.com/watch?v=x6T1zMSE4Ts

Transaction

Created

1 year ago

Content Type

Language

video/mp4

English