Multi-Task Learning can be very challenging when gradients of different tasks are of severely different magnitudes or point into conflicting directions. PCGrad eliminates this problem by projecting conflicting gradients while still retaining optimality guarantees.
Abstract: While deep learning and deep reinforcement learning (RL) systems have demonstrated impressive results in domains such as image classification, game playing, and robotic control, data efficiency remains a major challenge. Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks to enable more efficient learning. However, the multi-task setting presents a number of optimization challenges, making it difficult to realize large efficiency gains compared to learning tasks independently. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. In this work, we identify a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develop a simple yet general approach for avoiding such interference between task gradients. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient. On a series of challenging multi-task supervised and multi-task RL problems, this approach leads to substantial gains in efficiency and performance. Further, it is model-agnostic and can be combined with previously-proposed multi-task architectures for enhanced performance.
Authors: Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn
Yann LeCun points out an instance of dataset bias and proposes a sensible solution. People are not happy about it.
Original Tweet: https://twitter.com/ylecun/status/1274782757907030016
ERRATA:
- My specific example of the L1 regularizer wrt to Porsches and Ferraris does not actually work in this particular case. What I mean is a general sparsity-inducing regularizer.
- When I claim that an L1 regularizer would make the problem worse, this only holds in certain circumstances, for example when the data is Gaussian iid.
Thumbnail: https://commons.wikimedia.org/wiki/File:Yann_LeCun_-_2018_(cropped).jpg by Jérémy Barande / Ecole polytechnique Université Paris-Saclay / CC BY-SA 2.0
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
...
https://www.youtube.com/watch?v=n1SXlK5rhR8
#ai #meta #languagemodel
LLaMA is a series of large language models from 7B to 65B parameters, trained by Meta AI. They train for longer on more data and show that something like gpt-3 can be outperformed by significantly smaller models when trained like this. Meta also releases the trained models to the research community.
OUTLINE:
0:00 - Introduction & Paper Overview
4:30 - Rant on Open-Sourcing
8:05 - Training Data
12:40 - Training Hyperparameters
14:50 - Architecture Modifications
17:10 - Optimizer
19:40 - Efficient Implementation
26:15 - Main Results
38:00 - Some more completions
40:00 - Conclusion
Paper: https://arxiv.org/abs/2302.13971
Website: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
Abstract:
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
...
https://www.youtube.com/watch?v=E5OnoYF2oAk
Google turned the anti-bias dial up to 11 on their new Gemini Pro model.
References:
https://developers.googleblog.com/2024/02/gemini-15-available-for-private-preview-in-google-ai-studio.html
https://blog.google/technology/developers/gemma-open-models/?utm_source=tw
https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf
https://twitter.com/ClementDelangue/status/1760324815888486668?t=spXd7Oq_cSrRN2A-3r6gnQ&s=09
https://twitter.com/paulg/status/1760078920135872716?t=PVZkHQA_p7GxmeUX0hcZ_Q&s=09
https://twitter.com/yoavgo/status/1760445342691016811/photo/3
https://twitter.com/alex_peys/status/1760327435890135279/photo/2
https://twitter.com/woke8yearold/status/1760310705142558781/photo/1
https://twitter.com/stratejake/status/1760333904857497650?t=Z3BZOBaLI1EYAJ-CBAMNEg&s=09
https://twitter.com/JohnLu0x/status/1760066875583816003?t=Z3BZOBaLI1EYAJ-CBAMNEg&s=09
https://twitter.com/IMAO_/status/1760093853430710557?t=0eNmoTuvYZl9HQRaUBOKNw&s=09
https://twitter.com/WallStreetSilv/status/1760474958151426340?t=6k4VwKFvciw2VoDc70Tl2A&s=09
https://twitter.com/JackK/status/1760334258722250785
https://twitter.com/TRHLofficial/status/1760485063941149100?t=hx48DQd64JbVxZ3OzhD0wg&s=09
https://twitter.com/gordic_aleksa/status/1760266452475494828?t=VZ2lX_v-KrY4Thu4FvDh4w&s=09
https://twitter.com/benthompson/status/1760452419627233610?t=qR9D9KDC1axOx3gDBKKc2Q&s=09
https://twitter.com/altryne/status/1760358916624719938?t=PVZkHQA_p7GxmeUX0hcZ_Q&s=09
https://twitter.com/pmarca/status/1760503344035180601?t=6k4VwKFvciw2VoDc70Tl2A&s=09
Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
...
https://www.youtube.com/watch?v=Fr6Teh_ox-8
#stablediffusion #ai #stabilityai
OUTLINE:
0:00 - Intro
1:30 - What is Stability AI?
3:45 - Where does the money come from?
5:20 - Is this the CERN of AI?
6:15 - Who gets access to the resources?
8:00 - What is Stable Diffusion?
11:40 - What if your model produces bad outputs?
14:20 - Do you employ people?
16:35 - Can you prevent the corruption of profit?
19:50 - How can people find you?
22:45 - Final thoughts, let's destroy PowerPoint
Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
...
https://www.youtube.com/watch?v=YQ2QtKcK2dA
We present a stochastic non-autoregressive RNN that does not require teacher-forcing for training. The content is based on our 2018 NeurIPS paper:
Deep State Space Models for Unconditional Word Generation
https://arxiv.org/abs/1806.04550
...
https://www.youtube.com/watch?v=_PyusGsbBPY
Many object detectors focus on locating the center of the object they want to find. However, this leaves them with the secondary problem of determining the specifications of the bounding box, leading to undesirable solutions like anchor boxes. This paper directly detects the top left and the bottom right corners of objects independently, along with descriptors that allows to match the two later and form a complete bounding box. For this, a new pooling method, called corner pooling, is introduced.
OUTLINE:
0:00 - Intro & High-Level Overview
1:40 - Object Detection
2:40 - Pipeline I - Hourglass
4:00 - Heatmap & Embedding Outputs
8:40 - Heatmap Loss
10:55 - Embedding Loss
14:35 - Corner Pooling
20:40 - Experiments
Paper: https://arxiv.org/abs/1808.01244
Code: https://github.com/princeton-vl/CornerNet
Abstract:
We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors.
Authors: Hei Law, Jia Deng
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
...
https://www.youtube.com/watch?v=CA8JPbJ75tY
My thoughts on the let-the-young-get-infected argument.
https://medium.com/amnon-shashua/can-we-contain-covid-19-without-locking-down-the-economy-2a134a71873f
Abstract:
In this article, we present an analysis of a risk-based selective quarantine model where the population is divided into low and high-risk groups. The high-risk group is quarantined until the low-risk group achieves herd-immunity. We tackle the question of whether this model is safe, in the sense that the health system can contain the number of low-risk people that require severe ICU care (such as life support systems).
Authors: Shai Shalev-Shwartz, Amnon Shashua
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
...
https://www.youtube.com/watch?v=XdpF9ZixIbI
#chai #mlnews #nvidia
Follow Saynam here:
YouTube: https://www.youtube.com/c/ChaiTimeDataScience
Twitter: https://twitter.com/bhutanisanyam1
Apple Podcasts: https://podcasts.apple.com/us/podcast/chai-time-data-science/id1473685440?uo=4
LinkedIn: https://www.linkedin.com/in/sanyambhutani/
Spotify: https://open.spotify.com/show/7IbEWJjeimwddhOZqWe0G1
Anchor.fm RSS: https://anchor.fm/s/c19772c/podcast/rss
Outline:
0:00 - Intro & Overview
1:30 - Amazon's MMO may destroy gaming GPUs
2:40 - OpenAI pivots away from Robotics
3:35 - Google parent Alphabet launches Intrinsic
4:55 - AI learns how vegetables taste
5:55 - NASA uses AI to better understand the sun
6:50 - Man used AI to bring back deceased fiancee
7:45 - Robot collision sparks warehouse fire
8:20 - AI deduces patients' racial identities from medical records
9:40 - AlphaFold protein structure database
10:15 - ICCV BEHAVIOR challenge
11:05 - IBM, MIT, Harvard release Common Sense database
11:35 - High quality image generation using diffusion models
12:50 - Conclusion
References:
1 Amazon’s new MMO may be bricking Nvidia 3090s
https://www.theverge.com/2021/7/21/22587616/amazon-games-new-world-nvidia-rtx-3090-bricked-evga-closed-beta
https://www.youtube.com/watch?v=KLyNFrKyG74
2 Open AI pivotes from Robots
https://venturebeat.com/2021/07/23/ai-weekly-openais-pivot-from-robotics-acknowledges-the-power-of-simulation/
3 Google parent Alphabet launches Intrinsic: a new company to build software for industrial robots
https://www.theverge.com/2021/7/23/22590109/google-intrinsic-industrial-robotics-company-software
Introducing Intrinsic
https://blog.x.company/introducing-intrinsic-1cf35b87651
https://x.company/projects/intrinsic/
https://www.forbes.com/sites/jenniferhicks/2021/07/20/ai-is-learning-to-understand-how-vegetables-taste/?sh=73e6f646e1b2
4 Artificial Intelligence Helps Improve NASA’s Eyes on the Sun
https://www.nasa.gov/feature/goddard/2021/artificial-intelligence-helps-improve-nasa-s-eyes-on-the-sun
5 A man used AI to bring back his deceased fiancé. But the creators of the tech warn it could be dangerous
https://www.businessinsider.co.za/man-used-ai-to-talk-to-late-fiance-experts-warn-tech-could-be-misused-2021-7
6 Robot collision at Ocado warehouse near London sparks fire, delaying customer orders https://www.theverge.com/2021/7/18/22582454/robot-collision-ocado-warehouse-england-fire-delayed-orders
10 Reading Race: AI Recognizes Patient’s Racial Identity In Medical Images
https://arxiv.org/pdf/2107.10356.pdf
11 AlphaFold Protein Structure Database
https://alphafold.ebi.ac.uk
https://www.theverge.com/2021/7/22/22586578/deepmind-alphafold-ai-protein-folding-human-proteome-released-for-free
12 Behavior Challenge
http://svl.stanford.edu/behavior/challenge.html
13 Researchers from
...
https://www.youtube.com/watch?v=4xklF7PZ-BY
VAEs have been traditionally hard to train at high resolutions and unstable when going deep with many layers. In addition, VAE samples are often more blurry and less crisp than those from GANs. This paper details all the engineering choices necessary to successfully train a deep hierarchical VAE that exhibits global consistency and astounding sharpness at high resolutions.
OUTLINE:
0:00 - Intro & Overview
1:55 - Variational Autoencoders
8:25 - Hierarchical VAE Decoder
12:45 - Output Samples
15:00 - Hierarchical VAE Encoder
17:20 - Engineering Decisions
22:10 - KL from Deltas
26:40 - Experimental Results
28:40 - Appendix
33:00 - Conclusion
Paper: https://arxiv.org/abs/2007.03898
Abstract:
Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ as shown in Fig. 1. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256×256 pixels.
Authors: Arash Vahdat, Jan Kautz
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
...
https://www.youtube.com/watch?v=x6T1zMSE4Ts