LBRY Block Explorer • Claim • weight-standardization-paper-explained

LBRY Claims • weight-standardization-paper-explained

f13bc4d1b8958c167572ff0024f2480c72c3152a

Published By

@yannickilcher

Created On

1 Mar 2021 12:15:43 UTC

Transaction ID

9bf32e44ad20af295b780aecc187d842b849997f8ab3f0ee8e79149cb355e249

Cost

Safe for Work

Free

Yes

Weight Standardization (Paper Explained)

It's common for neural networks to include data normalization such as BatchNorm or GroupNorm. This paper extends the normalization to also include the weights of the network. This surprisingly simple change leads to a boost in performance and - combined with GroupNorm - new state-of-the-art results.

https://arxiv.org/abs/1903.10520

Abstract:
In this paper, we propose Weight Standardization (WS) to accelerate deep network training. WS is targeted at the micro-batch training setting where each GPU typically has only 1-2 images for training. The micro-batch training setting is hard because small batch sizes are not enough for training networks with Batch Normalization (BN), while other normalization methods that do not rely on batch knowledge still have difficulty matching the performances of BN in large-batch training. Our WS ends this problem because when used with Group Normalization and trained with 1 image/GPU, WS is able to match or outperform the performances of BN trained with large batch sizes with only 2 more lines of code. In micro-batch training, WS significantly outperforms other normalization methods. WS achieves these superior results by standardizing the weights in the convolutional layers, which we show is able to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients. The effectiveness of WS is verified on many tasks, including image classification, object detection, instance segmentation, video recognition, semantic segmentation, and point cloud recognition. The code is available here: this https URL.

Authors: Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
...
https://www.youtube.com/watch?v=p-zOeQCoG9c

Author

Content Type

Unspecified

video/mp4

Language

English

Open in LBRY

More from the publisher

Controlling

VIDEO

WORLD

world-models

lbry://@yannickilcher/world-models

Authors: David Ha, Jürgen Schmidhuber Abstract: We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. https://arxiv.org/abs/1803.10122 ... https://www.youtube.com/watch?v=dPsXxLyqpfs

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

ROBER

roberta-a-robustly-optimized-bert

lbry://@yannickilcher/roberta-a-robustly-optimized-bert

This paper shows that the original BERT model, if trained correctly, can outperform all of the improvements that have been proposed lately, raising questions about the necessity and reasoning behind these. Abstract: Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code. Authors: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov https://arxiv.org/abs/1907.11692 YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Minds: https://www.minds.com/ykilcher BitChute: https://www.bitchute.com/channel/10a5ui845DOJ/ ... https://www.youtube.com/watch?v=-MCYbmU9kfg

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

NEURA

neural-architecture-search-without

lbry://@yannickilcher/neural-architecture-search-without

#ai #research #machinelearning Neural Architecture Search is typically very slow and resource-intensive. A meta-controller has to train many hundreds or thousands of different models to find a suitable building plan. This paper proposes to use statistics of the Jacobian around data points to estimate the performance of proposed architectures at initialization. This method does not require training and speeds up NAS by orders of magnitude. OUTLINE: 0:00 - Intro & Overview 0:50 - Neural Architecture Search 4:15 - Controller-based NAS 7:35 - Architecture Search Without Training 9:30 - Linearization Around Datapoints 14:10 - Linearization Statistics 19:00 - NAS-201 Benchmark 20:15 - Experiments 34:15 - Conclusion & Comments Paper: https://arxiv.org/abs/2006.04647 Code: https://github.com/BayesWatch/nas-without-training Abstract: The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be extremely slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be remedied if we could infer a network's trained accuracy from its initial state. In this work, we examine how the linear maps induced by data points correlate for untrained network architectures in the NAS-Bench-201 search space, and motivate how this can be used to give a measure of modelling flexibility which is highly indicative of a network's trained performance. We incorporate this measure into a simple algorithm that allows us to search for powerful networks without any training in a matter of seconds on a single GPU. Code to reproduce our experiments is available at this https URL. Authors: Joseph Mellor, Jack Turner, Amos Storkey, Elliot J. Crowley Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/ If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar (preferred to Patreon): https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=a6v92P0EbJc

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

[ML N

ml-news-de-biasing-gpt-3-rl-cracks-chip

lbry://@yannickilcher/ml-news-de-biasing-gpt-3-rl-cracks-chip

OUTLINE: 0:00 - Intro 0:30 - Google RL creates next-gen TPUs 2:15 - Facebook launches NetHack challenge 3:50 - OpenAI mitigates bias by fine-tuning 9:05 - Google AI releases browseable reconstruction of human cortex 9:50 - GPT-J 6B Transformer in JAX 12:00 - Tensorflow launches Forum 13:50 - Text style transfer from a single word 15:45 - ALiEn artificial life simulator My Video on Chip Placement: https://youtu.be/PDRtyrVskMU References: RL creates next-gen TPUs https://www.nature.com/articles/s41586-021-03544-w https://www.youtube.com/watch?v=PDRtyrVskMU Facebook launches NetHack challenge https://ai.facebook.com/blog/launching-the-nethack-challenge-at-neurips-2021/ Mitigating bias by fine-tuning https://openai.com/blog/improving-language-model-behavior/?s=09 Human Cortex 3D Reconstruction https://ai.googleblog.com/2021/06/a-browsable-petascale-reconstruction-of.html GPT-J: An open-source 6B transformer https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/ https://6b.eleuther.ai/ https://github.com/kingoflolz/mesh-transformer-jax/#gpt-j-6b Tensorflow launches "Forum" https://discuss.tensorflow.org/ Text style transfer from single word https://ai.facebook.com/blog/ai-can-now-emulate-text-style-in-images-in-one-shot-using-just-a-single-word/ ALiEn Life Simulator https://github.com/chrxh/alien Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/ BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=Ihg4XDWOy68

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

LISTE

listening-to-you!-channel-update-(author

lbry://@yannickilcher/listening-to-you!-channel-update-(author

#mlnews #kilcher #withtheauthors Many of you have given me feedback on what you did and didn't like about the recent "with the authors" videos. Here's the result of that feedback and an outlook into the future. Merch: http://store.ykilcher.com Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=cO1nSnsH_CQ

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

TALKI

talking-to-companies-at-icml19

lbry://@yannickilcher/talking-to-companies-at-icml19

A short rant on sponsor companies at ICML and how to talk to them. ... https://www.youtube.com/watch?v=hkw-WDBipgo

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

SUPER

supervised-contrastive-learning-2

lbry://@yannickilcher/supervised-contrastive-learning-2

The cross-entropy loss has been the default in deep learning for the last few years for supervised learning. This paper proposes a new loss, the supervised contrastive loss, and uses it to pre-train the network in a supervised fashion. The resulting model, when fine-tuned to ImageNet, achieves new state-of-the-art. https://arxiv.org/abs/2004.11362 Abstract: Cross entropy is the most widely used loss function for supervised training of image classification models. In this paper, we propose a novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations. We modify the batch contrastive loss, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting. We are thus able to leverage label information more effectively than cross entropy. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from different classes. In addition to this, we leverage key ingredients such as large batch sizes and normalized embeddings, which have been shown to benefit self-supervised learning. On both ResNet-50 and ResNet-200, we outperform cross entropy by over 1%, setting a new state of the art number of 78.8% among methods that use AutoAugment data augmentation. The loss also shows clear benefits for robustness to natural corruptions on standard benchmarks on both calibration and accuracy. Compared to cross entropy, our supervised contrastive loss is more stable to hyperparameter settings such as optimizers or data augmentations. Authors: Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, Dilip Krishnan Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher ... https://www.youtube.com/watch?v=MpdbFLXOOIw

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

HOW C

how-cyber-criminals-are-using-chatgpt-(w

lbry://@yannickilcher/how-cyber-criminals-are-using-chatgpt-(w

#cybercrime #chatgpt #security An interview with Sergey Shykevich, Threat Intelligence Group Manager at Check Point, about how models like ChatGPT have impacted the realm of cyber crime. https://threatmap.checkpoint.com/ Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=10nEx2-8J0M

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English

VIDEO

BERT:

bert-pre-training-of-deep-bidirectional

lbry://@yannickilcher/bert-pre-training-of-deep-bidirectional

https://arxiv.org/abs/1810.04805 Abstract: We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%. Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova ... https://www.youtube.com/watch?v=-9evrZnBorM

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English