LBRY Block Explorer • Claim • mysterious-emergent-abilities-of-large

LBRY Claims • mysterious-emergent-abilities-of-large

ccfc22edc6aa52394dc00345305e142168a1f2b9

Published By

@DeepLearningExplainer

Created On

21 Apr 2023 03:50:01 UTC

Transaction ID

6df3f87fdea881cb8769974f8b12a381f95a17cf1e3365bcbda01cd9ff8820cb

Cost

Safe for Work

Free

Yes

Mysterious Emergent Abilities of Large Language Models

Dive into the fascinating world of large language models and their emergent abilities in this insightful video. We discuss the unpredictable phenomenon of emergent abilities, which are present in larger models but not in smaller ones. Learn about the relationship between scaling up language models and the qualitative changes in their behavior. This video covers various aspects of emergent abilities, including few-shot prompting, augmented prompting strategies, and the potential for further scaling to expand the range of language model capabilities. Join us as we explore this exciting research area at the intersection of artificial intelligence and natural language processing

Connect with me
Twitter: https://twitter.com/DeepExplainer
Linkedin: https://www.linkedin.com/in/edwin-xue-yong-fu-955723a6/
Email: edwindeeplearning@gmail.com
...
https://www.youtube.com/watch?v=WI21ZxBlOmo

Author

Content Type

Unspecified

video/mp4

Language

English

Open in LBRY

More from the publisher

Controlling

VIDEO

TOWAR

toward-efficient-learning-model-agnostic

lbry://@DeepLearningExplainer/toward-efficient-learning-model-agnostic

This video explains an algorithms for meta-learning that is model-agnostic. It is compatible with any model trained with gradient descent and applicable to a variety of different learning problems 0:00 - Intro 2:29 - Human Intelligence 4:07 - The goal of this meta learning 5:56 - Model-agnostic meta learning 10:17 - Step 1 - standard learning 12:04 - Step 2 - meta learning 15:59 - Algorithm 18:25 - Experiment setup 19:54 - Omniglot data 22:17 - MiniImagenet data 23:08 - Recap Related Video: Can Machines Learn Like Humans - In-context Learning\Meta Learning https://youtu.be/no5P_0ZYoOw Paper: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks https://arxiv.org/abs/1703.03400 Code: https://github.com/cbfinn/maml Abstract: We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies. ... https://www.youtube.com/watch?v=tGTNplKgt6Q

Transaction

Created

1 month ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

BERT:

bert-pre-training-of-deep-bidirectional

lbry://@DeepLearningExplainer/bert-pre-training-of-deep-bidirectional

This video explains a legendary paper, BERT. It leverages the Transformer encoder and comes up with an innovative way to pre-training language models (masked language modeling). BERT has a significant influence on how people approach NLP problems and inspires a lot of following studies and BERT variants. 0:00 - Intro 1:32 - Transformer v.s LSTMs 3:34 - Pre-BERT times 8:22 - Model architecture 9:46 - WordPiece embeddings 14:25 - Special tokens 16:42 - Input representations 18:15 - Masked language modeling 20:03 - Mismatch between pre-training and fine-tuning 23:21 - Next sentence prediction 26:28 - Pre-training data 30:57 - end-to-end fine-tuning 34:45 - SQaUD 36:57 - Ablation over pre-training tasks 41:37 - Ablation over model size 43:17 - Feature-based approach with BERT Related Videos: Transformer explained https://youtu.be/ELTGIye424E Introduction of GPT-3: The Most Powerful Language Model Ever https://youtu.be/Rv5SeM7LxLQ Paper https://arxiv.org/abs/1810.04805 Code https://github.com/google-research/bert (TensorFlow) https://github.com/huggingface/transformers (PyTorch) Connect Twitter https://twitter.com/home email edwindeeplearning@gmail.com Abstract We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement). ... https://www.youtube.com/watch?v=j9toSIRf4RI

Transaction

Created

1 month ago

Content Type

Language

video/mp4

Controlling

VIDEO

MULTI

multitask-prompted-training-enables-zero

lbry://@DeepLearningExplainer/multitask-prompted-training-enables-zero

Can zero-shot generalization instead be directly induced by explicit multitask learning? Watch the video to find out! 0:00 - Intro 2:14 - Prompted training format 5:52 - Measuring generalization to unseen tasks 8:45 - Held-out tasks 10:45 - The future of NLP 11:48 - Model 12:17 - Experiment results Connect Linkedin https://www.linkedin.com/in/xue-yong-fu-955723a6/ Twitter https://twitter.com/home email edwindeeplearning@gmail.com Paper https://arxiv.org/abs/2110.08207 Code https://github.com/bigscience-workshop/promptsource/ Abstract Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks. It has been hypothesized that this is a consequence of implicit multitask learning in language model training. Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping general natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts using varying natural language. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-Bench benchmark, outperforming models 6x its size. ... https://www.youtube.com/watch?v=YToXXfrIu6w

Transaction

Created

4 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

VOKEN

vokenization-improving-language

lbry://@DeepLearningExplainer/vokenization-improving-language

It's a super cool paper that invents "vokenization" to generate a large amount of visually-grounded language datasets and trains visually-grounded models on those. Most language models are trained on pure text data. Although it achieves significant success in recent years, but this is not how humans acquire a language. It raises an interesting question "Can language models achieve a high level of language understanding by reading the text input alone?" The answer is probably "no". To push the boundary of language models, adding other learning signals in the learning process is the key to success. And the first thing that comes to my mind is vision (visual cue). However, the existing visually-grounded datasets are a level of magnitude smaller than pure text ones. This paper purposes "vokenization" method to overcome this problem, and uses the new data that generate to train visually-supervised language models. More importantly, visually-grounded models show significant improvements over text-grounded only models. 0:00 - How did you learn your first language 1:00 - What's special about this paper 2:56 - How humans learn a language 5:23 - Visual pointing 6:07 - Challenge to visually-grounded supervision 9:58 - Token-image matching 11:53 - Vokenization 18:40 - Vokenizer training 23:39 - Visually-supervised language models 25:48 - Voken classification tasks 27:24 - Loss function 28:37 - Implication of voken classification 31:56 - Fine-tuning results 35:22 - Conventional visually-grounded corpora are very different 37:51 - Sentence-level v.s token-level 41:45 - Summary Paper https://arxiv.org/abs/2010.06775 Code https://github.com/airsplay/vokenization Abstract Humans learn language by listening, speaking, writing, reading, and also, via interaction with the multimodal real world. Existing language pre-training frameworks show the effectiveness of text-only self-supervision while we explore the idea of a visually-supervised language model in this paper. We find that the main reason hindering this exploration is the large divergence in magnitude and distributions between the visually-grounded language datasets and pure-language corpora. Therefore, we develop a technique named "vokenization" that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images (which we call "vokens"). The "vokenizer" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora. Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks such as GLUE, SQuAD, and SWAG. Connect Twitter https://twitter.com/home ... https://www.youtube.com/watch?v=4T1u3Z2DaZA

Transaction

Created

1 month ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

AI DE

ai-detects-covid-19-by-listening-to

lbry://@DeepLearningExplainer/ai-detects-covid-19-by-listening-to

Asymptomatic people infected with Covid-19, by definition, they don't have any symptoms. We're not supposed to tell the difference between them and the healthy. The AI system built by an MIT team can detect it with 97% accuracy. More interestingly, it's able to detect asymptomatic people 100% (sensitivity). The proposed model comprises 4 biomarkers (3 ResNet models and a Poisson mask). Each of them represents a hypothesis of the repository disease. Caveat: More replication is needed. There are clinical trials ongoing in Mount Sinai and White Planes Hospitals in the US, Catalan Health Institute in Catalonia, Hospitales Civiles de Guadalajara in Mexico, and Ospedale Luigi Sacco in Italy 0:00 - Intro 4:30 - Are the asymptomatics free of change 6:04 - COVID-19 cough dataset 7:41 - Model architecture 11:43 - Muscular degradation 13:13 - Vocal cords 14:46 - Sentiment 15:47 - Lungs and Respiratory Tract 19:48 - Results 22:18 - How many layers to fine-tune 25:33 - Explainable deep learning 28:12 - Summary Paper: COVID-19 Artificial Intelligence Diagnosis using only Cough Recordings https://ieeexplore.ieee.org/document/9208795 Connect Twitter https://twitter.com/home email edwindeeplearning@gmail.com ... https://www.youtube.com/watch?v=J_OmBva8_RA

Transaction

Created

1 month ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

EFFIC

efficient-one-pass-end-to-end-entity

lbry://@DeepLearningExplainer/efficient-one-pass-end-to-end-entity

How to perform full end-to-end entity linking has always been a challenging problem in NLP. The typical approach for this is to use a model to detect entities and then employ another model to perform entity disambiguation. And this paper beautifully formulates these two steps into a single neural network model. 0:00 - Ya ya ya 0:56 - What's special about this paper 2:10 - System overview 3:29 - Question & entities 6:19 - Mention detection 9:06 - Entity disambiguation 11:46 - Mention detection loss 14:09 - Entity disambiguation loss 15:45 - Datasets 16:24 - Results & discussion 22:55 - Runtime comparison 23:10 - Proof of concept 25:10 - Summary Connect Twitter https://twitter.com/home Email edwindeeplearning@gmail.com Related videos: REALM: Retrieval-Augmented Language Model https://youtu.be/JQ-bxQT5Qsw Question and Answer Test-Train Overlap in Open Domain QA https://youtu.be/Cb5sj4_Ztfo Paper Efficient One Pass End to End Entity Linking for Questions https://arxiv.org/abs/2010.02413 Code https://github.com/facebookresearch/BLINK/tree/master/elq Abstract We present ELQ, a fast end-to-end entity linking model for questions, which uses a biencoder to jointly perform mention detection and linking in one pass. Evaluated on WebQSP and GraphQuestions with extended annotations that cover multiple entities per question, ELQ outperforms the previous state of the art by a large margin of +12.7% and +19.6% F1, respectively. With a very fast inference time (1.57 examples/s on a single CPU), ELQ can be useful for downstream question answering systems. In a proof-of-concept experiment, we demonstrate that using ELQ significantly improves the downstream QA performance of GraphRetriever. ... https://www.youtube.com/watch?v=eXN7Bu06RjI

Transaction

Created

1 month ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

QUANT

quantifying-attention-flow-in

lbry://@DeepLearningExplainer/quantifying-attention-flow-in

This video walks you through the paper "Quantifying Attention Flow In Transformers" that proposes a simple yet effective method to better analyze transformer-base models' attention weights. Line to the paper: https://arxiv.org/abs/2005.00928 (Quantifying Attention Flow In Transformers) The official code implementation of the paper: https://github.com/samiraabnar/attention_flow Relevant video: Revealing Dark Secrets of BERT (Analysis of BERT's Attention Heads) - Paper Explained https://youtu.be/mnU9ILoDH68 Abstract of the paper: In the Transformer model, “self-attention” combines information from attended embed- dings into the representation of the focal em- bedding in the next layer. Thus, across layers of the Transformer, information originating from different tokens gets increasingly mixed. This makes attention weights unreliable as ex- planations probes. In this paper, we consider the problem of quantifying this flow of information through self-attention. We propose two methods for approximating the attention to in- put tokens given attention weights, attention rollout and attention flow, as post hoc methods when we use attention weights as the relative relevance of the input tokens. We show that these methods give complementary views on the flow of information, and compared to raw attention, both yield higher correlations with importance scores of input tokens obtained using an ablation method and input gradients. ... https://www.youtube.com/watch?v=3Q0ZXqVaQPo

Transaction

Created

1 month ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

HOW T

how-to-teach-computers-understand-videos

lbry://@DeepLearningExplainer/how-to-teach-computers-understand-videos

A groundbreaking way to do self-supervision on videos and text. I would say it's the BERT moment for this video-text understanding. #videoclip #contrastivelearning #videotransformer 0:00 - Intro 3:31 - Retrieval augmented training 5:07 - Video and text encoding8:48 - Contrastive loss 12:09 - Zero-shot transfer to end tasks 14:05 - Experiment results 18:09 - What did we learn VideoCLIP: Contrastive Pre-training forZero-shot Video-Text Understanding https://arxiv.org/abs/2109.14084 Connect Twitter https://twitter.com/home Linkedin https://www.linkedin.com/in/xue-yong-fu-955723a6/ email edwindeeplearning@gmail.com Abstract We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks. VideoCLIP trains a transformer for video and text by contrasting temporally overlapping positive video-text pairs with hard negatives from nearest neighbor retrieval. Our experiments on a diverse series of downstream tasks, including sequence-level text-video retrieval, VideoQA, token-level action localization, and action segmentation reveal state-of-the-art performance, surpassing prior work, and in some cases even outperforming supervised approaches. ... https://www.youtube.com/watch?v=vqMZjsIKUoQ

Transaction

Created

1 month ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

IMPRO

improving-punctuation-restoration-for

lbry://@DeepLearningExplainer/improving-punctuation-restoration-for

It proposes a data sampling technique and a two-stage fine-tuning approach, allowing people to sample more training data similar to our in-domain ASR transcripts and improve the model performance. 0:00 - How to make a model more accurate 1:02 - I published a paper 3:05 - Punctuation restoration 5:32 - In-domain data 7:29 - Annotated data is expensive 8:47 - Opensubtitles 10:04 - Data sampling via LM 11:34 - Two-stage fine-tuning 14:55 - Layer reduction 16:49 - Takeaway 18:10- EMNLP 2021 Connect Linkedin https://www.linkedin.com/in/xue-yong-fu-955723a6/ Twitter https://twitter.com/home email edwindeeplearning@gmail.com Paper Improving Punctuation Restoration for Speech Transcripts via External Data https://arxiv.org/abs/2110.00560?context=cs Abstract Automatic Speech Recognition (ASR) systems generally do not produce punctuated transcripts. To make transcripts more readable and follow the expected input format for down-stream language models, it is necessary to add punctuation marks. In this paper, we tackle the punctuation restoration problem specifically for the noisy text (e.g., phone conversation scenarios). To leverage the available writ-ten text datasets, we introduce a data sampling technique based on an n-gram language model to sample more training data that are similar to our in-domain data. Moreover, we propose a two-stage fine-tuning approach that utilizes the sampled external data as well as our in-domain dataset for models based on BERT. Extensive experiments show that the proposed approach outperforms the baseline with an improvementof1.12%F1 score. ... https://www.youtube.com/watch?v=jxOpu4hXPJY

Transaction

Created

1 month ago

Content Type

Language

video/mp4

English