LBRY Block Explorer • Claim • ml-news-grok-1-open-sourced-nvidia-gtc

LBRY Claims • ml-news-grok-1-open-sourced-nvidia-gtc

dd5cb94ecc90f105a14fa0e764e9195b00f3ad61

Published By

@yannickilcher

Created On

26 Mar 2024 21:56:24 UTC

Transaction ID

2cd2216ea4b944bdfbfcbbda902b2ef7a6a46c94b98fd4262a8abcb8a346b9bb

Cost

Safe for Work

Free

Yes

[ML News] Grok-1 open-sourced | Nvidia GTC | OpenAI leaks model names | AI Act

Author

Content Type

Unspecified

video/mp4

Language

English

Open in LBRY

More from the publisher

Controlling

VIDEO

FASTE

faster-neural-network-training-with-data

lbry://@yannickilcher/faster-neural-network-training-with-data

CPUs are often bottlenecks in Machine Learning pipelines. Data fetching, loading, preprocessing and augmentation can be slow to a point where the GPUs are mostly idle. Data Echoing is a technique to re-use data that is already in the pipeline to reclaim this idle time and keep the GPUs busy at all times. https://arxiv.org/abs/1907.05550 Abstract: In the twilight of Moore's law, GPUs and other specialized hardware accelerators have dramatically sped up neural network training. However, earlier stages of the training pipeline, such as disk I/O and data preprocessing, do not run on accelerators. As accelerators continue to improve, these earlier stages will increasingly become the bottleneck. In this paper, we introduce "data echoing," which reduces the total computation used by earlier pipeline stages and speeds up training whenever computation upstream from accelerators dominates the training time. Data echoing reuses (or "echoes") intermediate outputs from earlier pipeline stages in order to reclaim idle capacity. We investigate the behavior of different data echoing algorithms on various workloads, for various amounts of echoing, and for various batch sizes. We find that in all settings, at least one data echoing algorithm can match the baseline's predictive performance using less upstream computation. We measured a factor of 3.25 decrease in wall-clock time for ResNet-50 on ImageNet when reading training data over a network. Authors: Dami Choi, Alexandre Passos, Christopher J. Shallue, George E. Dahl Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher ... https://www.youtube.com/watch?v=bFn2xcGi1TQ

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

[ML N

ml-news-uber-deep-learning-for-eta

lbry://@yannickilcher/ml-news-uber-deep-learning-for-eta

#mlnews #muzero #nerf Your regularly irregular updates on everything new in the ML world! Merch: http://store.ykilcher.com OUTLINE: 0:00 - Intro 0:15 - Sponsor: Weights & Biases 2:15 - Uber switches from XGBoost to Deep Learning for ETA prediction 5:45 - MuZero advances video compression 10:10 - Learned Soft Prompts can steer large language models 12:45 - Block-NeRF captures entire city blocks 14:15 - Neural Architecture Search considers underlying hardware 16:50 - Mega-Blog on Self-Organizing Agents 18:40 - Know Your Data (for Tensorflow Datasets) 20:30 - Helpful Things Sponsor: Weights & Biases https://wandb.me/yannic References: https://docs.wandb.ai/guides/integrations/other/openai https://colab.research.google.com/github/wandb/examples/blob/master/colabs/openai/Fine_tune_GPT_3_with_Weights_%26_Biases.ipynb#scrollTo=rJdQqrC8Ablo https://wandb.ai/borisd13/GPT-3/reports/Fine-Tuning-Tips-and-Exploration-on-OpenAI-s-GPT-3---VmlldzoxNDYwODA2 Uber switches from XGBoost to Deep Learning for ETA prediction https://eng.uber.com/deepeta-how-uber-predicts-arrival-times/?utm_source=pocket_mylist MuZero advances video compression https://deepmind.com/blog/article/MuZeros-first-step-from-research-into-the-real-world https://storage.googleapis.com/deepmind-media/MuZero/MuZero%20with%20self-competition.pdf Learned Soft Prompts can steer large language models https://ai.googleblog.com/2022/02/guiding-frozen-language-models-with.html https://aclanthology.org/2021.emnlp-main.243/ Block-NeRF captures entire city blocks https://arxiv.org/abs/2202.05263 https://arxiv.org/pdf/2202.05263.pdf https://waymo.com/intl/zh-cn/research/block-nerf/ Neural Architecture Search considers underlying hardware https://ai.googleblog.com/2022/02/unlocking-full-potential-of-datacenter.html https://openaccess.thecvf.com/content/CVPR2021/papers/Li_Searching_for_Fast_Model_Families_on_Datacenter_Accelerators_CVPR_2021_paper.pdf Mega-Blog on Self-Organizing Agents https://developmentalsystems.org/sensorimotor-lenia/ https://flowers.inria.fr/ Know Your Data (for Tensorflow Datasets) https://knowyourdata-tfds.withgoogle.com/#dataset=pass&filters=kyd%2Fcloud_vision%2Fface_probability:9&tab=RELATIONS&item=train%5B89%25%3A91%25%5D_27143&expanded_groups=cloud_vision https://knowyourdata.withgoogle.com/ Helpful Things https://twitter.com/casualganpapers/status/1490318575873241091 https://www.reddit.com/r/MachineLearning/comments/snmtzn/r_phd_thesis_on_neural_differential_equations/ https://arxiv.org/abs/2202.02435 https://github.com/vicariousinc/PGMax https://www.vicarious.com/posts/pgmax-factor-graphs-for-discrete-probabilistic-graphical-models-and-loopy-belief-propagation-in-jax/?utm_content=197542312&utm_medium=social&utm_source=twitter&hss_channel=tw-204185426 https://diambra.ai/to ... https://www.youtube.com/watch?v=fEKZC9mta8w

Transaction

Created

2 weeks ago

Content Type

Language

video/mp4

Controlling

VIDEO

NEURA

neural-architecture-search-without

lbry://@yannickilcher/neural-architecture-search-without

#ai #research #machinelearning Neural Architecture Search is typically very slow and resource-intensive. A meta-controller has to train many hundreds or thousands of different models to find a suitable building plan. This paper proposes to use statistics of the Jacobian around data points to estimate the performance of proposed architectures at initialization. This method does not require training and speeds up NAS by orders of magnitude. OUTLINE: 0:00 - Intro & Overview 0:50 - Neural Architecture Search 4:15 - Controller-based NAS 7:35 - Architecture Search Without Training 9:30 - Linearization Around Datapoints 14:10 - Linearization Statistics 19:00 - NAS-201 Benchmark 20:15 - Experiments 34:15 - Conclusion & Comments Paper: https://arxiv.org/abs/2006.04647 Code: https://github.com/BayesWatch/nas-without-training Abstract: The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be extremely slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be remedied if we could infer a network's trained accuracy from its initial state. In this work, we examine how the linear maps induced by data points correlate for untrained network architectures in the NAS-Bench-201 search space, and motivate how this can be used to give a measure of modelling flexibility which is highly indicative of a network's trained performance. We incorporate this measure into a simple algorithm that allows us to search for powerful networks without any training in a matter of seconds on a single GPU. Code to reproduce our experiments is available at this https URL. Authors: Joseph Mellor, Jack Turner, Amos Storkey, Elliot J. Crowley Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/ If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar (preferred to Patreon): https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=a6v92P0EbJc

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

CELEB

celebrating-100k-subscribers!-(w-channel

lbry://@yannickilcher/celebrating-100k-subscribers!-(w-channel

#yannickilcher #machinelearning #100k OUTLINE: 0:00 - 100k! 1:00 - Announcements & Thanks 3:55 - Channel Statistics Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/ BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=ifBI2jTaAEo

Transaction

Created

2 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

TREE

tree-of-thoughts-deliberate-problem

lbry://@yannickilcher/tree-of-thoughts-deliberate-problem

#gpt4 #ai #prompt Tree-of-Thought improves prompting of large language models (LLMs) by generalizing the concept of Chain-of-Thought prompting and introduces a tree search across language model thoughts, including state evaluation and backtracking. Experiments on toy tasks show large improvements over both classic and Chain-of-Thought prompting. OUTLINE: 0:00 - Introduction 1:20 - From Chain-of-Thought to Tree-of-Thought 11:10 - Formalizing the algorithm 16:00 - Game of 24 & Creative writing 18:30 - Crosswords 23:30 - Is this a general problem solver? 26:50 - Ablation studies 28:55 - Conclusion Paper: https://arxiv.org/abs/2305.10601 Abstract: Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: this https URL. Authors: Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=ut5kp56wW_4

Transaction

Created

2 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

OPENA

openai-suggests-ai-licenses-(us-senate

lbry://@yannickilcher/openai-suggests-ai-licenses-(us-senate

#ai #openai #gpt4 US Senate hearing on AI regulation. MLST video on the hearing: https://www.youtube.com/watch?v=DeSXnESGxr4 Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=I72_GJHzH3Y

Transaction

Created

2 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

NEURI

neurips-live-stream-(vendor-hall)

lbry://@yannickilcher/neurips-live-stream-(vendor-hall)

Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=GGPNdH1lBBc

Transaction

Created

2 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

RWKV:

rwkv-reinventing-rnns-for-the

lbry://@yannickilcher/rwkv-reinventing-rnns-for-the

#gpt4 #rwkv #transformer We take a look at RWKV, a highly scalable architecture between Transformers and RNNs. Fully Connected (June 7th in SF) Promo Link: https://www.fullyconnected.com/?promo=ynnc OUTLINE: 0:00 - Introduction 1:50 - Fully Connected In-Person Conference in SF June 7th 3:00 - Transformers vs RNNs 8:00 - RWKV: Best of both worlds 12:30 - LSTMs 17:15 - Evolution of RWKV's Linear Attention 30:40 - RWKV's Layer Structure 49:15 - Time-Parallel vs Sequence Mode 53:55 - Experimental Results & Limitations 58:00 - Visualizations 1:01:40 - Conclusion Paper: https://arxiv.org/abs/2305.13048 Abstract: Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of Transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, which parallelizes computations during training and maintains constant computational and memory complexity during inference, leading to the first non-transformer architecture to be scaled to tens of billions of parameters. Our experiments reveal that RWKV performs on par with similarly sized Transformers, suggesting that future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling the trade-offs between computational efficiency and model performance in sequence processing tasks. Authors: Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Xiangru Tang, Bolun Wang, Johan S. Wind, Stansilaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n ... https://www.youtube.com/watch?v=x8pW19wKfXQ

Transaction

Created

2 weeks ago

Content Type

Language

video/mp4

English

Controlling

VIDEO

WHEN

when-bert-plays-the-lottery-all-tickets

lbry://@yannickilcher/when-bert-plays-the-lottery-all-tickets

BERT is a giant model. Turns out you can prune away many of its components and it still works. This paper analyzes BERT pruning in light of the Lottery Ticket Hypothesis and finds that even the "bad" lottery tickets can be fine-tuned to good accuracy. OUTLINE: 0:00 - Overview 1:20 - BERT 3:20 - Lottery Ticket Hypothesis 13:00 - Paper Abstract 18:00 - Pruning BERT 23:00 - Experiments 50:00 - Conclusion https://arxiv.org/abs/2005.00561 ML Street Talk Channel: https://www.youtube.com/channel/UCMLtBahI5DMrt0NPvDSoIRQ Abstract: Much of the recent success in NLP is due to the large Transformer-based models such as BERT (Devlin et al, 2019). However, these models have been shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis. For fine-tuned BERT, we show that (a) it is possible to find a subnetwork of elements that achieves performance comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. However, the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful. We also show that the "good" subnetworks vary considerably across GLUE tasks, opening up the possibilities to learn what knowledge BERT actually uses at inference time. Authors: Sai Prasanna, Anna Rogers, Anna Rumshisky Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher ... https://www.youtube.com/watch?v=IIebBjbBevs

Transaction

Created

3 weeks ago

Content Type

Language

video/mp4

English