My thoughts on the changes to the paper submission process for NeurIPS 2020.
The main new changes are:
1. ACs can desk reject papers
2. All authors have to be able to review if asked
3. Resubmissions from other conferences must be marked and a summary of changes since the last submission must be provided
4. Borader societal / ethical impact must be discussed
5. Upon acceptance, all papers must link to an explanatory video and the PDFs for slides and poster
https://neurips.cc/Conferences/2020/CallForPapers
https://youtu.be/361h6lHZGDg
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
...
https://www.youtube.com/watch?v=JPX_jSZtszY
#ai #privacy #tech
This paper demonstrates a method to extract verbatim pieces of the training data from a trained language model. Moreover, some of the extracted pieces only appear a handful of times in the dataset. This points to serious security and privacy implications for models like GPT-3. The authors discuss the risks and propose mitigation strategies.
OUTLINE:
0:00 - Intro & Overview
9:15 - Personal Data Example
12:30 - Eidetic Memorization & Language Models
19:50 - Adversary's Objective & Outlier Data
24:45 - Ethical Hedging
26:55 - Two-Step Method Overview
28:20 - Perplexity Baseline
30:30 - Improvement via Perplexity Ratios
37:25 - Weights for Patterns & Weights for Memorization
43:40 - Analysis of Main Results
1:00:30 - Mitigation Strategies
1:01:40 - Conclusion & Comments
Paper: https://arxiv.org/abs/2012.07805
Abstract:
It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model.
We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.
We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. For example, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.
Authors: Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy
...
https://www.youtube.com/watch?v=plK2WVdLTOY
#mlnews #wudao #academicfraud
OUTLINE:
0:00 - Intro
0:25 - EU seeks to regulate AI
2:45 - AI COVID detection systems are all flawed
5:05 - Chinese lab trains model 10x GPT-3 size
6:55 - Google error identifies "ugliest" language
9:45 - McDonald's learns about AI buzzwords
11:25 - AI predicts cryptocurrency prices
12:00 - Unreal Engine hack for CLIP
12:35 - Please commit more academic fraud
References:
https://www.lawfareblog.com/artificial-intelligence-act-what-european-approach-ai
https://blogs.sciencemag.org/pipeline/archives/2021/06/02/machine-learning-deserves-better-than-this
https://www.nature.com/articles/s42256-021-00307-0
https://en.pingwest.com/a/8693
https://arxiv.org/pdf/2104.12369.pdf
https://www.bbc.com/news/world-asia-india-57355011
https://www.zdnet.com/article/mcdonalds-wants-to-democratise-machine-learning-for-all-users-across-its-operations/
https://www.analyticsinsight.net/ai-is-helping-you-make-profits-by-predicting-cryptocurrency-prices/
https://twitter.com/arankomatsuzaki/status/1399471244760649729
https://jacobbuckman.com/2021-05-29-please-commit-more-blatant-academic-fraud/
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
...
https://www.youtube.com/watch?v=bw1kiLMQFKU
#apple #icloud #neuralhash
Send your Apple fanboy friends to prison with this one simple trick ;) We break Apple's NeuralHash algorithm used to detect CSAM for iCloud photos. I show how it's possible to craft arbitrary hash collisions from any source / target image pair using an adversarial example attack. This can be used for many purposes, such as evading detection, or forging false positives, triggering manual reviews.
OUTLINE:
0:00 - Intro
1:30 - Forced Hash Collisions via Adversarial Attacks
2:30 - My Successful Attack
5:40 - Results
7:15 - Discussion
DISCLAIMER: This is for demonstration and educational purposes only. This is not an endorsement of illegal activity or circumvention of law.
Code: https://github.com/yk/neural_hash_collision
Extract Model: https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX
My Video on NeuralHash: https://youtu.be/z15JLtAuwVI
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
...
https://www.youtube.com/watch?v=6MUpWGeGMxs
#cm3 #languagemodel #transformer
This video contains a paper explanation and an incredibly informative interview with first author Armen Aghajanyan.
Autoregressive Transformers have come to dominate many fields in Machine Learning, from text generation to image creation and many more. However, there are two problems. First, the collected data is usually scraped from the web and uni- or bi-modal and throws away a lot of structure of the original websites, and second, language modelling losses are uni-directional. CM3 addresses both problems: It directly operates on HTML and includes text, hyperlinks, and even images (via VQGAN tokenization) and can therefore be used in plenty of ways: Text generation, captioning, image creation, entity linking, and much more. It also introduces a new training strategy called Causally Masked Language Modelling, which brings a level of bi-directionality into autoregressive language modelling. In the interview after the paper explanation, Armen and I go deep into the how and why of these giant models, we go over the stunning results and we make sense of what they mean for the future of universal models.
OUTLINE:
0:00 - Intro & Overview
6:30 - Directly learning the structure of HTML
12:30 - Causally Masked Language Modelling
18:50 - A short look at how to use this model
23:20 - Start of interview
25:30 - Feeding language models with HTML
29:45 - How to get bi-directionality into decoder-only Transformers?
37:00 - Images are just tokens
41:15 - How does one train such giant models?
45:40 - CM3 results are amazing
58:20 - Large-scale dataset collection and content filtering
1:04:40 - More experimental results
1:12:15 - Why don't we use raw HTML?
1:18:20 - Does this paper contain too many things?
Paper: https://arxiv.org/abs/2201.07520
Abstract:
We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens. Our new causally masked approach generates tokens left to right while also masking out a small number of long token spans that are generated at the end of the string, instead of their original positions. The casual masking object provides a type of hybrid of the more common causal and masked language models, by enabling full generative modeling while also providing bidirectional context when generating the masked spans. We train causally masked language-image models on large-scale web and Wikipedia articles, where each document contains all of the text, hypertext markup, hyperlinks, and image tokens (from a VQVAE-GAN), provided in the order they appear in the original HTML source (before masking). The resulting CM3 models can generate rich structured, multi-modal outputs while conditioning on arbitrary masked
...
https://www.youtube.com/watch?v=qNfCVGbvnJc
OUTLINE:
0:30 - Activity Grammars for Temporal Action Segmentation
8:50 - Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback
17:05 - On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences
21:20 - Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming
27:10 - Equivariant Adaptation of Large Pretrained Models
33:10 - Multi-Head Adapter Routing for Cross-Task Generalization
39:25 - Geometry-Aware Adaptation for Pretrained Models
46:10 - Adversarial Learning for Feature Shift Detection and Correction
Papers:
Title: Activity Grammars for Temporal Action Segmentation
Link: https://arxiv.org/abs/2312.04266
Author:
Dayoung Gong, Joonseok Lee,
Deunsol Jung, Suha Kwak, Minsu Cho
--------
Title: Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback
Link: https://arxiv.org/abs/2311.16102
Author:
Mihir Prabhudesai, Tsung-Wei Ke,
Alexander C. Li, Deepak Pathak, Katerina Fragkiadaki
--------
Title: On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences
Link: https://arxiv.org/abs/2305.18423
Author:
Alireza Fathollah Pour, Hassan Ashtiani
--------
Title: Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming
Link: https://arxiv.org/abs/2310.19068
Author:
Gregory Dexter, Petros Drineas,
David P. Woodruff, Taisuke Yasuda
--------
Title: Equivariant Adaptation of Large Pretrained Models
Link: https://arxiv.org/pdf/2310.01647.pdf
Author:
Arnab Kumar Mondal, Siba Smarak Panigrahi,
Sékou-Oumar Kaba, Sai Rajeswar, Siamak Ravanbakhsh
--------
Title: Multi-Head Adapter Routing for Cross-Task Generalization
Link: https://arxiv.org/abs/2211.03831
Author:
Lucas Caccia, Edoardo Ponti, Zhan Su,
Matheus Pereira, Nicolas Le Roux, Alessandro Sordoni
--------
Title: Geometry-Aware Adaptation for Pretrained Models
Link: https://arxiv.org/abs/2307.12226
Author:
Nicholas Roberts, Xintong Li, Dyah Adila,
Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala
--------
Title: Adversarial Learning for Feature Shift Detection and Correction
Link: https://arxiv.org/abs/2312.04546
Author: Miriam Barrabes, Daniel Mas Montserrat,
Margarita Geleta, Xavier Giro-i-Nieto, Alexander G. Ioannidis
Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
...
https://www.youtube.com/watch?v=cx3bbMf9LRA
#ddpm #diffusionmodels #openai
GANs have dominated the image generation space for the majority of the last decade. This paper shows for the first time, how a non-GAN model, a DDPM, can be improved to overtake GANs at standard evaluation metrics for image generation. The produced samples look amazing and other than GANs, the new model has a formal probabilistic foundation. Is there a future for GANs or are Diffusion Models going to overtake them for good?
OUTLINE:
0:00 - Intro & Overview
4:10 - Denoising Diffusion Probabilistic Models
11:30 - Formal derivation of the training loss
23:00 - Training in practice
27:55 - Learning the covariance
31:25 - Improving the noise schedule
33:35 - Reducing the loss gradient noise
40:35 - Classifier guidance
52:50 - Experimental Results
Paper (this): https://arxiv.org/abs/2105.05233
Paper (previous): https://arxiv.org/abs/2102.09672
Code: https://github.com/openai/guided-diffusion
Abstract:
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for sample quality using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128×128, 4.59 on ImageNet 256×256, and 7.72 on ImageNet 512×512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.85 on ImageNet 512×512. We release our code at this https URL
Authors: Alex Nichol, Prafulla Dhariwal
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5h
...
https://www.youtube.com/watch?v=W-O7AZNzbzQ
Object detection often does not occur in a vacuum. Static cameras, such as wildlife traps, collect lots of irregularly sampled data over a large time frame and often capture repeating or similar events. This model learns to dynamically incorporate other frames taken by the same camera into its object detection pipeline.
OUTLINE:
0:00 - Intro & Overview
1:10 - Problem Formulation
2:10 - Static Camera Data
6:45 - Architecture Overview
10:00 - Short-Term Memory
15:40 - Long-Term Memory
20:10 - Quantitative Results
22:30 - Qualitative Results
30:10 - False Positives
32:50 - Appendix & Conclusion
Paper: https://arxiv.org/abs/1912.03538
My Video On Attention Is All You Need: https://youtu.be/iDulhoQ2pro
Abstract:
In static monitoring cameras, useful contextual information can stretch far beyond the few seconds typical video understanding models might see: subjects may exhibit similar behavior over multiple days, and background objects remain static. Due to power and storage constraints, sampling frequencies are low, often no faster than one frame per second, and sometimes are irregular due to the use of a motion trigger. In order to perform well in this setting, models must be robust to irregular sampling rates. In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera. Specifically, we propose an attention-based approach that allows our model, Context R-CNN, to index into a long term memory bank constructed on a per-camera basis and aggregate contextual features from other frames to boost object detection performance on the current frame.
We apply Context R-CNN to two settings: (1) species detection using camera traps, and (2) vehicle detection in traffic cameras, showing in both settings that Context R-CNN leads to performance gains over strong baselines. Moreover, we show that increasing the contextual time horizon leads to improved results. When applied to camera trap data from the Snapshot Serengeti dataset, Context R-CNN with context from up to a month of images outperforms a single-frame baseline by 17.9% mAP, and outperforms S3D (a 3d convolution based baseline) by 11.2% mAP.
Authors: Sara Beery, Guanhang Wu, Vivek Rathod, Ronny Votel, Jonathan Huang
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
...
https://www.youtube.com/watch?v=eI8xTdcZ6VY
#openai #math #imo
Formal mathematics is a challenging area for both humans and machines. For humans, formal proofs require very tedious and meticulous specifications of every last detail and results in very long, overly cumbersome and verbose outputs. For machines, the discreteness and sparse reward nature of the problem presents a significant problem, which is classically tackled by brute force search, guided by a couple of heuristics. Previously, language models have been employed to better guide these proof searches and delivered significant improvements, but automated systems are still far from usable. This paper introduces another concept: An expert iteration procedure is employed to iteratively produce more and more challenging, but solvable problems for the machine to train on, which results in an automated curriculum, and a final algorithm that performs well above the previous models. OpenAI used this method to even solve two problems of the international math olympiad, which was previously infeasible for AI systems.
OUTLINE:
0:00 - Intro
2:35 - Paper Overview
5:50 - How do formal proofs work?
9:35 - How expert iteration creates a curriculum
16:50 - Model, data, and training procedure
25:30 - Predicting proof lengths for guiding search
29:10 - Bootstrapping expert iteration
34:10 - Experimental evaluation & scaling properties
40:10 - Results on synthetic data
44:15 - Solving real math problems
47:15 - Discussion & comments
Paper: https://arxiv.org/abs/2202.01344
miniF2F benchmark: https://github.com/openai/miniF2F
Abstract:
We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads.
Authors: Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, Ilya Sutskever
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191
If you want to support me,
...
https://www.youtube.com/watch?v=lvYVuOmUVs8