Papers-Notes

  1. DLRM
  2. Matryoshka Representation Learning
  3. Sparse Contrastive Learning for Content-Based Cold Item Recommendation
  4. Efficient Learning of Sparse Representations from Interactions
  5. CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
  6. Diffusion Beats Autoregressive in Data-Constrained Settings
  7. Defeating the Training-Inference Mismatch via FP16
  8. LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework
  9. Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t
  10. The Super Weight in Large Language Models
  11. ImageBind: One Embedding Space To Bind Them All
  12. Scaling (Down) Clip: A Comprehensive Analysis of Data, Architecture, and Training Strategies
  13. Demystifying CLIP data
  14. Learning Transferable Visual Models From Natural Language Supervision
  15. LoRA: Low-Rank Adaptation of Large Language Models
  16. FrugalGPT: How to use LLM while reducing cost and improving performance
  17. Mathematics of Deep Learning
  18. Wasserstein GAN
  19. Why and How of Nonnegative Matrix Factorization
  20. DenseNet
  21. Learning Generative Models with Sinkhorn Divergences
  22. Improving GANs Using Optimal Transport
  23. Mask R-CNN
  24. Fully Convolutional Networks for Semantic Segmentation
  25. Improving Sequence-To-Sequence Learning Via Optimal Transport
  26. Memory-Efficient Implementation of DenseNets
  27. Attention Is All You Need
  28. Analyzing and Improving Representations with the Soft Nearest Neighbor Loss
  29. Optimal Transport for Domain Adaptation
  30. Large Scale Optimal Transport and Mapping Estimation
  31. Autoencoding Variational Bayes
  32. Label Efficient Learning of Transferable Representations across Domains and Tasks
  33. Stacked What-Where Auto-Encoders
  34. Unsupervised Data Augmentation for Consistency Training
  35. Towards Federated Learning at Scale: System Design
  36. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  37. Notification Volume Control and Optimization System at Pinterest
  38. Class-Balanced Loss Based on Effective Number of Samples
  39. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

DLRM

References


ImageBind: One Embedding Space To Bind Them All

References


Scaling (Down) Clip: A Comprehensive Analysis of Data, Architecture, and Training Strategies

References


Demystifying CLIP data

References


Learning Transferable Visual Models From Natural Language Supervision

References


LoRA: Low-Rank Adaptation of Large Language Models

References


FrugalGPT: How to use LLM while reducing cost and improving performance

\[max \:\: {\mathbb{E}_{(q,a) \in (QxA)} [r(a, \hat{a}(s,q))]} \:\: with \:\: \mathbb{E}_{(q,a) \in (QxA)} [c(s,q)] \leq b\]

References


Mathematics of Deep Learning

References


Wasserstein GAN

References


Why and How of Nonnegative Matrix Factorization

\[x_j \approx \sum_{k=1}^{r}\ w_kh_j(k) \;\text{for some weights}\; h_j \in \mathbb{R}^{r}\]

References


Densely Connected Convolutional Networks

References


Learning Generative Models with Sinkhorn Divergences

\[d^{\lambda}_M(r,c) = min_{P\in U(r,c)}\ \sum_{i\,j}\ P_{i\,j} \ M_{i\,j} - \frac{1}{\lambda} h(P)\]

References


Improving GANs Using Optimal Transport

\[\mathcal{D}^2_{MED}(p, g) = 2\mathbb{E}[\mathbb{W}_c(\mathbf{X}, \mathbf{Y})] - \mathbb{E}[\mathbb{W}_c(\mathbf{X}, \mathbf{X'})] - \mathbb{E}[\mathbb{W}_c(\mathbf{Y}, \mathbf{Y'})]\]

where \(\mathbf{X}, \mathbf{X'}\) aare individually sampled mini-bathces from distribution \(\textit{p}\) and \(\mathbf{Y}, \mathbf{Y'}\) are independent mini-bathces from \(\textit{g}\)

References


Mask R-CNN

\[L = L_{cls} + L_{box} + L_{mask}\]

References


Fully Convolutional Networks for Semantic Segmentation

References


Improving Sequence-To-Sequence Learning Via Optimal Transport

\[\mathcal{L} = \mathcal{L}_{MLE} + \gamma_1 \ \mathcal{L}_{copy} + \gamma_2 \ \mathcal{L}_{seq}\]

References


Memory-Efficient Implementation of DenseNets

References


Attention Is All You Need

References


Analyzing and Improving Representations with the Soft Nearest Neighbor Loss

References


Optimal Transport for Domain Adaptation

References


Large Scale Optimal Transport and Mapping Estimation

References


Autoencoding Variational Bayes

References


Label Efficient Learning of Transferable Representations across Domains and Tasks

References


Stacked What-Where Auto-Encoders

References


Unsupervised Data Augmentation for Consistency Training

\[\min_{\theta} \mathcal{J}_{UDA}(\theta) = \mathbb{E}_{x \in U} \mathbb{E_{\hat{x} \sim q(\hat{x}|x)}} [\mathcal{D}(p_{\hat{\theta}} (y \; \| \; x) \; \| \; p_{\theta}(y \; \| \; \hat{x}))]\]

where \(q(\hat{x} \| x)\) is a data augmentation transformation

References


Towards Federated Learning at Scale: System Design

References


BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

References


Notification Volume Control and Optimization System at Pinterest

References


Class-Balanced Loss Based on Effective Number of Samples

\[E_n = (1 - \beta^n) / (1 - \beta), where \beta = (N - 1)/N\] \[CB(p, y) = \frac{1}{E_n} \mathcal{L}(p,y) = \frac{1 - \beta}{1 - \beta^{n_y}}\mathcal{L}(p,y)\]

References


Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

\[y_k = h^k (f^k(x)), \quad where \quad f^k (x) = \sum^n_{i=1} g^k(x)_i f_i(x)\]

References


Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t

References


The Super Weight in Large Language Models

References


LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework

References


Diffusion Beats Autoregressive in Data-Constrained Settings

References


Defeating the Training-Inference Mismatch via FP16

References


CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

References


Efficient Learning of Sparse Representations from Interactions

References


Sparse Contrastive Learning for Content-Based Cold Item Recommendation

References


Matryoshka Representation Learning