view article Article Token Merging for fast LLM inference : Background and first trials with Mistral By samchain • Apr 30 • 3
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 61
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 93
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Paper • 2406.08973 • Published Jun 13 • 85
Universal token classification Collection Collection of universal token classification (UTC) models capable in prompt-tuned manner to solve many information extraction tasks. • 11 items • Updated 15 days ago • 12
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24 • 43
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published May 24 • 52
view article Article GPU Poor Savior: Revolutionizing Low-Bit Open Source LLMs and Cost-Effective Edge Computing By NicoNico • May 25 • 9
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction Paper • 2403.18795 • Published Mar 27 • 17
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion Paper • 2403.18818 • Published Mar 27 • 24
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 44
MGM Collection Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated May 3 • 46
PERL: Parameter Efficient Reinforcement Learning from Human Feedback Paper • 2403.10704 • Published Mar 15 • 56
MetricX-23 Collection A collection of MetricX-23 models (https://aclanthology.org/2023.wmt-1.63/) • 6 items • Updated Jul 31 • 14
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 71
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14 • 54
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 123
Text-to-Image Base Models Collection All text-to-image open source base models, with their respective license • 28 items • Updated May 10 • 19
DeepMind GraphCast Collection Model weights, normalization statistics, and example input data for DeepMind GraphCast. • 4 items • Updated Dec 29, 2023 • 2
Transformer Arch Collection Checkout: https://bbycroft.net/llm and http://nlp.seas.harvard.edu/2018/04/03/attention.html • 11 items • Updated Mar 15 • 1
Self-Discover: Large Language Models Self-Compose Reasoning Structures Paper • 2402.03620 • Published Feb 6 • 109
Rethinking Optimization and Architecture for Tiny Language Models Paper • 2402.02791 • Published Feb 5 • 12
Shortened LLaMA: A Simple Depth Pruning for Large Language Models Paper • 2402.02834 • Published Feb 5 • 13
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models Paper • 2310.05861 • Published Oct 9, 2023 • 2
OLMo Suite Collection Artifacts for the first set of OLMo models. • 18 items • Updated about 6 hours ago • 50
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation Paper • 2302.09664 • Published Feb 19, 2023 • 2
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Paper • 2401.16420 • Published Jan 29 • 54
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 48
ReGAL: Refactoring Programs to Discover Generalizable Abstractions Paper • 2401.16467 • Published Jan 29 • 8
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 70
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models Paper • 2401.05252 • Published Jan 10 • 45
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models Paper • 2401.06102 • Published Jan 11 • 19
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation Paper • 2401.05675 • Published Jan 11 • 20
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Paper • 2401.05566 • Published Jan 10 • 25
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 42
PALP: Prompt Aligned Personalization of Text-to-Image Models Paper • 2401.06105 • Published Jan 11 • 46
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Paper • 2401.09417 • Published Jan 17 • 58
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19 • 53
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment Paper • 2401.12474 • Published Jan 23 • 33
Small Language Model Meets with Reinforced Vision Vocabulary Paper • 2401.12503 • Published Jan 23 • 31
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 86
MaLA-500: Massive Language Adaptation of Large Language Models Paper • 2401.13303 • Published Jan 24 • 11
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models Paper • 2401.13919 • Published Jan 25 • 23
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence Paper • 2401.14196 • Published Jan 25 • 46
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty Paper • 2401.15077 • Published Jan 26 • 17