Citaman (Anthonny OLIME)

upvoted an article 5 days ago

Article

Token Merging for fast LLM inference : Background and first trials with Mistral

By

•

Apr 30

• 3

upvoted a paper about 2 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 61

upvoted an article 3 months ago

Article

How I train a LoRA: m3lt style training overview

By

•

Jul 1

• 45

upvoted 3 papers 3 months ago

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 93

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Paper • 2406.08973 • Published Jun 13 • 85

Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11 • 52

upvoted a collection 4 months ago

Universal token classification

Collection

Collection of universal token classification (UTC) models capable in prompt-tuned manner to solve many information extraction tasks. • 11 items • Updated 15 days ago • 12

upvoted 3 papers 4 months ago

upvoted an article 4 months ago

Article

GPU Poor Savior: Revolutionizing Low-Bit Open Source LLMs and Cost-Effective Edge Computing

By

•

May 25

• 9

upvoted 2 articles 5 months ago

Article

Transformers

By

•

Jul 2

• 5

Article

Diffusion Models

By

•

May 19

• 13

upvoted 6 papers 6 months ago

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 77

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

Paper • 2403.18795 • Published Mar 27 • 17

Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27 • 23

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27 • 24

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 51

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27 • 44

upvoted a collection 6 months ago

MGM

Collection

Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated May 3 • 46

upvoted a paper 6 months ago

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Paper • 2403.10704 • Published Mar 15 • 56

upvoted a collection 6 months ago

MetricX-23

Collection

A collection of MetricX-23 models (https://aclanthology.org/2023.wmt-1.63/) • 6 items • Updated Jul 31 • 14

upvoted 3 papers 6 months ago

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14 • 71

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14 • 54

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 123

upvoted 2 collections 7 months ago

Text-to-Image Base Models

Collection

All text-to-image open source base models, with their respective license • 28 items • Updated May 10 • 19

DeepMind GraphCast

Collection

Model weights, normalization statistics, and example input data for DeepMind GraphCast. • 4 items • Updated Dec 29, 2023 • 2

upvoted a collection 8 months ago

Transformer Arch

Collection

Checkout: https://bbycroft.net/llm and http://nlp.seas.harvard.edu/2018/04/03/attention.html • 11 items • Updated Mar 15 • 1

upvoted 4 papers 8 months ago

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 109

Rethinking Optimization and Architecture for Tiny Language Models

Paper • 2402.02791 • Published Feb 5 • 12

Shortened LLaMA: A Simple Depth Pruning for Large Language Models

Paper • 2402.02834 • Published Feb 5 • 13

Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models

Paper • 2310.05861 • Published Oct 9, 2023 • 2

upvoted a collection 8 months ago

OLMo Suite

Collection

Artifacts for the first set of OLMo models. • 18 items • Updated about 6 hours ago • 50

upvoted a paper 8 months ago

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

Paper • 2302.09664 • Published Feb 19, 2023 • 2

upvoted a collection 8 months ago

LLaVA-1.6

Collection

A collection of LLaVA-1.6 checkpoints • 4 items • Updated Jan 31 • 64

upvoted 25 papers 8 months ago

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Paper • 2401.16420 • Published Jan 29 • 54

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29 • 48

ReGAL: Refactoring Programs to Discover Generalizable Abstractions

Paper • 2401.16467 • Published Jan 29 • 8

H2O-Danube-1.8B Technical Report

Paper • 2401.16818 • Published Jan 30 • 16

Weaver: Foundation Models for Creative Writing

Paper • 2401.17268 • Published Jan 30 • 41

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8 • 70

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

Paper • 2401.05252 • Published Jan 10 • 45

LEGO:Language Enhanced Multi-modal Grounding Model

Paper • 2401.06071 • Published Jan 11 • 10

Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models

Paper • 2401.06102 • Published Jan 11 • 19

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Paper • 2401.05675 • Published Jan 11 • 20

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10 • 25

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11 • 42

PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11 • 46

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17 • 58

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 141

VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18 • 37

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Paper • 2401.10774 • Published Jan 19 • 53

Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment

Paper • 2401.12474 • Published Jan 23 • 33

Small Language Model Meets with Reinforced Vision Vocabulary

Paper • 2401.12503 • Published Jan 23 • 31

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 86

MaLA-500: Massive Language Adaptation of Large Language Models

Paper • 2401.13303 • Published Jan 24 • 11

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Paper • 2401.13919 • Published Jan 25 • 23

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25 • 46

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Paper • 2401.15077 • Published Jan 26 • 17

Learning Universal Predictors

Paper • 2401.14953 • Published Jan 26 • 18

Anthonny OLIME

AI & ML interests

Organizations

Citaman's activity

Token Merging for fast LLM inference : Background and first trials with Mistral

How I train a LoRA: m3lt style training overview

GPU Poor Savior: Revolutionizing Low-Bit Open Source LLMs and Cost-Effective Edge Computing

Transformers

Diffusion Models