amyeroberts (Amy Roberts)

upvoted a collection 5 months ago

Idefics2 🐶

Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 88

upvoted a collection 6 months ago

LLaVa-NeXT

Collection

LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19 • 25

upvoted 3 papers 8 months ago

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5 • 67

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Paper • 2402.03161 • Published Feb 5 • 14

V-IRL: Grounding Virtual Intelligence in Real Life

Paper • 2402.03310 • Published Feb 5 • 15

upvoted a collection 8 months ago

Canonical models

Collection

This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace • 68 items • Updated Feb 13 • 13

upvoted 16 papers 9 months ago

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

Paper • 2401.05252 • Published Jan 10 • 45

What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs

Paper • 2401.02411 • Published Jan 4 • 12

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3 • 30

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Paper • 2401.02955 • Published Jan 5 • 19

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 16

VideoLCM: Video Latent Consistency Model

Paper • 2312.09109 • Published Dec 14, 2023 • 22

Point Transformer V3: Simpler, Faster, Stronger

Paper • 2312.10035 • Published Dec 15, 2023 • 17

MobileSAMv2: Faster Segment Anything to Everything

Paper • 2312.09579 • Published Dec 15, 2023 • 20

upvoted a collection 10 months ago

LLaMA-VID

Collection

LLaMA-VID checkpoints. Please refer to project page for more detail: https://llama-vid.github.io/ • 11 items • Updated Dec 3, 2023 • 4

upvoted 26 papers 10 months ago

Kandinsky 3.0 Technical Report

Paper • 2312.03511 • Published Dec 6, 2023 • 43

HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image

Paper • 2312.04543 • Published Dec 7, 2023 • 21

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 17

Controllable Human-Object Interaction Synthesis

Paper • 2312.03913 • Published Dec 6, 2023 • 22

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Paper • 2312.04474 • Published Dec 7, 2023 • 29

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper • 2312.03818 • Published Dec 6, 2023 • 31

Customizing Motion in Text-to-Video Diffusion Models

Paper • 2312.04966 • Published Dec 7, 2023 • 10

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Paper • 2312.04963 • Published Dec 7, 2023 • 16

DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models

Paper • 2312.05107 • Published Dec 8, 2023 • 38

Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 23

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Paper • 2312.06109 • Published Dec 11, 2023 • 20

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Paper • 2312.06655 • Published Dec 11, 2023 • 23

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Paper • 2311.10122 • Published Nov 16, 2023 • 26

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Paper • 2311.06607 • Published Nov 11, 2023 • 3

Holistic Evaluation of Text-To-Image Models

Paper • 2311.04287 • Published Nov 7, 2023 • 11

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Paper • 2311.04257 • Published Nov 7, 2023 • 20

LRM: Large Reconstruction Model for Single Image to 3D

Paper • 2311.04400 • Published Nov 8, 2023 • 47

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 42

FMViT: A multiple-frequency mixing Vision Transformer

Paper • 2311.05707 • Published Nov 9, 2023 • 5

GOAT: GO to Any Thing

Paper • 2311.06430 • Published Nov 10, 2023 • 14

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

Paper • 2311.07446 • Published Nov 13, 2023 • 28

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

Paper • 2311.07590 • Published Nov 9, 2023 • 16

Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster

Paper • 2311.08263 • Published Nov 14, 2023 • 15

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Paper • 2311.07885 • Published Nov 14, 2023 • 39

Drivable 3D Gaussian Avatars

Paper • 2311.08581 • Published Nov 14, 2023 • 46

Instant3D: Instant Text-to-3D Generation

Paper • 2311.08403 • Published Nov 14, 2023 • 44

upvoted 11 papers about 1 year ago

SLiMe: Segment Like Me

Paper • 2309.03179 • Published Sep 6, 2023 • 29

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Paper • 2309.04354 • Published Sep 8, 2023 • 13

NExT-GPT: Any-to-Any Multimodal LLM

Paper • 2309.05519 • Published Sep 11, 2023 • 78

Tracking Anything with Decoupled Video Segmentation

Paper • 2309.03903 • Published Sep 7, 2023 • 27

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 239

Tracking Anything in High Quality

Paper • 2307.13974 • Published Jul 26, 2023 • 13

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Paper • 2307.16449 • Published Jul 31, 2023 • 15

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Paper • 2308.04079 • Published Aug 8, 2023 • 166

Nougat: Neural Optical Understanding for Academic Documents

Paper • 2308.13418 • Published Aug 25, 2023 • 34

Doppelgangers: Learning to Disambiguate Images of Similar Structures

Paper • 2309.02420 • Published Sep 5, 2023 • 9

SynJax: Structured Probability Distributions for JAX

Paper • 2308.03291 • Published Aug 7, 2023 • 5

Amy Roberts

AI & ML interests

Articles

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Object Detection Leaderboard

Organizations

amyeroberts's activity