ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis Paper • 2409.02048 • Published 21 days ago • 1
Oryx Collection Oryx: One Multi-Modal LLM for On-Demand Spatial-Temporal Understanding • 5 items • Updated 6 days ago • 4
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published 6 days ago • 44
jina-embeddings-v3 Collection Multilingual multi-task general text embedding model • 6 items • Updated 6 days ago • 8
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 6 days ago • 63
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 7 days ago • 178
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 6 days ago • 188
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published 27 days ago • 81
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published 13 days ago • 60
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 11 days ago • 43
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos Paper • 2409.08353 • Published 12 days ago • 9
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published 22 days ago • 73
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published 12 days ago • 15
Insights from Benchmarking Frontier Language Models on Web App Code Generation Paper • 2409.05177 • Published 16 days ago • 5
view article Article All LLMs Write Great Code, But Some Make (A Lot) Fewer Mistakes By onekq • 12 days ago • 3
ProteinBench: A Holistic Evaluation of Protein Foundation Models Paper • 2409.06744 • Published 15 days ago • 6
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published 14 days ago • 56
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published 14 days ago • 52
dilmash release Collection Dilmash: Karakalpak Machine Translation • 5 items • Updated 15 days ago • 2
Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak Paper • 2409.04269 • Published 19 days ago • 8
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Paper • 2409.04593 • Published 18 days ago • 19
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published 21 days ago • 70
Benchmarking Chinese Knowledge Rectification in Large Language Models Paper • 2409.05806 • Published 15 days ago • 14
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11 • 65
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data Paper • 2409.03810 • Published 19 days ago • 30
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published 21 days ago • 85
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published 21 days ago • 27
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA Paper • 2409.02897 • Published 20 days ago • 43
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published 23 days ago • 26
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Paper • 2408.16725 • Published 26 days ago • 50
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published 28 days ago • 32
CSGO: Content-Style Composition in Text-to-Image Generation Paper • 2408.16766 • Published 26 days ago • 17
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper • 2408.16768 • Published 26 days ago • 26
CogVLM2 Collection This collection hosts the repos of the THUDM's CogVLM2 releases • 8 items • Updated Aug 18 • 17
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published 27 days ago • 56
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Paper • 2408.15664 • Published 28 days ago • 11
Qwen2-VL Collection Vision-language model series based on Qwen2 • 15 items • Updated 7 days ago • 119
Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation Paper • 2408.15991 • Published 27 days ago • 15
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published 28 days ago • 41
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published 29 days ago • 51
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published 29 days ago • 137
Video Generation models Collection The domain of video generation is booming. Here are the list of selected Open Access video generation (T2V) models. • 14 items • Updated 29 days ago • 12
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Paper • 2408.14176 • Published 30 days ago • 58