Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.01826

Papers - Multilingual - Japanese

RakutenAI-7B: Extending Large Language Models for Japanese

Paper • 2403.15484 • Published Mar 21 • 12
One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 31

Papers - Multilingual

A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

Paper • 2304.08999 • Published Apr 18, 2023 • 2
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 82
Robust Open-Vocabulary Translation from Visual Text Representations

Paper • 2104.08211 • Published Apr 16, 2021 • 1
Poro 34B and the Blessing of Multilinguality

Paper • 2404.01856 • Published Apr 2 • 12

Scaling MLPs: A Tale of Inductive Bias

Paper • 2306.13575 • Published Jun 23, 2023 • 14
Trap of Feature Diversity in the Learning of MLPs

Paper • 2112.00980 • Published Dec 2, 2021 • 1
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics

Paper • 2301.05816 • Published Jan 14, 2023 • 1
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

Paper • 2108.04384 • Published Aug 9, 2021 • 1

Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 40
SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks

Paper • 2309.00255 • Published Sep 1, 2023 • 1
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)

Paper • 2309.08968 • Published Sep 16, 2023 • 22
Matryoshka Representation Learning

Paper • 2205.13147 • Published May 26, 2022 • 8

One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 31
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published 22 days ago • 75

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82
Baichuan 2: Open Large-scale Language Models

Paper • 2309.10305 • Published Sep 19, 2023 • 18
Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 38
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 64

One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 31
Gated recurrent neural networks discover attention

Paper • 2309.01775 • Published Sep 4, 2023 • 7
FLM-101B: An Open LLM and How to Train It with $100K Budget

Paper • 2309.03852 • Published Sep 7, 2023 • 43
Large Language Models as Optimizers

Paper • 2309.03409 • Published Sep 7, 2023 • 75

Model Efficiency

One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 31

Training a large language model from scratch for 1$ on LambdaLabs

TheBirdLegacy/FreeLoaderLM

Text Generation • Updated Sep 9, 2023
CofeAI/FLM-101B

Text Generation • Updated Sep 18, 2023 • 15 • 92
FLM-101B: An Open LLM and How to Train It with $100K Budget

Paper • 2309.03852 • Published Sep 7, 2023 • 43
Composable Function-preserving Expansions for Transformer Architectures

Paper • 2308.06103 • Published Aug 11, 2023 • 19

Large Language Models as Optimizers

Paper • 2309.03409 • Published Sep 7, 2023 • 75
One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 31
Self-Alignment with Instruction Backtranslation

Paper • 2308.06259 • Published Aug 11, 2023 • 40
Shepherd: A Critic for Language Model Generation

Paper • 2308.04592 • Published Aug 8, 2023 • 29

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs