nbroad
's Collections
attention and long context
updated
Efficient Streaming Language Models with Attention Sinks
Paper
•
2309.17453
•
Published
•
13
Effective Long-Context Scaling of Foundation Models
Paper
•
2309.16039
•
Published
•
30
allenai/longformer-base-4096
Updated
•
4.59M
•
161
google/bigbird-roberta-base
Updated
•
59.6k
•
47
uw-madison/yoso-4096
Fill-Mask
•
Updated
•
604
Yukang/Llama-2-7b-longlora-100k-ft
Text Generation
•
Updated
•
660
•
51
mosaicml/mpt-7b-storywriter
Text Generation
•
Updated
•
2.52k
•
814
allenai/led-base-16384
Text2Text Generation
•
Updated
•
19k
•
40
RRWKV: Capturing Long-range Dependencies in RWKV
Paper
•
2306.05176
•
Published
Retentive Network: A Successor to Transformer for Large Language Models
Paper
•
2307.08621
•
Published
•
170
Hyena Hierarchy: Towards Larger Convolutional Language Models
Paper
•
2302.10866
•
Published
•
6
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution
Paper
•
2306.15794
•
Published
•
17
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Paper
•
2212.14052
•
Published
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper
•
2310.01889
•
Published
•
9
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
•
2309.12307
•
Published
•
86
CoLT5: Faster Long-Range Transformers with Conditional Computation
Paper
•
2303.09752
•
Published
•
2
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Paper
•
2112.07916
•
Published
•
2
Investigating Efficiently Extending Transformers for Long Input
Summarization
Paper
•
2208.04347
•
Published
Train Short, Test Long: Attention with Linear Biases Enables Input
Length Extrapolation
Paper
•
2108.12409
•
Published
•
5
amazon/MistralLite
Text Generation
•
Updated
•
3.28k
•
427
NousResearch/Yarn-Mistral-7b-128k
Text Generation
•
Updated
•
19.9k
•
570
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
65
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
110