[ACL'24] Analysing the Impact of Sequence Composition on Language Model Pre-Training. https://github.com/yuzhaouoe/pretraining-data-packing
Yu Zhao
yuzhaouoe
AI & ML interests
NLP/ML
Organizations
Collections
1
models
9
yuzhaouoe/IntraDoc-2048
Text Generation
•
Updated
•
1.74k
yuzhaouoe/BM25Chunk-2048
Text Generation
•
Updated
•
1.77k
yuzhaouoe/MixChunk-2048
Text Generation
•
Updated
•
1.76k
yuzhaouoe/UniChunk-2048
Text Generation
•
Updated
•
1.76k
yuzhaouoe/MixChunk
Text Generation
•
Updated
•
1.02k
yuzhaouoe/UniChunk
Text Generation
•
Updated
•
1.03k
yuzhaouoe/IntraDoc
Text Generation
•
Updated
•
279
yuzhaouoe/BM25Chunk
Text Generation
•
Updated
•
275
yuzhaouoe/eval_data
Updated
datasets
None public yet