🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
BEEspoke Data
community
AI & ML interests
'an LLM is only as good as the dataset it was trained on' - Sun Tzu
Organization Card
🐝📊💁
Collections
7
spaces
1
models
49
BEE-spoke-data/tFINE-900m-instruct-orpo
Text2Text Generation
•
Updated
•
11
BEE-spoke-data/smol_llama-220M-openhermes
Text Generation
•
Updated
•
641
•
5
BEE-spoke-data/tFINE-900m-e16-d32-instruct
Text2Text Generation
•
Updated
•
56
BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e
Text2Text Generation
•
Updated
•
6
BEE-spoke-data/tFINE-900m-e16-d32-flan
Text2Text Generation
•
Updated
•
62
BEE-spoke-data/slimpajama_tok-48128-BPE-forT5
Updated
BEE-spoke-data/claude-tokenizer-forT5
Updated
BEE-spoke-data/Meta-Llama-3-8Bee
Text Generation
•
Updated
•
39
BEE-spoke-data/MiniTokenizer-20480
Updated
BEE-spoke-data/BeeTokenizer
Updated
•
1
datasets
63
BEE-spoke-data/synthsumm-comparisons
Viewer
•
Updated
•
4.67k
BEE-spoke-data/fineweb-cinema-100k
Viewer
•
Updated
•
100k
BEE-spoke-data/aimodels.fyi-papers
Viewer
•
Updated
•
14.8k
BEE-spoke-data/smollm-corpus-python
Viewer
•
Updated
•
12.4M
•
149
BEE-spoke-data/flan-v2-hf
Viewer
•
Updated
•
819M
•
7
BEE-spoke-data/the-stack-smol-xs-all
Viewer
•
Updated
•
8.7k
•
4
BEE-spoke-data/the-stack-smol-xs-scored-and-annotated-python
Viewer
•
Updated
•
100
•
2
BEE-spoke-data/upvoteweb-posts
Viewer
•
Updated
•
45.9M
•
8
BEE-spoke-data/napierone-pdf-raw
Viewer
•
Updated
•
18.5k
•
5
BEE-spoke-data/fineweb-1000_64k
Viewer
•
Updated
•
2k
•
15
•
2