Post
2547
Myself (
@Steelskull
) and
@elinas
have been working on a new rendition of the Aethora-15B model, that's built on the Llama 3 architecture, and we've optimized it especially for creative writing tasks ( Both kinds ;D ) while maintaining strong general intelligence capabilities.
Model: L3-Aethora-15B-V2
ZeusLabs/L3-Aethora-15B-V2
Dataset: Aether-Lite-v1.8.1
TheSkullery/Aether-Lite-v1.8.1
What we've built:
A modified DUS (Depth Up Scale) model (originally created by Elinas) by using passthrough to create a 15b model, with specific adjustments (zeroing) to 'o_proj' and 'down_proj', enhancing its efficiency and reducing perplexity
Trained for 17.5 hours on 4 x A100 GPUs (huge thanks to g4rg for sponsoring the compute!)
Uses our Aether-Lite-V1.8.1 dataset with Large 125k high-quality samples
Focuses on creative writing and storytelling, with robust general intelligence
What makes L3-Aethora-15B v2 unique:
Creative Writing: We've really pushed its capabilities in generating engaging narratives, poetry, and adapting to various writing styles, RP and genres.
Versatile Intelligence: While we focused on creative tasks, it still handles scientific discussions, problem-solving, and educational content creation like a champ.
Long Context Understanding: Trained on the full sequence length of 8192 tokens, it maintains coherent conversations over extended interactions.
Carefully Curated Dataset: Alot of work was put into Aether-Lite-V1.8.1, our training dataset. It combines creative writing, instructional content, and specialized knowledge from various high-quality sources. All brought together by a custom data pipeline. (more information on the process is available on the dataset page)
Open Source: We've made both the model and the full dataset available to the community.
We'd love your ideas and recommendations for further improvements!
Model: L3-Aethora-15B-V2
ZeusLabs/L3-Aethora-15B-V2
Dataset: Aether-Lite-v1.8.1
TheSkullery/Aether-Lite-v1.8.1
What we've built:
A modified DUS (Depth Up Scale) model (originally created by Elinas) by using passthrough to create a 15b model, with specific adjustments (zeroing) to 'o_proj' and 'down_proj', enhancing its efficiency and reducing perplexity
Trained for 17.5 hours on 4 x A100 GPUs (huge thanks to g4rg for sponsoring the compute!)
Uses our Aether-Lite-V1.8.1 dataset with Large 125k high-quality samples
Focuses on creative writing and storytelling, with robust general intelligence
What makes L3-Aethora-15B v2 unique:
Creative Writing: We've really pushed its capabilities in generating engaging narratives, poetry, and adapting to various writing styles, RP and genres.
Versatile Intelligence: While we focused on creative tasks, it still handles scientific discussions, problem-solving, and educational content creation like a champ.
Long Context Understanding: Trained on the full sequence length of 8192 tokens, it maintains coherent conversations over extended interactions.
Carefully Curated Dataset: Alot of work was put into Aether-Lite-V1.8.1, our training dataset. It combines creative writing, instructional content, and specialized knowledge from various high-quality sources. All brought together by a custom data pipeline. (more information on the process is available on the dataset page)
Open Source: We've made both the model and the full dataset available to the community.
We'd love your ideas and recommendations for further improvements!