Minimum gpu ram capacity

#77
by bob-sj - opened

My laptop GPU is RTX 3070 Ti Laptop.
When I tried to run model. I got error killed. 50% of the time it progresses and then stops. What is the minimum capacity?

  1. If you want to run the model in 4-bit quantization it should need 6GB of GPU.
  2. If you want to fine-tune the model in 4bit quantization you should need at least 15GB GPU.
  3. if you want to run the full model you should need at least 16GB GPU.

I'm not sure which model you're running, I'll assume it is 3.1 8B instruct, because this is the community for that.
I haven't set it up a lap top of any kind,
but I have set it up on a windows 10 pc using a geforce GT 1030 GPU with 2 GB of GDDR, and I've set it up on Fedora server/ Fedora Workstation/Linux Mint Cinnamon/ Ubuntu with the same hardware.
It maybe too late to suggest, it's been 27 days, but before you assume you don't have enough memory --- from the specs I can find on the for your computer, it has 8 GB GDDR6,,, way more than I had -- and before you run it in 4-bit quantazation, you should try running on the cpu.
I also don't know what script you're running but whereever you can find the parameter 'device=' or 'device_mapping' change that value to 'cpu' instead of 'auto' or 'cuda', then try running the script and tell me what you see.

I am running on 3 GPUs, each of 12GB , but still getting the out-of-memory error.
"CUDA out of memory"

Did you try export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True in your virtual environment or the environment you're running it in?

I think I've finally resolved the issue on my ubuntu servers. I have one GPU and I've been setting device_map='cuda' and it runs for a while as long as I keep the inferences simple, but anything more than 1400 characters long will crash the session with torch cuda out of memory errors: it always fails at "logits = logits.float()"
I finally set my device_map value to auto and now torch is using the cpu and systems memory along with the gpu and gpu memory is stead at 87% while it is processing input. GPU is 95% CPU 101% memory remains at 6.9 GB for systems ram.
I don't know if its a fluke. I will update if it ever crashes again.

I spoke too soon. I already had logic in place to keep the session input to at or below 1400 characters in the script I was running. Adding the cpu just increased the threshold of characters to prevent a torch out of memory error from crashing the script. Never the less, it is a significant increase from 1400 characters to at around 10675. Anything over that amount of characters crashes the process with a torch out of memory error.
So, it sort of fixes the issue.

Sign up or log in to comment