New best story on Hacker News: Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

February 22, 2026

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU
358 by xaskasdf | 93 comments on Hacker News.
Hi everyone, I'm kinda involved in some retrogaming and with some experiments I ran into the following question: "It would be possible to run transformer models bypassing the cpu/ram, connecting the gpu to the nvme?" This is the result of that question itself and some weekend vibecoding (it has the linked library repository in the readme as well), it seems to work, even on consumer gpus, it should work better on professional ones tho

New York Times

New York Times

New best story on Hacker News: Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

Post a Comment

0 Comments

Popular Posts

New best story on Hacker News: Bonsai 27B: A 27B-Class model that runs on a phone

New best story on Hacker News: How to stop Claude from saying load-bearing

New best story on Hacker News: Microsoft Comic Chat is now open source

Categories

Tags

Random Post

Labels

Popular Posts

New best story on Hacker News: Bonsai 27B: A 27B-Class model that runs on a phone

New best story on Hacker News: How to stop Claude from saying load-bearing

New best story on Hacker News: Microsoft Comic Chat is now open source

Menu Footer Widget

New best story on Hacker News: Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

You may like these posts

Post a Comment

0 Comments

Social Plugin

Popular Posts

Categories

Tags

Random Post

Labels

Popular Posts

Menu Footer Widget