@qvac/diffusion-cpp
Text-to-image generation from text prompts.
Overview
Bare module that adds support for text-to-image generation in QVAC using qvac-ext-stable-diffusion.cpp as the inference engine.
Models
Supports SD1.x, SD2.x, SDXL, SD3/SD3.5, FLUX.1, and FLUX.2-klein model families.
FLUX.2-klein
Three separate components are required:
- Diffusion model (
flux-2-klein-4b-Q8_0.gguf) — the main image transformer. This GGUF has no SD metadata KV pairs so it must be loaded viadiffusion_model_pathinternally, notmodel_path. - Text encoder (
Qwen3-4B-Q4_K_M.gguf) — Qwen3 4B in standard GGML Q4_K_M format. - VAE (
flux2-vae.safetensors) — standard safetensors format, compatible as-is.
Model file reference — FLUX.2-klein 4B
| Role | File | Source |
|---|---|---|
| Diffusion model | flux-2-klein-4b-Q8_0.gguf | leejet/FLUX.2-klein-4B-GGUF |
| Text encoder | Qwen3-4B-Q4_K_M.gguf | unsloth/Qwen3-4B-GGUF |
| VAE | flux2-vae.safetensors | black-forest-labs/FLUX.2-klein-4B |
Stable Diffusion
- Stable Diffusion 1.x / 2.x — all-in-one checkpoint as a single
*.gguffile - Stable Diffusion XL — all-in-one
*.ggufor split CLIP encoders - Stable Diffusion 3 — safetensors with separate CLIP encoders
Requirements
- Memory: 16 GB unified memory on Apple Silicon, or 8 GB VRAM on GPU.
- Bare v1.24
Installation
npm i @qvac/diffusion-cppQuickstart
If you don't have Bare runtime, install it:
npm i -g bareCreate a new project:
mkdir qvac-diffusion-quickstart
cd qvac-diffusion-quickstart
npm init -yInstall dependencies:
npm i @qvac/diffusion-cpp bare-path bare-process bare-fsDownload the FLUX.2 [klein] 4B model files (~6.8 GB total):
mkdir -p models
curl -L -C - -o models/flux-2-klein-4b-Q8_0.gguf \
https://huggingface.co/leejet/FLUX.2-klein-4B-GGUF/resolve/main/flux-2-klein-4b-Q8_0.gguf
curl -L -C - -o models/Qwen3-4B-Q4_K_M.gguf \
https://huggingface.co/unsloth/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q4_K_M.gguf
curl -L -C - -o models/flux2-vae.safetensors \
https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/resolve/main/flux2-vae.safetensorsCreate index.js:
const path = require('bare-path')
const fs = require('bare-fs')
const process = require('bare-process')
const ImgStableDiffusion = require('@qvac/diffusion-cpp')
async function main () {
const MODELS_DIR = path.resolve(__dirname, './models')
const args = {
logger: console,
diskPath: MODELS_DIR,
modelName: 'flux-2-klein-4b-Q8_0.gguf',
llmModel: 'Qwen3-4B-Q4_K_M.gguf',
vaeModel: 'flux2-vae.safetensors'
}
const config = {
threads: 8
}
const model = new ImgStableDiffusion(args, config)
await model.load()
try {
const images = []
const response = await model.run({
prompt: 'a majestic red fox in a snowy forest, golden light, photorealistic',
steps: 20,
width: 512,
height: 512,
guidance: 3.5,
seed: 42
})
await response
.onUpdate(data => {
if (data instanceof Uint8Array) {
images.push(data)
} else if (typeof data === 'string') {
try {
const tick = JSON.parse(data)
if ('step' in tick) process.stdout.write(`\rStep ${tick.step}/${tick.total}`)
} catch (_) {}
}
})
.await()
console.log('\n')
if (images.length > 0) {
fs.writeFileSync('output.png', images[0])
console.log('Saved → output.png')
}
} catch (error) {
console.error('Error occurred:', error.message || error)
} finally {
await model.unload()
}
}
main().catch(error => {
console.error('Fatal error:', error.message)
process.exit(1)
})Run index.js:
bare index.jsUsage
1. Import the model class
const ImgStableDiffusion = require('@qvac/diffusion-cpp')2. Create the args object
const path = require('bare-path')
const MODELS_DIR = path.resolve(__dirname, './models')
const args = {
logger: console,
diskPath: MODELS_DIR,
modelName: 'flux-2-klein-4b-Q8_0.gguf',
llmModel: 'Qwen3-4B-Q4_K_M.gguf',
vaeModel: 'flux2-vae.safetensors'
}| Property | Required | Description |
|---|---|---|
diskPath | ✅ | Local directory where model files are already stored |
modelName | ✅ | Diffusion model file name (all-in-one for SD1.x/2.x; diffusion-only GGUF for FLUX.2) |
logger | — | Logger instance (e.g. console) |
clipLModel | — | Separate CLIP-L text encoder (FLUX.1 / SD3) |
clipGModel | — | Separate CLIP-G text encoder (SDXL / SD3) |
t5XxlModel | — | Separate T5-XXL text encoder (FLUX.1 / SD3) |
llmModel | — | Qwen3 LLM text encoder (FLUX.2 [klein]) |
vaeModel | — | Separate VAE file |
3. Create the config object
const config = {
threads: 8 // CPU threads for tensor operations (Metal handles GPU automatically)
}All config values are coerced to strings internally before being passed to the native layer.
| Parameter | Type | Default | Description |
|---|---|---|---|
threads | number | auto | Number of CPU threads for model loading and CPU ops |
type | 'f32' | 'f16' | 'q4_0' | 'q8_0' | … | auto | Override weight quantisation type |
rng | 'cpu' | 'cuda' | 'std_default' | 'cuda' | RNG backend ('cuda' = philox RNG — not GPU-specific despite the name; recommended) |
clip_on_cpu | true | false | false | Force CLIP encoder to run on CPU |
vae_on_cpu | true | false | false | Force VAE to run on CPU |
flash_attn | true | false | false | Enable flash attention (reduces memory) |
4. Create a model instance
const model = new ImgStableDiffusion(args, config)The constructor stores configuration only — no memory is allocated yet.
5. Load the Model
await model.load()This creates the native sd_ctx_t and loads all weights into memory. It can take 10–30 seconds depending on disk speed and model size. All model files must already be present on disk at diskPath.
6. Run Inference
The primary API. Returns a QvacResponse that streams step-progress ticks and the final PNG:
const images = []
const response = await model.run({
prompt: 'a majestic red fox in a snowy forest, golden light, photorealistic',
steps: 20,
width: 512,
height: 512,
guidance: 3.5,
seed: 42
})
await response
.onUpdate(data => {
if (data instanceof Uint8Array) {
images.push(data)
} else if (typeof data === 'string') {
try {
const tick = JSON.parse(data)
if ('step' in tick) process.stdout.write(`\rStep ${tick.step}/${tick.total}`)
} catch (_) {}
}
})
.await()
require('bare-fs').writeFileSync('output.png', images[0])Generation parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | string | — | Text prompt |
negative_prompt | string | '' | Things to avoid in the output |
width | number | 512 | Output width in pixels (multiple of 8) |
height | number | 512 | Output height in pixels (multiple of 8) |
steps | number | 20 | Number of diffusion steps |
guidance | number | 3.5 | Distilled guidance scale (FLUX.2) |
cfg_scale | number | 7.0 | Classifier-free guidance scale (SD1.x / SD2.x) |
sampling_method | string | auto | Sampler name; auto-selects euler for FLUX.2, euler_a for SD1.x |
scheduler | string | auto | Scheduler; auto-selected per model family |
seed | number | -1 | Random seed (-1 for random) |
batch_count | number | 1 | Number of images to generate |
vae_tiling | boolean | false | Enable VAE tiling (required for large images on 16 GB) |
cache_preset | string | — | Step-caching preset: slow, medium, fast, ultra |
Do not set sampling_method: 'euler_a' for FLUX.2 models — it will produce random noise. Leave the field unset to let the library auto-select euler for flow-matching models.
7. Release Resources
await model.unload()unload() calls free_sd_ctx which releases all GPU and CPU memory. The JS object can be safely garbage collected afterwards.