Llama
Self-hosted document generation with Meta Llama and Carbone, never leaves your servers
Self-hosted document generation with Llama and Carbone
Meta Llama gives you the rare combination of frontier-class capability and open weights you can run anywhere. Carbone gives you the same control on the document side, with Carbone Cloud hosted in the EU or Carbone On-Premise running inside your own VPC. Pair Llama with Carbone On-Premise and the entire pipeline from prompt to branded PDF stays on infrastructure you control — no third party ever sees the prompt, the data, or the document.
Why teams pair Llama with Carbone:
- Truly sovereign pipeline. Run Llama 3.3, Llama 3.2, or Code Llama on your own GPUs through vLLM, SGLang, Ollama, or llama.cpp. Pair with Carbone On-Premise inside the same VPC and the document content never crosses a network boundary you do not own.
- Predictable cost at scale. Token cost on a self-hosted Llama is just your GPU cost, so generating ten thousand contracts costs the same as generating ten. Carbone bills a predictable subscription — a document quota, usage-based, or unlimited — never per page or per token, so high-volume workflows stay predictable.
- Templates designed in Word, Excel, or PowerPoint. Your legal, finance, or operations team owns the template. Llama writes the data structure, Carbone fills it in. No proprietary editor.
- 100+ output formats. PDF, DOCX, XLSX, PPTX, ODT, ODS, CSV, HTML, Markdown, JPEG, PNG. Real editable Office files, not screenshots embedded in a DOCX wrapper.
- Strong code generation with Code Llama. Code Llama is purpose-built for code authoring, which makes it an excellent partner for the Carbone Skill: paste an existing Docxtemplater, Jinja, Handlebars, JasperReports, Aspose, or Conga Composer template, Code Llama rewrites it as clean Carbone tags.
- Battle-tested for 7+ years. Carbone powers production document workflows in finance, legal, healthcare, and the public sector.
Install the Carbone Skill Set up the Carbone MCP Deploy Carbone On-Premise

Llama document generation workflows: contracts, reports, and regulated filings
- Regulated finance and insurance. Banks, insurers, and asset managers that have already selected Llama for data-residency reasons can finally finish the loop. Carbone On-Premise generates the credit memo, the KYC pack, the fund factsheet, or the policy schedule entirely inside your VPC.
- Healthcare and public sector. Patient discharge summaries, lab reports, administrative forms, and benefit letters generated from Llama plus Carbone, with no data ever leaving the hospital network or the agency cluster.
- Defence and aerospace primes. Air-gapped Llama deployments need an air-gapped document layer. Carbone On-Premise runs in the same isolated environment and emits the technical reports, configuration items, and acceptance protocols in DOCX or PDF.
- High-volume contract automation. Sales orders, NDAs, service agreements, leases. One Carbone template and a Llama instance on a modest GPU rig produce thousands of personalised, branded contracts per hour at flat infrastructure cost.
- Migration off legacy template engines. Code Llama plus the Carbone Skill is a focused migration engine for porting templates from Docxtemplater, Jinja, Handlebars, JasperReports, Apache POI, Aspose, Conga Composer, PDF Butler, or OpenText Exstream.

Llama to PDF: turn HTML or Markdown into a polished document
Llama's coding variants, including Code Llama and Llama 3.3 in code mode, are strong HTML and Markdown authors, and Carbone renders both as native template formats. Ask one to draft an investor report, a resume, or a product datasheet in HTML/CSS or Markdown, then hand the result to Carbone: one API call returns a print-ready PDF (or DOCX, ODT) with page breaks, headers, footers, loops, conditions, and your brand styles.
Reach for this pattern when you don't need a pre-existing Office template — one-off reports, marketing collateral, or product brochures.
See the HTML templates and Markdown templates guides for the syntax and best practices.

Where AI-native file generation falls short
Letting an AI build the whole file is great for a one-off draft. For production document workflows, three things break, and Carbone fixes each:
✗ Not repeatable. The same prompt yields a different file each run, with layout, fonts, and branding drifting from page to page, so testing, versioning, and automation fall apart.
✓ Carbone renders the same document every time from one template and one JSON payload.
✗ Fragile, locked-down files. AI output can look fine yet refuse to open cleanly, break its own formulas, or flatten tables and text into images you cannot edit.
✓ Carbone fills a real Office template instead of writing the file from scratch, so every result is structurally valid and fully editable.
✗ Expensive at scale. Generating full documents burns tens of thousands of tokens each, and a small edit regenerates large sections.
✓ Carbone uses predictable pricing instead — a fixed document quota, usage-based billing, or a flat rate for unlimited rendering.
Bottom line: Llama helps you draft the template and the wording; Carbone handles the part that has to be exact — merging your data to produce the same document every time.

Carbone Skill and MCP: render documents from a self-hosted Llama
The Carbone Skill is the knowledge layer. Loaded into your Llama agent, it teaches the model the full Carbone templating language: tag syntax, formatters, conditions, loops, :set patterns, HTML and Markdown templates, and best practices. Llama then writes correct Carbone templates on the first try.
The Carbone MCP server is the action layer — a document generation MCP for AI agents. Any MCP-capable Llama client, including Continue, Cline, Aider, Roo, or your own OpenAI-compatible wrapper, can call Carbone directly to upload templates, render documents, convert formats, and manage your template storage.
Use the Skill on its own to design and fix templates. Add the MCP when you want Llama to render the finished document end-to-end, entirely inside your infrastructure.
Learn more about the Carbone Skill Learn more about the Carbone MCP

Document automation at scale
Once your template is ready, plug Carbone into n8n, Make, Zapier, Salesforce, HubSpot, Odoo, or any of our other integrations. Build the template once with Llama, generate documents at scale, automatically and on your own infrastructure.

Carbone works with every AI assistant
Explore the dedicated pages for Claude, ChatGPT, Microsoft Copilot, Vibe by Mistral, Google Gemini, and DeepSeek.

Trusted by 800+ paid customers in 40+ countries














