Plan
Your subscription, GPU allocation, and upgrade options.
Tier 2 — Managed Inference Charter Rate locked forever
Professional-Grade GPU · 20 GB VRAM · Houston, TX · 12-month term · ROS-01-I
$999/mo
Renews Jun 14, 2026 · $0 token fees
What's Included
Base · Hardware
✓ Dedicated GPU — yours alone, no neighbors ✓ IPMI / out-of-band access ✓ OpenAI-compatible endpoint ✓ Your data never leaves Houston TX ✓ Swap API key, drop in replacement ✓ 20 GB VRAM, 1 TB data volume
Tier 1 · Managed Infrastructure
✓ OS lifecycle (Ubuntu 24.04 LTS) ✓ NVIDIA driver + CUDA, version-pinned ✓ 24/7 NOC — hardware alerts & response ✓ PSU, GPU thermal, ECC error monitoring ✓ Daily incremental + weekly full backup ✓ 4-hr on-site break/fix (business hours)
Tier 2 · Managed Inference Platform — Your current tier
✓ vLLM deployed, configured & tuned ✓ Model deployment automation (HF ID or S3) ✓ AWQ/GPTQ/FP8 quantization pipeline ✓ API-key auth + per-key token accounting ✓ Traffic balancing — no agent bottlenecks ✓ P95 TTFT SLO · model rollback on error
Upgrade Options
Regulated Inference
Tier 2 + full compliance stack. HIPAA BAA, SOC 2, audit logs. Required for healthcare, finance, and legal workloads.
✓ HIPAA BAA executed & active ✓ SOC 2 Type II report inheritance ✓ 7-year tamper-evident audit log ✓ Dedicated VLAN + US-persons-only support ✓ Annual third-party pen test report
+$999/mo
Upgrade to Tier 3
Full AI infrastructure partnership. Named TAM, fine-tuning, RAG ops, and quarterly model reviews.
✓ Quarterly model selection review ✓ LoRA/QLoRA fine-tuning support ✓ RAG stack deployed & operated (pgvector / Qdrant) ✓ Drift detection + eval harness ✓ Named TAM + monthly business review
+$1,499/mo
Add GPU
A second dedicated GPU at the same Charter rate. Run 30B+ models or isolate prod and dev inference.
✓ Same professional-grade GPU hardware ✓ Assigned a permanent physical ID — ROS-01-II ✓ Charter rate locked — $999/mo ✓ Runs independently or load-balanced ✓ Can run 30B+ AWQ with 2× 20 GB VRAM
$999/mo per GPU
Billing History
PeriodPlanAmountStatus
May 2026Tier 2 — Managed Inference$999Due Jun 1
Apr 2026Tier 2 — Managed Inference$999Paid
Mar 2026Tier 2 — Managed Inference$999Paid
Overview
May 2026 — your savings at a glance.
This Month — May 2026
Your plan Tier 2 · $999/mo
Anthropic equivalent $3,360
You saved $2,361 70%

560M tokens · 420M input × $3.00/M + 140M output × $15.00/M = $3,360 Anthropic equivalent

Your flat monthly plan: $999 · Savings: $2,361 (70%)

Energy arbitrage is priced into your plan rate — no separate power bill. Also vs. GPT-4o: $2,155 saved

Your Plan
Tier 2 — Managed Inference
Professional-Grade GPU · 20 GB VRAM · Houston TX · ROS-01-I
Charter Rate locked $0 token fees
Model slot capacity (20 GB VRAM)
7B class modelsup to 2 concurrent
14B AWQ models1 at a time
30B+ modelsrequires upgrade
Cost per Million Tokens
$1.61/M
Active Agents
14
↑2 from last week
API Calls This Month
18,432
↑18% WoW
GPU Allocation Used
23% of 1 GPU
77% free
Rate Limit Incidents
0
GPU Status
Operational
24-Hour Token Activity Refreshes every 5 min
0:003:006:009:0012:0015:0018:0021:00
Usage
Token consumption, spend, and agent activity.
Effective Rate This Month
$1.61/M tokens
vs. $6.00/M on Anthropic (blended)
Plan Fee This Month
$999
560M tokens · 18,432 API calls · $0 token fees
Anthropic Equivalent
$3,360
$2,361 saved this month
Daily Token Usage — May 2026
May 1May 7May 14May 21May 28
By Model
ModelTokensCostRatevs. AnthropicSavings
Qwen2.5-14B-Instruct-AWQ448M$734$1.64/M$3,136$2,402
Qwen2.5-7B-Instruct-AWQ112M$169$1.51/M$784$615
By Agent
Agent IDTokensCallsCostRate
agent_01JX4K2P…210M7,200$340$1.62/M
agent_01JX3M8R…168M5,541$272$1.62/M
agent_01JX1Q9T…112M3,686$181$1.62/M
agent_01JWZN5V…70M2,005$110$1.57/M

Rolling 30-day window. Month-to-date totals on "This Month" align with your billing cycle.

By Model — Last 30 Days
ModelTokensCostRateSavings vs. Anthropic
Qwen2.5-14B-Instruct-AWQ512M$838$1.64/M$2,746
Billing History
MonthTokensEffective RatePlan FeeAnthropic Equiv.Savings
May 2026560M$1.61/M$999$3,360$2,361
Apr 2026498M$1.64/M$999$2,934$1,935
Mar 2026445M$1.64/M$999$2,620$1,621

Plan fee is flat $999/mo — $0 per-token fees on all calls regardless of volume.

Models
Models running on your GPU and those available to deploy.
Running Now
Qwen/Qwen2.5-14B-Instruct-AWQ LIVE
14B params  ·  AWQ 4-bit  ·  General purpose
VRAM
18.4 / 20 GB
Model Library — Available to Deploy
Qwen/Qwen2.5-7B-Instruct-AWQ
7B paramsAWQ 4-bit~8GB VRAMFast · General purpose
Available
mistralai/Mistral-7B-Instruct-v0.3
7B paramsFP16 / AWQ~7GB VRAMFast · Instruction following
Available
meta-llama/Llama-3.1-8B-Instruct
8B paramsAWQ 4-bit~8GB VRAMFast · Reasoning
Available
microsoft/Phi-4
14B paramsAWQ 4-bit~10GB VRAMEfficient · STEM / code
Available
google/gemma-2-9b-it
9B paramsAWQ 4-bit~9GB VRAMFast · Multilingual
Available
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
7B paramsAWQ 4-bit~8GB VRAMReasoning · Chain-of-thought
Available
Qwen/Qwen2.5-14B-Instruct-AWQ
14B paramsAWQ 4-bit~18.4GB VRAMGeneral purpose · Currently live
Running
Don't see your model?
Request a custom model
Any model on HuggingFace. We'll evaluate VRAM fit and deploy within 4 business hours if it's compatible with your GPU allocation.
API Keys
Manage keys for authenticating to the BotINFRA API.
Your Keys
NameKeyCreatedLast UsedCalls / Month
Productionbi_live_••••••••Jan 14, 2025May 8, 202614,210
Stagingbi_live_••••••••Mar 3, 2026May 7, 20264,222
Create New Key
Status
Is BotINFRA working for you right now?
All systems operational Last checked: May 8, 2026 at 14:32 UTC
Uptime This Month
99.98%
7 min downtime
Rate Limit Incidents
0
No throttled requests
Scheduled Maintenance
None
No planned outages
Response Time SLA
ModelP95 CommitmentThis Month (avg)Status
Qwen2.5-14B (current)500 ms first tokenMonitoring pendingComing soon
7B class models300 ms first tokenMonitoring pendingComing soon

Latency instrumentation is in development. SLA credits apply retroactively once active.

Incident History — Last 90 Days
DateDescriptionDurationStatus
May 1, 2026Scheduled GPU driver update7 minResolved
No other incidents in the past 90 days.
Compliance
Your data, your control. Proof for legal and IT.
Demo:
All 5 controls active · $999/mo Regulated Inference
🏥
HIPAA BAA
Active
Business Associate Agreement signed Jan 14, 2025. Your agents can process protected health information (PHI).
🔒
SOC 2 Type II
Compliant
Last audited March 2026. Report available on request. Covers security, availability, and confidentiality.
📍
Data Residency
Confirmed
All inference runs on hardware in Houston, TX, USA. Your prompts and outputs never leave this facility. Included on all plans.
🗑️
Zero Data Retention
Active
Inference requests and responses are not stored or logged for model training. No prompts retained after the API call completes. Included on all plans.
📋
Audit Log Retention
7-Year Retention
Every inference call logged with timestamp, agent ID, model, and token count. Tamper-evident storage. Downloadable for legal review.
🔑
Encryption & BYOK
Active
Encryption at rest verified. Customer-managed keys (BYOK) available. US-persons-only support staffing, background-checked.
🛡️
Annual Pen Test
Last: Feb 2026
Annual third-party penetration test. Report available to your security team on request. Next test: Feb 2027.
Audit Log
TimestampAgent IDModelInput TokensOutput Tokens
2026-05-08 14:31:02Zagent_01JX4K2PQwen2.5-14B-AWQ4,218832
2026-05-08 14:29:44Zagent_01JX3M8RQwen2.5-14B-AWQ2,104641
2026-05-08 14:28:11Zagent_01JX4K2PQwen2.5-14B-AWQ6,5401,204
2026-05-08 14:25:58Zagent_01JX1Q9TQwen2.5-14B-AWQ1,882490
2026-05-08 14:22:30Zagent_01JWZN5VQwen2.5-14B-AWQ3,310720

Showing 5 most recent. Export for full log — retained 7 years.

Agents
Live health board — status, errors, idle detection, and spend attribution across your fleet.
Total Agents
14
↑2 from last week
Running Now
12
Idle (>2h)
2
Consider pausing
Avg Error Rate
0.2%
Well within SLO
Calls Today
2,841
↑22% vs yesterday
Cost Today
$34.21
On pace: $1,027/mo
⚠️
2 agents haven't made a call in over 2 hours. Doc Summarizer and Report Builder may be stuck or complete. Review and pause if unneeded.
Agent Fleet — Live Refreshes every 30s
Agent Role Status Last Call Calls Today Error Rate Cost Today WoW Trend
agent_01JX4K2P Outreach Bot Running 2 min ago 2,841 0.0% $14.20 ↑12%
agent_01JX3M8R Research Bot Running 5 min ago 1,892 0.4% $9.18 ↑8%
agent_01JX1Q9T Data Extractor Running 8 min ago 1,210 0.0% $5.92 ↑22%
agent_01JW8K4R Support Triage Running 1 min ago 1,544 0.8% $7.21 ↑5%
agent_01JWZN5V Lead Scorer Running 12 min ago 764 0.0% $3.44 ↓3%
agent_01JW2P4M Intake Classifier Running 3 min ago 488 0.0% $2.14 ↑34%
agent_01JV9K8T Proposal Writer Running 18 min ago 312 0.0% $1.48 ↑17%
agent_01JV5M2T Doc Summarizer Idle 3h 3 hr ago 48 0.0% $0.21 ↓82%
agent_01JV3N8P Report Builder Idle 5h 5 hr ago 12 0.0% $0.05 ↓91%

Showing 9 of 14 active agents. All agents share the flat $999/mo plan — cost attribution is for internal tracking only.

Spend Attribution — This Month
Cost by Agent — May 2026
AgentRoleTokensCalls% of FleetAttributed Cost
agent_01JX4K2POutreach Bot210M7,20037.5%$374
agent_01JX3M8RResearch Bot168M5,54130.0%$300
agent_01JX1Q9TData Extractor112M3,68620.0%$200
agent_01JWZN5VLead Scorer70M2,00512.5%$125

Attributed cost = agent's % of total tokens × $999 plan fee. Useful for internal chargeback or ROI analysis.

Impact
The environmental and research contributions your compute is making.
CO₂e Avoided This Month
19.7
metric tons
vs. drawing equivalent power from ERCOT grid

51,100 kWh × 0.386 kg CO₂e/kWh ÷ 1,000 = 19.72 MT

Source: EPA eGRID 2023, Texas ERCT subregion

Equivalent to removing ~4.3 cars from the road for one year

Lifetime avoided: 118 MT

Health Research Funded This Month
$1,050
contributed
to the American Open-Source Health Research Endowment

35% of your subscription above BotINFRA's operating threshold

This month: $1,050

Lifetime total: $8,925

Funds subsidized compute for American biomedical AI researchers — open-source, publicly accessible results.

Monthly CO₂e Avoided (MT)
JanFebMarAprMay
Research Funding — Lifetime Allocation
Research AreaInstitutionFunded
Cancer genomics computeMD Anderson Cancer Center$2,850
Drug discovery LLMsBaylor College of Medicine$2,310
Radiology AI trainingUTHealth Houston$1,890
Rare disease classificationTexas Children's Hospital$1,050
General endowment reserveAOHRE$825

Allocations are made quarterly by the AOHRE board. All funded research is published open-access.