✓ Dedicated GPU — yours alone, no neighbors✓ IPMI / out-of-band access✓ OpenAI-compatible endpoint✓ Your data never leaves Houston TX✓ Swap API key, drop in replacement✓ 20 GB VRAM, 1 TB data volume
Tier 2 · Managed Inference Platform — Your current tier
✓ vLLM deployed, configured & tuned✓ Model deployment automation (HF ID or S3)✓ AWQ/GPTQ/FP8 quantization pipeline✓ API-key auth + per-key token accounting✓ Traffic balancing — no agent bottlenecks✓ P95 TTFT SLO · model rollback on error
Upgrade Options
Regulated Inference
Tier 2 + full compliance stack. HIPAA BAA, SOC 2, audit logs. Required for healthcare, finance, and legal workloads.
✓ HIPAA BAA executed & active✓ SOC 2 Type II report inheritance✓ 7-year tamper-evident audit log✓ Dedicated VLAN + US-persons-only support✓ Annual third-party pen test report
+$999/mo
Upgrade to Tier 3
Full AI infrastructure partnership. Named TAM, fine-tuning, RAG ops, and quarterly model reviews.
✓ Quarterly model selection review✓ LoRA/QLoRA fine-tuning support✓ RAG stack deployed & operated (pgvector / Qdrant)✓ Drift detection + eval harness✓ Named TAM + monthly business review
+$1,499/mo
Add GPU
A second dedicated GPU at the same Charter rate. Run 30B+ models or isolate prod and dev inference.
✓ Same professional-grade GPU hardware✓ Assigned a permanent physical ID — ROS-01-II✓ Charter rate locked — $999/mo✓ Runs independently or load-balanced✓ Can run 30B+ AWQ with 2× 20 GB VRAM
14B paramsAWQ 4-bit~18.4GB VRAMGeneral purpose · Currently live
Running
Request sent for . Deployment completes within 4 business hours.
Don't see your model?
Request a custom model
Any model on HuggingFace. We'll evaluate VRAM fit and deploy within 4 business hours if it's compatible with your GPU allocation.
Request received
We'll check VRAM compatibility and reply within 4 business hours. If the model doesn't fit your current GPU allocation, we'll let you know what options are available.
API Keys
Manage keys for authenticating to the BotINFRA API.
Key created — copy it now. It will not be shown again.
Your Keys
Name
Key
Created
Last Used
Calls / Month
Production
bi_live_••••••••
Jan 14, 2025
May 8, 2026
14,210
Staging
bi_live_••••••••
Mar 3, 2026
May 7, 2026
4,222
Create New Key
Status
Is BotINFRA working for you right now?
All systems operationalLast checked: May 8, 2026 at 14:32 UTC
Uptime This Month
99.98%
7 min downtime
Rate Limit Incidents
0
No throttled requests
Scheduled Maintenance
None
No planned outages
Response Time SLA
Model
P95 Commitment
This Month (avg)
Status
Qwen2.5-14B (current)
500 ms first token
Monitoring pending
Coming soon
7B class models
300 ms first token
Monitoring pending
Coming soon
Latency instrumentation is in development. SLA credits apply retroactively once active.
Incident History — Last 90 Days
Date
Description
Duration
Status
May 1, 2026
Scheduled GPU driver update
7 min
Resolved
No other incidents in the past 90 days.
Compliance
Your data, your control. Proof for legal and IT.
Demo:
All 5 controls active · $999/mo Regulated Inference
Compliance add-on not active
HIPAA BAA, SOC 2 inheritance, audit logs, dedicated VLAN, and BYOK are all available. Required for healthcare, legal, and financial workloads.
🏥
HIPAA BAA
Active
Business Associate Agreement signed Jan 14, 2025. Your agents can process protected health information (PHI).
🔒
SOC 2 Type II
Compliant
Last audited March 2026. Report available on request. Covers security, availability, and confidentiality.
📍
Data Residency
Confirmed
All inference runs on hardware in Houston, TX, USA. Your prompts and outputs never leave this facility. Included on all plans.
🗑️
Zero Data Retention
Active
Inference requests and responses are not stored or logged for model training. No prompts retained after the API call completes. Included on all plans.
📋
Audit Log Retention
7-Year Retention
Every inference call logged with timestamp, agent ID, model, and token count. Tamper-evident storage. Downloadable for legal review.
🔑
Encryption & BYOK
Active
Encryption at rest verified. Customer-managed keys (BYOK) available. US-persons-only support staffing, background-checked.
🛡️
Annual Pen Test
Last: Feb 2026
Annual third-party penetration test. Report available to your security team on request. Next test: Feb 2027.
Audit Log
Timestamp
Agent ID
Model
Input Tokens
Output Tokens
2026-05-08 14:31:02Z
agent_01JX4K2P
Qwen2.5-14B-AWQ
4,218
832
2026-05-08 14:29:44Z
agent_01JX3M8R
Qwen2.5-14B-AWQ
2,104
641
2026-05-08 14:28:11Z
agent_01JX4K2P
Qwen2.5-14B-AWQ
6,540
1,204
2026-05-08 14:25:58Z
agent_01JX1Q9T
Qwen2.5-14B-AWQ
1,882
490
2026-05-08 14:22:30Z
agent_01JWZN5V
Qwen2.5-14B-AWQ
3,310
720
Showing 5 most recent. Export for full log — retained 7 years.
Audit Log
🔒
Audit log requires the Compliance Add-On
Every inference call — agent ID, model, timestamps, token counts — logged and retained for 7 years.
Agents
Live health board — status, errors, idle detection, and spend attribution across your fleet.
Total Agents
14
↑2 from last week
Running Now
12
Idle (>2h)
2
Consider pausing
Avg Error Rate
0.2%
Well within SLO
Calls Today
2,841
↑22% vs yesterday
Cost Today
$34.21
On pace: $1,027/mo
⚠️
2 agents haven't made a call in over 2 hours.Doc Summarizer and Report Builder may be stuck or complete. Review and pause if unneeded.
Agent Fleet — LiveRefreshes every 30s
Agent
Role
Status
Last Call
Calls Today
Error Rate
Cost Today
WoW Trend
agent_01JX4K2P
Outreach Bot
Running
2 min ago
2,841
0.0%
$14.20
↑12%
agent_01JX3M8R
Research Bot
Running
5 min ago
1,892
0.4%
$9.18
↑8%
agent_01JX1Q9T
Data Extractor
Running
8 min ago
1,210
0.0%
$5.92
↑22%
agent_01JW8K4R
Support Triage
Running
1 min ago
1,544
0.8%↑
$7.21
↑5%
agent_01JWZN5V
Lead Scorer
Running
12 min ago
764
0.0%
$3.44
↓3%
agent_01JW2P4M
Intake Classifier
Running
3 min ago
488
0.0%
$2.14
↑34%
agent_01JV9K8T
Proposal Writer
Running
18 min ago
312
0.0%
$1.48
↑17%
agent_01JV5M2T
Doc Summarizer
Idle 3h
3 hr ago
48
0.0%
$0.21
↓82%
agent_01JV3N8P
Report Builder
Idle 5h
5 hr ago
12
0.0%
$0.05
↓91%
Showing 9 of 14 active agents. All agents share the flat $999/mo plan — cost attribution is for internal tracking only.
Spend Attribution — This Month
Cost by Agent — May 2026
Agent
Role
Tokens
Calls
% of Fleet
Attributed Cost
agent_01JX4K2P
Outreach Bot
210M
7,200
37.5%
$374
agent_01JX3M8R
Research Bot
168M
5,541
30.0%
$300
agent_01JX1Q9T
Data Extractor
112M
3,686
20.0%
$200
agent_01JWZN5V
Lead Scorer
70M
2,005
12.5%
$125
Attributed cost = agent's % of total tokens × $999 plan fee. Useful for internal chargeback or ROI analysis.
Impact
The environmental and research contributions your compute is making.
CO₂e Avoided This Month
19.7
metric tons
vs. drawing equivalent power from ERCOT grid
51,100 kWh × 0.386 kg CO₂e/kWh ÷ 1,000 = 19.72 MT
Source: EPA eGRID 2023, Texas ERCT subregion
Equivalent to removing ~4.3 cars from the road for one year
Lifetime avoided: 118 MT
Health Research Funded This Month
$1,050
contributed
to the American Open-Source Health Research Endowment
35% of your subscription above BotINFRA's operating threshold
This month: $1,050
Lifetime total: $8,925
Funds subsidized compute for American biomedical AI researchers — open-source, publicly accessible results.
Monthly CO₂e Avoided (MT)
JanFebMarAprMay
Research Funding — Lifetime Allocation
Research Area
Institution
Funded
Cancer genomics compute
MD Anderson Cancer Center
$2,850
Drug discovery LLMs
Baylor College of Medicine
$2,310
Radiology AI training
UTHealth Houston
$1,890
Rare disease classification
Texas Children's Hospital
$1,050
General endowment reserve
AOHRE
$825
Allocations are made quarterly by the AOHRE board. All funded research is published open-access.
Our team will reach out within one business day to confirm availability and finalize terms for your Charter account.
Request sent. We'll be in touch within one business day.
Revoke ""?
This key stops working immediately. Cannot be undone.