Inference Tab
We’ve created a new hub where you can one-click launch inference workloads in your own AWS infrastructure. Models including DeepGram, Llama-3-8B, and Mistral-3-7B are currently supported.
We run inference workloads on the ideal GPUs for these models, using industry-standard engines that guarantee high throughput and low latency.
GPU metric-based Autoscaling
Users can configure GPU-enabled applications to autoscale based on GPU and VRAM utilization. You can configure this feature from the Resources tab of your application. Applications need to be bound to a specific GPU node group.
Next Up
Cost Explorer Tab (Beta), Aurora Fast Database Cloning, and Improved Docker Build Times
Shankar Radhakrishnan
1 min read
Compliance with Thoropass and Grafana Add-on
Shankar Radhakrishnan
1 min read
Langfuse Add-on, Spot Instances, Filtering Slack Notifications by App, and Improved Security
Shankar Radhakrishnan
1 min read
Usage Tab, ARM Instances, Multiple Node Groups, and Improved Slack Notifications
Shankar Radhakrishnan
1 min read
Updated Add-ons and Secrets Management Integrations
Shankar Radhakrishnan
1 min read
Introducing the Inference Tab and GPU metric-based Autoscaling
Shankar Radhakrishnan
1 min read
Compliance with Oneleet - One-click SOC 2 and HIPAA on AWS
Shankar Radhakrishnan
1 min read
AWS Datastores
Shankar Radhakrishnan
1 min read
Add-on Explorer, UI improvements, and minimum permissions for AWS
Justin Rhee
2 min read
Introducing Logs 2.0 and the App Diff View
Justin Rhee
1 min read
App Activity Feed + Environment Groups
Soham Dessai
2 min read
Announcing the Changelog, Porter Apps, and Provisioner Improvements
Feroze Mohideen
1 min read