CASE STUDY

How Nooks uses Porter to achieve a median latency of 250ms for their AI parallel dialer

Shankar Radhakrishnan
December 21, 2023
11 min read

Most of the calls that sales representatives make go to voicemail, wasting hours of their time listening to ringtones instead of talking to potential customers. With sales teams regularly making hundreds of calls per week, Nooks AI Parallel Dialer allows users to call multiple lines at once and 5x their call volume and talk time. Nooks recently raised a Series A round and has $27M in total funding.

In addition to the Dialer, the startup boasts several other features, such as a virtual sales floor allowing teams to collaborate during calls, integrations with Customer Relationship Management tools (CRMs), like Hubspot, and Sales Engagement Platforms (SEPs) such as Outreach, Salesloft, and Apollo.

We interviewed Sishaar Rao, a Software Engineer at Nooks, to learn more about their experience using Porter to consolidate and manage their infrastructure. 

Serverless, PaaS, or Kubernetes?

Nooks initially hosted their infrastructure across several vendors, evaluating the pros and cons of each. Some of their workloads were on Google’s Cloud Functions, other workloads were on Fly.io, and they used standalone virtual machines (VMs) directly on Google Cloud Platform (GCP) as well. Finally, they used Modal for their Machine Learning (ML) pipelines. 

Cold starts and execution timeouts on Cloud Functions

Cloud Functions is a serverless offering from GCP that Nooks felt allowed them to move fast, scale easily, and not think too much about managing their infrastructure. They chose to host these applications on GCP as they were already using Firebase (Google’s NoSQL real-time database offering), which integrates with Cloud Functions well. For example, one can trigger functions, like sending welcome and farewell emails, when a Firebase user account is created or deleted. 

Cloud Functions’ advantages come from being a serverless platform - it's easy to use since the infrastructure is fully managed; users simply put their code into functions rather than having to configure applications running on containers. Furthermore, code is run in response to events (like the Firebase integration example above) and resources are provisioned in response to events as well, allowing for hassle-free scalability.

The main negatives of Cloud Functions are the cold-start and execution time-out problems that are inherent to serverless infrastructure. Rather than continuously running like a Kubernetes (K8s) container runtime, the first time a serverless function is activated (or after a period of inactivity), it needs to create a new instance, which can take a few seconds, leading to a delay in the execution of a function - the cold start problem that occurred for around 3% of their workloads. The other issue lies in the maximum timeout duration of serverless functions (at least on Cloud Functions) being 540 seconds or 9 minutes.

Nooks could simply not entertain cold starts due to the nature of their product. For context, a “lead” is a potential buyer of one’s product or service. Most of Nooks’ users are B2B companies that conduct outbound sales (a form of sales whereby the reps reach out to leads to sell to them), often over the phone. Nooks’ parallel dialer calls multiple leads at the same time and connects sales reps to leads that actually pick up, using proprietary ML models that can tell whether the voice comes from a voicemail or is a real person (an extremely difficult task, considering the endless variations of machine- and human-created voicemails).

The Nooks AI parallel dialer and virtual sales floor make collaboration during calls simple and easy for sales teams.

The sales rep then sees a transcription of what was said and is immediately connected to the lead in real-time and as fast as possible, rapidly accelerating the pace of outbound. If a lead picks up the phone, says “Hello?”, and hears no response for three seconds due to a cold start, they’re likely to hang up. This results in a terrible end-user experience for the sales reps and leads. Cold starts occurring 3% of the time may seem like a small percentage, but at scale, this could mean millions of dollars in potential revenue lost.

Nooks’ customers also want to be able to import data from their CRMs and SEPs on-demand at any given time; cold starts are an issue here as well. Furthermore, the time it takes to import data from these tools can be substantial due to rate limits and can vary heavily by the size of the datasets - an import could easily exceed the maximum timeout duration configurable on Cloud Functions. In essence, it wasn’t very feasible to orchestrate these imports on Cloud Functions; Nooks’ engineers had to implement significant workarounds to operate under the constraints that the serverless offering imposed upon them. The cold start and execution timeout problems made it so Nooks could safely rule out serverless as an option for hosting their apps.

Opacity and outages on a traditional PaaS

Fly.io is a Platform-as-a-Service (PaaS) that provides a global edge application delivery network, allowing for reduced latency since applications are deployed closer to the end-user. While they enjoyed the developer experience and convenience it provided, the Nooks team ran into different problems with Fly.io. 

The PaaS would have outages, resulting in Nooks’ metrics and logs for their applications, or even deployments, going down, meaning they didn’t have visibility into the status of their apps or couldn’t push out new code to update them.

Nooks also tried to move to a Virtual Private Cloud (VPC) on the PaaS for their latency-critical applications, such as their parallel dialler. There are multiple components to this application; the front end (a user’s network connection/speed and physical device) is largely out of Nooks’ control. But the backend is within their purview - incoming audio connections to the application result in callouts to various ML models, which then send predictions to an API that gets propagated back up the stack, resulting in the sales rep being connected to the call (or not, in the case of a voicemail). This is not a simple, one-in, one-out equation, but rather a topological map of steps and information. If any individual step takes 30 or 50 milliseconds (even 80ms per step can occur on public networking), it delays the entire process and heavily degrades the end-user experience. If even 30ms can be saved per step through hosting on a VPC, Nooks realized it was worth the additional cost and complexity. 

Fly.io required a lot of custom configuration to deploy to their own VPC and Nooks wasn’t able to get the custom feature working in time. Ultimately, they decided to move off of the PaaS and to their own private cloud to have more control over their infrastructure. 

Managing Kubernetes directly

The AI startup knew they would need to host their application on their own private cloud to avoid the downtime they experienced on a traditional PaaS like Fly.io. They needed private networking and control over their networking, no cold starts or execution timeouts for their latency-critical applications, and GPU support for their ML workloads. Google Kubernetes Engine (GKE) provided all of the above.

However, managing Kubernetes requires a great deal of infrastructure expertise, so they considered GKE Autopilot, a managed offering from GCP. They determined that Autopilot didn’t abstract away the management of Kubernetes enough to provide the time savings and level of convenience they were looking for, especially at its steep price point: GKE Autopilot is double the price of normal GKE. Since Autopilot wasn’t a cost-effective solution either, their final option was to manage their own infrastructure directly on GKE. 

Nooks’ main concern regarding migrating to and managing GKE was the time and expertise required. They had a monolithic codebase that was spread across multiple vendors and offerings (Cloud Functions, raw VMs on GCP, and Fly.io), meaning they would have to spend hundreds of engineering hours refactoring their code to consolidate it all and get it working smoothly on GKE – time spent not building their product. Post-migration, they would have to dedicate engineering bandwidth to managing their Kubernetes cluster. Sishaar had experience using Hashicorp Nomad and Terraform (Infrastructure as Code, or IaC, tools), and was aware of the monumental amount of effort and expertise necessary to manage one’s own infrastructure. He knew it would be a time-consuming and challenging endeavor that would also prevent them from focusing entirely on their product. 

“For a small engineering team, it’s worth putting off having a dedicated infrastructure team for as long as possible, since you essentially lose engineers that could be spending all their time on the product.” - Sishaar Rao, SWE at Nooks

GKE managed by Porter

Porter allows Nooks to leverage all of the advantages of hosting on their own private cloud and Kubernetes cluster while retaining the convenience of a PaaS, saving precious engineering hours and allowing them to migrate over all of their workloads from disparate vendors with ease and speed.

“I firmly believe that startups should optimize their infra stack to have as little infra work as possible so they can focus on product work. It’s very valuable to do that for as long as possible, and Porter lets us do that. I’m the only engineer that works on infrastructure at Nooks and only for a few days at a time for specific projects - other than that, I never even have to think about it.” - Sishaar Rao, SWE at Nooks

Since the Nooks team didn’t have to spend time migrating, consolidating, and managing their infrastructure, they were able to spend some of the time saved refactoring their code to output JSON-formatted logs, building telemetry across services, and standardizing metrics collection, all of which improved their observability significantly. 

The AI startup had also been running into a fair amount of security concerns regarding middleware authentication on Cloud Functions to other services, which were no longer an issue on their own private VPC as their infrastructure can be configured to operate smoothly with any authentication services.

Finally, they’re able to focus on ensuring their deployments are rolled out safely rather than worrying that their infrastructure will work at all. Nooks’ only concern when it comes to building and deploying are application-level errors from their own codebase.

Workloads on Porter

On Porter, Nooks runs two clusters - a beta cluster and a production cluster. They offer two environments for their customers, with the beta environment being more rapidly updated with new features. 

The issue of latency on public networking was solved with Porter as all the clusters provisioned on the platform are in users’ own private cloud. The connection times for calls through Nooks’ parallel dialer now fall under the following distribution: 

  • 50% are under 250ms
  • 75% are under 500ms
  • 90% are under 900ms

Another benefit Nooks realized post-migration was a significant amount of cost savings on their infrastructure (nearly 50%). Other than the inherent benefits of Kubernetes clusters, such as better resource utilization and autoscaling, this was due to Cloud Functions having a cyclical invocation problem. That is, when one has a pattern whereby they invoke one API endpoint to another API endpoint, performing this pattern in cycles as Nooks does, a single invocation quickly turns into multiple, resulting in exponential cost increases.

Within each cluster, Nooks runs a variety of workloads. These include the aforementioned APIs, WebSocket servers (where the ML models run), preemptible workloads (that lower runtime costs), cron jobs (for scheduled events), and workers that autoscale based on Pub/Sub queue length through Kubernetes Event-driven Autoscaling (KEDA).

Machine Learning pipelines and GPU nodes 

Nooks also needed to utilize GPU nodes for their Machine Learning (ML) pipelines; GPUs can batch instructions and push vast amounts of data at high volumes. This allows companies like Nooks to rapidly speed up their workloads.

The ML lifecycle involves two phases - training and inference. The former requires the creation of an ML model (code implementing a mathematical algorithm) that is then trained on data examples. The model is then validated against unseen data, testing for efficacy. The latter phase - inference - is when the model is run on real data to generate live outputs or predictions. Initially, they used Modal for training and inference due to its Jupyter Notebook capabilities. Jupyter Notebooks are tools, available to use on a web browser, that show one’s code, its output, and any commentary, allowing users to quickly iterate on their ML models.

The Jupyter Notebook interface on Colab allows users to finetune their ML models.

Since they could use GPU-enabled nodes on Porter for inference and Google Colaboratory (Colab), another Jupyter Notebook service that was native to Google, for training, they moved their ML pipelines off of Modal. 

The GCP marketplace contains a Colab-hosted runner VM that comes in the form of an image. They can configure this image with the resource allocations (CPU, RAM, and GPU) they want, then fine-tune their models, stopping and starting the instance as needed (generally for a day or two at a time as training a model is much more resource-intensive than running it). Finally, they export the trained models from Colab and deploy them on their Porter-managed clusters with GPU nodes; inference occurs indefinitely on the WebSocket servers, and therefore, is the majority of their GPU usage.

An application on Porter using GPUs. 

Configurable application parameters and autoscaling

For Nooks, phone calls come into the WebSocket servers where ML inference occurs through WebSocket connections. If these WebSockets disconnect, additional services like call transcription would break; the parameter that ensures nothing breaks for long-running WebSockets is their termination grace period. Unfortunately, traditional PaaS providers, such as Fly.io, don’t let their users tweak application parameters like termination grace periods, so upon deploying code changes, functionality would break.

Using Porter, Nooks is able to set longer termination grace periods for their WebSockets with a ‘Prestop’ hook. Hooks tell Kubernetes containers about events in their lifecycle and let scripts run in response to any changing phases. The ‘Prestop’ hook effectively allows Nooks’ containerized WebSockets to keep running and makes it so the process only terminates when the hook handler executes. This guarantees that functionality never breaks during deployments.

Furthermore, the startup can configure autoscaling for their Python workers based on Pub/Sub  queue length, a custom metric. These workers perform lengthy background tasks that Nooks wouldn’t want to use web services for. KEDA acts as a metrics server exposing Pub/Sub queue length to the HorizontalPodAutoscaler (HPA), so if the queue grows rapidly (meaning the number of tasks is suddenly increasing), the workers are automatically scaled as the HPA deploys more pods to handle that increase.

Add-ons on and off Porter’s marketplace

Porter also offers observability out of the box - users can view application metrics (CPU usage, memory usage, and network usage) for up to 30 days through a highly available and configured Prometheus instance installed on every Porter-provisioned cluster. However, Nooks needed a more powerful monitoring solution so they use one of the add-ons that Porter supports - DataDog.

Some of the add-ons Porter offers.

Although it’s not an add-on Porter offers out of the box, Nooks also uses Deepgram, an AI-powered phone call transcription service that transcribes the Nooks users’ phone calls into multiple languages in real-time using Automatic Speech Recognition (ASR).

To maintain ultra-low latency by having ASR colocated with their other applications, Nooks deployed Deepgram on-prem onto their GKE clusters managed by Porter. After fine-tuning the ASR models to be even faster and more accurate, and with some private networking configuration (specifically using an HTTPS endpoint rather than HTTP to overcome an mTLS authentication issue), Sishaar deployed Deepgram’s Docker image using Porter. While Porter abstracts away the underlying infrastructure, if a user wants to go under the hood (to use a third-party service like Deepgram for example), they entirely have the freedom and ability to do so.

Infrastructure you won’t outgrow

Sishaar had experienced the difficulty and pain of an infrastructure re-platforming at a previous company (from another PaaS, Convox, to their own private cloud using Hashicorp Nomad for infrastructure management) – this migration went six months over schedule and required a great deal of unforeseen infrastructure work from engineers, tearing them away from product work. 

Fortunately, Porter made the migration from Cloud Functions and Fly.io to GKE easy. More importantly, Nooks is now using the golden standard of container orchestration for the foreseeable future - Kubernetes. 

“Now that we have K8s out of the box, we’ll never need to re-platform our infrastructure; we’re set up for success no matter how much traffic we get.” - Sishaar Rao, SWE at Nooks

Kubernetes boasts a bevy of advantages, such as granular resource allocation, autoscaling based on demand and traffic, and high availability that ensures containers are always available through the automatic distribution of that traffic and automatic restarts of failing containers. Another one of K8s’ benefits is its cloud agnosticism. This means that the concern of vendor lock-in to a cloud provider is effectively mitigated. So, if Nooks ever decided to move their ML model training from GCP to Azure, perhaps for the latter’s extensive GPU offerings, they could easily spin up an Azure instance using Porter for their ML training pipeline. The experience of Porter is the same across cloud providers, allowing companies to take advantage of a multi-cloud strategy and not be affected by vendor lock-in from their cloud provider. 

K8s’ disadvantages are its complexity and the level of knowledge required to take full advantage of its capabilities. Fortunately, Nooks can leverage the benefits of K8s without spending any additional engineering bandwidth on infrastructure management, through Porter, all on their own private cloud. 

Next Up

Why Landing chose Porter to scale their servers
Shankar Radhakrishnan
5 min read
Why Carry uses Porter instead of hiring a full-time DevOps engineer
Shankar Radhakrishnan
4 min read
How Writesonic runs a 1.6TB Kubernetes cluster with no DevOps engineers
Justin Rhee
2 min read
How Avenue scaled after YC without hiring DevOps
Justin Rhee
3 min read
How Memberstack uses Porter to serve 30 million requests
Justin Rhee
3 min read
How Dashdive uses Porter to handle a billion requests per day
Shankar Radhakrishnan
5 min read
How Nooks uses Porter to achieve a median latency of 250ms for their AI parallel dialer
Shankar Radhakrishnan
11 min read
How Getaround uses Porter to manage Kubernetes clusters serving traffic across 8 countries
Shankar Radhakrishnan
4 min read
Govly moves from GCP to AWS in a day using Porter
Shankar Radhakrishnan
5 min read
How Onclusive uses Porter to consolidate their tech following five mergers
Shankar Radhakrishnan
3 min read
How HomeLight powers billions of dollars of real estate business on Porter
Justin Rhee
3 min read
Why Woflow moved from ECS to Porter
Trevor Shim
6 min read
Subscribe to our weekly newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.