AWS Podcast Episode #724: Accelerated Computing – From Fraud Detection to AI Innovation
Released on June 9, 2025
The 724th episode of the AWS Podcast delves into the transformative world of accelerated computing, exploring its pivotal role in powering advanced AI and machine learning (ML) applications across various industries. Hosted by Shruti Koparkar, the episode features insightful discussions with two AWS experts, Ray and Sudhir Kaldendi, who shed light on the challenges, architectural considerations, and real-world applications of GPU-accelerated computing on AWS.
Introduction to Accelerated Computing
Shruti Koparkar opens the episode by introducing the concept of accelerated computing on AWS, emphasizing its significance in AI and ML workloads. Accelerated computing leverages powerful hardware like Nvidia GPUs and AWS’s proprietary AI chips—Trainium and Inferentia—to enhance computational performance for complex tasks.
Segment 1: Accelerated Computing with Ray
Ray’s Role and Expertise
[00:00] Shruti Koparkar: "Hello everyone and welcome to another episode of the AWS Podcast... Today we are going to dive into a couple of different use cases for accelerated computing."
[01:06] Ray: "I'm a container specialist, solutions architect... My primary role is to now work with customers that are trying to build these massive systems on Kubernetes... especially for machine learning and generative AI solutions."
Ray, an experienced solutions architect at AWS, specializes in container orchestration and helps customers architect scalable ML applications using Kubernetes and Amazon EKS (Elastic Kubernetes Service).
Key Challenges: Maximizing GPU Utilization
A significant challenge highlighted by Ray is the maximization of GPU utilization. Unlike CPUs, GPUs are costly, and underutilization can lead to substantial financial losses. Ray states:
[05:37] Ray: "The biggest challenge... is that the GPUs... are very expensive. Anytime you're using the GPU you need to ensure that you're maximizing its usage."
He points out that applications often fail to fully utilize GPU capacity, sometimes achieving only 30% utilization, which is inefficient given the high costs associated with GPU resources.
Architectural Considerations for GPU Workloads
Ray discusses critical architectural considerations for deploying GPU-accelerated workloads on EKS:
-
Storage: Utilizing fast, distributed storage solutions like Amazon FSx for Lustre to ensure rapid data access and minimize latency.
[13:51] Ray: "FSX for Lustre is a great service for anyone that's looking to do distributed training... It allows hundreds of instances to read simultaneously."
-
Networking: Ensuring low-latency connections between GPUs using Elastic Fabric Adapter (EFA) to facilitate efficient data exchange during distributed training.
-
Resource Management: Balancing cost and performance by selecting appropriate GPU types based on workload requirements.
Customer Success Story: Rivian
Ray shares how Rivian, an automotive technology company, leverages accelerated computing on AWS:
[15:24] Ray: "Rivian has built a stack on top of AWS data on EKS... They use the Jark stack—Jupyter, Argo, Ray on Kubernetes—to run distributed jobs efficiently."
Rivian utilizes Argo Workflows and Ray to manage and orchestrate large-scale ML tasks, ensuring high GPU utilization and streamlined workflow management. By optimizing job scheduling and co-locating related tasks, Rivian maximizes the performance and cost-effectiveness of their GPU resources.
Nvidia NIMS and Karpenter Integration
[19:37] Shruti Koparkar: "Nvidia has launched NIMS... How do those work with EKS?"
[20:13] Ray: "NIMS addresses maximizing GPU efficiency by providing optimized containers that adapt to different hardware configurations."
Ray explains that Nvidia Inference Microservices (NIMS) offer pre-packaged, optimized containers for various ML models, simplifying deployment and ensuring optimal performance across diverse hardware setups.
Furthermore, Ray introduces Karpenter, an open-source Kubernetes autoscaler:
[23:02] Ray: "Karpenter optimizes compute scaling by dynamically provisioning the right EC2 instances based on workload demands."
Karpenter automates the scaling process, allowing EKS to adjust resources seamlessly in response to application needs, thereby enhancing resource utilization and reducing costs.
Segment 2: Accelerated Computing in Financial Services with Sudhir Kaldendi
Sudhir’s Role and Expertise
Transitioning to the financial sector, Sudhir Kaldendi, a principal solution architect specializing in payments at AWS, discusses the application of accelerated computing in fraud detection.
[27:03] Sudhir Kaldendi: "Financial institutions face the challenge of processing vast amounts of transaction data in real time, requiring robust infrastructure and efficient data processing."
Fraud Detection with Accelerated Computing
Sudhir highlights how financial services leverage GPU-accelerated instances to build sophisticated fraud detection systems:
-
Data Processing: Utilizing Nvidia Rapids integrated with Amazon EMR to accelerate data processing pipelines.
[35:55] Sudhir Kaldendi: "Nvidia Rapids speeds up data processing and machine learning pipelines, enabling faster fraud detection and cost savings."
-
Machine Learning Pipelines: Implementing frameworks like Nvidia Morpheus and Triton Inference Server for real-time transaction analysis and model deployment.
-
Scalability and Cost Efficiency: Combining AWS services like Amazon SageMaker with Nvidia technologies to achieve up to 14 times faster data processing and model inference, while significantly reducing costs.
Architectural Considerations for Real-Time Fraud Detection
Sudhir outlines key architectural components essential for building real-time fraud detection systems:
-
Data Storage and Retrieval: Efficiently storing and accessing vast volumes of historical and real-time transaction data using Amazon S3 data lakes.
-
Data Correlation: Integrating historical data with emerging trends to identify patterns indicative of fraudulent activities.
-
High Throughput Processing: Leveraging Triton Inference Server to handle up to 350,000 transactions per second, ensuring swift and accurate fraud detection.
[30:19] Sudhir Kaldendi: "Using the Triton inference server, we could process close to 350,000 transactions per second, which is crucial for identifying fraud in real time."
Customer Success Story: Feature Space’s ARIC Risk Hub
Sudhir shares the success of Feature Space and their ARIC Risk Hub:
[33:25] Sudhir Kaldendi: "Feature Space processes over 100 billion events annually, leveraging AWS scalability and Nvidia GPUs to deliver impressive fraud detection rates."
By utilizing AWS’s elastic computing power and Nvidia GPUs, Feature Space has developed a dynamic platform capable of real-time fraud detection with high accuracy, effectively mitigating financial risks.
Convergence of AI Technologies
Sudhir elaborates on the synergy between various AI models in enhancing fraud detection:
[38:26] Sudhir Kaldendi: "Graph neural networks analyze complex transaction patterns, while large language models process unstructured data like invoices and emails. Together, they provide a comprehensive fraud detection system."
The integration of Graph Neural Networks (GNNs), Large Language Models (LLMs), and Large Transaction Models enables financial institutions to detect fraudulent activities more accurately and efficiently, reducing false positives and enhancing customer trust.
Conclusion
Throughout the episode, Shruti Koparkar facilitates a deep dive into the intricacies of accelerated computing on AWS, highlighting its critical role in driving AI and ML innovations. From optimizing GPU utilization with Kubernetes and EKS to empowering financial institutions with real-time fraud detection capabilities, accelerated computing stands at the forefront of technological advancements.
[26:44] Shruti Koparkar: "Thank you so much, Ray... you provided a great overview of what folks should be thinking about when they are running GPU workloads on EKS."
[40:56] Shruti Koparkar: "That's it for this episode, everyone... until next time, keep on building."
For more insights and updates, listeners are encouraged to connect with Shruti Koparkar on LinkedIn or X, and provide feedback via email at awspodcast@Amazon.com.
This episode underscores the pivotal role of accelerated computing in modern AI applications, providing actionable insights for developers and IT professionals aiming to harness the full potential of AWS’s GPU-powered resources.
