Generative artificial intelligence (gen AI) foundation models (FMs) can create new content and ideas, including conversations, stories, images, videos, music, and even software code, in response to a prompt. Gen AI is powered by large-scale FMs that can be trained with up to petabytes of data and supported by AWS infrastructure . As these FMs grow, their parameters also increase – upward of trillions of parameters. Even a smaller language model can be trained with a few billion parameters, and depending on the use case, that number can go up to 15 billion parameters.
Organizations are leveraging gen AI to reshape industries, such as healthcare, entertainment, finance, and manufacturing. While most organizations aim to take advantage of gen AI through customized applications using large language models (LLMs) and FMs or fully managed services with industry-leading models, some still want to build and train their own models.
As training and deploying these large-scale FMs continue to evolve, organizations need an unprecedented level of high-throughput, low-latency, and secure infrastructure to train these models in a reasonable time and deploy them for inference while working to lower costs and maintain the highest performance possible.
This article takes you through the key challenges in training and inferencing models and how the right infrastructure can optimize cost, improve performance, and reduce time to market.

Challenges across the AI workflow
The range of infrastructure challenges organizations face is evolving as projects and technologies advance.
Increased interest in rapidly adopting new technologies Gen AI integrations are being pursued at a fast pace, but organizations still need to thoughtfully address concerns over privacy, security, costs, performance, knowledge, training gaps, and other impacts to avoid risks.
Working with models at any scale While some FMs have grown astronomically to include trillions of parameters, many organizations use smaller, more finely tuned models for their specific needs. Organizations want to flexibly scale compute, networking, and storage to meet diverse and changing requirements.
Balancing infrastructure costs while maintaining performance Training, building, and deploying gen AI models requires an unprecedented level of performance and new technologies with budgets that remain similar year over year. This necessitates finding ways to lower costs while maintaining performance. You need a broad set of compute accelerators to meet the demand of any gen AI use case.
Data infrastructure modernization, integration, and scalability Legacy systems inhibit advanced analytics and AI capabilities and bring substantial capacity constraints, requiring organizations to spearhead transformations that optimize value from the cloud. Plus, integrating gen AI systems into existing infrastructure and workflows can be complex and resource-intensive.
While initial proofs of concept are easier to complete, scaling solutions and systems to handle increasing workloads and ensuring reliability and performance are not. Infrastructure should offer broad and flexible options to fit each scenario.
Data sovereignty, data residency, and regulatory considerations Organizations in highly regulated industries are especially cautious about data security and privacy for gen AI applications, including concerns like exposure of intellectual property (IP) or code, data security and privacy, governance, and ensuring compliance. Organizations must navigate complex, uncertain, and ambiguous regulatory landscapes and ensure that their organizations comply with relevant laws and guidelines while exploring their cloud infrastructure solutions
To navigate these complexities, it’s essential to separate hype from reality. Our expert assessment of market predictions helps business leaders understand which AI trends deliver true value—and where to focus resources for long-term impact.

Developing your generative AI solution on AWS Infrastructure
- Data collection and preparation
After you’ve identified a use case and set objectives, you will typically need to source large datasets, cleanse the data, and, in some cases, reprocess it. You will also need scalable tools to make data preparation efficient and manageable
- Selecting models and architecture
Pre-built models and solution templates can help data scientists and machine learning (ML) practitioners get started quickly. A wide range of publicly available and fine-tunable FMs for text and image generation are available from libraries such as Hugging Face. Choosing models that work with accelerated compute and tools, such as Amazon SageMaker AI, can help you innovate faster
- Model training
Data is typically split into sets for training, validation, and testing. The model is trained through multiple runs in which weights are adjusted, problems are identified, and tracking metrics, such as model accuracy, are refined. FMs are often trained on petabytes of data and may be too large to fit in a single GPU.
You will need purpose-built ML silicon or GPUs in clusters with up to thousands of nodes. As a result, much of your training budget is likely to be spent on infrastructure. You will also need access to the latest ML frameworks and libraries and high-performing and secure technologies that speed up networking
- Fine-tuning and optimizing models
Your compute capacity and resource needs will vary depending on the type of fine-tuning or optimizations you choose—from full fine-tuning to parameter efficient fine-tuning. You will also need access to tools and software that help you maximize performance.
- Deployment
As you prepare to deploy FMs for inference, your infrastructure needs will change. Inference can account for a large portion of the total cost of gen AI in production, so you will need to implement infrastructure that reduces the inference cost at scale. Compute needs are also different from the training stage because nodes can be distributed rather than clustered. You may find complexities in achieving the low latency needed for real-time inference— required by interactive use cases like chatbots—or the throughput needed for batch inference of large datasets
How Dedicatted Delivers Remittance Automation on AWS Infrastructure with GenAI
We start with process diagnostics to uncover workflow bottlenecks and ERP integration gaps. Then, we roll out in phases:
- Foundation Setup – AWS-powered pipelines (S3, Lambda, Step Functions) replace manual uploads with secure automation.
- AI Intelligence – Amazon Bedrock and Comprehend extract structured data from diverse documents with high accuracy.
- ERP Integration – Parsed data flows seamlessly into Epicor Prophet, hardened for scalability and compliance.
Our tailored approach ensures precision, compliance, and scalability – with confidence scoring, secure cloud-native tools, and workflows designed for growth.
The impact
- 40% cost reduction in financial operations
- Faster, error-free remittance processing across 13 branches
Teams freed to focus on strategic work, not manual tasks
With proven AWS + GenAI expertise, Dedicatted transforms complex financial workflows into scalable, cost-efficient systems.\

Minimize latency with optimized networking
- Enable lightning-fast inter-node communication for high-performance AI applications with up to 3,200 gigabits per second (Gbps) of Elastic Fabric Adapter (EFA) networking, providing lowlatency, high-bandwidth networking throughput.
- Reduce latency by 16 percent and support up to 20,000 GPUs with Amazon EC2 UltraClusters 2.0, a flatter and wider network fabric specifically optimized for ML accelerators. It offers up to 10 times more overall bandwidth than alternatives.
- Increase network efficiency and optimize job scheduling with the Amazon EC2 Instance Topology API.
- With insights into the proximity between your instances, it can help you strategically allocate each job to the instance type that best fits your requirements.
Optimize storage for throughput, low latency, and reduced costs
AWS offers a comprehensive choice of cloud storage options that meet every need in AI workflows, from delivering the performance to keep accelerators highly utilized to reducing the cost of long-term storage.
- Amazon FSx for Lustre can help you accelerate ML with maximized throughput to compute resources and seamless access to training data stored in Amazon Simple Storage Service (Amazon S3).
- Amazon S3 Express One Zone provides the lowest-latency cloud object storage available, with data access speed up to 10 times faster and request costs up to 50 percent lower than Amazon S3 Standard.
- Amazon S3 is built to retrieve any amount of data from anywhere, offering industry-leading scalability, data availability, security, and performance. Use Amazon S3 to create a centralized repository or data lake that allows you to store all your structured and unstructured data at any scale.
Control data and AI infrastructure securely
Built on the foundation of the AWS Nitro System, AWS safeguards even your most sensitive data. The Nitro System is designed to enforce restrictions so that nobody, including anyone at AWS, can access your workloads or data running on your accelerated computing EC2 instances or any other Nitro-based EC2 instance. The level of security protection offered is so critical that we’ve added it to our AWS Service Terms to provide additional assurance to all of our customers, and it has been validated by the NCC Group, an independent cybersecurity firm