Skip to content

1.4 - AI App Architecture

Today we explore AI application architecture in detail, learning to identify resources and understand how components work together to create intelligent applications.

  • Common patterns in AI application architecture
  • How to identify and categorize Azure resources in an architecture diagram
  • The Retrieval Augmented Generation (RAG) pattern
  • How data flows through an AI application

Before diving in, review these resources:

  1. 📘 Azure OpenAI Architecture Patterns - Baseline architecture for OpenAI applications
  2. 📘 RAG Pattern Guide - Comprehensive RAG implementation guide
  3. 🔗 Azure Architecture Center - AI - Collection of AI architecture patterns

Let’s deconstruct a typical AI application architecture into its key components:

Purpose: How users interact with your application

Common Azure Services:

  • Azure Static Web Apps: For React/Angular/Vue frontends
  • Azure App Service: For server-rendered applications
  • Azure Container Apps: For containerized frontends

What it Does:

  • Renders UI
  • Captures user input
  • Displays AI responses
  • Manages session state

Purpose: Entry point and traffic management

Common Azure Services:

  • Azure API Management: Full-featured API gateway
  • Azure Application Gateway: Load balancing with WAF
  • Azure Front Door: Global routing and CDN

What it Does:

  • Rate limiting
  • Authentication/Authorization
  • Request routing
  • Caching
  • Monitoring

Purpose: Business logic and orchestration

Common Azure Services:

  • Azure Functions: Event-driven, serverless compute
  • Azure Container Apps: Containerized microservices
  • Azure App Service: Full web application hosting
  • Azure Kubernetes Service (AKS): Enterprise container orchestration

What it Does:

  • Process requests
  • Orchestrate AI calls
  • Implement business rules
  • Handle errors and retries
  • Manage workflows

Purpose: Intelligence and cognitive capabilities

Common Azure Services:

  • Azure OpenAI Service: GPT models for generation
  • Azure AI Services: Vision, speech, language understanding
  • Azure Machine Learning: Custom model training and deployment

What it Does:

  • Natural language understanding
  • Text generation
  • Embeddings creation
  • Image analysis
  • Speech recognition/synthesis

Purpose: Store and retrieve information

Common Azure Services:

  • Azure AI Search: Vector and hybrid search
  • Azure Cosmos DB: NoSQL database for documents
  • Azure SQL Database: Relational data
  • Azure Storage: Blobs, files, tables
  • Azure Cache for Redis: Fast data caching

What it Does:

  • Store product catalogs
  • Index content for search
  • Cache frequent queries
  • Manage user sessions
  • Store conversation history

Purpose: Track performance and issues

Common Azure Services:

  • Azure Monitor: Comprehensive monitoring
  • Application Insights: Application performance
  • Log Analytics: Centralized logging
  • Azure Managed Grafana: Visualization dashboards

What it Does:

  • Track API calls
  • Monitor costs
  • Alert on errors
  • Visualize metrics
  • Trace requests

Retrieval Augmented Generation (RAG) is the most common pattern for enterprise AI applications. Let’s break it down:

User Query
1. Query Processing
2. Embedding Generation (Azure OpenAI)
3. Vector Search (Azure AI Search)
4. Context Retrieval (Top-K relevant documents)
5. Prompt Construction (Query + Context)
6. LLM Generation (Azure OpenAI GPT)
Response to User

Without RAG: LLMs only know what they were trained on (limited, potentially outdated)

With RAG: LLMs can access your specific data (current, relevant, accurate)

  • Accuracy: Responses grounded in your data
  • Freshness: Use up-to-date information
  • Transparency: Show source documents
  • Control: Filter what data the model can access
  • Cost: More efficient than fine-tuning

Let’s apply this to our retail assistant:

[Users]
[Static Web App]
Frontend
[API Management Gateway]
Auth + Rate Limiting
[Azure Container Apps]
Backend Services
┌──────────────┴──────────────┐
↓ ↓
[Azure OpenAI Service] [Azure AI Search]
- Embeddings Model - Product Catalog Index
- GPT-4 Completion - Policy Documents Index
↓ ↓
└──────────────┬──────────────┘
[Azure Cosmos DB]
Conversation History
[Application Insights]
Monitoring
  1. User asks: “Show me waterproof hiking boots under $150”

  2. Frontend sends request to API Gateway

  3. API Gateway authenticates user, routes to backend

  4. Backend creates embedding of query using Azure OpenAI

  5. AI Search finds relevant products using vector search

  6. Backend constructs prompt with query + search results

  7. Azure OpenAI generates conversational response with product recommendations

  8. Backend stores conversation in Cosmos DB

  9. Frontend displays response to user

  10. Application Insights tracks the entire request flow

When reviewing an architecture diagram, ask:

  1. What category? (Compute, Storage, AI, Networking, etc.)
  2. What purpose? (What problem does it solve?)
  3. Why this service? (Why not alternatives?)
  4. How does it connect? (What calls it? What does it call?)
  1. What’s the entry point? (Where do requests start?)
  2. What’s the critical path? (Main flow for key scenarios)
  3. Where’s the data? (Storage and retrieval patterns)
  4. What could fail? (Single points of failure)
  5. How does it scale? (Bottlenecks and scaling strategies)
  • API authentication and authorization
  • Network isolation (VNets, Private Endpoints)
  • Secrets management (Key Vault)
  • Content filtering on AI outputs
  • Retry policies with exponential backoff
  • Circuit breakers for failing services
  • Health checks and monitoring
  • Multi-region deployment (if needed)
  • Caching at multiple layers
  • Async processing where possible
  • Connection pooling
  • CDN for static assets
  • Right-size resource SKUs
  • Use consumption-based pricing
  • Implement caching to reduce AI calls
  • Monitor and alert on spending

Explore architecture concepts:

  1. “What are the trade-offs between using Azure Functions versus Azure Container Apps for hosting AI application backend services?”
  2. “How do you implement caching effectively in a RAG architecture to reduce costs and improve response times?”
  3. “What are the key security considerations when exposing Azure OpenAI Service through a public-facing API?”

Next: Day 5 - Manual Provisioning

Tomorrow we’ll get hands-on with manual resource provisioning to understand what happens under the hood.