Spandan Mahapatra
ENT-03 | AI for enterprise technology

Mainframe Modernization & AIOps

Modernizing Delta's z/TPF mainframe estate through a hybrid cloud strategy — leveraging the Kyndryl 20-year partnership, AWS cloud infrastructure, and TCS Ignio AIOps to create a resilient, observable, and progressively autonomous technology foundation.

z/TPF modernizationKyndryl 20yr partnershipAWS hybrid cloudTCS Ignio AIOps
-50%
Incident resolution time
99.99%
System availability
-30%
Mainframe MIPS cost

The stakes

Business scale and impact that makes this transformation critical.

z/TPF
Core mainframe platform
1970s-era reservation and crew systems
20 yr
Kyndryl partnership
Strategic infrastructure management
40%
Workloads on AWS
Current cloud migration progress
$400M+
Annual infrastructure cost

Current-state friction

Legacy

z/TPF Mainframe Dependency

Delta's core reservation, crew scheduling, and departure control systems still run on IBM z/TPF — a 1970s-era transaction processing platform. Institutional knowledge is concentrated in a shrinking pool of senior engineers, and the rigid architecture limits the pace of innovation and integration with modern cloud services.

50+ years old, shrinking talent pool
Observability

Limited AIOps Observability

Current infrastructure monitoring is fragmented across mainframe, on-premises, and cloud environments. Without unified AIOps observability, incident correlation is manual, root cause analysis is slow, and predictive failure detection is virtually impossible across the hybrid estate.

3+ monitoring tool silos
Hybrid

Hybrid Cloud Complexity

With 40% of workloads on AWS and critical systems still on z/TPF, Delta operates a complex hybrid environment. Data movement, latency management, and consistent security policies across environments create operational overhead that will only grow as cloud migration accelerates.

Hybrid z/TPF + AWS architecture

Intelligent choices architecture

Four-step agentic decision loop powering autonomous operations.

STEP 01
Sense
What the agents observe
  • z/TPF transaction volumes, response times, and resource utilization metrics
  • AWS CloudWatch and infrastructure metrics across all cloud workloads
  • Application dependency maps spanning mainframe and cloud environments
  • Change management feeds tracking deployments across all environments
TCS Ignio · IBM OMEGAMON · AWS CloudWatch · ServiceNow CMDB
STEP 02
Decide
How the agents reason
  • Anomaly detection correlating signals across mainframe, cloud, and network layers
  • Predictive failure analysis using historical incident patterns and capacity trends
  • Workload migration candidate identification based on coupling analysis and risk scoring
  • Incident priority and routing decisions using business impact assessment
TCS Ignio correlation engine · Predictive failure model · Migration scoring framework · Business impact analyzer
STEP 03
Act
What the agents do
  • Automated incident remediation for known patterns (restart services, clear queues, scale resources)
  • Proactive capacity scaling in AWS based on predicted demand surges
  • Automated runbook execution for standard operational procedures
  • Incident communication and escalation to Kyndryl and internal teams
TCS Ignio automation · AWS Auto Scaling · Runbook automation platform · PagerDuty integration
STEP 04
Learn
How the agents improve
  • Incident post-mortem analysis identifying systemic infrastructure weaknesses
  • Mainframe workload profiling for progressive migration planning
  • AIOps model retraining on new incident patterns and resolution outcomes
  • Capacity planning optimization using trend analysis and seasonal modeling
Incident analytics · Workload profiler · MLflow model registry · Capacity planning engine
At 11PM on a Friday, TCS Ignio detects a subtle z/TPF memory allocation anomaly that historically precedes a reservation system degradation within 4-6 hours. The AIOps agent correlates it with an AWS batch job that's generating unusual mainframe API call volume, automatically throttles the batch job, initiates a z/TPF preventive memory flush, and pages the Kyndryl on-call team with full context — preventing a Saturday morning reservation outage that would have affected 180K bookings.

Human + AI autonomy levels

L1Tool
CURRENT
L2Assistant
TARGET
L3Supervised agent
L4Autonomous agent
L5Agentic workforce
Human role
Human as operator
Human as decision-maker
Human as supervisor
Human as exception handler
Human as strategist
AI role
AI as monitoring dashboard
AI correlates and recommends
AI remediates known patterns
AI manages infrastructure operations
Self-healing infrastructure
Description
Unified observability dashboards combining z/TPF, AWS, and network metrics for infrastructure teams.
TCS Ignio correlates incidents across environments and recommends remediation; engineers validate and execute actions. Kyndryl team manages mainframe operations.
Agent autonomously handles known incident patterns and routine capacity scaling; escalates novel incidents and mainframe-impacting changes.
Full AIOps automation across hybrid environment including predictive remediation and proactive scaling with human focus on strategic modernization decisions.
Multi-agent self-healing infrastructure coordinating AIOps, security, capacity, and migration agents for continuously optimized hybrid operations.
Team type
Traditional squads
Human-led with AI copilot
AI-led with human oversight
Autonomous with guardrails
Agent ecosystem
Guardrails
Read-only monitoring; all remediation actions performed manually by infrastructure teams
All remediation requires engineer approval; mainframe changes require Kyndryl review
Bounded to approved runbooks; mainframe changes require Kyndryl approval; production database changes always human-reviewed
Critical system changes require human approval; data integrity protections immutable; Kyndryl partnership protocols honored
Cross-agent safety protocols; Kyndryl partnership governance; strategic modernization roadmap by CTO

TCS agentic AI agents

Click an agent to see detailed capabilities, autonomy levels, and TCS proof points.

KPI architecture

LevelKPIBaselineTargetBusiness link
L0 BoardSystem availability99.5%99.99%Business continuity and revenue protection
L1 ExecIncident resolution time4.5 hrs2.2 hrsOperational impact minimization
L2 OpsMainframe MIPS efficiencyBaseline+30%Infrastructure cost optimization
L3 AI OpsAutomated incident remediation10%55%Operations team productivity
L4 AI DecisionPredictive failure detectionN/A>80%Proactive outage prevention

TCS proof points

TCS IP
TCS Ignio AIOps Platform

Enterprise AIOps platform providing unified observability, automated remediation, and predictive analytics across hybrid mainframe-cloud environments for global enterprises.

200+
Enterprise deployments
48%
Incident resolution time reduction
99.98%
Average availability achieved
Quick-win opportunity

TCS Incept.AI Innovation Camp: 4-6 week discovery workshop ($500K-$1M) to assess current state, identify automation opportunities, and deliver a prioritized transformation roadmap with measurable business outcomes.

Expansion path

From discovery to full-scale deployment: Spark.AI for prototyping (8-12 weeks), Realize.AI for production scaling (6-12 months), and ongoing managed services with SLA-based outcomes.

Enterprise Control Plane
How this connects
  • Model orchestration for AIOps anomaly detection and prediction models
  • Governance controls for infrastructure change management compliance
  • Observability tracking system availability, incident metrics, and migration progress

Related use cases