Open to remote · Enterprise & Scale-up roles

Gaurav Kaushal

DevOps / Platform Engineer / AWS · Kubernetes · CI/CD

I design and operate reliable infrastructure — from CI/CD pipelines to production environments — helping teams ship faster without breaking systems.

Currently working on large-scale on-prem infrastructure and CI/CD automation supporting enterprise production environments at Optum / UHG.

View My Work Get In Touch

Jaipur, India Remote-friendly AWS Certified 8+ yrs experience

Experience

Senior DevOps Engineer

Artech → Optum / UHG

May 2025 – Present

DevOps Engineer

Nlineaxis IT Solutions

Dec 2018 – Mar 2025

Software Developer

Locus R.A.G.S.

Jan 2018 – Dec 2018

About

Engineering reliability,
not just infrastructure.

I'm a DevOps / Platform Engineer with 8 years in IT, focusing on automation, CI/CD, and infrastructure reliability. My background started in traditional operations, which shaped how I approach production systems today — with a bias toward stability over speed.

I believe infrastructure should be predictable and boring. The goal of good DevOps is not exciting deployments — it's making sure nothing breaks when they happen. If your on-call is busy, the platform hasn't done its job yet.

Currently working with large on-prem and hybrid cloud environments, supporting CI/CD pipelines and infrastructure automation in regulated production systems at Optum / UHG.

Outside of work I enjoy exploring new infrastructure tooling, writing about DevOps on gauravkaushal.tech, and experimenting with automation that removes toil.

Current Role

Senior DevOps Engineer

Current Client

Optum / UHG (via Artech)

Location

Jaipur, Rajasthan · India

Availability

Open to remote roles globally

Certifications

✓

AWS Certified Solutions Architect – Associate

✓

AWS Certified Cloud Practitioner (CLF-C02)

◌

HashiCorp Terraform Associate In progress

◌

Certified Kubernetes Administrator (CKA) In progress

Technical Skills

Stack & Toolchain

AWS is my primary cloud. Everything listed reflects hands-on production experience.

Infrastructure

Linux (Ubuntu, RHEL) Terraform Ansible Puppet AWS CloudFormation

Containers

Docker Kubernetes / EKS Helm AKS

CI/CD

Jenkins GitHub Actions GitLab CI ArgoCD Argo Rollouts

Monitoring

Prometheus Grafana ELK / EFK Stack CloudWatch

Cloud

AWS (primary) Azure EC2 · EKS · ECS · S3 Lambda · Route 53

Security & Scripting

Vault IAM / least-privilege SonarQube · Nexus IQ Python Bash

Work

Selected Infrastructure Work

Representative work from production environments. Outcomes are real — not estimates.

CI/CD Pipeline Standardisation

40%+ reduction in production defects

Problem: Release pipelines were inconsistent across teams, with no enforced quality gates and frequent production defects from unvalidated deployments.

Action: Designed an organisation-wide GitHub Actions and ArgoCD governance framework with gated DEV → QA → UAT → PROD promotion flows, SonarQube SAST, and Nexus IQ dependency scanning built into every pipeline.

Outcome: Production defects reduced by 40%+. Release process became auditable and consistent across all regulated environments.

GitHub Actions ArgoCD SonarQube Nexus IQ Argo Rollouts

Discuss →

Hybrid Cloud Infrastructure & Cost Governance

20% reduction in monthly cloud spend · 99.9% uptime

Problem: No unified networking between 100+ on-premises servers and cloud resources, with growing cloud spend and no visibility into waste.

Action: Built a secure hybrid environment using Terraform and AWS Transit Gateways. Implemented automated S3 lifecycle policies, EC2 rightsizing, and Lambda-driven cost automation. Added cross-region snapshot management for DR.

Outcome: 20% reduction in monthly cloud spend. Unified hybrid environment with 99.9% uptime and automated disaster recovery.

Terraform AWS Transit Gateway Python / Lambda Ansible

Discuss →

Kubernetes Platform & Observability Stack

45% faster incident response · Legacy workloads containerised

Problem: Legacy monolithic workloads with no horizontal scalability, and limited visibility into system health — reactive incident response only.

Action: Migrated services to AWS EKS using Helm charts. Built a full observability stack with Prometheus, Grafana, and EFK — with proactive alerting configured before any workloads went live.

Outcome: 45% reduction in incident response time. Teams moved from reactive to proactive operations.

Kubernetes / EKS Helm Prometheus Grafana EFK Stack

Discuss →

Infrastructure as Code Migration

40% faster environment setup · Eliminated config drift

Problem: Environments were provisioned manually, leading to configuration drift, slow onboarding, and unreproducible infrastructure states.

Action: Led full transition to IaC using Terraform and Ansible across AWS and Azure. Standardised all provisioning and configuration workflows with version-controlled, peer-reviewed infrastructure changes.

Outcome: 40% faster environment setup. Drift eliminated. All infrastructure changes became auditable and repeatable.

Terraform Ansible Puppet AWS Azure

Discuss →

Mindset

How I Work in Production

Principles I follow when designing and operating production systems.

Infrastructure as Code before any manual operation. Every change is version-controlled, reviewed, and repeatable.

Automate repetitive operational tasks to reduce human error and free engineers to focus on higher-value work.

Observability and alerting go in before scaling. You can't operate what you can't see.

Prefer small, reversible deployments over large risky releases. Blue-green and canary strategies exist for a reason.

Document infrastructure decisions so teams can operate systems independently — not just the person who built them.

Keep production boring. If deployments are exciting, something in the process has already gone wrong.

Gaurav Kaushal

Engineering reliability,not just infrastructure.

Stack & Toolchain

Selected Infrastructure Work

How I Work in Production

Let's talk infrastructure.

Engineering reliability,
not just infrastructure.