Open to remote · Enterprise & Scale-up roles

Gaurav Kaushal

DevOps / Platform Engineer / AWS · Kubernetes · CI/CD

I design and operate reliable infrastructure — from CI/CD pipelines to production environments — helping teams ship faster without breaking systems.

Currently working on large-scale on-prem infrastructure and CI/CD automation supporting enterprise production environments at Optum / UHG.

Jaipur, India Remote-friendly AWS Certified 8+ yrs experience
Experience
Senior DevOps Engineer
Artech → Optum / UHG
May 2025 – Present
DevOps Engineer
Nlineaxis IT Solutions
Dec 2018 – Mar 2025
Software Developer
Locus R.A.G.S.
Jan 2018 – Dec 2018

Engineering reliability,
not just infrastructure.

I'm a DevOps / Platform Engineer with 8 years in IT, focusing on automation, CI/CD, and infrastructure reliability. My background started in traditional operations, which shaped how I approach production systems today — with a bias toward stability over speed.

I believe infrastructure should be predictable and boring. The goal of good DevOps is not exciting deployments — it's making sure nothing breaks when they happen. If your on-call is busy, the platform hasn't done its job yet.

Currently working with large on-prem and hybrid cloud environments, supporting CI/CD pipelines and infrastructure automation in regulated production systems at Optum / UHG.

Outside of work I enjoy exploring new infrastructure tooling, writing about DevOps on gauravkaushal.tech, and experimenting with automation that removes toil.

Current Role
Senior DevOps Engineer
Current Client
Optum / UHG (via Artech)
Location
Jaipur, Rajasthan · India
Availability
Open to remote roles globally
Certifications
AWS Certified Solutions Architect – Associate
AWS Certified Cloud Practitioner (CLF-C02)
HashiCorp Terraform Associate In progress
Certified Kubernetes Administrator (CKA) In progress

Stack & Toolchain

AWS is my primary cloud. Everything listed reflects hands-on production experience.

Infrastructure
Linux (Ubuntu, RHEL) Terraform Ansible Puppet AWS CloudFormation
Containers
Docker Kubernetes / EKS Helm AKS
CI/CD
Jenkins GitHub Actions GitLab CI ArgoCD Argo Rollouts
Monitoring
Prometheus Grafana ELK / EFK Stack CloudWatch
Cloud
AWS (primary) Azure EC2 · EKS · ECS · S3 Lambda · Route 53
Security & Scripting
Vault IAM / least-privilege SonarQube · Nexus IQ Python Bash

Selected Infrastructure Work

Representative work from production environments. Outcomes are real — not estimates.

01
CI/CD Pipeline Standardisation
40%+ reduction in production defects
Problem: Release pipelines were inconsistent across teams, with no enforced quality gates and frequent production defects from unvalidated deployments.

Action: Designed an organisation-wide GitHub Actions and ArgoCD governance framework with gated DEV → QA → UAT → PROD promotion flows, SonarQube SAST, and Nexus IQ dependency scanning built into every pipeline.

Outcome: Production defects reduced by 40%+. Release process became auditable and consistent across all regulated environments.
GitHub Actions ArgoCD SonarQube Nexus IQ Argo Rollouts
Discuss →
02
Hybrid Cloud Infrastructure & Cost Governance
20% reduction in monthly cloud spend · 99.9% uptime
Problem: No unified networking between 100+ on-premises servers and cloud resources, with growing cloud spend and no visibility into waste.

Action: Built a secure hybrid environment using Terraform and AWS Transit Gateways. Implemented automated S3 lifecycle policies, EC2 rightsizing, and Lambda-driven cost automation. Added cross-region snapshot management for DR.

Outcome: 20% reduction in monthly cloud spend. Unified hybrid environment with 99.9% uptime and automated disaster recovery.
Terraform AWS Transit Gateway Python / Lambda Ansible
Discuss →
03
Kubernetes Platform & Observability Stack
45% faster incident response · Legacy workloads containerised
Problem: Legacy monolithic workloads with no horizontal scalability, and limited visibility into system health — reactive incident response only.

Action: Migrated services to AWS EKS using Helm charts. Built a full observability stack with Prometheus, Grafana, and EFK — with proactive alerting configured before any workloads went live.

Outcome: 45% reduction in incident response time. Teams moved from reactive to proactive operations.
Kubernetes / EKS Helm Prometheus Grafana EFK Stack
Discuss →
04
Infrastructure as Code Migration
40% faster environment setup · Eliminated config drift
Problem: Environments were provisioned manually, leading to configuration drift, slow onboarding, and unreproducible infrastructure states.

Action: Led full transition to IaC using Terraform and Ansible across AWS and Azure. Standardised all provisioning and configuration workflows with version-controlled, peer-reviewed infrastructure changes.

Outcome: 40% faster environment setup. Drift eliminated. All infrastructure changes became auditable and repeatable.
Terraform Ansible Puppet AWS Azure
Discuss →

How I Work in Production

Principles I follow when designing and operating production systems.

01
Infrastructure as Code before any manual operation. Every change is version-controlled, reviewed, and repeatable.
02
Automate repetitive operational tasks to reduce human error and free engineers to focus on higher-value work.
03
Observability and alerting go in before scaling. You can't operate what you can't see.
04
Prefer small, reversible deployments over large risky releases. Blue-green and canary strategies exist for a reason.
05
Document infrastructure decisions so teams can operate systems independently — not just the person who built them.
06
Keep production boring. If deployments are exciting, something in the process has already gone wrong.

Let's talk infrastructure.

Open to senior DevOps and platform engineering roles — remote, enterprise, or scale-up. If you're building infrastructure or scaling platforms, I'm happy to discuss how I can contribute.

Typically respond within 24 hours.