Francisco Herrera Profile Picture

Francisco Herrera

Senior DevOps Engineer | Cloud Architect | Site Reliability Engineer (SRE)

San Salvador, El Salvador (Remote-friendly) LinkedIn

Professional Summary

Dynamic and results-driven Senior DevOps, Cloud Architect, and Site Reliability Engineer (SRE) with over 8 years of progressive experience designing, building, and optimizing cloud-native infrastructure for highly scalable enterprise and startup environments. Recognized as a technical leader who bridges the gap between engineering execution and strategic business objectives. Expert in AWS, GCP, and Kubernetes ecosystems, with advanced proficiency in Infrastructure as Code (IaC) and modern AI systems orchestration. Proven track record of spearheading large-scale cloud migrations, driving aggressive cost-optimization initiatives, and establishing rigorous GitOps workflows. Adept at managing high-performance inference engines, enforcing strict security postures, and delivering highly available, fault-tolerant solutions while mentoring cross-functional engineering teams.

Technical Skills Dashboard

Filter:

Cloud & Architecture

AWS (EC2, ASG, Lambda, Fargate) AWS S3 & RDS AWS VPC & IAM GCP Ecosystem Azure Services Multi-Cloud Strategy

Containers & Orchestration

Kubernetes (EKS, AKS, GKE) Docker Red Hat OpenShift AWS ECS & ECR Container Security & Hardening

CI/CD & GitOps

ArgoCD (GitOps core) Jenkins Pipelines GitHub Actions Apache Airflow Immutable Tagging (Git SHAs)

Infrastructure as Code

Terraform Terraform Cloud Pulumi Packer AWS CloudFormation

Observability & Monitoring

Grafana Prometheus Alert Manager Datadog CloudWatch Loki & ELK Stack DataHub

AI Infrastructure & Data

AI Agent Orchestration Model Deployment LightLLM Inference AWS Bedrock Integration Snowflake DW AWS Glue ETL LLMOps

Automation & Scripting

Python Development Bash Scripting Go (Golang)

Professional Experience

Senior DevOps & Infrastructure Architect

Analytics Services – delivering cutting‑edge data analytics solutions

Jan 2026 – Present
  • Architect and manage the core cloud-native infrastructure for the FireFoundry AI platform, authoring modular Helm charts (firefoundry-core, firefoundry-control-plane) automatically published to a private Helm repository via GitHub Actions pipelines.
  • Engineered declarative GitOps delivery pipelines using ArgoCD, establishing self-healing, automated pruning, and sync waves for deploying core microservices (broker, context-service, secure code sandbox) to Kubernetes (AKS) with zero configuration drift.
  • Designed and deployed a multi-cloud OpenTelemetry (OTel) Collector gateway pattern, routing OTLP traces, metrics, and logs from workloads to Azure Monitor and Google Cloud (Cloud Monitoring/Logging) simultaneously.
  • Scaled hosting and deployment infrastructure for the Model Context Protocol (MCP) Gateway service, enabling autonomous AI agents (leveraging Anthropic specs and Azure OpenAI models) to securely access vector search, RAG, and document processing tools.
  • Executed a comprehensive FinOps Datadog audit encompassing an inventory of 737 hosts, successfully identifying and implementing significant annual cost savings.
  • Hardened Kubernetes security posture by provisioning custom CA certificate authorities via cert-manager and mkcert, setting up secure ACR image pull secrets, and integrating automated registry vulnerability scanning.
View Architecture Diagram
graph TD
  subgraph AKS ["Kubernetes AKS Cluster (FireFoundry)"]
    Workload["App Microservices (Broker, Sandbox, Context)"]
    McpGateway["MCP Gateway Service (Node/TS)"]
    OtelCol["OpenTelemetry Collector Gateway"]
    Workload -->|OTLP/HTTP traces, metrics, logs| OtelCol
    Workload -->|SSE / REST| McpGateway
  end
  subgraph CoreServices ["AI Core Services"]
    EntitySvc["Entity Service (Graph DB)"]
    ContextSvc["Context Service (gRPC / RAG)"]
    CodeSandbox["Code Sandbox (Secure Runtime)"]
    McpGateway -->|gRPC / Vector Search| ContextSvc
    McpGateway -->|Graph API| EntitySvc
    McpGateway -->|REST API| CodeSandbox
  end
  subgraph CloudObservability ["Multi-Cloud Telemetry"]
    AzureMon["Azure Monitor Application Insights"]
    GcpLogging["Google Cloud Monitoring & Trace"]
    OtelCol -->|AzureMonitor Exporter| AzureMon
    OtelCol -->|GoogleCloud Exporter| GcpLogging
  end
  subgraph GitOps ["GitOps & CI/CD Control Plane"]
    ArgoCD["ArgoCD Controller"]
    GitRepo["GitHub Repo (ff_infra)"]
    ACR["Azure Container Registry (ACR)"]
    ArgoCD -->|Pulls Charts & Configs| GitRepo
    ArgoCD -->|Syncs Deployments| AKS
    AKS -->|Pulls Secure Images| ACR
  end
                  

Senior DevOps Engineer

Advertising Platform – enabling advanced programmatic advertising solutions

2025 – 2026
  • Architected and managed multi-tenant cloud infrastructure for high-scale advertising platforms utilizing AWS ECS containers and Kubernetes clusters, ensuring high throughput and low latency.
  • Implemented massive data warehousing and analytics solutions leveraging Snowflake, facilitating real-time advertising metrics, BI reporting, and predictive analytics.
  • Integrated AWS Bedrock to provision AI/ML-powered optimization engines, enhancing the accuracy and delivery speed of targeted recommendation systems.
  • Governed infrastructure states utilizing Terraform Cloud, ensuring consistent, repeatable, and secure deployments across distinct staging and production environments.
  • Engineered complex CI/CD pipelines using Jenkins to automate testing, deployment, and seamless rollback processes for diverse, multi-language microservices written in PHP, Python, and Java.
  • Orchestrated advanced ETL data workflows via Apache Airflow and AWS Glue Jobs, automating pipeline transformations for critical, large-scale platform data.
View Architecture Diagram
graph TD
  User["Ad Client User"] -->|Sends Request| ALB["AWS Application Load Balancer"]
  subgraph ECSCluster ["AWS ECS Fargate Cluster"]
    PHPEngine["PHP Ad Web Engine"]
    JavaEngine["Java Payment Service"]
    PyRecs["Python Recommendation API"]
    ALB -->|Routes Web Traffic| PHPEngine
    ALB -->|Routes Payment API| JavaEngine
    PHPEngine -->|Queries Scores| PyRecs
  end
  subgraph AI ["AI Personalization"]
    Bedrock["AWS Bedrock API"]
    PyRecs -->|Live Request Context| Bedrock
  end
  subgraph DataLake ["Analytics Data Lake"]
    S3Raw["AWS S3 (Raw JSON Event Logs)"]
    Glue["AWS Glue ETL Job"]
    Snowflake["Snowflake Data Warehouse"]
    Airflow["Apache Airflow"]
    PHPEngine -->|Streams Raw Events| S3Raw
    Airflow -->|Triggers Execution| Glue
    Glue -->|Extracts & Loads| S3Raw
    Glue -->|Populates Tables| Snowflake
  end
                  

Senior DevOps Engineer

Providing strategic advisory services for enterprise infrastructure projects

2020 – 2024
  • Led infrastructure architecture end-to-end for multiple enterprise clients, driving projects from initial requirements gathering and capacity planning through to secure production delivery.
  • Directed massive on-premise to AWS cloud migrations. Successfully achieved a 40% reduction in overall operating costs post-migration by implementing rigorous optimization algorithms and automation strategies.
  • Architected high-throughput, event-driven data pipelines and real-time integrations using AWS Kinesis, Lambda functions, and Kinesis Data Firehose.
  • Developed a comprehensive, centralized library of reusable Terraform modules. Standardized infrastructure provisioning across the organization, drastically reducing deployment times and human configuration errors.
  • Deployed sophisticated, end-to-end observability stacks combining Grafana, Prometheus, DataHub, and Great Expectations to guarantee data integrity, pipeline health, and system reliability.
  • Provided technical mentorship to engineering teams and established stringent infrastructure, security, and GitOps standards successfully adopted across 15+ concurrent projects.
View Architecture Diagram
graph TD
  Client["Client On-Premises Servers"] -->|AWS Direct Connect| Kinesis["AWS Kinesis Data Streams"]
  subgraph ServerlessProcessing ["Serverless Stream Processing"]
    Lambda["AWS Lambda (Parser & Enricher)"]
    Firehose["AWS Kinesis Data Firehose"]
    Kinesis -->|Triggers Execution| Lambda
    Lambda -->|Forwards Clean Events| Firehose
  end
  subgraph TargetStorage ["Staging & Data Lakes"]
    RDS["AWS RDS PostgreSQL"]
    S3Lake["AWS S3 Data Lake (Parquet)"]
    Lambda -->|Writes Metadata| RDS
    Firehose -->|Batches & Compresses| S3Lake
  end
  subgraph Observability ["Telemetry & Data Quality"]
    Prometheus["Prometheus Server"]
    Grafana["Grafana Dashboards"]
    GE["Great Expectations (Schema Validation)"]
    Lambda -->|Exposes Metrics| Prometheus
    Prometheus --> Grafana
    S3Lake -->|Triggers Quality Check| GE
  end
                  

DevOps Engineer & SRE

Cloud Services Provider • Case Study & Diagrams →

2018 – 2020
  • Orchestrated the complex migration of 50+ critical business services from legacy on-premise infrastructure in Ireland to AWS Europe, achieving zero downtime for end-users.
  • Eliminated manual overhead by automating infrastructure provisioning and recurring administrative tasks using Terraform, Packer, and custom Bash, Python, and Go scripts.
  • Designed and rolled out extensive proactive monitoring, robust identity security frameworks, and FinOps cost-optimization strategies across diverse multi-service environments.
  • Administered highly available Linux and Windows server fleets. Refined backup strategies, RPO/RTO metrics, and comprehensive disaster recovery protocols to ensure business continuity.
View Architecture Diagram
graph TD
  User["End User Clients"] -->|DNS Request| Route53["AWS Route 53 DNS Resolver"]
  subgraph IrelandDC ["Legacy Ireland On-Premises Datacenter"]
    OldApp["Legacy Monolith App Servers"]
    OldDB["Primary SQL Server (Source)"]
    OldApp --> OldDB
  end
  subgraph AWSEurope ["AWS Europe Dublin Cloud"]
    NewApp["Auto Scaling EC2 Fleet (Target)"]
    NewDB["AWS RDS SQL Server (Replica)"]
    NewApp --> NewDB
  end
  OldDB -->|Continuous DB Mirroring & Replication| NewDB
  Route53 -->|Canary Weighted: 90%| OldApp
  Route53 -->|Canary Weighted: 10%| NewApp
                  

IT Support Specialist

IT Solutions Provider • Case Study & Diagrams →

2015 – 2018
  • Delivered proactive technical support, systematic maintenance, and advanced troubleshooting for Windows Server environments and overarching enterprise IT infrastructure across multiple organizational entities.
  • Implemented, audited, and monitored automated backup strategies, maintaining 99.9% availability for key systems and ensuring operational business continuity via rapid incident triage and resolution.
View Architecture Diagram
graph TD
  subgraph ProductionOffice ["Primary Office Site"]
    DC1["Active Directory Domain Controller"]
    FS1["File & Data Server"]
    Syslog["Syslog Server / Alert Portals"]
    DC1 --> Syslog
    FS1 --> Syslog
  end
  subgraph BackupCenter ["Veeam Backup Infrastructure"]
    Veeam["Veeam Backup & Replication Suite"]
    NasLocal["Local NAS Node (Daily Backups)"]
    FS1 -->|VSS Snapshot Sync| Veeam
    DC1 -->|System State Backup| Veeam
    Veeam -->|Writes Daily Archives| NasLocal
  end
  subgraph OffsiteDR ["Offsite Recovery Location"]
    NasOffsite["Offsite NAS Storage (Weekly Replica)"]
    Glacier["Cloud Archival Storage (Cold Backup)"]
    NasLocal -->|WAN Replication Sync| NasOffsite
    NasLocal -->|Glacier Copy Job| Glacier
  end
                  

Key Professional Projects

Enterprise Observability & FinOps Optimization (2026)

Conducted a massive infrastructure audit utilizing Datadog, mapping an inventory of exactly 737 hosts to execute critical architectural changes that drastically reduced annual cloud spend.

Multi‑Tenant Recruiting & Financial Platform (2025–2026)

Architected and deployed a highly scalable SaaS application on Microsoft Azure, orchestrating microservices via Azure Kubernetes Service (AKS) with strict multi‑client data isolation. Integrated GPT 5.4 mini for intelligent automation (resume parsing, candidate matching, financial data analysis). Built an observability stack with Prometheus and Grafana, configuring sophisticated alerts for SLA compliance.

AI Infrastructure & High‑Performance Inference (2025–2026)

Integrated AWS Bedrock to support AI/ML optimization models and deployed LightLLM as the primary high‑performance inference engine for internal machine‑learning workloads, ensuring low latency and high throughput.

Healthcare Web Architecture Deployment (2025)

Developed and deployed an end‑to‑end infrastructure solution for a Texas‑based diagnostics provider, using Pulumi and Terraform to programmatically define cloud resources.

Multi‑Cloud Kubernetes Platform & GitOps Delivery (2024–2025)

Architected resilient microservices environments spanning AWS EKS and Azure AKS, utilizing ArgoCD for GitOps‑based continuous delivery and enforcing immutable image tags via Git SHAs.

GCP Data & Event‑Driven Ecosystem (2023–2024)

Designed a highly available data architecture on GCP using GKE for containerized microservices, BigQuery for large‑scale analytics, and Cloud Pub/Sub for real‑time processing pipelines.

Azure to AWS Infrastructure Migration (2022–2023)

Replicated and modernized legacy Azure infrastructure natively into AWS, improving CI/CD workflows and security. Migrated pipelines from Jenkins to GitHub Actions, preserving deployment history and reducing build times.

Serverless & Containers Integration (2020–2022)

Delivered scalable AWS Amplify and ECS Fargate solutions powered by AWS Lambda, with automated CI/CD via Bitbucket Pipelines, secure authentication, and end‑to‑end observability.

On‑Premise to AWS Migration (2018–2020)

Successfully migrated 50+ core services from on‑premise environments into AWS using Terraform, Packer, and EKS, establishing automated EC2 deployments and CI/CD pipelines, improving reliability and reducing operational overhead.