Deploying Elixir on AWS: Fly.io vs ECS Fargate

A practical comparison of deploying Elixir applications: Fly.io’s simplicity vs AWS ECS Fargate’s control. Real-world trade-offs, costs, and code examples to help you choose the right platform.

The Elixir Deployment Dilemma

Elixir applications are beautiful to write but can be tricky to deploy. The BEAM VM’s distributed nature, hot code reloading, and clustering capabilities make it different from typical stateless web apps. You need a deployment platform that understands these requirements - or you need to build that understanding yourself.

I’ve deployed production Elixir apps on both Fly.io and AWS ECS Fargate. Here’s what I learned.

Why This Comparison Matters

Fly.io has become the darling of the Elixir community. It’s built with Elixir in mind, offers global edge deployment, and promises “just work” clustering.

AWS ECS Fargate is the enterprise choice - battle-tested, deeply integrated with AWS services, and offering fine-grained control over everything.

The question isn’t which is “better” - it’s which fits your needs, team, and constraints.

Fly.io: The Developer Experience Champion

What Makes Fly.io Special

Fly.io was designed for applications like Elixir that benefit from being close to users and running in multiple regions. Their infrastructure is built on Firecracker VMs, giving you near-bare-metal performance with container convenience.

Key Features:

  • Automatic WireGuard mesh networking between instances
  • Built-in support for distributed Elixir clustering
  • Global Anycast routing
  • Persistent volumes that follow your app
  • Zero-config SSL certificates

Setting Up an Elixir App on Fly.io

Install the Fly CLI:

curl -L https://fly.io/install.sh | sh

Initialize your Elixir app:

fly launch

That’s it. Seriously. Fly detects your Elixir app, generates a fly.toml, creates a Dockerfile, and deploys.

Here’s what a production-ready fly.toml looks like:

app = "my-elixir-app"
primary_region = "iad"

[build]

[deploy]
  release_command = "/app/bin/migrate"

[env]
  PHX_HOST = "my-elixir-app.fly.dev"
  PORT = "8080"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = false
  auto_start_machines = false
  min_machines_running = 2

  [http_service.concurrency]
    type = "connections"
    hard_limit = 1000
    soft_limit = 800

[[vm]]
  memory = "1gb"
  cpu_kind = "shared"
  cpus = 1

[metrics]
  port = 9091
  path = "/metrics"

Clustering on Fly.io

Fly.io makes clustering trivial with libcluster. Add to your mix.exs:

{:libcluster, "~> 3.3"}

Configure in config/runtime.exs:

config :libcluster,
  topologies: [
    fly6pn: [
      strategy: Cluster.Strategy.DNSPoll,
      config: [
        polling_interval: 5_000,
        query: "#{System.get_env("FLY_APP_NAME")}.internal",
        node_basename: System.get_env("FLY_APP_NAME")
      ]
    ]
  ]

Add to your application supervision tree:

defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    children = [
      {Cluster.Supervisor, [Application.get_env(:libcluster, :topologies), [name: MyApp.ClusterSupervisor]]},
      MyApp.Repo,
      MyAppWeb.Endpoint
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

Deploy to multiple regions:

fly scale count 2 --region iad
fly scale count 2 --region fra
fly scale count 2 --region syd

Your Elixir nodes automatically discover and connect to each other. Phoenix PubSub works across regions. It’s magical.

The Fly.io Reality Check

Pros:

  • Deployment in under 5 minutes from zero to production
  • Clustering “just works” without VPC peering or service discovery
  • Global deployment is trivial
  • Excellent for Phoenix LiveView (low latency matters)
  • Built-in metrics and logging
  • Generous free tier for experimentation

Cons:

  • Less control over infrastructure
  • Limited AWS service integration (no VPC peering, no PrivateLink)
  • Smaller ecosystem than AWS
  • Pricing can get expensive at scale (memory is pricey)
  • No managed database options (you run Postgres yourself or use external)
  • Support is community-first (great community, but not enterprise SLAs)

Cost Example:

  • 2x shared-cpu-1x, 1GB RAM: ~$15/month
  • 4x shared-cpu-1x, 1GB RAM across 2 regions: ~$30/month
  • 4x dedicated-cpu-2x, 4GB RAM: ~$280/month

AWS ECS Fargate: The Enterprise Choice

Why ECS Fargate?

Fargate is AWS’s serverless container platform. You define tasks, Fargate runs them. No EC2 instances to manage, automatic scaling, deep AWS integration.

Key Features:

  • Native integration with ALB, NLB, CloudMap, Secrets Manager, Parameter Store
  • VPC networking with security groups
  • IAM roles for tasks
  • CloudWatch integration
  • Spot pricing for cost savings
  • ECS Exec for debugging

Setting Up Elixir on ECS Fargate

This requires more work. You need:

  1. A container image
  2. Task definition
  3. Service definition
  4. Load balancer
  5. Service discovery (for clustering)

1. Dockerfile for Production

FROM hexpm/elixir:1.16.0-erlang-26.2.1-debian-bookworm-20231009-slim as builder

WORKDIR /app

RUN apt-get update && apt-get install -y build-essential git nodejs npm

ENV MIX_ENV=prod

COPY mix.exs mix.lock ./
RUN mix local.hex --force && mix local.rebar --force
RUN mix deps.get --only prod
RUN mix deps.compile

COPY config config
COPY lib lib
COPY assets assets
COPY priv priv

RUN mix assets.deploy
RUN mix compile
RUN mix release

FROM debian:bookworm-slim

RUN apt-get update && apt-get install -y openssl libncurses5 locales curl && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY --from=builder /app/_build/prod/rel/my_app ./

ENV LANG=C.UTF-8

CMD ["/app/bin/my_app", "start"]

Build and push to ECR:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com
docker build -t my-app .
docker tag my-app:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/my-app:latest

2. ECS Task Definition (Terraform)

resource "aws_ecs_task_definition" "app" {
  family                   = "my-app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "512"
  memory                   = "1024"
  execution_role_arn       = aws_iam_role.ecs_execution.arn
  task_role_arn            = aws_iam_role.ecs_task.arn

  container_definitions = jsonencode([{
    name  = "app"
    image = "123456789.dkr.ecr.us-east-1.amazonaws.com/my-app:latest"
    
    portMappings = [{
      containerPort = 4000
      protocol      = "tcp"
    }]

    environment = [
      { name = "PHX_HOST", value = "myapp.example.com" },
      { name = "PORT", value = "4000" },
      { name = "RELEASE_NODE", value = "my_app@${FARGATE_PRIVATE_IP}" }
    ]

    secrets = [
      { name = "DATABASE_URL", valueFrom = aws_secretsmanager_secret.db_url.arn },
      { name = "SECRET_KEY_BASE", valueFrom = aws_secretsmanager_secret.secret_key.arn },
      { name = "RELEASE_COOKIE", valueFrom = aws_secretsmanager_secret.erlang_cookie.arn }
    ]

    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = "/ecs/my-app"
        "awslogs-region"        = "us-east-1"
        "awslogs-stream-prefix" = "ecs"
      }
    }

    healthCheck = {
      command     = ["CMD-SHELL", "curl -f http://localhost:4000/health || exit 1"]
      interval    = 30
      timeout     = 5
      retries     = 3
      startPeriod = 60
    }
  }])
}

3. ECS Service with Service Discovery

resource "aws_service_discovery_private_dns_namespace" "app" {
  name = "myapp.local"
  vpc  = aws_vpc.main.id
}

resource "aws_service_discovery_service" "app" {
  name = "app"

  dns_config {
    namespace_id = aws_service_discovery_private_dns_namespace.app.id
    
    dns_records {
      ttl  = 10
      type = "A"
    }
  }

  health_check_custom_config {
    failure_threshold = 1
  }
}

resource "aws_ecs_service" "app" {
  name            = "my-app"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.app.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = "app"
    container_port   = 4000
  }

  service_registries {
    registry_arn = aws_service_discovery_service.app.arn
  }

  depends_on = [aws_lb_listener.app]
}

4. Clustering on ECS Fargate

This is where it gets complex. You need libcluster with the AWS strategy:

config :libcluster,
  topologies: [
    ecs: [
      strategy: ClusterEcs.Strategy,
      config: [
        cluster: "my-cluster",
        service_name: "my-app",
        region: "us-east-1",
        app_prefix: "my_app",
        polling_interval: 10_000
      ]
    ]
  ]

Add {:libcluster_ecs, "~> 0.6"} to your deps.

Your ECS task role needs permissions:

resource "aws_iam_role_policy" "ecs_task_discovery" {
  role = aws_iam_role.ecs_task.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "ecs:ListTasks",
        "ecs:DescribeTasks",
        "ec2:DescribeNetworkInterfaces"
      ]
      Resource = "*"
    }]
  })
}

Alternatively, use AWS Cloud Map (Service Discovery) with DNS polling - simpler but slightly higher latency.

The ECS Fargate Reality Check

Pros:

  • Complete control over infrastructure
  • Deep AWS integration (RDS, ElastiCache, S3, etc.)
  • VPC networking and security groups
  • IAM-based security
  • CloudWatch metrics and logs
  • Spot pricing can reduce costs by 70%
  • Enterprise support available
  • Scales to massive workloads

Cons:

  • Complex setup (infrastructure as code is mandatory)
  • Clustering requires extra work
  • No built-in global deployment (you build multi-region yourself)
  • Slower iteration (build, push, deploy cycle)
  • More moving parts to maintain
  • Cold starts on scale-to-zero (though you typically don’t do this)

Cost Example:

  • 2x Fargate tasks (0.5 vCPU, 1GB): ~$25/month
  • 4x Fargate tasks (0.5 vCPU, 1GB): ~$50/month
  • 4x Fargate Spot tasks (1 vCPU, 2GB): ~$45/month
  • Plus ALB ($16/month), NAT Gateway ($32/month), data transfer

The Real-World Decision Matrix

Choose Fly.io When:

  • You’re a small team or solo developer
  • You want to ship fast and iterate quickly
  • Global edge deployment is important (Phoenix LiveView apps)
  • You don’t need deep AWS service integration
  • Your app is relatively stateless or uses external databases
  • You value simplicity over control
  • Budget is under $500/month

Perfect For:

  • SaaS MVPs
  • Phoenix LiveView applications
  • Side projects that might scale
  • Apps serving global users
  • Teams without DevOps expertise

Choose ECS Fargate When:

  • You’re already on AWS
  • You need VPC integration with RDS, ElastiCache, etc.
  • Compliance requires specific networking/security controls
  • You have DevOps resources
  • You need enterprise support and SLAs
  • Cost optimization matters at scale (Spot instances)
  • You’re building multi-tenant systems with isolation requirements

Perfect For:

  • Enterprise applications
  • Apps with complex AWS integrations
  • Regulated industries (healthcare, finance)
  • High-traffic applications (>1M requests/day)
  • Teams with existing AWS expertise

Hybrid Approach: Best of Both Worlds?

Some teams use both:

  • Fly.io for staging/preview environments: Fast iteration, cheap, disposable
  • ECS Fargate for production: Control, security, AWS integration

Or:

  • Fly.io for the Phoenix app: Global edge, low latency
  • AWS for data layer: RDS, ElastiCache, S3 via public endpoints

This works but adds operational complexity. Only do this if you have clear reasons.

Migration Path

Started on Fly.io and need to move to AWS? The path is straightforward:

  1. Containerize properly (you already have a Dockerfile)
  2. Set up ECR and push your image
  3. Create ECS task definition
  4. Configure service discovery for clustering
  5. Set up ALB and target groups
  6. Migrate environment variables to Secrets Manager
  7. Update DNS
  8. Test clustering thoroughly

Budget 2-4 weeks for a proper migration with testing.

My Recommendation

Start with Fly.io unless you have specific reasons not to. The developer experience is unmatched, and you can always migrate later if needed.

Move to ECS Fargate when:

  • You’re spending >$500/month on Fly.io
  • You need AWS service integration
  • Compliance requires it
  • You have DevOps resources

The best platform is the one that lets you focus on building your product, not managing infrastructure.

Conclusion

Fly.io and ECS Fargate represent different philosophies: simplicity vs control. Both can run production Elixir apps successfully.

Fly.io wins on developer experience and time-to-production. ECS Fargate wins on control, integration, and cost at scale.

Choose based on your team, requirements, and constraints - not hype or trends.

And remember: the BEAM is portable. Your Elixir app will run beautifully on either platform. That’s the real win.