Enable TLS for my upstream gRPC cluster in Envoy

464 views Asked by At

I have a web-grpc frontend application that communicates with my gRPC backend. As browsers do not "speak" gRPC, I need an envoy proxy to transform http requests to actual grpc and back. Locally, my setup works well, and it looks like:

Browser web-grpc -----> Envoy proxy -------> gRPC backend

I can even dockerize every component and launch them locally and they run fine.

I deployed every containerized component (3 in total) to Google Cloud Run instances. Cloud Run handles SSL/TLS by default, wrapping it around the provided container. So, I can execute calls to the frontend using https and the envoy proxy using https. And I actually can execute grpc calls to the backend gRPC service using Postman, or coding a gRPC client myself as long as I use SSL.

What I cannot do is enable the Envoy proxy to use TLS when initiating connections to the gRPC backend.

The Cloud Run host is my-grpc-server.a.run.app, given by Cloud Run.

Proof that the backend is up and running correctly from Postman. Note the lock icon at the left of the host, signaling the use of SSL/TLS, and the protocol grpc: enter image description here

And using this Golang gRPC code I can call the service correctly too:

    host = "my-grpc-server.a.run.app"
    port = "443"

    address := fmt.Sprintf("%s:%s", host, port)

    var opts []grpc.DialOption

    opts = append(opts, grpc.WithAuthority(host))
    systemRoots, err := x509.SystemCertPool()
    if err != nil {
        log.Fatalf("Failed to read system root CA certificates: %v", err)
    }
    cred := credentials.NewTLS(&tls.Config{
        RootCAs: systemRoots,
    })
    opts = append(opts, grpc.WithTransportCredentials(cred))

    conn, err := grpc.Dial(address, opts...)
    if err != nil {
        log.Fatalf("Failed to connect to %s:%s: %v", host, port, err)
    }
    defer conn.Close()

    client := // Build the client
    client.Check // Execute the health check correctly.

I have tried setting up SSL/TLS in my Envoy proxy without success. The non-SSL/TLS configuration is the Envoy gRPC vanilla configuration from the docs with additional CORS configuration, and it looks like:

admin:
  address:
    socket_address: { address: 127.0.0.1, port_value: 9901 }

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 127.0.0.1, port_value: 8080 }
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          codec_type: auto
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              typed_per_filter_config:
                envoy.filters.http.cors:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.CorsPolicy
                  allow_origin_string_match:
                  - safe_regex:
                      regex: \*
                  allow_methods: "GET,POST,PUT,PATCH,DELETE,OPTIONS"
                  allow_headers: "DNT,User-Agent,X-User-Agent,X-Grpc-Web,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,Access-Control-Allow-Origin"
                  allow_credentials: true
                  expose_headers: grpc-status,grpc-message
                  max_age: "1728000"
              routes:
              - match: { prefix: "/" }
                route: { cluster: my_grpc_service }
          http_filters:
          - name: envoy.filters.http.cors
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
          - name: envoy.grpc_web
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
  - name: my_grpc_service
    connect_timeout: 3.0s
    type: STATIC
    http2_protocol_options: {}
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: my_grpc_service
      endpoints:
        - lb_endpoints:
          - endpoint:
              address:
                socket_address:
                  address: 127.0.0.1 // Localhost when using it locally
                  port_value: 50051 // The port needed locally
    health_checks:
      timeout: 1s
      interval: 10s
      unhealthy_threshold: 2
      healthy_threshold: 2
      grpc_health_check: {}

I have tried adding the next block at the same indentation level of load_assignment, and pointing to the right host and port:

  // ...
                socket_address:
                  address: my-grpc-server.a.run.app // The Cloud Run host
                  port_value: 443 // The default SSL port
  // ...
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext

But now Envoy seems to stop trying to use gRPC and instead forward the http2 requests from the frontend as they are received, because in the browser I can see a Google-crafted error message from Cloud Run, in the Grpc-Message response header, saying (among other things):

That's an error. The requested URL <code>/some.path.Health/Check</code> was not found on this server. That's all we know.

I have also tried adding my trusted CA file, but the error is the same as the previous one:

    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        common_tls_context: 
          validation_context:
            match_subject_alt_names:
            - exact: "my-grpc-server.a.run.app"
            trusted_ca:
              filename: /path/to/cert.pem

Using sni did not solve the issue either.

Using FileAccessLog I can see the gRPC Status is:

12, UNIMPLEMENTED
# When successfuly running locally without TLS, the status is:
2, UNKNOWN

Other resources on the web are confusing because they seem to setup SSL for the Envoy listeners (SSL termination), not the clusters (SSL creation).

Can someone point me in the right direction?

For some additional context, I do not need mutual authentication, and as shown, the frontend and backend code are most probably correct. The issue seems contained within Envoy configuration.

The tools I am using are:

grpc-web 1.4.2 (npm)
envoy  version: 7bba38b743bb3bca22dffb4a21c38ccc155fbef8/1.27.0/Distribution/RELEASE/BoringSSL
GCloud Run
1

There are 1 answers

0
EmmanuelB On BEST ANSWER

Setting auto_host_rewrite: true at the level of route solved the issue:

              - match: { prefix: "/" }  
                route:  
                  cluster: my_grpc_service  
                  auto_host_rewrite: true

Thanks a lot Josef Gattermayer and your post that contained a fully working GCloud Run envoy proxy: https://www.ackee.agency/blog/how-to-setup-a-grpc-web-backend-on-google-cloud-run-with-envoy-proxy

By the time I am writting this answer in November 2023, it still works.