Go Error Handling Patterns for Microservices Architecture

Go’s simplicity and performance make it an excellent choice for microservices architecture, but handling errors across distributed systems presents unique challenges. Unlike monolithic applications where errors can be handled in a single context, microservices require sophisticated error handling patterns that maintain system resilience while providing meaningful debugging information.

In this post, we’ll explore advanced error handling patterns specifically designed for Go microservices, including error wrapping strategies, context propagation techniques, and distributed tracing integration. These patterns will help you build more robust and maintainable distributed systems.

The Challenge of Error Handling in Microservices

When building microservices with Go, errors don’t just occur within a single service boundary. They propagate across network calls, traverse multiple services, and need to be handled at various levels of your architecture. The challenge is maintaining error context while ensuring each service can make appropriate decisions based on error types.

Traditional Go error handling with simple error returns becomes insufficient when you need to:

Trace errors across service boundaries
Maintain error context through multiple network hops
Differentiate between retryable and non-retryable errors
Provide meaningful error responses to clients while hiding internal details

Structured Error Types for Microservices

The foundation of effective microservices error handling starts with structured error types. Rather than using simple string errors, create error types that carry semantic meaning and can be handled appropriately by different services.

package errors

import (
    "fmt"
    "net/http"
)

// ErrorCode represents different types of errors in our system
type ErrorCode string

const (
    ErrorCodeValidation    ErrorCode = "VALIDATION_ERROR"
    ErrorCodeNotFound      ErrorCode = "NOT_FOUND"
    ErrorCodeUnauthorized  ErrorCode = "UNAUTHORIZED"
    ErrorCodeInternal      ErrorCode = "INTERNAL_ERROR"
    ErrorCodeServiceDown   ErrorCode = "SERVICE_UNAVAILABLE"
    ErrorCodeTimeout       ErrorCode = "TIMEOUT"
)

// ServiceError represents errors that can be handled across service boundaries
type ServiceError struct {
    Code       ErrorCode `json:"code"`
    Message    string    `json:"message"`
    Service    string    `json:"service"`
    Operation  string    `json:"operation"`
    Retryable  bool      `json:"retryable"`
    StatusCode int       `json:"status_code"`
    Cause      error     `json:"-"` // Don't serialize the underlying error
}

func (e *ServiceError) Error() string {
    return fmt.Sprintf("[%s:%s] %s: %s", e.Service, e.Operation, e.Code, e.Message)
}

func (e *ServiceError) Unwrap() error {
    return e.Cause
}

// HTTPStatusCode returns the appropriate HTTP status code for this error
func (e *ServiceError) HTTPStatusCode() int {
    if e.StatusCode != 0 {
        return e.StatusCode
    }
    
    switch e.Code {
    case ErrorCodeValidation:
        return http.StatusBadRequest
    case ErrorCodeNotFound:
        return http.StatusNotFound
    case ErrorCodeUnauthorized:
        return http.StatusUnauthorized
    case ErrorCodeServiceDown, ErrorCodeTimeout:
        return http.StatusServiceUnavailable
    default:
        return http.StatusInternalServerError
    }
}

This structured approach provides several benefits. As noted in Mario Carrion’s microservices guide, implementing an error type with state allows you to “define an Error Code that we can use to properly render different responses on our HTTP layer.”

Error Wrapping Patterns

Go’s error wrapping capabilities, introduced in Go 1.13, are particularly powerful in microservices architectures. However, you need to be strategic about what information to preserve and what to abstract away.

package userservice

import (
    "context"
    "fmt"
    "github.com/yourorg/common/errors"
)

type UserService struct {
    db     Database
    authSvc AuthService
}

func (s *UserService) GetUser(ctx context.Context, userID string) (*User, error) {
    // Validate input
    if userID == "" {
        return nil, &errors.ServiceError{
            Code:      errors.ErrorCodeValidation,
            Message:   "user ID cannot be empty",
            Service:   "user-service",
            Operation: "GetUser",
            Retryable: false,
        }
    }
    
    // Check authorization
    if err := s.authSvc.CheckPermission(ctx, userID); err != nil {
        var authErr *errors.ServiceError
        if errors.As(err, &authErr) {
            // Preserve the error code but update the context
            return nil, &errors.ServiceError{
                Code:      authErr.Code,
                Message:   "insufficient permissions to access user",
                Service:   "user-service",
                Operation: "GetUser",
                Retryable: authErr.Retryable,
                Cause:     err,
            }
        }
        
        // Unknown error from auth service
        return nil, &errors.ServiceError{
            Code:      errors.ErrorCodeInternal,
            Message:   "authorization check failed",
            Service:   "user-service",
            Operation: "GetUser",
            Retryable: false,
            Cause:     err,
        }
    }
    
    // Fetch user from database
    user, err := s.db.GetUser(ctx, userID)
    if err != nil {
        return nil, s.wrapDatabaseError(err, "GetUser")
    }
    
    return user, nil
}

func (s *UserService) wrapDatabaseError(err error, operation string) error {
    // Check for specific database errors
    if isNotFoundError(err) {
        return &errors.ServiceError{
            Code:      errors.ErrorCodeNotFound,
            Message:   "user not found",
            Service:   "user-service",
            Operation: operation,
            Retryable: false,
            Cause:     err,
        }
    }
    
    if isTimeoutError(err) {
        return &errors.ServiceError{
            Code:      errors.ErrorCodeTimeout,
            Message:   "database operation timed out",
            Service:   "user-service",
            Operation: operation,
            Retryable: true,
            Cause:     err,
        }
    }
    
    // Generic database error
    return &errors.ServiceError{
        Code:      errors.ErrorCodeInternal,
        Message:   "database operation failed",
        Service:   "user-service",
        Operation: operation,
        Retryable: false,
        Cause:     err,
    }
}

Context Propagation for Error Tracing

Context propagation is crucial for tracing errors across service boundaries. You need to carry trace information, correlation IDs, and other metadata that helps with debugging distributed systems.

package middleware

import (
    "context"
    "net/http"
    "github.com/google/uuid"
)

type contextKey string

const (
    TraceIDKey      contextKey = "trace_id"
    CorrelationIDKey contextKey = "correlation_id"
    ServiceChainKey  contextKey = "service_chain"
)

// TraceMiddleware adds tracing information to requests
func TraceMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        ctx := r.Context()
        
        // Extract or generate trace ID
        traceID := r.Header.Get("X-Trace-ID")
        if traceID == "" {
            traceID = uuid.New().String()
        }
        ctx = context.WithValue(ctx, TraceIDKey, traceID)
        
        // Extract or generate correlation ID
        correlationID := r.Header.Get("X-Correlation-ID")
        if correlationID == "" {
            correlationID = uuid.New().String()
        }
        ctx = context.WithValue(ctx, CorrelationIDKey, correlationID)
        
        // Build service chain
        serviceChain := r.Header.Get("X-Service-Chain")
        if serviceChain != "" {
            serviceChain += " -> "
        }
        serviceChain += "user-service"
        ctx = context.WithValue(ctx, ServiceChainKey, serviceChain)
        
        // Add trace headers to response
        w.Header().Set("X-Trace-ID", traceID)
        w.Header().Set("X-Correlation-ID", correlationID)
        
        next.ServeHTTP(w, r.WithContext(ctx))
    })
}

// Enhanced ServiceError with tracing information
type TracedServiceError struct {
    *errors.ServiceError
    TraceID       string `json:"trace_id"`
    CorrelationID string `json:"correlation_id"`
    ServiceChain  string `json:"service_chain"`
}

func NewTracedError(ctx context.Context, baseErr *errors.ServiceError) *TracedServiceError {
    traceID, _ := ctx.Value(TraceIDKey).(string)
    correlationID, _ := ctx.Value(CorrelationIDKey).(string)
    serviceChain, _ := ctx.Value(ServiceChainKey).(string)
    
    return &TracedServiceError{
        ServiceError:  baseErr,
        TraceID:       traceID,
        CorrelationID: correlationID,
        ServiceChain:  serviceChain,
    }
}

gRPC Error Handling Patterns

When using gRPC for service-to-service communication, you need specific patterns to handle errors across the protocol boundary. The Reddit discussion on gRPC error handling highlights common challenges and solutions.

package grpchandler

import (
    "context"
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
    "google.golang.org/protobuf/types/known/anypb"
    "github.com/yourorg/common/errors"
)

// ErrorDetail represents additional error information
type ErrorDetail struct {
    Code      string `json:"code"`
    Service   string `json:"service"`
    Operation string `json:"operation"`
    Retryable bool   `json:"retryable"`
}

// ToGRPCError converts a ServiceError to a gRPC status error
func ToGRPCError(err error) error {
    var serviceErr *errors.ServiceError
    if !errors.As(err, &serviceErr) {
        // Unknown error type, wrap as internal error
        return status.Error(codes.Internal, "internal server error")
    }
    
    // Map service error codes to gRPC codes
    var grpcCode codes.Code
    switch serviceErr.Code {
    case errors.ErrorCodeValidation:
        grpcCode = codes.InvalidArgument
    case errors.ErrorCodeNotFound:
        grpcCode = codes.NotFound
    case errors.ErrorCodeUnauthorized:
        grpcCode = codes.Unauthenticated
    case errors.ErrorCodeTimeout:
        grpcCode = codes.DeadlineExceeded
    case errors.ErrorCodeServiceDown:
        grpcCode = codes.Unavailable
    default:
        grpcCode = codes.Internal
    }
    
    // Create status with details
    st := status.New(grpcCode, serviceErr.Message)
    
    // Add error details
    detail := &ErrorDetail{
        Code:      string(serviceErr.Code),
        Service:   serviceErr.Service,
        Operation: serviceErr.Operation,
        Retryable: serviceErr.Retryable,
    }
    
    detailAny, _ := anypb.New(detail)
    st, _ = st.WithDetails(detailAny)
    
    return st.Err()
}

// FromGRPCError converts a gRPC error back to a ServiceError
func FromGRPCError(err error) error {
    st, ok := status.FromError(err)
    if !ok {
        return &errors.ServiceError{
            Code:      errors.ErrorCodeInternal,
            Message:   "unknown gRPC error",
            Service:   "grpc-client",
            Operation: "call",
            Retryable: false,
            Cause:     err,
        }
    }
    
    // Extract error details if available
    for _, detail := range st.Details() {
        if errorDetail, ok := detail.(*ErrorDetail); ok {
            return &errors.ServiceError{
                Code:      errors.ErrorCode(errorDetail.Code),
                Message:   st.Message(),
                Service:   errorDetail.Service,
                Operation: errorDetail.Operation,
                Retryable: errorDetail.Retryable,
                Cause:     err,
            }
        }
    }
    
    // Fallback to mapping gRPC codes
    var errorCode errors.ErrorCode
    switch st.Code() {
    case codes.InvalidArgument:
        errorCode = errors.ErrorCodeValidation
    case codes.NotFound:
        errorCode = errors.ErrorCodeNotFound
    case codes.Unauthenticated:
        errorCode = errors.ErrorCodeUnauthorized
    case codes.DeadlineExceeded:
        errorCode = errors.ErrorCodeTimeout
    case codes.Unavailable:
        errorCode = errors.ErrorCodeServiceDown
    default:
        errorCode = errors.ErrorCodeInternal
    }
    
    return &errors.ServiceError{
        Code:      errorCode,
        Message:   st.Message(),
        Service:   "unknown",
        Operation: "grpc-call",
        Retryable: st.Code() == codes.Unavailable || st.Code() == codes.DeadlineExceeded,
        Cause:     err,
    }
}

Circuit Breaker Integration

Circuit breakers are essential for preventing cascade failures in microservices architectures. Integrating them with your error handling patterns provides automatic fallback mechanisms.

package circuitbreaker

import (
    "context"
    "time"
    "github.com/sony/gobreaker"
    "github.com/yourorg/common/errors"
)

type ServiceClient struct {
    breaker *gobreaker.CircuitBreaker
    client  HTTPClient
}

func NewServiceClient(name string, client HTTPClient) *ServiceClient {
    settings := gobreaker.Settings{
        Name:        name,
        MaxRequests: 3,
        Interval:    10 * time.Second,
        Timeout:     30 * time.Second,
        ReadyToTrip: func(counts gobreaker.Counts) bool {
            failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
            return counts.Requests >= 3 && failureRatio >= 0.6
        },
        OnStateChange: func(name string, from, to gobreaker.State) {
            // Log state changes for monitoring
            log.Printf("Circuit breaker %s changed from %s to %s", name, from, to)
        },
    }
    
    return &ServiceClient{
        breaker: gobreaker.NewCircuitBreaker(settings),
        client:  client,
    }
}

func (c *ServiceClient) Call(ctx context.Context, request *Request) (*Response, error) {
    result, err := c.breaker.Execute(func() (interface{}, error) {
        response, err := c.client.Do(ctx, request)
        if err != nil {
            // Check if this is a retryable error
            var serviceErr *errors.ServiceError
            if errors.As(err, &serviceErr) && !serviceErr.Retryable {
                // Non-retryable errors shouldn't trip the circuit breaker
                return nil, gobreaker.ErrIgnore{Err: err}
            }
        }
        return response, err
    })
    
    if err != nil {
        // Handle circuit breaker specific errors
        if err == gobreaker.ErrOpenState {
            return nil, &errors.ServiceError{
                Code:      errors.ErrorCodeServiceDown,
                Message:   "service temporarily unavailable (circuit breaker open)",
                Service:   c.breaker.Name(),
                Operation: "call",
                Retryable: true,
                Cause:     err,
            }
        }
        
        if err == gobreaker.ErrTooManyRequests {
            return nil, &errors.ServiceError{
                Code:      errors.ErrorCodeServiceDown,
                Message:   "service overloaded (circuit breaker half-open)",
                Service:   c.breaker.Name(),
                Operation: "call",
                Retryable: true,
                Cause:     err,
            }
        }
        
        // Handle ignored errors (unwrap them)
        if ignoreErr, ok := err.(gobreaker.ErrIgnore); ok {
            return nil, ignoreErr.Err
        }
        
        return nil, err
    }
    
    return result.(*Response), nil
}

Distributed Tracing Integration

Modern microservices architectures require distributed tracing to understand error propagation across services. Integrating OpenTelemetry with your error handling provides comprehensive observability.

package tracing

import (
    "context"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/codes"
    "go.opentelemetry.io/otel/trace"
    "github.com/yourorg/common/errors"
)

var tracer = otel.Tracer("microservice-errors")

// WithErrorTracing wraps a function with distributed tracing and error handling
func WithErrorTracing(ctx context.Context, operationName string, fn func(context.Context) error) error {
    ctx, span := tracer.Start(ctx, operationName)
    defer span.End()
    
    err := fn(ctx)
    if err != nil {
        // Record error details in the span
        var serviceErr *errors.ServiceError
        if errors.As(err, &serviceErr) {
            span.SetAttributes(
                attribute.String("error.code", string(serviceErr.Code)),
                attribute.String("error.service", serviceErr.Service),
                attribute.String("error.operation", serviceErr.Operation),
                attribute.Bool("error.retryable", serviceErr.Retryable),
            )
            
            // Set span status based on error type
            if serviceErr.Code == errors.ErrorCodeInternal {
                span.SetStatus(codes.Error, serviceErr.Message)
            } else {
                // Client errors shouldn't be marked as span errors
                span.SetStatus(codes.Ok, serviceErr.Message)
            }
        } else {
            span.SetStatus(codes.Error, err.Error())
        }
        
        span.RecordError(err)
    }
    
    return err
}

// Enhanced service method with tracing
func (s *UserService) GetUserWithTracing(ctx context.Context, userID string) (*User, error) {
    return WithErrorTracing(ctx, "user.get", func(ctx context.Context) error {
        user, err := s.GetUser(ctx, userID)
        if err != nil {
            return err
        }
        
        // Add user attributes to span
        span := trace.SpanFromContext(ctx)
        span.SetAttributes(
            attribute.String("user.id", user.ID),
            attribute.String("user.email", user.Email),
        )
        
        return nil
    })
}

Error Response Patterns

Finally, you need consistent patterns for converting internal errors to client responses while maintaining security and providing useful information.

package httphandler

import (
    "encoding/json"
    "net/http"
    "github.com/yourorg/common/errors"
)

type ErrorResponse struct {
    Error struct {
        Code      string `json:"code"`
        Message   string `json:"message"`
        TraceID   string `json:"trace_id,omitempty"`
        Retryable bool   `json:"retryable"`
    } `json:"error"`
}

func HandleError(w http.ResponseWriter, r *http.Request, err error) {
    var serviceErr *errors.ServiceError
    var response ErrorResponse
    var statusCode int
    
    if errors.As(err, &serviceErr) {
        statusCode = serviceErr.HTTPStatusCode()
        response.Error.Code = string(serviceErr.Code)
        response.Error.Message = serviceErr.Message
        response.Error.Retryable = serviceErr.Retryable
    } else {
        // Unknown error - don't expose internal details
        statusCode = http.StatusInternalServerError
        response.Error.Code = string(errors.ErrorCodeInternal)
        response.Error.Message = "An internal error occurred"
        response.Error.Retryable = false
    }
    
    // Add trace ID if available
    if traceID := r.Context().Value(TraceIDKey); traceID != nil {
        response.Error.TraceID = traceID.(string)
    }
    
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(statusCode)
    json.NewEncoder(w).Encode(response)
    
    // Log error for monitoring (with full context)
    logError(r.Context(), err, statusCode)
}

func logError(ctx context.Context, err error, statusCode int) {
    // Extract tracing information
    traceID, _ := ctx.Value(TraceIDKey).(string)
    correlationID, _ := ctx.Value(CorrelationIDKey).(string)
    
    // Log with structured information
    log.WithFields(map[string]interface{}{
        "error":          err.Error(),
        "status_code":    statusCode,
        "trace_id":       traceID,
        "correlation_id": correlationID,
    }).Error("HTTP request failed")
}

Best Practices and Security Considerations

When implementing these error handling patterns, follow these best practices:

Security First: As highlighted in the JetBrains secure error handling guide, ensure that vulnerabilities in third-party components cannot be introspected through error messages. Never expose internal system details in client-facing errors.

Consistent Error Codes: Use consistent error codes across all services. This makes it easier for clients to handle errors programmatically and enables better monitoring and alerting.

Retryable vs Non-Retryable: Clearly distinguish between errors that can be retried and those that cannot. This prevents clients from wasting resources on futile retry attempts.

Context Preservation: Always preserve error context when wrapping errors, but be selective about what information crosses service boundaries.

Conclusion

Effective error handling in Go microservices requires thoughtful design and consistent patterns. By implementing structured error types, proper context propagation, and integration with observability tools, you can build resilient systems that provide meaningful debugging information while maintaining security and performance.

The patterns outlined in this post provide a foundation for handling errors across distributed systems. As noted in the comprehensive guide to Go error handling, “the errors package provides a powerful set of tools for handling errors in Go” - and when combined with microservices-specific patterns, these tools become even more powerful.

Remember that error handling is not just about catching and returning errors - it’s about building systems that can gracefully handle failure modes and provide operators with the information they need to maintain system health.