Go Error Handling Patterns for Microservices Architecture
Go’s simplicity and performance make it an excellent choice for microservices architecture, but handling errors across distributed systems presents unique challenges. Unlike monolithic applications where errors can be handled in a single context, microservices require sophisticated error handling patterns that maintain system resilience while providing meaningful debugging information.
In this post, we’ll explore advanced error handling patterns specifically designed for Go microservices, including error wrapping strategies, context propagation techniques, and distributed tracing integration. These patterns will help you build more robust and maintainable distributed systems.
The Challenge of Error Handling in Microservices
When building microservices with Go, errors don’t just occur within a single service boundary. They propagate across network calls, traverse multiple services, and need to be handled at various levels of your architecture. The challenge is maintaining error context while ensuring each service can make appropriate decisions based on error types.
Traditional Go error handling with simple error returns becomes insufficient when you need to:
- Trace errors across service boundaries
- Maintain error context through multiple network hops
- Differentiate between retryable and non-retryable errors
- Provide meaningful error responses to clients while hiding internal details
Structured Error Types for Microservices
The foundation of effective microservices error handling starts with structured error types. Rather than using simple string errors, create error types that carry semantic meaning and can be handled appropriately by different services.
package errors
import (
"fmt"
"net/http"
)
// ErrorCode represents different types of errors in our system
type ErrorCode string
const (
ErrorCodeValidation ErrorCode = "VALIDATION_ERROR"
ErrorCodeNotFound ErrorCode = "NOT_FOUND"
ErrorCodeUnauthorized ErrorCode = "UNAUTHORIZED"
ErrorCodeInternal ErrorCode = "INTERNAL_ERROR"
ErrorCodeServiceDown ErrorCode = "SERVICE_UNAVAILABLE"
ErrorCodeTimeout ErrorCode = "TIMEOUT"
)
// ServiceError represents errors that can be handled across service boundaries
type ServiceError struct {
Code ErrorCode `json:"code"`
Message string `json:"message"`
Service string `json:"service"`
Operation string `json:"operation"`
Retryable bool `json:"retryable"`
StatusCode int `json:"status_code"`
Cause error `json:"-"` // Don't serialize the underlying error
}
func (e *ServiceError) Error() string {
return fmt.Sprintf("[%s:%s] %s: %s", e.Service, e.Operation, e.Code, e.Message)
}
func (e *ServiceError) Unwrap() error {
return e.Cause
}
// HTTPStatusCode returns the appropriate HTTP status code for this error
func (e *ServiceError) HTTPStatusCode() int {
if e.StatusCode != 0 {
return e.StatusCode
}
switch e.Code {
case ErrorCodeValidation:
return http.StatusBadRequest
case ErrorCodeNotFound:
return http.StatusNotFound
case ErrorCodeUnauthorized:
return http.StatusUnauthorized
case ErrorCodeServiceDown, ErrorCodeTimeout:
return http.StatusServiceUnavailable
default:
return http.StatusInternalServerError
}
}
This structured approach provides several benefits. As noted in Mario Carrion’s microservices guide, implementing an error type with state allows you to “define an Error Code that we can use to properly render different responses on our HTTP layer.”
Error Wrapping Patterns
Go’s error wrapping capabilities, introduced in Go 1.13, are particularly powerful in microservices architectures. However, you need to be strategic about what information to preserve and what to abstract away.
package userservice
import (
"context"
"fmt"
"github.com/yourorg/common/errors"
)
type UserService struct {
db Database
authSvc AuthService
}
func (s *UserService) GetUser(ctx context.Context, userID string) (*User, error) {
// Validate input
if userID == "" {
return nil, &errors.ServiceError{
Code: errors.ErrorCodeValidation,
Message: "user ID cannot be empty",
Service: "user-service",
Operation: "GetUser",
Retryable: false,
}
}
// Check authorization
if err := s.authSvc.CheckPermission(ctx, userID); err != nil {
var authErr *errors.ServiceError
if errors.As(err, &authErr) {
// Preserve the error code but update the context
return nil, &errors.ServiceError{
Code: authErr.Code,
Message: "insufficient permissions to access user",
Service: "user-service",
Operation: "GetUser",
Retryable: authErr.Retryable,
Cause: err,
}
}
// Unknown error from auth service
return nil, &errors.ServiceError{
Code: errors.ErrorCodeInternal,
Message: "authorization check failed",
Service: "user-service",
Operation: "GetUser",
Retryable: false,
Cause: err,
}
}
// Fetch user from database
user, err := s.db.GetUser(ctx, userID)
if err != nil {
return nil, s.wrapDatabaseError(err, "GetUser")
}
return user, nil
}
func (s *UserService) wrapDatabaseError(err error, operation string) error {
// Check for specific database errors
if isNotFoundError(err) {
return &errors.ServiceError{
Code: errors.ErrorCodeNotFound,
Message: "user not found",
Service: "user-service",
Operation: operation,
Retryable: false,
Cause: err,
}
}
if isTimeoutError(err) {
return &errors.ServiceError{
Code: errors.ErrorCodeTimeout,
Message: "database operation timed out",
Service: "user-service",
Operation: operation,
Retryable: true,
Cause: err,
}
}
// Generic database error
return &errors.ServiceError{
Code: errors.ErrorCodeInternal,
Message: "database operation failed",
Service: "user-service",
Operation: operation,
Retryable: false,
Cause: err,
}
}
Context Propagation for Error Tracing
Context propagation is crucial for tracing errors across service boundaries. You need to carry trace information, correlation IDs, and other metadata that helps with debugging distributed systems.
package middleware
import (
"context"
"net/http"
"github.com/google/uuid"
)
type contextKey string
const (
TraceIDKey contextKey = "trace_id"
CorrelationIDKey contextKey = "correlation_id"
ServiceChainKey contextKey = "service_chain"
)
// TraceMiddleware adds tracing information to requests
func TraceMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
// Extract or generate trace ID
traceID := r.Header.Get("X-Trace-ID")
if traceID == "" {
traceID = uuid.New().String()
}
ctx = context.WithValue(ctx, TraceIDKey, traceID)
// Extract or generate correlation ID
correlationID := r.Header.Get("X-Correlation-ID")
if correlationID == "" {
correlationID = uuid.New().String()
}
ctx = context.WithValue(ctx, CorrelationIDKey, correlationID)
// Build service chain
serviceChain := r.Header.Get("X-Service-Chain")
if serviceChain != "" {
serviceChain += " -> "
}
serviceChain += "user-service"
ctx = context.WithValue(ctx, ServiceChainKey, serviceChain)
// Add trace headers to response
w.Header().Set("X-Trace-ID", traceID)
w.Header().Set("X-Correlation-ID", correlationID)
next.ServeHTTP(w, r.WithContext(ctx))
})
}
// Enhanced ServiceError with tracing information
type TracedServiceError struct {
*errors.ServiceError
TraceID string `json:"trace_id"`
CorrelationID string `json:"correlation_id"`
ServiceChain string `json:"service_chain"`
}
func NewTracedError(ctx context.Context, baseErr *errors.ServiceError) *TracedServiceError {
traceID, _ := ctx.Value(TraceIDKey).(string)
correlationID, _ := ctx.Value(CorrelationIDKey).(string)
serviceChain, _ := ctx.Value(ServiceChainKey).(string)
return &TracedServiceError{
ServiceError: baseErr,
TraceID: traceID,
CorrelationID: correlationID,
ServiceChain: serviceChain,
}
}
gRPC Error Handling Patterns
When using gRPC for service-to-service communication, you need specific patterns to handle errors across the protocol boundary. The Reddit discussion on gRPC error handling highlights common challenges and solutions.
package grpchandler
import (
"context"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
"google.golang.org/protobuf/types/known/anypb"
"github.com/yourorg/common/errors"
)
// ErrorDetail represents additional error information
type ErrorDetail struct {
Code string `json:"code"`
Service string `json:"service"`
Operation string `json:"operation"`
Retryable bool `json:"retryable"`
}
// ToGRPCError converts a ServiceError to a gRPC status error
func ToGRPCError(err error) error {
var serviceErr *errors.ServiceError
if !errors.As(err, &serviceErr) {
// Unknown error type, wrap as internal error
return status.Error(codes.Internal, "internal server error")
}
// Map service error codes to gRPC codes
var grpcCode codes.Code
switch serviceErr.Code {
case errors.ErrorCodeValidation:
grpcCode = codes.InvalidArgument
case errors.ErrorCodeNotFound:
grpcCode = codes.NotFound
case errors.ErrorCodeUnauthorized:
grpcCode = codes.Unauthenticated
case errors.ErrorCodeTimeout:
grpcCode = codes.DeadlineExceeded
case errors.ErrorCodeServiceDown:
grpcCode = codes.Unavailable
default:
grpcCode = codes.Internal
}
// Create status with details
st := status.New(grpcCode, serviceErr.Message)
// Add error details
detail := &ErrorDetail{
Code: string(serviceErr.Code),
Service: serviceErr.Service,
Operation: serviceErr.Operation,
Retryable: serviceErr.Retryable,
}
detailAny, _ := anypb.New(detail)
st, _ = st.WithDetails(detailAny)
return st.Err()
}
// FromGRPCError converts a gRPC error back to a ServiceError
func FromGRPCError(err error) error {
st, ok := status.FromError(err)
if !ok {
return &errors.ServiceError{
Code: errors.ErrorCodeInternal,
Message: "unknown gRPC error",
Service: "grpc-client",
Operation: "call",
Retryable: false,
Cause: err,
}
}
// Extract error details if available
for _, detail := range st.Details() {
if errorDetail, ok := detail.(*ErrorDetail); ok {
return &errors.ServiceError{
Code: errors.ErrorCode(errorDetail.Code),
Message: st.Message(),
Service: errorDetail.Service,
Operation: errorDetail.Operation,
Retryable: errorDetail.Retryable,
Cause: err,
}
}
}
// Fallback to mapping gRPC codes
var errorCode errors.ErrorCode
switch st.Code() {
case codes.InvalidArgument:
errorCode = errors.ErrorCodeValidation
case codes.NotFound:
errorCode = errors.ErrorCodeNotFound
case codes.Unauthenticated:
errorCode = errors.ErrorCodeUnauthorized
case codes.DeadlineExceeded:
errorCode = errors.ErrorCodeTimeout
case codes.Unavailable:
errorCode = errors.ErrorCodeServiceDown
default:
errorCode = errors.ErrorCodeInternal
}
return &errors.ServiceError{
Code: errorCode,
Message: st.Message(),
Service: "unknown",
Operation: "grpc-call",
Retryable: st.Code() == codes.Unavailable || st.Code() == codes.DeadlineExceeded,
Cause: err,
}
}
Circuit Breaker Integration
Circuit breakers are essential for preventing cascade failures in microservices architectures. Integrating them with your error handling patterns provides automatic fallback mechanisms.
package circuitbreaker
import (
"context"
"time"
"github.com/sony/gobreaker"
"github.com/yourorg/common/errors"
)
type ServiceClient struct {
breaker *gobreaker.CircuitBreaker
client HTTPClient
}
func NewServiceClient(name string, client HTTPClient) *ServiceClient {
settings := gobreaker.Settings{
Name: name,
MaxRequests: 3,
Interval: 10 * time.Second,
Timeout: 30 * time.Second,
ReadyToTrip: func(counts gobreaker.Counts) bool {
failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
return counts.Requests >= 3 && failureRatio >= 0.6
},
OnStateChange: func(name string, from, to gobreaker.State) {
// Log state changes for monitoring
log.Printf("Circuit breaker %s changed from %s to %s", name, from, to)
},
}
return &ServiceClient{
breaker: gobreaker.NewCircuitBreaker(settings),
client: client,
}
}
func (c *ServiceClient) Call(ctx context.Context, request *Request) (*Response, error) {
result, err := c.breaker.Execute(func() (interface{}, error) {
response, err := c.client.Do(ctx, request)
if err != nil {
// Check if this is a retryable error
var serviceErr *errors.ServiceError
if errors.As(err, &serviceErr) && !serviceErr.Retryable {
// Non-retryable errors shouldn't trip the circuit breaker
return nil, gobreaker.ErrIgnore{Err: err}
}
}
return response, err
})
if err != nil {
// Handle circuit breaker specific errors
if err == gobreaker.ErrOpenState {
return nil, &errors.ServiceError{
Code: errors.ErrorCodeServiceDown,
Message: "service temporarily unavailable (circuit breaker open)",
Service: c.breaker.Name(),
Operation: "call",
Retryable: true,
Cause: err,
}
}
if err == gobreaker.ErrTooManyRequests {
return nil, &errors.ServiceError{
Code: errors.ErrorCodeServiceDown,
Message: "service overloaded (circuit breaker half-open)",
Service: c.breaker.Name(),
Operation: "call",
Retryable: true,
Cause: err,
}
}
// Handle ignored errors (unwrap them)
if ignoreErr, ok := err.(gobreaker.ErrIgnore); ok {
return nil, ignoreErr.Err
}
return nil, err
}
return result.(*Response), nil
}
Distributed Tracing Integration
Modern microservices architectures require distributed tracing to understand error propagation across services. Integrating OpenTelemetry with your error handling provides comprehensive observability.
package tracing
import (
"context"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
"go.opentelemetry.io/otel/trace"
"github.com/yourorg/common/errors"
)
var tracer = otel.Tracer("microservice-errors")
// WithErrorTracing wraps a function with distributed tracing and error handling
func WithErrorTracing(ctx context.Context, operationName string, fn func(context.Context) error) error {
ctx, span := tracer.Start(ctx, operationName)
defer span.End()
err := fn(ctx)
if err != nil {
// Record error details in the span
var serviceErr *errors.ServiceError
if errors.As(err, &serviceErr) {
span.SetAttributes(
attribute.String("error.code", string(serviceErr.Code)),
attribute.String("error.service", serviceErr.Service),
attribute.String("error.operation", serviceErr.Operation),
attribute.Bool("error.retryable", serviceErr.Retryable),
)
// Set span status based on error type
if serviceErr.Code == errors.ErrorCodeInternal {
span.SetStatus(codes.Error, serviceErr.Message)
} else {
// Client errors shouldn't be marked as span errors
span.SetStatus(codes.Ok, serviceErr.Message)
}
} else {
span.SetStatus(codes.Error, err.Error())
}
span.RecordError(err)
}
return err
}
// Enhanced service method with tracing
func (s *UserService) GetUserWithTracing(ctx context.Context, userID string) (*User, error) {
return WithErrorTracing(ctx, "user.get", func(ctx context.Context) error {
user, err := s.GetUser(ctx, userID)
if err != nil {
return err
}
// Add user attributes to span
span := trace.SpanFromContext(ctx)
span.SetAttributes(
attribute.String("user.id", user.ID),
attribute.String("user.email", user.Email),
)
return nil
})
}
Error Response Patterns
Finally, you need consistent patterns for converting internal errors to client responses while maintaining security and providing useful information.
package httphandler
import (
"encoding/json"
"net/http"
"github.com/yourorg/common/errors"
)
type ErrorResponse struct {
Error struct {
Code string `json:"code"`
Message string `json:"message"`
TraceID string `json:"trace_id,omitempty"`
Retryable bool `json:"retryable"`
} `json:"error"`
}
func HandleError(w http.ResponseWriter, r *http.Request, err error) {
var serviceErr *errors.ServiceError
var response ErrorResponse
var statusCode int
if errors.As(err, &serviceErr) {
statusCode = serviceErr.HTTPStatusCode()
response.Error.Code = string(serviceErr.Code)
response.Error.Message = serviceErr.Message
response.Error.Retryable = serviceErr.Retryable
} else {
// Unknown error - don't expose internal details
statusCode = http.StatusInternalServerError
response.Error.Code = string(errors.ErrorCodeInternal)
response.Error.Message = "An internal error occurred"
response.Error.Retryable = false
}
// Add trace ID if available
if traceID := r.Context().Value(TraceIDKey); traceID != nil {
response.Error.TraceID = traceID.(string)
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(statusCode)
json.NewEncoder(w).Encode(response)
// Log error for monitoring (with full context)
logError(r.Context(), err, statusCode)
}
func logError(ctx context.Context, err error, statusCode int) {
// Extract tracing information
traceID, _ := ctx.Value(TraceIDKey).(string)
correlationID, _ := ctx.Value(CorrelationIDKey).(string)
// Log with structured information
log.WithFields(map[string]interface{}{
"error": err.Error(),
"status_code": statusCode,
"trace_id": traceID,
"correlation_id": correlationID,
}).Error("HTTP request failed")
}
Best Practices and Security Considerations
When implementing these error handling patterns, follow these best practices:
Security First: As highlighted in the JetBrains secure error handling guide, ensure that vulnerabilities in third-party components cannot be introspected through error messages. Never expose internal system details in client-facing errors.
Consistent Error Codes: Use consistent error codes across all services. This makes it easier for clients to handle errors programmatically and enables better monitoring and alerting.
Retryable vs Non-Retryable: Clearly distinguish between errors that can be retried and those that cannot. This prevents clients from wasting resources on futile retry attempts.
Context Preservation: Always preserve error context when wrapping errors, but be selective about what information crosses service boundaries.
Conclusion
Effective error handling in Go microservices requires thoughtful design and consistent patterns. By implementing structured error types, proper context propagation, and integration with observability tools, you can build resilient systems that provide meaningful debugging information while maintaining security and performance.
The patterns outlined in this post provide a foundation for handling errors across distributed systems. As noted in the comprehensive guide to Go error handling, “the errors package provides a powerful set of tools for handling errors in Go” - and when combined with microservices-specific patterns, these tools become even more powerful.
Remember that error handling is not just about catching and returning errors - it’s about building systems that can gracefully handle failure modes and provide operators with the information they need to maintain system health.