Microservice Dependencies

Overview

In tSM's microservice-based architecture, managing the startup and shutdown sequence of individual services is crucial to maintaining system stability, avoiding data loss, and ensuring that services are ready to process requests in the correct order. This guide provides a high-level overview of the dependencies between tSM services and the recommended startup and shutdown sequences.

1. Understanding tSM Microservice Dependencies

The tSM platform consists of several microservices that interact closely. These dependencies arise due to shared data, synchronous API calls, or message queues such as Kafka. Understanding these dependencies is vital to prevent situations where services become unavailable or data is lost. A well-defined sequence ensures that services start correctly and stop gracefully without disrupting the overall system.

Graceful Shutdown and Startup

Each service is designed to support graceful shutdowns, where services complete ongoing requests, close active database connections, and release resources before stopping. This ensures that data is not lost and dependent services are not disrupted.

Signal Handling: Ensure each microservice correctly handles termination signals (e.g., SIGTERM).
Message Queues: Pause message consumption and ensure all messages are processed before shutting down Kafka consumers.
Database Connections: Ensure all active database connections are closed gracefully to avoid data corruption.

2. Deployment Scenarios

In a tSM deployment, there are two main scenarios to consider:

2.1 Zero-Downtime Deployments

Zero-downtime deployments are the standard approach for routine updates, introducing new microservices, or making compatible changes to existing microservices. The tSM platform uses a combination of Kubernetes services and built-in load balancing mechanisms to automatically handle traffic routing without requiring manual intervention. New service instances are started and seamlessly integrated into the platform without service interruptions.

Rolling Updates: New instances are started, health-checked, and added to the load balancer automatically. Once the new instances are verified as healthy, the old instances are terminated gracefully. This ensures continuous availability without manual traffic switching.
Service Discovery: When a new instance starts, it registers itself with the tSM platform’s service discovery mechanism. Traffic is routed only to healthy instances.
Flyway Script Usage: Database schema changes are managed using Flyway scripts. When a new service version is deployed, Flyway checks and automatically applies the necessary schema updates. This process is seamless and does not require downtime for compatible schema changes.

Example: Deploying a new version of the Order Management Service:

Deploy a new instance of the service.
Flyway automatically runs any new database migration scripts, updating the schema without downtime.
Confirm that the new instance is healthy and has registered itself in the Service Discovery.
The load balancer redirects traffic to the new instance.
Shutdown the old instance gracefully.

2.2 Downtime Deployments

Downtime deployments are required when major infrastructure changes are involved (e.g., upgrading the database engine, cluster maintenance) or when incompatible changes in service interfaces make zero-downtime impossible. Flyway scripts are still used to handle schema updates, but the deployment must be performed in a controlled maintenance window.

Full Stop and Start: All services are stopped, the infrastructure is updated, and the services are restarted in the correct order.
Incompatible Schema Changes: If the schema changes cannot be applied in a zero-downtime manner, Flyway will execute the scripts during the maintenance window to ensure consistency.

Example: Upgrading the underlying database engine or performing cluster maintenance:

Schedule a maintenance window and inform users about the expected downtime.
Stop all affected services in the correct shutdown order.
Perform the database or infrastructure upgrade.
Use Flyway scripts to apply the required schema changes automatically.
Restart the services in the correct startup order.

3. Service Dependencies

Core Infrastructure Services

Core infrastructure services should be the first to start and the last to shut down in a tSM deployment. These services provide fundamental capabilities, such as storage, messaging, and monitoring, for the entire platform. Examples include:

Database (Postgres, MSSQL, Oracle): Stores persistent state and configuration data used across the platform.
Kafka: Manages asynchronous communication and event-driven interactions between services.
Elasticsearch: Provides support for full-text search and logging.
Monitoring and Logging Tools: Tools like Prometheus, Grafana, Log Elasticsearch, and Kibana provide visibility into service health and performance metrics.

Primary Backend Services

Primary backend services in the tSM platform form the core of the application. These services must be operational for other microservices to function correctly and should be started after the core infrastructure services are up and running. Primary services are:

Configuration Server: Provides centralized configuration for all other microservices. Must be available before starting services that depend on dynamic configuration.
API Gateway: Serves as the main entry point for API requests and manages routing, security, and rate limiting.
User Management Service: Manages user data, authentication, and authorization.
Config & Forms Service: Manages system configuration.
Management Service: System overview and management.

Dependent Business Services

Business services in the tSM platform often rely on the primary backend services. They implement domain-specific logic, manage workflows, and integrate with external systems. Examples include:

Catalog: Manages product and service catalogs.
Inventory Service: Tracks inventory and asset states.
Order Service: Manages order lifecycles and related processes.
Integration Services: Facilitate communication with third-party systems and APIs.

4. Recommended Startup and Shutdown Order for tSM Microservices

Startup Order

Core Infrastructure Services: Start databases, message brokers, and monitoring tools to ensure the foundation is ready.
Primary Backend Services: Start core services like API Gateway, User Management, and other foundational microservices.
Business and Integration Services: Start services that provide business functionalities or integrate with external systems.

Shutdown Order

For a safe and effective shutdown of tSM microservices, the shutdown order should be the reverse of the startup order:

Business and Integration Services: Shut down these services first to ensure that no new operations are started.
Primary Backend Services: Stop API Gateway, User Management, and related services.
Core Infrastructure Services: Finally, stop databases, message brokers, and monitoring tools.

5. Example Scenarios for tSM Maintenance

Scenario 1: Zero-Downtime Update of a Business Logic Service

When updating a business service (e.g., Order Management Service) using a rolling update:

Deploy a new instance of the service.
Flyway automatically runs any new database migration scripts.
Confirm that the new instance is healthy and has registered itself.
Traffic is automatically routed to the new instance by the load balancer.
Shutdown the old instance gracefully.

Scenario 2: Downtime Maintenance on Kafka in a tSM Deployment

For planned maintenance on a Kafka cluster used by tSM:

Stop services that consume or produce messages to Kafka to prevent data loss or inconsistent states.
Perform the maintenance on Kafka.
Restart the services and ensure they resume consuming and producing messages correctly.