Deployment Guide

Overview

The Telco Service Management (tSM) application is a robust, microservice-based system designed for complex corporate environments. This guide provides a detailed overview of recommended deployment strategies, microservice dependencies, and methods for managing zero-downtime and downtime deployments.

We have extensive experience deploying the tSM application in large organizations, adhering to strict corporate standards and governance models. Our deployment processes involve full collaboration with Level 2 (L2) support teams for deployment, monitoring, and maintenance, ensuring that operational teams have complete visibility and control over the application.

Deployment Scenarios

This document outlines various deployment scenarios, including:

Zero-Downtime Deployments for routine updates or adding new services.
Downtime Deployments for major infrastructure changes or incompatible schema modifications.
CI/CD Pipelines for automating deployments using GitLab and ArgoCD.
Customer-Specific Deployment Flows to adapt to unique infrastructure requirements.
On-Premises Deployment with Virtual Machines (VMs) deployment

1. Microservice Dependencies in tSM

The tSM application comprises several microservices that interact closely, with dependencies arising from shared data, synchronous API calls, or message queues like Kafka. Understanding these dependencies is crucial for managing startup and shutdown sequences without causing data loss or service disruptions.

Core Infrastructure Services

Core infrastructure services provide the foundation for the entire application and should be the first to start and the last to shut down. Examples include:

Database (PostgreSQL, MSSQL, Oracle): Stores persistent state and configuration data.
Kafka: Manages asynchronous communication and event-driven interactions.
Elasticsearch: Provides support for full-text search and logging.
Monitoring and Logging Tools: Prometheus, Grafana, Elasticsearch, and Kibana provide visibility into service health and performance.

Primary Backend Services

Primary backend services are central to the tSM application and must be operational for other services to function correctly. Examples include:

Configuration Server: Provides dynamic configuration for other microservices.
API Gateway: Acts as the main entry point for external API requests.
User Management Service: Manages user data and authentication.
Management Service: Oversees the overall system health and scaling.

Dependent Business Services

Business services rely on primary backend services to provide domain-specific logic. Examples include:

Catalog Service: Manages product and service catalogs.
Inventory Service: Tracks inventory states.
Order Service: Manages order lifecycles and workflows.

2. Deployment Flow

We have developed a streamlined deployment flow to ensure seamless integration into corporate environments. This flow includes:

CI/CD Pipelines (GitLab)

Our deployment pipelines are managed by GitLab CI/CD. The process begins by building the application and pushing Docker images to the Harbor container registry. The deployment pipeline then uses ArgoCD for automated synchronization and deployment into the Kubernetes cluster, ensuring that the correct version of the application is deployed and managed according to the desired state.

Key Steps:

Build: Application builds are triggered in the GitLab CI/CD pipelines, including running unit tests, integration tests, and packaging the application into Docker images.
Push to Harbor: The newly built Docker images are pushed to the Harbor container registry.
Deployment: ArgoCD pulls Docker images from Harbor and deploys them into the Kubernetes cluster.
Rollback: ArgoCD supports rollback mechanisms in case of issues.

Customer-Side Deployment Flow

We follow a standardized deployment flow for customer-specific environments, adapting to their infrastructure, security policies, and internal processes. This includes:

Pull Docker Images: From the Harbor registry.
Push to Customer Registry: If applicable, to the customer’s private container registry.
Deployment Creation: Configuration files are created and updated using GitLab CI/CD pipelines.
Kubernetes Synchronization: The state of the Kubernetes cluster is synchronized using ArgoCD.
Automated Smoke Testing: Automated smoke tests validate the deployment.
Handover to L2 Support: The deployment is handed over with detailed runbooks and monitoring dashboards.

On-Premises Deployment with Virtual Machines

For customers deploying the tSM application on-premises without Kubernetes and High Availability (HA), services are run manually on Virtual Machines (VMs). This approach is suitable for environments where container orchestration platforms are not available or preferred.

Key Steps:

Install Required Dependencies: Ensure that all necessary dependencies (e.g., Java, databases) are installed on the VMs.
Deploy Services Manually: Copy application binaries or Docker images to the VMs.
Run Services as Systemd Services: Configure each microservice to run as a systemd service for manageability.
Configure Services: Update configuration files to match the on-premises environment.
Start Services in Order: Start core infrastructure services first, followed by primary backend services, then dependent business services.
Monitoring and Logging: Set up monitoring and logging tools as per requirements.

Note: In this setup, High Availability features like automatic failover and load balancing are not available. Manual intervention is required for service restarts and maintenance.

3. Deployment Strategies

3.1 Zero-Downtime Deployments

Zero-downtime deployments are the standard approach for routine updates. This strategy uses Kubernetes services and built-in load balancers to manage traffic routing automatically. New service instances are started and integrated into the application without service interruptions.

Rolling Updates: Kubernetes handles the deployment of new instances and the termination of old ones automatically. There is no manual check required; the Kubernetes service updates the application seamlessly.
Service Discovery: New instances register with the tSM application's service discovery mechanism.
Flyway for Schema Changes: Flyway scripts handle database schema changes automatically.

3.2 Downtime Deployments

Downtime deployments are necessary for major infrastructure changes. Flyway scripts still handle schema updates, but the deployment must be performed in a controlled maintenance window.

Full Stop and Start: All services are stopped, infrastructure is updated, and services are restarted.
Incompatible Schema Changes: Handled by Flyway during the maintenance window.

4. Deployment Best Practices

Collaboration with Internal IT: We align with internal teams to ensure that deployment processes follow corporate security standards.
Compliance with Security Policies: Secure communication between services using TLS/SSL, Role-Based Access Control (RBAC) for Kubernetes resources, and container security.
Integration with L2 Support: Ongoing support and monitoring via Grafana and Prometheus dashboards.

5. Example Scenarios for tSM Maintenance

Scenario 1: Zero-Downtime Update of a Business Logic Service

Deploy a New Instance: Initiate the deployment of the new service version.
Automatic Database Migration: Flyway scripts run any new database migration scripts.
Automated Health Checks: Kubernetes automatically checks the health of the new instance.
Traffic Routing: Traffic is automatically routed to the new instance.
Graceful Termination: The old instance is terminated gracefully.

Scenario 2: Downtime Maintenance on Kafka

Stop Dependent Services: Stop services that use Kafka.
Perform Maintenance: Execute the required Kafka maintenance tasks.
Restart Services: Restart the services and validate functionality.

Scenario 3: On-Premises Deployment Without HA

Manually Stop the Service: Stop the service on the VM.
Update Service Artifacts: Update the service binaries or Docker image.
Apply Configuration Changes: Update configuration files as needed.
Run Database Migrations: Execute database migration scripts manually if necessary.
Start the Service: Start the service and check logs for successful startup.

6. Conclusion

Proper deployment planning ensures stable operations in complex environments. By following these guidelines, teams can leverage zero-downtime deployments for routine updates while minimizing disruption during major upgrades. For on-premises environments without HA, manual deployment processes are outlined to ensure services run effectively.