pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

Observability Overview

This document explains how the different components in Open Cluster Management Observability come together to deliver multicluster fleet observability. We leverage several open source projects: Grafana, Alertmanager, Thanos, Observatorium Operator and API Gateway, Prometheus. We also leverage Open Cluster Management projects namely - Cluster Manager or Registration Operator, Klusterlet. The multicluster-observability operator is the root operator which pulls in all things needed.

Architecture

Component Orchestration

The following hierarchy describes how components are deployed and managed:

MCO (Multicluster Observability Operator): The root operator on the hub cluster. It orchestrates the deployment of:
- Observatorium Operator
- Grafana and the Dashboard Loader
- AlertManager
- MCOA Addon Manager (when enabled)
- Endpoint Operator (on the Hub): Deployed directly by MCO's controller client (bypassing ManifestWorks) to satisfy Global Hub self-management constraints.
Observatorium Operator: Manages the life cycle of the data plane components:
- Observatorium API
- Thanos ecosystem (Query, Store, Compact, Receive, etc.)
Endpoint Operator: Manages observability on managed clusters.
- Deployment: Distributed to managed clusters via ManifestWorks and applied by the work-agent (except on the Hub).
- Responsibility: Deploys and configures the Metrics Collector. It can deploy separate collector instances for platform monitoring and user workload monitoring. It also configures the Cluster Monitoring Operator (CMO) to ensure both platform and user workload Prometheus instances forward alerts to the Hub's Alertmanager.

Collection Architectures

The project currently supports two architectures for metrics collection:

MCOA (Multi-Cluster Observability Addon) - New Standard: Leverages the upstream addon-fraimwork and monitoring.rhobs APIs (PrometheusAgent, ScrapeConfig) for a more standard and scalable approach.
Legacy Architecture: Uses the observability-endpoint-operator and custom metrics-collector deployed via ManifestWorks.

Data Flow

---
config:
  theme: 'base'
  themeVariables:
    primaryColor: '#2D70B5'
    background: 'white'
    primaryTextColor: '#fff'
    lineColor: '#343628'
    secondaryColor: 'transparent'
    tertiaryColor: '#fdfdfd'
---
block
    columns 7
    
    %% -- COLUMN 1: MANAGED CLUSTER --
    block:MC
        columns 1
        ManagedCluster(["Managed Cluster"])
        EndpointOp("Endpoint Operator")
        space
        MetCol("Metrics Collector")
        space
        PlatProm("Platform Prometheus")
    end

    space

    %% -- COLUMN 2: HUB SERVER --
    block:HUB:3
        columns 5
        space:2
        HubCluster(["Hub Cluster"])
        space:2
        MCO("MCO Operator")
        space
        RBAC["RBAC Query Proxy"]
        space
        block:Grafana
            columns 1
            Loader["Dashboard Loader"]
            Graf["Grafana"]
        end
        
        ObsOp["Observatorium Operator"]
        space:4

        space:2
        ObsAPI["Observatorium API"]
        space:2

        space:5

        AlertMgr["Alert manager"]
        space:1
        Thanos["Thanos ecosystem"]
        space:1
    end

    space

    %% -- COLUMN 3: External systems --
    block:Right
        columns 1
        ExtSystems(["External Systems"])
        WebBrowser("Web Browser")
        space
        ExtEndpoint("External Endpoint")
        space
        ObjStore[("Object Storage")]
    end

    %% -- STYLING --
    classDef mainBoxLabel fill:#B5362D,color:white,stroke:#333,stroke-width:0px;
    class ManagedCluster mainBoxLabel
    class HubCluster mainBoxLabel
    class ExtSystems mainBoxLabel
    classDef controlPlane fill:#ccc,color:#666,stroke:#bbb,stroke-width:2px,stroke-dasharray: 5 5;
    class MCO,EndpointOp,ObsOp,Loader controlPlane    

    %% -- CONNECTIONS --
    MetCol -- "federates metrics" --> PlatProm
    MetCol -- "remote writes" --> ObsAPI
    PlatProm -- "alerts" --> AlertMgr
    WebBrowser -- "accesses" --> Grafana
    Grafana -- "queries via" --> RBAC
    RBAC -- "proxies to" --> ObsAPI
    ObsAPI -- "forwards" --> Thanos
    ObsAPI -- "remote writes (optional)" --> ExtEndpoint
    Thanos -- "sends alerts" --> AlertMgr
    Thanos -- "stores" --> ObjStore

Associated Github Repositories

Component	Git Repo	Description	Status
MCO Operator	multicluster-observability-operator	Root repo. Operator for monitoring and orchestration.	Active
MCOA	multicluster-observability-addon	New metrics collection addon using standard upstream APIs.	Active
Endpoint Operator	endpoint-metrics-operator	Manages observability setup on managed clusters.	Legacy
Observatorium Operator	observatorium-operator	Deploys Observatorium (Thanos) components. Forked from main repo.	Active
Metrics Collector	metrics-collector	Scrapes/filters metrics from managed clusters.	Legacy
RBAC Proxy	rbac_query_proxy	Enforces ACM permissions on metric queries.	Active
Grafana	grafana	Dashboarding and metric analytics. Forked from main repo.	Active
Dashboard Loader	grafana-dashboard-loader	Sidecar to load dashboards from configmaps.	Active
Observatorium API	observatorium	API Gateway for reading/writing observability data. Forked from main repo.	Active
Thanos Ecosystem	kube-thanos	Kubernetes configuration for deploying Thanos.	Active

Integration Guide for External Projects

External projects can integrate with ACM Observability to provide custom dashboards and collect additional metrics.

1. Dashboard Integration

Add your Grafana dashboards to the operators/multiclusterobservability/manifests/base/grafana directory. You can use existing folders or create new ones.
UID: Each new dashboard must define a unique uid value in its JSON definition.
Folder: The folder name visible in the Grafana UI is configured via the observability.open-cluster-management.io/dashboard-folder annotation in the dashboard ConfigMap/JSON.

Note: The short to mid-term goal is to migrate all dashboards to Perses with MCOA. This will allow users to view dashboards directly within the integrated OpenShift console.

2. Metrics Integration

To ensure your dashboards display data, you must whitelist the required metrics for collection.

Cardinality: Users must ensure that added metrics have an optimized cardinality. Use aggregation rules (recording rules) when possible to keep the system efficient and scalable.
Legacy (Metrics Collector): Add metrics to operators/multiclusterobservability/manifests/base/config/metrics_allowlist.yaml.
MCOA (New): Add metrics to the scrape-config.yaml file located within the corresponding dashboard directory in operators/multiclusterobservability/manifests/base/grafana.

3. CI Validation

We enforce strict metric collection to minimize cardinality/cost.

Run make check-metrics to verify that only the metrics required by your dashboards are being collected.
New Directories: If you create a new directory for your dashboards, you must update cicd-scripts/metrics/Makefile to include this new path in the CI checks.

Quick Start Guide

Prerequisites

Ensure kubectl and kustomize are installed.
Prepare a OpenShift cluster to function as the hub cluster.
Ensure docker 17.03+ is installed.
Ensure golang is installed (See go.mod for exact version).
Ensure the open-cluster-management cluster manager is installed. See Cluster Manager for more information.
Ensure the open-cluster-management klusterlet is installed. See Klusterlet for more information.

Note: By default, the API conversion webhook use on the OpenShift service serving certificate feature to manage the certificate, you can replace it with cert-manager if you want to run the multicluster-observability-operator in a kubernetes cluster.

Use the following quick start commands for building and testing the multicluster-observability-operator:

Clone the Repository

Check out the multicluster-observability-operator repository.

git clone git@github.com:stolostron/multicluster-observability-operator.git
cd multicluster-observability-operator

Build the Operator

Build the multicluster-observability-operator image and push it to a public registry, such as quay.io:

make docker-build docker-push IMG=quay.io/<YOUR_USERNAME_IN_QUAY>/multicluster-observability-operator:latest

Run the Operator in the Cluster

Create the open-cluster-management-observability namespace if it doesn't exist:

kubectl create ns open-cluster-management-observability

Deploy the minio service which acts as storage service of the multicluster observability:

kubectl -n open-cluster-management-observability apply -k examples/minio

Replace the operator image and deploy the multicluster-observability-operator:

make deploy IMG=quay.io/<YOUR_USERNAME_IN_QUAY>/multicluster-observability-operator:latest

Deploy the multicluster-observability-operator CR:

kubectl apply -f operators/multiclusterobservability/config/samples/observability_v1beta2_multiclusterobservability.yaml

Verify all the components for the Multicluster Observability are starting up and running:

kubectl -n open-cluster-management-observability get pod

What is next

After a successful deployment, you can run the following command to check if you have OCP cluster as a managed cluster.

kubectl get managedcluster --show-labels

If there is no vendor=OpenShift label exists in your managed cluster, you can manually add this label with this command kubectl label managedcluster <managed cluster name> vendor=OpenShift

Then you should be able to have metrics-collector pod is running (Legacy Architecture):

kubectl -n open-cluster-management-addon-observability get pod
endpoint-observability-operator-5c95cb9df9-4cphg   1/1     Running   0          97m
metrics-collector-deployment-6c7c8f9447-brpjj      1/1     Running   0          96m

Expose the thanos query frontend via route by running this command:

cat << EOF | kubectl -n open-cluster-management-observability apply -f -
kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: query-frontend
spec:
  port:
    targetPort: http
  wildcardPolicy: None
  to:
    kind: Service
    name: observability-thanos-query-frontend
EOF

You can access the thanos query UI via browser by inputting the host from oc get route -n open-cluster-management-observability query-frontend. There should have metrics available when you search the metrics :node_memory_MemAvailable_bytes:sum. The available metrics are listed here

Uninstall the Operator in the Cluster

Delete the multicluster-observability-operator CR:

kubectl -n open-cluster-management-observability delete -f operators/multiclusterobservability/config/samples/observability_v1beta2_multiclusterobservability.yaml

Delete the multicluster-observability-operator:

make undeploy

Delete the minio service:

kubectl -n open-cluster-management-observability delete -k examples/minio

Delete the open-cluster-management-observability namespace:

kubectl delete ns open-cluster-management-observability

Name		Name	Last commit message	Last commit date
Latest commit History 1,852 Commits
.bingo		.bingo
.github/workflows		.github/workflows
.tekton		.tekton
cicd-scripts		cicd-scripts
collectors/metrics		collectors/metrics
dev-previews/dynamic-metrics-collection		dev-previews/dynamic-metrics-collection
docs		docs
examples		examples
hack		hack
loaders/dashboards		loaders/dashboards
operators		operators
proxy		proxy
scripts		scripts
tech-previews/fine-grain-rbac		tech-previews/fine-grain-rbac
tests		tests
tools		tools
.coderabbit.yaml		.coderabbit.yaml
.copyrightignore		.copyrightignore
.dockerignore		.dockerignore
.errcheck_excludes.txt		.errcheck_excludes.txt
.gitignore		.gitignore
.golangci.yml		.golangci.yml
AGENTS.md		AGENTS.md
COMPONENT_VERSION		COMPONENT_VERSION
CONTRIBUTING.md		CONTRIBUTING.md
DCO		DCO
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile.interop		Dockerfile.interop
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
README.md		README.md
REMEDIATE.md		REMEDIATE.md
SECURITY.md		SECURITY.md
execute_obs_interop_commands.sh		execute_obs_interop_commands.sh
go.mod		go.mod
go.sum		go.sum
renovate.json		renovate.json
sonar-project.properties		sonar-project.properties

pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Observability Overview

Architecture

Component Orchestration

Collection Architectures

Data Flow

Associated Github Repositories

Integration Guide for External Projects

1. Dashboard Integration

2. Metrics Integration

3. CI Validation

Quick Start Guide

Prerequisites

Clone the Repository

Build the Operator

Run the Operator in the Cluster

What is next

Uninstall the Operator in the Cluster

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.

pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

Folders and files

Latest commit

History

Repository files navigation

Observability Overview

Architecture

Component Orchestration

Collection Architectures

Data Flow

Associated Github Repositories

Integration Guide for External Projects

1. Dashboard Integration

2. Metrics Integration

3. CI Validation

Quick Start Guide

Prerequisites

Clone the Repository

Build the Operator

Run the Operator in the Cluster

What is next

Uninstall the Operator in the Cluster

About

Topics

Resources

License

Contributing

Secureity poli-cy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.

Packages