Operations & reliability

FortisX is designed to run as a long-lived, continuously operating platform that observes multiple validator-based networks and supports allocation and rebalancing decisions. This requires operational practices that focus not only on service availability, but also on data freshness, integrity, and the ability to explain behaviour over time.

This section describes the operational objectives and reliability model of FortisX: how the platform is monitored, how incidents and data issues are handled, how changes are introduced, and how continuity is maintained in the presence of dependencies such as networks, providers, and infrastructure.

Operational objectives

Operationally, FortisX is built around a small set of objectives:

Service availability — the core services responsible for ingest, analytics, risk modeling, policy evaluation, and external interfaces should remain available and responsive under normal conditions and degrade in a controlled manner under stress.
Data freshness — metrics and risk assessments should reflect network and validator conditions with bounded delay, subject to the capabilities of upstream data sources.
Data integrity and reproducibility — data used for analytics and policies should be internally consistent, and it should be possible to reconstruct past views using recorded inputs and model versions.
Transparent degradation — when external dependencies fail or degrade, the impact on FortisX should be visible, and affected outputs should be marked accordingly rather than silently treated as complete.

These objectives shape the design of monitoring, incident handling, and change management across the platform.

Availability and data freshness

FortisX distinguishes between two related but separate dimensions:

Service availability – whether core components (ingest, storage, analytics, risk modeling, policy engine, APIs) are reachable and performing within expected bounds.
Data freshness – how recent the underlying metrics and risk profiles are relative to current network conditions.

The platform tracks both:

For each supported network, FortisX maintains indicators of:
- the most recent block, slot, or epoch observed;
- the lag between network head and ingested data;
- the recency of derived metrics and risk profiles.
For each core service, FortisX tracks:
- request success rates and latencies;
- internal queue depths and processing latencies;
- resource utilisation and error rates.

Users and external systems can inspect both kinds of information, so they can distinguish between an unavailable service and a service that is available but operating with stale or degraded data for a particular network.

Monitoring and alerting

Monitoring is integrated into each layer of the architecture:

Ingest and data pipeline
- connectivity and error rates for upstream sources (nodes, RPC providers, indexers);
- progress metrics (for example, last processed height) per network and per collector;
- validation and quality checks, such as detection of missing or inconsistent data segments.
Storage and aggregation
- health of underlying storage systems;
- success rates and performance of aggregation jobs;
- capacity indicators and growth rates.
Analytics and risk modeling
- timing and success of metric computation and risk profile updates;
- detection of unexpected discontinuities or anomalies in derived metrics.
Policy engine and interfaces
- frequency and success of policy evaluations;
- volumes and latency of API calls and integration events;
- error conditions in dashboards, APIs, and downstream delivery.

Alerting rules are defined for conditions such as:

sustained lag in data ingest for a network;
repeated failures in a collector or aggregation job;
anomalies in data volumes or distributions that may indicate upstream or internal issues;
degradation in API or dashboard responsiveness.

Alerts are routed to operational teams for investigation and, where necessary, escalation.

Incident response

When incidents occur, FortisX follows a structured process that separates technical remediation from post-incident analysis.

Typical steps include:

Detection and triage
- Confirming whether the alert corresponds to a real issue.
- Classifying the incident (for example, ingest disruption, data quality issue, analytics failure, interface degradation).
Containment and mitigation
- Taking steps to prevent the issue from propagating (for example, isolating faulty data sources or pausing specific jobs).
- Providing clear signals to downstream systems (for example, marking affected metrics as degraded or temporarily withholding certain outputs).
Resolution
- Restoring normal behaviour of affected components.
- Initiating backfill or recomputation processes where data has been skipped or corrupted.
Post-incident review
- Documenting the timeline, root causes, and impact.
- Identifying corrective actions, such as improved validation, additional monitoring, or architectural adjustments.

Where incidents have a visible impact on external consumers, FortisX records and communicates the scope of the incident so that users can interpret affected analytics and decisions in context.

Change management and releases

The platform is designed to evolve over time: new networks may be added, models and metrics may be refined, and policies and interfaces may be extended. To manage this safely, FortisX applies disciplined change management:

Environment separation Changes are developed and tested in non-production environments that mirror the structure of the production system as closely as practical.
Incremental rollout New features or model versions may be introduced gradually:
- running in shadow mode alongside existing versions;
- processing real data but not yet driving decisions;
- with explicit comparison of outputs before activation.
Configuration and model versioning Risk models, policies, and key configuration parameters are versioned, with clear delineation between code changes and configuration changes.
Release records Releases are accompanied by records that describe:
- what has changed;
- which components and networks are affected;
- any expected impact on metrics, risk profiles, or policy outputs.

When changes affect the semantics of metrics, risk scores, or policies, these changes are reflected in documentation and, where necessary, in the whitepaper itself.

Data backfills and corrections

Because FortisX depends on external data sources and complex processing, there are cases where:

data is temporarily missing or incomplete;
upstream providers later correct or revise information;
model definitions change in ways that require recomputation.

To handle this, the platform supports controlled backfills and recomputations:

Raw event logs and snapshots are retained for periods sufficient to reconstruct relevant historical views.
Backfill jobs can be run for specific time ranges, networks, or entities to close gaps or incorporate corrected upstream data.
Derived metrics, risk profiles, and policy outputs can be recomputed based on the same raw inputs and model versions, or under updated models where analysis requires it.

Backfills and corrections are tracked so that users can see when historical data or indicators have been updated and under which model or configuration.

Dependency management and external providers

FortisX depends on several categories of external components:

validator-based networks themselves;
RPC providers and indexing services;
infrastructure platforms (for example, compute, storage, networking);
optional third-party data services.

Operational practices include:

avoiding single points of dependency where possible (for example, using multiple data providers for critical networks);
monitoring the health and behaviour of upstream services and reacting when they fail or deviate from expected patterns;
maintaining clear interfaces so that providers can be changed or reconfigured without redesigning core services.

When external dependencies experience outages or behavioural changes, FortisX aims to:

constrain the impact to the networks or features directly affected;
maintain internal consistency by marking certain metrics or outputs as degraded rather than extrapolating from incomplete data.

Operational transparency for users

FortisX is intended to be part of broader operational and risk frameworks. To support this, the platform exposes operational signals to users and integrating systems, such as:

indicators of data freshness and quality for each network;
status of core platform services and interfaces;
summaries of recent incidents that have affected data or service availability;
version information for risk models and policies currently in force.

This operational transparency allows organisations to:

incorporate FortisX status into their own monitoring and incident processes;
align internal governance (for example, risk committees, change advisory boards) with changes and events in the platform;
interpret allocation proposals and analytics in light of the operational conditions under which they were produced.

Summary

Operational discipline and reliability are central to the role of FortisX as a staking and analytics platform. The system is designed to:

separate concerns between ingest, analytics, risk modeling, policies, and interfaces;
monitor each component and its external dependencies;
handle incidents, data gaps, and model changes in a controlled and traceable way;
expose sufficient operational information for users to rely on its outputs within their own control frameworks.

The next section describes how these operational practices are complemented by security measures, audits, and compliance considerations that govern how FortisX is built and run.

PreviousPolicy engine & allocation rules NextSecurity, audits & compliance

Last updated 4 months ago

hashtagOperational objectives

hashtagAvailability and data freshness

hashtagMonitoring and alerting

hashtagIncident response

hashtagChange management and releases

hashtagData backfills and corrections

hashtagDependency management and external providers

hashtagOperational transparency for users

hashtagSummary