Framework

Sources: TVR v2.3 §2.6, §2 · SAI v2.2 §2.2, §4.1.1.7

Phase 2 Testing Scope

5G-EMERGE aims to evaluate the comparative effectiveness of different video distribution methods, with a focus on their ability to support reliable, high-quality, and sustainable content delivery.

Use Case Categories

The 5G-EMERGE ecosystem targets a range of market segments served by a common infrastructure. Four use case categories are in scope, each with its own service requirements and deployment scenarios:

Category	Description
DTH — Direct to Home	Media delivery to homes, residential and non-residential buildings, and nomadic gateways
DTV — Direct to Vehicle	Media and connectivity to vehicles in motion
DTE — Direct to Edge	Delivery to network edges: base stations, maritime nodes, broadcast relay infrastructure
DTD — Direct to Device	5G NR-NTN MBS broadcast directly to end-user devices

Service Scenarios

Four end-to-end service scenarios are defined as representative of the media delivery challenges in scope. They can be applied across all four use case categories:

Scenario	Media Service	Terrestrial IP Infrastructure
1	Distribution of linear / live TV channels at scale	Available to end users
2	Distribution of VoD services at scale	Available to end users
3	Distribution of linear / live TV channels at scale	Not available to end users
4	Distribution of VoD services at scale	Not available to end users

Further detail on use case categories and service scenarios is provided in the SAI document, which also defines the system architecture.

Phase 1 vs Phase 2

Phase	Focus
Phase 1	Primarily functional testing — verifying that system components are correctly designed and implemented. Only limited end-to-end tests were carried out.
Phase 2	Goes beyond functional and component testing. The aim is to verify that the 5G-EMERGE system meets end-to-end service-level performance requirements and key non-functional parameters (security, load and stress testing). Phase 2 is predominantly focused on non-functional testing, and in particular, performance testing.

Strand 2 Cross-Strand Testing

Strand 2 has defined a comprehensive set of cross-strand end-to-end tests designed to verify that the 5G-EMERGE system achieves the KVIs. The aims are:

A common approach to end-to-end testing across the consortium
The expression of Key Value Indicators (KVIs) via measurable Key Performance Indicators (KPIs)
Reporting standards for consistent evaluation
Scenarios pertinent to assessing the media delivery platform as a whole

Testing specific to Strands 3–6 (component-level and strand-specific functional validation) is documented separately and is a pre-condition for end-to-end service-level testing.

Testing is a cornerstone of systems engineering. It ensures quality, reduces risks, and verifies that solutions perform reliably in real-world conditions. Within 5G-EMERGE, testing serves to:

Identify Defects — uncover bugs, glitches, and performance issues before they reach end-users.
Ensure Quality — confirm the system meets requirements, performs consistently, and provides a positive user experience.
Reduce Risks — minimise risks relating to quality, security, and performance.
Lower Costs — detect and resolve issues early, avoiding expensive fixes after deployment.

All components must also comply with relevant standards for the technologies and regions in which they operate. These may include physical, accessibility, and intellectual property standards, among others.

Test Structure

Testing is organised into a clear hierarchy for comprehension and manageability.

Unit	Definition
Test Case	The smallest unit of testing. Tests one specific scenario, consisting of inputs, steps, and expected outcomes.
Test Suite	A collection of related test cases.
Test Script	A sequence of steps for executing a test case.
Traceability Matrix	Maps requirements, risks, or coverage areas to test cases.

This hierarchical structure ensures traceability from high-level requirements through to individual test outcomes, supporting both systematic coverage and reporting.

Test Execution

Execution of tests can be either conducted manually (Manual Testing) or automated through tools (Automated Testing).

Test Harnesses support component testing. For end-to-end validation in 5G-EMERGE, these are referred to as Test Beds. Where real components are unavailable, simulated components (mocks, stubs) may be substituted.

Execution scope is tailored according to risk and impact:

Scope	Purpose
Smoke Testing	Quick checks for build stability.
Sanity Testing	Focused validation of specific fixes or changes.
Regression Testing	Ensuring unchanged functionality remains intact.
Full Testing	Comprehensive execution of all test cases.

Automated testing conducted after each change is known as Continuous Testing.

It is recommended that, where possible, testing be automated and continuous, with full testing conducted periodically and change-based testing on each change.

Additional deliverables related to the test beds (e.g. design of demos) are covered by the deliverables in the associated work packages and are out of scope for this document.

Test Scale

In Phase 2, the test beds are designed to support the necessary Test Levels but are not scaled to support load tests and soak tests at production-expected volumes. A full set of large-scale test scenarios has been included in Appendix 1 of the TVR for completeness.

Where full-scale testing is not feasible, test scenarios may be conducted at scaled-down volumes to assess behaviour and determine limiting factors. Such tests should be designed so that they do not simply re-identify known architectural constraints (e.g. limitations arising from available spectrum, cluster sizes, or throughput).

The four use case categories defined in the SAI give the framing for what scale means in each deployment context:

Use Case	Deployment Context
DTH — Direct to Home	Media delivery to homes, residential/non-residential buildings, and nomadic gateways. Extends video streaming beyond the reach of 5G or terrestrial broadband.
DTV — Direct to Vehicle	Media and connectivity to vehicles in motion. Tests include satellite antenna pointing/tracking under dynamic conditions.
DTE — Direct to Edge	Delivery to network edges (base stations, maritime nodes, broadcast relay). Diverse infrastructure — fixed, Arctic, maritime, broadcast integration.
DTD — Direct to Device	5G NR-NTN MBS broadcast to end-user devices. Full chain from satellite transmission to UE player application.

Test Types

Different types of tests focus on specific aspects of the system to isolate and evaluate particular characteristics. Test types fall into two main categories.

Functional Testing

Checks that the system does exactly what it is specified to do. This was the primary focus in Phase 1. In Phase 2, functional testing of strand-specific components is a pre-condition for end-to-end service-level testing.

Non-Functional Testing

Verifies the readiness of a system according to non-functional parameters — performance, security, accessibility, scale — which functional testing alone cannot address.

5G-EMERGE Phase 2 is predominantly focused on non-functional testing, and in particular, performance testing.

Type	Description
Performance Testing	Examines throughput, latency, stability, reliability, scalability, and resource usage under a specified workload.
Security Testing	Unveils vulnerabilities to ensure the system is free from threats or risks; targets flaws that could lead to loss of data, revenue, or reputation.
Load Testing	Determines how the system behaves when accessed by multiple consumers simultaneously.
Stress Testing	Tests beyond normal operational capacity to evaluate limiting conditions.
Accessibility Testing	Ensures usability for users with and without disabilities. Note: as 5G-EMERGE is at a research and development stage, accessibility is not being evaluated at this time.

Test Reporting

To ensure results are comparable, reproducible, and interpretable across the consortium, each test is conducted under a standardised reporting framework defining inputs, outputs, and expectations.

Inputs

All tests must begin with a clear definition of required inputs:

System-Level Metrics — network logs, playback sessions, server utilisation, concurrency levels, energy monitoring, crash/error reports.
Environmental Parameters — geographic distribution of test nodes, network variability (latency, packet loss, bandwidth), simulated user loads.
Method Identifiers — explicit classification of the distribution method under test (CDN, Edge, etc.).

Inputs must be captured in structured, timestamped logs with a uniform schema so that results can be aggregated across test campaigns.

Outputs

Each test must produce outputs that are quantitative, comparable, and traceable:

Primary Metrics aligned to the KPI (e.g. watch time, session duration, completion rate, video resolution/fps, latency, kWh consumed).
Derived Indicators — correlations (e.g. buffering vs. abandonment) or efficiency ratios (e.g. energy per GB delivered).
Comparative Data — presented per distribution method, region, and test condition.

All outputs should be stored in a centralised results repository in standardised units (ms, %, kWh, MB, GB).

Expectations

Each use case is guided by explicit expectations:

Benchmark Performance — define what resilience, scalability, or quality should look like under given conditions (e.g. near-linear energy scaling, <2 s startup delay, >95% uptime under stress).
Identify Inefficiencies — highlight scenarios where energy, capacity, or network performance deviate from expected efficiency.
Support Comparability — ensure differences in performance can be attributed to the distribution method rather than test inconsistency.

Expectations must be written as testable hypotheses (e.g. "Edge delivery will reduce latency and energy per viewing minute compared to centralised CDN") to maintain a scientific research framework.

Report Requirements

Test reports must:

Document Context — record test date, location, duration, planned/actual network conditions, and distribution method.
Present Structured Results — use consistent table and chart formats to display inputs, outputs, and derived metrics.
Enable Cross-Test Analysis — support aggregation across methods and KPIs for holistic evaluation.
Flag Deviations — clearly identify where results diverge from expectations, positively or negatively.

All reports should be archived and versioned, allowing stakeholders to reproduce or validate findings.

Test Metrics — KVIs and KPIs

Phase 2 employs a structured set of Key Value Indicators (KVIs) aligned to six overarching evaluation dimensions. Each KVI is linked to specific measurable Key Performance Indicators (KPIs), enabling comparison of distribution approaches across the full range of use case categories.

KVI Dimensions:

Dimension	What it evaluates
Reach	Geographic availability and accessibility of the service
Resilience	Stability and recovery under stress, failure, or scaling
Quality	User-facing playback experience and stability
Scalability	Ability to handle load growth without degradation
Security	Protection of data, infrastructure, and communications
Cost Efficiency	Energy, resource, and operational cost per unit of delivery

Most KPIs are recorded as datasets. The following statistical measures are applied where appropriate: mean, standard deviation, lower/upper quartile, 95th percentile (p95), 99.9th percentile (p99.9).

Engagement and Usage KPIs

Engagement KPIs function as proxies for usability and adoption potential under different delivery frameworks. They help identify not only which distribution methods sustain attention, but why users may disengage.

KPI	Definition
Average Watch Time per User	Total cumulative watch duration ÷ total unique users in a period. Indicates sustained engagement per distribution method.
Average Session Duration	Total length of all sessions ÷ total number of sessions in a period.
Total Watch Time (aggregate)	Broad measure of usage volume per distribution approach.
Content Completion Rate	Completed views ÷ initiated views × 100. Reflects stability and user satisfaction.
Content Abandonment Points	Timestamp of departure relative to content length; reveals where users exit.
Sessions per User	Frequency of return visits or retries; potentially linked to technical quality.
Search-to-Play Conversion Rate	% of searches that result in playback (where content discovery is relevant).
Recommendation Engagement Rate	Actions on recommendations (clicks, watches, saves) ÷ impressions.

Technical Performance KPIs

Technical performance KPIs are central to comparing the efficiency, reliability, and scalability of different delivery methods. Variations across methods directly reflect differences in architectural efficiency.

KPI	Definition
Startup Delay (Video Start Time)	Time from playback request to first frame.
Buffering Ratio	Proportion of playback time spent buffering. Ideally zero.
Rebuffering Events per Session	Number of playback interruptions. Ideally zero.
Throughput	Bitrate provided to the end-user.
Bitrate Stability / QoE	Consistency of video quality across varying conditions.
Error Rate (Failed Playback Attempts)	% of streams that fail to start or crash.
Application / Platform Crash Rate	Failure frequency across devices or environments.
Availability (Uptime %)	Proportion of time the system is available for use.
Load Distribution Performance	Effectiveness in managing concurrent users and traffic spikes.
Latency by Region	Delivery speed across geographic locations; key for CDN vs. edge comparisons.

Energy and Resource Efficiency KPIs

Different distribution approaches carry distinct energy and resource footprints. These KPIs ensure evaluation moves beyond technical feasibility to include environmental and systemic impact.

KPI	Definition
Energy Consumption per Stream (kWh/stream)	Total energy to deliver a single stream from request to completion.
Energy Consumption per Viewing Hour (kWh/hour)	Standardised measure for cross-method energy efficiency comparison.
Server Energy Utilisation Rate (%)	Server energy used for active delivery vs. idle operation.
Network Energy Intensity (kWh/GB)	Energy consumed per GB transferred through the distribution network.
Data Transfer Volume per User (GB/user)	Total data delivered to a user; linked to energy use.
Carbon Intensity per Stream (gCO₂e/stream)	Estimated greenhouse gas emissions per stream delivered.
Peak vs. Average Energy Demand (kWh/kWh)	Energy consumption during high-demand periods vs. baseline.

Note: Some proposed energy tests assume large-scale deployment beyond the scope of Phase 2. Tests feasible within Phase 2 test beds are included in the overall test plan; others are recorded in TVR Appendix 1 for future reference.

Compute, AI and Security KPIs

Computing KPIs

These KPIs assess Nuvla's capability to deploy and operate ETSI-MEC / 3GPP edge applications on distributed edge infrastructure.

KPI
Average RAM utilisation
Average CPU utilisation
Average disk utilisation
Application deployment and instantiation time on the selected edge node
Deployment success rate under nominal and stressed resource conditions
Service continuity / stability during sustained workload execution

AI-Related KPIs

These KPIs evaluate the effectiveness, accuracy, adaptability, and performance efficiency of the AI-based WAAP (Web Application and API Protection) solution.

KPI	Target
Threat detection and mitigation rate (%)	≥ 94%
False positive rate (%)	≤ 2%
False negative rate (%)	≤ 2%
AI model inference latency (ms)	≤ 5 ms under nominal conditions (avg. request size 2.5 KB)
Throughput performance (requests/second)	No degradation vs. baseline WAAP
Logging completeness and integrity (%)	100% coverage; tamper-resistant
Request inspection coverage (%)	100% of request components

Security KPIs

These KPIs address confidentiality, integrity, and availability of edge infrastructure and communication channels, measured through controlled scenarios including attack simulations and misconfiguration attempts.

Far-Edge Security:

KPI
Secure boot integrity verification rate (%)
Unauthorised system modification detection rate (%)
Data protection at rest effectiveness
System recovery and key management robustness

Far-Edge Firewall:

KPI
Session logs under normal conditions and under attack (rules allowed/denied, latency, packet loss)
Traffic classification based on SaaS applications
Management traffic bandwidth (standard and under attack)
Firewall device health (CPU and RAM monitoring)

Satellite Link Security:

KPI	Target
Encryption coverage (%)	100% of satellite link traffic
Secure tunnel establishment success rate (%)	—
Resistance to traffic interception	Qualitative validation — all intercepted traffic appears as encrypted QUIC streams
Throughput efficiency under encryption (%)	Ratio of secured vs. baseline throughput
Packet loss resilience under secure transport (%)	Stable throughput under lossy satellite conditions

For the full list of test cases mapped to these KPIs, see Global Test Register. For the E2E scenario catalogue, see E2E Scenarios Reference.