TVR E2E Scenarios — Reference

Sources: TVR v2.3 §2.9.6 (Tables 2 & 3) · TVR v2.3 Appendix 1 · GTL v3 (Service level tests sheet) · STM

This document covers two related sources:

  1. TVR §2.9.6 — E2E performance tests confirmed as carried out in Phase 2 (Tables 2 and 3), assigning specific testbeds to each scenario.
  2. TVR Appendix 1 — Complete catalogue of E2E service scenarios with detailed definitions, grouped as performable (A.1) and non-performable (A.2) in Phase 2.

For the full list of 16 service scenarios (Table 1), see Service Scenarios.


TVR §2.9.6 — Phase 2 E2E Performance Tests

Table 2: Strand 3–6 E2E Performance Tests in Phase 2

E2E performance tests are also carried out across Strand 3 (DTH — SES, EBU), Strand 4, Strand 5 (Maritime), and Strand 6 (Arctic Space / DTE). These include:

  • WP 3.1.5 (SES DTH): Startup delay comparison across CDN/Edge/satellite methods; buffering frequency under different delivery methods
  • WP 3.x (EBU DTH): Equivalent startup and buffering tests for bidirectional satellite reception
  • Strand 5 (Maritime): Buffering and resilience tests under maritime connectivity conditions
  • Strand 6 / Arctic Space: Latency by region (C.1); scalability and error rate tests

Full strand-specific test plans are documented in the respective strand/WP technical notes.

Table 3: Strand 2 (WP 2.1) Tests Confirmed for Phase 2

These are the Strand 2 tests (defined in TVR §2.9.6, Table 3) that will be carried out in Phase 2, alongside tests specific to the testbed. Tests depend on the ability of far/near-edge components to collect statistics from the user player to evaluate QoE and QoS.

Scenario Use Case Testbed(s) Test Title KPI
B DTE RAI Assess if watch time drops with degraded quality Average Watch Time per User
E DTE RAI Test correlation between delivery method and playback continuity Average Session Duration
G DTE RAI Test if completions are higher in resilient methods (less buffering) Content Completion Rate
I DTE Arctic Space Quantify wasted resources if streams are abandoned early Content Completion Rate
J DTE RAI Identify abandonment spikes after buffering events Content Abandonment Points
K DTE RAI Test if users drop at predictable quality loss points Content Abandonment Points
M DTE, DTH SES, EBU, Maritime, Arctic Space Directly test startup times distribution across methods Start-up Delay
O DTE Maritime Stress-test buffering under poor networks vs resilient distribution Buffering Ratio / Rebuffering Events
P DTH, DTE SES, EBU, Maritime, RAI Compare buffering frequency under different methods Buffering Ratio / Rebuffering Events
R DTH, DTE SES, EBU, Arctic Space Track frequency of resolution switches under different delivery Buffering Ratio / Rebuffering Events
T DTE Maritime Test retries under network drops or overloaded servers Error Rate (Failed Attempts)
V DTE Maritime, Arctic Space Measure load effect on failure rates under stress tests Error Rate (Failed Attempts)
W DTE Maritime Quantify cost of wasted connection attempts Error Rate (Failed Attempts)
A.2 DTH, DTE SES, EBU, Maritime, RAI Track downtime as resilience indicator Availability (Uptime %)
C.1 DTE Arctic Space Measure response times across different geographies Latency by Region

Note: Struck-through scenarios (E, G, O, V, W) are no longer performable in Phase 2. E, G, O and W were dropped earlier following the STM update of April 2026 (no testbed coverage at RAI or Maritime); V has now also been reclassified as non-performable. TVR v2.3 (May 2026) is aligned: E, O, V and W appear in Appendix A.2 (non-performable); G has been removed from the appendix entirely. A number of other scenarios assume large-scale deployment and are not feasible in Phase 2 — these are detailed in Appendix 1, Section A.2 below.



A.1 — Performable Tests (Confirmed Feasible in Phase 2)

The following scenarios are confirmed as feasible in the Phase 2 testbeds and must be listed as test cases in the respective strand/WP test documents.

Scenario Title KVI KPI Testbeds
B Assess if watch time drops with degraded quality Quality Average Watch Time per User DTE
I Quantify wasted resources if streams are abandoned early Cost Efficiency Content Completion Rate DTE
J Identify abandonment spikes after buffering events Resilience Content Abandonment Points DTE
K Test if users drop at predictable quality loss points Quality Content Abandonment Points DTE
M Directly test startup times distribution across methods Quality Start-up Delay DTE & DTH
P Compare buffering frequency under different methods Quality Buffering Ratio / Rebuffering Events DTE
R Track frequency of resolution switches under different delivery Quality Bitrate Stability / QoE DTE & DTH
T Test retries under network drops or overloaded servers Resilience Error / Failure Rate DTE
A.2 Track downtime as resilience indicator Resilience Uptime / Availability DTE & DTH
C.1 Measure response times across different geographies Quality Latency by Region DTE

Scenario Descriptions

Scenario B — Assess if watch time drops with degraded quality

  • KVI: Quality · KPI: Average Watch Time per User · Testbeds: DTE
  • Input: Playback session logs with quality indicators (bitrate, resolution, buffering events); controlled quality degradation conditions (reduced bitrate, simulated packet loss, induced buffering); user engagement data under optimal and degraded conditions.
  • Output: Average watch time (min/user) segmented by distribution method and quality level; correlation analysis between quality degradation and watch time reduction.
  • Expectation: Watch time decreases as quality deteriorates; adaptive methods (CDN with ABR) mitigate the reduction.

Scenario I — Quantify wasted resources if streams are abandoned early

  • KVI: Cost Efficiency · KPI: Content Completion Rate · Testbeds: DTE (Arctic Space)
  • Input: Playback session logs showing stream start/end points; resource consumption data (bandwidth, compute) per stream; abandonment timestamps and causes.
  • Output: Resource waste ratio (resources consumed by abandoned streams vs. completed streams); cost model for wasted delivery per distribution method.
  • Expectation: Methods with higher completion rates waste fewer resources; early abandonment in centralised delivery is costlier due to longer data paths.

Scenario J — Identify abandonment spikes after buffering events

  • KVI: Resilience · KPI: Content Abandonment Points · Testbeds: DTE
  • Input: Playback logs showing exact abandonment timestamps; event data on buffering (frequency, duration, severity) preceding abandonment.
  • Output: Correlation between buffering events and abandonment points; identification of thresholds where buffering causes significant user drop-off.
  • Expectation: Abandonment spikes cluster immediately after prolonged or repeated buffering; resilient methods show fewer abandonment spikes.

Scenario K — Test if users drop at predictable quality loss points

  • KVI: Quality · KPI: Content Abandonment Points · Testbeds: DTE
  • Input: Playback logs recording quality degradation events (bitrate reductions, resolution drops); session abandonment logs aligned with those events.
  • Output: Patterns showing correlation between quality loss and session drop-offs; comparative data across distribution methods.
  • Expectation: Predictable abandonment points emerge at moments of severe quality loss; methods with adaptive bitrate may prevent sharp drops.

Scenario M — Directly test startup times distribution across methods

  • KVI: Quality · KPI: Start-up Delay · Testbeds: DTE & DTH
  • Input: Time measurements from user request initiation to video playback start across CDN, Edge, etc.; controlled tests under consistent network conditions.
  • Output: Startup delay (seconds) for each method — average, min, max, p95, p99.9 histograms; comparative dataset of responsiveness across methods.
  • Expectation: Edge or adaptive methods achieve lower average startup delays; centralised methods may show longer delays depending on server distance.

Scenario P — Compare buffering frequency under different methods

  • KVI: Quality · KPI: Buffering Ratio / Rebuffering Events · Testbeds: DTE
  • Input: Playback logs recording frequency of buffering events per stream; method identifiers (CDN, Edge, etc.).
  • Output: Mean, distribution, and peak buffering frequency per method; comparative dataset highlighting which methods deliver smoother playback.

Scenario R — Track frequency of resolution switches under different delivery

  • KVI: Quality · KPI: Bitrate Stability / QoE · Testbeds: DTE & DTH
  • Input: Playback event logs recording bitrate/resolution switches; timestamps and network conditions during switches.
  • Output: Resolution switch frequency per method; distribution of switch magnitude (up vs down); QoE impact correlation.
  • Expectation: More stable delivery methods show fewer downward resolution switches under variable conditions.

Scenario T — Test retries under network drops or overloaded servers

  • KVI: Resilience · KPI: Error / Failure Rate · Testbeds: DTE
  • Input: Request logs under simulated network drops and server overload conditions; method identifiers.
  • Output: Retry rates and failure rates per method; time-to-recovery measurements.
  • Expectation: Resilient methods with redundancy (CDN, edge failover) show lower failure rates and faster recovery.

Scenario A.2 — Track downtime as resilience indicator

  • KVI: Resilience · KPI: Uptime / Availability · Testbeds: DTE & DTH
  • Input: Uptime monitoring logs across all delivery endpoints; incident and failover event records.
  • Output: Availability percentage per method over the test period; correlation between architecture type and uptime.
  • Expectation: Distributed methods with redundancy achieve higher availability; centralised methods are more vulnerable to single points of failure.

Scenario C.1 — Measure response times across different geographies

  • KVI: Quality · KPI: Latency by Region · Testbeds: DTE (Arctic Space)
  • Input: Request-response timing data collected across geographically distributed endpoints; network path information; delivery method identifiers.
  • Output: Latency measurements (average, p95, p99) per region and delivery method; geographic latency heat maps; correlation between delivery architecture and regional performance.
  • Expectation: Edge and CDN methods achieve lower latency in remote regions compared to centralised approaches; satellite-augmented delivery narrows the latency gap for underserved geographies.

A.2 — Non-Performable Tests (Not Feasible in Phase 2)

The following scenarios cannot be executed in Phase 2 due to scale limitations, missing real-world user populations, or infrastructure constraints. They are documented for completeness and potential Phase 3 consideration.

Table mirrors TVR v2.3 Appendix A.2. Scenarios are listed in the order they appear in the appendix.

Scenario Title KVI KPI Reason Not Performable
A Compare watch times across regions/distribution methods to test accessibility Reach, Scalability Average Watch Time per User Requires multi-region real-user populations at scale
C Evaluate trade-off between longer engagement vs. energy consumption Cost Efficiency Average Watch Time / Energy per Viewing Hour Real-user longitudinal engagement data unavailable
D Measure whether different methods (CDN vs Edge) support longer sessions globally Reach, Quality Average Session Duration Global deployment at scale out of scope for Phase 2
E Test correlation between delivery method and playback continuity Quality Average Session Duration Reclassified by TVR v2.3 — no testbed coverage for this measurement in Phase 2
F Compare energy per minute of viewing across methods Cost Efficiency Average Session Duration / Energy Cross-method energy measurement requires live deployments at scale
H Measure impact of delivery on full content enjoyment Quality Content Completion Rate Requires real broadcast events with large viewer pools
L Calculate energy/data wasted due to mid-stream exits Cost Efficiency Content Abandonment Points Large-scale energy metering not available in testbeds
N Estimate resource cost of longer startup overheads Cost Efficiency Start-up Delay Cost modelling requires full production cost data
O Stress-test buffering under poor networks vs resilient distribution Resilience Buffering Ratio / Rebuffering Events Reclassified by TVR v2.3 — no Maritime/DTE testbed coverage in Phase 2
Q Energy wasted during idle buffering quantified Cost Efficiency Buffering Ratio / Rebuffering Events Requires energy instrumentation at viewer device scale
S Test link between bitrate volatility and energy inefficiency Cost Efficiency Bitrate Stability / QoE Energy measurement at the bitrate-event level not available
U Evaluate impact of failures on playback experience Quality Error / Failure Rate Requires real viewer feedback at scale
V Measure load effect on failure rates under stress tests Scalability Error Rate (Failed Attempts) Reclassified by TVR v2.3 — no testbed coverage for load-induced failure measurement in Phase 2
W Quantify cost of wasted connection attempts Cost Efficiency Error Rate (Failed Attempts) Reclassified by TVR v2.3 — no Maritime/DTE testbed coverage in Phase 2
X Record app crashes under varying load/network conditions Resilience Error / Failure Rate Requires real mobile app deployments at scale
Y Test if crashes correlate with poor QoE Quality Error / Failure Rate Requires cross-correlation of crash and QoE data at scale
Z Costs linked to re-initialising playback sessions Cost Efficiency Error / Failure Rate Production cost data not available
A.1 Verify geographic reach by uptime reporting across regions Reach Uptime / Availability Multi-region deployment at real geographic scale not available
A.3 Measure uptime under simulated high concurrency Quality, Scalability Availability (Uptime %) High-concurrency stress simulation (10k–100k users) beyond Phase 2 testbed capacity
A.4 Calculate costs of downtime incidents Cost Efficiency Availability (Uptime %) Production cost data on downtime impact not available
B.1 Test resilience to sudden spikes (flash crowd scenarios) Resilience Load Distribution Performance Flash-crowd-scale simulation beyond Phase 2 testbed capacity
B.2 Measure load balancing efficiency at scale (2M–20M concurrent viewers) Scalability Load Distribution Performance Multi-million-viewer concurrency out of scope for Phase 2
B.3 Test performance & resilience at 2M–20M concurrent viewers Security Load Distribution Performance Multi-million-viewer concurrency out of scope for Phase 2
B.4 Model cost of overprovisioning vs. efficient load sharing Cost Efficiency Load Distribution Performance Production-scale provisioning and cost data not available
C.2 Compare latency effects on playback smoothness Quality Latency by Region Requires per-session multi-region latency instrumentation at scale
C.3 Test multi-region scaling capabilities Scalability Latency by Region Multi-region scaling testbed not available in Phase 2
C.4 Estimate impact of latency on energy use Cost Efficiency Latency by Region Per-session energy measurement across regions not feasible at scale
D.1 Direct measurement of kWh required to deliver one stream under each method Cost Efficiency Energy Consumption per Stream Per-stream end-to-end energy metering not available across server/CDN/Edge/network elements
E.1 Standardised test: measure kWh per viewing hour across methods Cost Efficiency Energy Consumption per Hour of Viewing Cross-method controlled energy metering at scale not available
F.1 Compare idle vs active energy use under stress Resilience Server Energy Utilization Rate Continuous energy instrumentation under controlled stress not available in testbeds
F.2 Measure energy use under scaling workloads Scalability Server Energy Utilization Rate Scaling-workload simulation (100k–1M concurrent) beyond Phase 2 testbed capacity
F.3 Efficiency gains of high vs. low utilization Cost Efficiency Server Energy Utilization Rate Production-scale utilization data not available
G.1 Compare network-level energy usage per GB delivered Cost Efficiency Network Energy Intensity (kWh/GB) Network-element energy metering not available across routers/backhaul
H.1 Test resilience of systems under demand spikes Resilience Peak vs. Average Energy Demand Flash-crowd-scale demand simulation beyond Phase 2 testbed capacity
H.2 Compare scaling cost during peaks vs. baseline Scalability Peak vs. Average Energy Demand Production operational cost data not available
H.3 Identify inefficiencies caused by peak overcapacity Cost Efficiency Peak vs. Average Energy Demand Production overprovisioning data not available

Scenario G (Test if completions are higher in resilient methods) has been removed from TVR v2.3 Appendix 1 entirely and is no longer in either A.1 or A.2.