TVR E2E Scenarios — Reference
Sources: TVR v2.3 §2.9.6 (Tables 2 & 3) · TVR v2.3 Appendix 1 · GTL v3 (Service level tests sheet) · STM
This document covers two related sources:
- TVR §2.9.6 — E2E performance tests confirmed as carried out in Phase 2 (Tables 2 and 3), assigning specific testbeds to each scenario.
- TVR Appendix 1 — Complete catalogue of E2E service scenarios with detailed definitions, grouped as performable (A.1) and non-performable (A.2) in Phase 2.
For the full list of 16 service scenarios (Table 1), see Service Scenarios.
TVR §2.9.6 — Phase 2 E2E Performance Tests
Table 2: Strand 3–6 E2E Performance Tests in Phase 2
E2E performance tests are also carried out across Strand 3 (DTH — SES, EBU), Strand 4, Strand 5 (Maritime), and Strand 6 (Arctic Space / DTE). These include:
- WP 3.1.5 (SES DTH): Startup delay comparison across CDN/Edge/satellite methods; buffering frequency under different delivery methods
- WP 3.x (EBU DTH): Equivalent startup and buffering tests for bidirectional satellite reception
- Strand 5 (Maritime): Buffering and resilience tests under maritime connectivity conditions
- Strand 6 / Arctic Space: Latency by region (C.1); scalability and error rate tests
Full strand-specific test plans are documented in the respective strand/WP technical notes.
Table 3: Strand 2 (WP 2.1) Tests Confirmed for Phase 2
These are the Strand 2 tests (defined in TVR §2.9.6, Table 3) that will be carried out in Phase 2, alongside tests specific to the testbed. Tests depend on the ability of far/near-edge components to collect statistics from the user player to evaluate QoE and QoS.
| Scenario | Use Case | Testbed(s) | Test Title | KPI |
|---|---|---|---|---|
| B | DTE | RAI | Assess if watch time drops with degraded quality | Average Watch Time per User |
| I | DTE | Arctic Space | Quantify wasted resources if streams are abandoned early | Content Completion Rate |
| J | DTE | RAI | Identify abandonment spikes after buffering events | Content Abandonment Points |
| K | DTE | RAI | Test if users drop at predictable quality loss points | Content Abandonment Points |
| M | DTE, DTH | SES, EBU, Maritime, Arctic Space | Directly test startup times distribution across methods | Start-up Delay |
| P | DTH, DTE | SES, EBU, Maritime, RAI | Compare buffering frequency under different methods | Buffering Ratio / Rebuffering Events |
| R | DTH, DTE | SES, EBU, Arctic Space | Track frequency of resolution switches under different delivery | Buffering Ratio / Rebuffering Events |
| T | DTE | Maritime | Test retries under network drops or overloaded servers | Error Rate (Failed Attempts) |
| A.2 | DTH, DTE | SES, EBU, Maritime, RAI | Track downtime as resilience indicator | Availability (Uptime %) |
| C.1 | DTE | Arctic Space | Measure response times across different geographies | Latency by Region |
Note: Struck-through scenarios (E, G, O, V, W) are no longer performable in Phase 2. E, G, O and W were dropped earlier following the STM update of April 2026 (no testbed coverage at RAI or Maritime); V has now also been reclassified as non-performable. TVR v2.3 (May 2026) is aligned: E, O, V and W appear in Appendix A.2 (non-performable); G has been removed from the appendix entirely. A number of other scenarios assume large-scale deployment and are not feasible in Phase 2 — these are detailed in Appendix 1, Section A.2 below.
A.1 — Performable Tests (Confirmed Feasible in Phase 2)
The following scenarios are confirmed as feasible in the Phase 2 testbeds and must be listed as test cases in the respective strand/WP test documents.
| Scenario | Title | KVI | KPI | Testbeds |
|---|---|---|---|---|
| B | Assess if watch time drops with degraded quality | Quality | Average Watch Time per User | DTE |
| I | Quantify wasted resources if streams are abandoned early | Cost Efficiency | Content Completion Rate | DTE |
| J | Identify abandonment spikes after buffering events | Resilience | Content Abandonment Points | DTE |
| K | Test if users drop at predictable quality loss points | Quality | Content Abandonment Points | DTE |
| M | Directly test startup times distribution across methods | Quality | Start-up Delay | DTE & DTH |
| P | Compare buffering frequency under different methods | Quality | Buffering Ratio / Rebuffering Events | DTE |
| R | Track frequency of resolution switches under different delivery | Quality | Bitrate Stability / QoE | DTE & DTH |
| T | Test retries under network drops or overloaded servers | Resilience | Error / Failure Rate | DTE |
| A.2 | Track downtime as resilience indicator | Resilience | Uptime / Availability | DTE & DTH |
| C.1 | Measure response times across different geographies | Quality | Latency by Region | DTE |
Scenario Descriptions
Scenario B — Assess if watch time drops with degraded quality
- KVI: Quality · KPI: Average Watch Time per User · Testbeds: DTE
- Input: Playback session logs with quality indicators (bitrate, resolution, buffering events); controlled quality degradation conditions (reduced bitrate, simulated packet loss, induced buffering); user engagement data under optimal and degraded conditions.
- Output: Average watch time (min/user) segmented by distribution method and quality level; correlation analysis between quality degradation and watch time reduction.
- Expectation: Watch time decreases as quality deteriorates; adaptive methods (CDN with ABR) mitigate the reduction.
Scenario I — Quantify wasted resources if streams are abandoned early
- KVI: Cost Efficiency · KPI: Content Completion Rate · Testbeds: DTE (Arctic Space)
- Input: Playback session logs showing stream start/end points; resource consumption data (bandwidth, compute) per stream; abandonment timestamps and causes.
- Output: Resource waste ratio (resources consumed by abandoned streams vs. completed streams); cost model for wasted delivery per distribution method.
- Expectation: Methods with higher completion rates waste fewer resources; early abandonment in centralised delivery is costlier due to longer data paths.
Scenario J — Identify abandonment spikes after buffering events
- KVI: Resilience · KPI: Content Abandonment Points · Testbeds: DTE
- Input: Playback logs showing exact abandonment timestamps; event data on buffering (frequency, duration, severity) preceding abandonment.
- Output: Correlation between buffering events and abandonment points; identification of thresholds where buffering causes significant user drop-off.
- Expectation: Abandonment spikes cluster immediately after prolonged or repeated buffering; resilient methods show fewer abandonment spikes.
Scenario K — Test if users drop at predictable quality loss points
- KVI: Quality · KPI: Content Abandonment Points · Testbeds: DTE
- Input: Playback logs recording quality degradation events (bitrate reductions, resolution drops); session abandonment logs aligned with those events.
- Output: Patterns showing correlation between quality loss and session drop-offs; comparative data across distribution methods.
- Expectation: Predictable abandonment points emerge at moments of severe quality loss; methods with adaptive bitrate may prevent sharp drops.
Scenario M — Directly test startup times distribution across methods
- KVI: Quality · KPI: Start-up Delay · Testbeds: DTE & DTH
- Input: Time measurements from user request initiation to video playback start across CDN, Edge, etc.; controlled tests under consistent network conditions.
- Output: Startup delay (seconds) for each method — average, min, max, p95, p99.9 histograms; comparative dataset of responsiveness across methods.
- Expectation: Edge or adaptive methods achieve lower average startup delays; centralised methods may show longer delays depending on server distance.
Scenario P — Compare buffering frequency under different methods
- KVI: Quality · KPI: Buffering Ratio / Rebuffering Events · Testbeds: DTE
- Input: Playback logs recording frequency of buffering events per stream; method identifiers (CDN, Edge, etc.).
- Output: Mean, distribution, and peak buffering frequency per method; comparative dataset highlighting which methods deliver smoother playback.
Scenario R — Track frequency of resolution switches under different delivery
- KVI: Quality · KPI: Bitrate Stability / QoE · Testbeds: DTE & DTH
- Input: Playback event logs recording bitrate/resolution switches; timestamps and network conditions during switches.
- Output: Resolution switch frequency per method; distribution of switch magnitude (up vs down); QoE impact correlation.
- Expectation: More stable delivery methods show fewer downward resolution switches under variable conditions.
Scenario T — Test retries under network drops or overloaded servers
- KVI: Resilience · KPI: Error / Failure Rate · Testbeds: DTE
- Input: Request logs under simulated network drops and server overload conditions; method identifiers.
- Output: Retry rates and failure rates per method; time-to-recovery measurements.
- Expectation: Resilient methods with redundancy (CDN, edge failover) show lower failure rates and faster recovery.
Scenario A.2 — Track downtime as resilience indicator
- KVI: Resilience · KPI: Uptime / Availability · Testbeds: DTE & DTH
- Input: Uptime monitoring logs across all delivery endpoints; incident and failover event records.
- Output: Availability percentage per method over the test period; correlation between architecture type and uptime.
- Expectation: Distributed methods with redundancy achieve higher availability; centralised methods are more vulnerable to single points of failure.
Scenario C.1 — Measure response times across different geographies
- KVI: Quality · KPI: Latency by Region · Testbeds: DTE (Arctic Space)
- Input: Request-response timing data collected across geographically distributed endpoints; network path information; delivery method identifiers.
- Output: Latency measurements (average, p95, p99) per region and delivery method; geographic latency heat maps; correlation between delivery architecture and regional performance.
- Expectation: Edge and CDN methods achieve lower latency in remote regions compared to centralised approaches; satellite-augmented delivery narrows the latency gap for underserved geographies.
A.2 — Non-Performable Tests (Not Feasible in Phase 2)
The following scenarios cannot be executed in Phase 2 due to scale limitations, missing real-world user populations, or infrastructure constraints. They are documented for completeness and potential Phase 3 consideration.
Table mirrors TVR v2.3 Appendix A.2. Scenarios are listed in the order they appear in the appendix.
| Scenario | Title | KVI | KPI | Reason Not Performable |
|---|---|---|---|---|
| A | Compare watch times across regions/distribution methods to test accessibility | Reach, Scalability | Average Watch Time per User | Requires multi-region real-user populations at scale |
| C | Evaluate trade-off between longer engagement vs. energy consumption | Cost Efficiency | Average Watch Time / Energy per Viewing Hour | Real-user longitudinal engagement data unavailable |
| D | Measure whether different methods (CDN vs Edge) support longer sessions globally | Reach, Quality | Average Session Duration | Global deployment at scale out of scope for Phase 2 |
| E | Test correlation between delivery method and playback continuity | Quality | Average Session Duration | Reclassified by TVR v2.3 — no testbed coverage for this measurement in Phase 2 |
| F | Compare energy per minute of viewing across methods | Cost Efficiency | Average Session Duration / Energy | Cross-method energy measurement requires live deployments at scale |
| H | Measure impact of delivery on full content enjoyment | Quality | Content Completion Rate | Requires real broadcast events with large viewer pools |
| L | Calculate energy/data wasted due to mid-stream exits | Cost Efficiency | Content Abandonment Points | Large-scale energy metering not available in testbeds |
| N | Estimate resource cost of longer startup overheads | Cost Efficiency | Start-up Delay | Cost modelling requires full production cost data |
| O | Stress-test buffering under poor networks vs resilient distribution | Resilience | Buffering Ratio / Rebuffering Events | Reclassified by TVR v2.3 — no Maritime/DTE testbed coverage in Phase 2 |
| Q | Energy wasted during idle buffering quantified | Cost Efficiency | Buffering Ratio / Rebuffering Events | Requires energy instrumentation at viewer device scale |
| S | Test link between bitrate volatility and energy inefficiency | Cost Efficiency | Bitrate Stability / QoE | Energy measurement at the bitrate-event level not available |
| U | Evaluate impact of failures on playback experience | Quality | Error / Failure Rate | Requires real viewer feedback at scale |
| V | Measure load effect on failure rates under stress tests | Scalability | Error Rate (Failed Attempts) | Reclassified by TVR v2.3 — no testbed coverage for load-induced failure measurement in Phase 2 |
| W | Quantify cost of wasted connection attempts | Cost Efficiency | Error Rate (Failed Attempts) | Reclassified by TVR v2.3 — no Maritime/DTE testbed coverage in Phase 2 |
| X | Record app crashes under varying load/network conditions | Resilience | Error / Failure Rate | Requires real mobile app deployments at scale |
| Y | Test if crashes correlate with poor QoE | Quality | Error / Failure Rate | Requires cross-correlation of crash and QoE data at scale |
| Z | Costs linked to re-initialising playback sessions | Cost Efficiency | Error / Failure Rate | Production cost data not available |
| A.1 | Verify geographic reach by uptime reporting across regions | Reach | Uptime / Availability | Multi-region deployment at real geographic scale not available |
| A.3 | Measure uptime under simulated high concurrency | Quality, Scalability | Availability (Uptime %) | High-concurrency stress simulation (10k–100k users) beyond Phase 2 testbed capacity |
| A.4 | Calculate costs of downtime incidents | Cost Efficiency | Availability (Uptime %) | Production cost data on downtime impact not available |
| B.1 | Test resilience to sudden spikes (flash crowd scenarios) | Resilience | Load Distribution Performance | Flash-crowd-scale simulation beyond Phase 2 testbed capacity |
| B.2 | Measure load balancing efficiency at scale (2M–20M concurrent viewers) | Scalability | Load Distribution Performance | Multi-million-viewer concurrency out of scope for Phase 2 |
| B.3 | Test performance & resilience at 2M–20M concurrent viewers | Security | Load Distribution Performance | Multi-million-viewer concurrency out of scope for Phase 2 |
| B.4 | Model cost of overprovisioning vs. efficient load sharing | Cost Efficiency | Load Distribution Performance | Production-scale provisioning and cost data not available |
| C.2 | Compare latency effects on playback smoothness | Quality | Latency by Region | Requires per-session multi-region latency instrumentation at scale |
| C.3 | Test multi-region scaling capabilities | Scalability | Latency by Region | Multi-region scaling testbed not available in Phase 2 |
| C.4 | Estimate impact of latency on energy use | Cost Efficiency | Latency by Region | Per-session energy measurement across regions not feasible at scale |
| D.1 | Direct measurement of kWh required to deliver one stream under each method | Cost Efficiency | Energy Consumption per Stream | Per-stream end-to-end energy metering not available across server/CDN/Edge/network elements |
| E.1 | Standardised test: measure kWh per viewing hour across methods | Cost Efficiency | Energy Consumption per Hour of Viewing | Cross-method controlled energy metering at scale not available |
| F.1 | Compare idle vs active energy use under stress | Resilience | Server Energy Utilization Rate | Continuous energy instrumentation under controlled stress not available in testbeds |
| F.2 | Measure energy use under scaling workloads | Scalability | Server Energy Utilization Rate | Scaling-workload simulation (100k–1M concurrent) beyond Phase 2 testbed capacity |
| F.3 | Efficiency gains of high vs. low utilization | Cost Efficiency | Server Energy Utilization Rate | Production-scale utilization data not available |
| G.1 | Compare network-level energy usage per GB delivered | Cost Efficiency | Network Energy Intensity (kWh/GB) | Network-element energy metering not available across routers/backhaul |
| H.1 | Test resilience of systems under demand spikes | Resilience | Peak vs. Average Energy Demand | Flash-crowd-scale demand simulation beyond Phase 2 testbed capacity |
| H.2 | Compare scaling cost during peaks vs. baseline | Scalability | Peak vs. Average Energy Demand | Production operational cost data not available |
| H.3 | Identify inefficiencies caused by peak overcapacity | Cost Efficiency | Peak vs. Average Energy Demand | Production overprovisioning data not available |
Scenario G (Test if completions are higher in resilient methods) has been removed from TVR v2.3 Appendix 1 entirely and is no longer in either A.1 or A.2.