TVR E2E Scenarios — Reference

Sources: TVR v2.3 §2.9.6 (Tables 2 & 3) · TVR v2.3 Appendix 1 · GTL v3 (Service level tests sheet) · STM

This document covers two related sources:

TVR §2.9.6 — E2E performance tests confirmed as carried out in Phase 2 (Tables 2 and 3), assigning specific testbeds to each scenario.
TVR Appendix 1 — Complete catalogue of E2E service scenarios with detailed definitions, grouped as performable (A.1) and non-performable (A.2) in Phase 2.

For the full list of 16 service scenarios (Table 1), see Service Scenarios.

TVR §2.9.6 — Phase 2 E2E Performance Tests

Table 2: Strand 3–6 E2E Performance Tests in Phase 2

E2E performance tests are also carried out across Strand 3 (DTH — SES, EBU), Strand 4, Strand 5 (Maritime), and Strand 6 (Arctic Space / DTE). These include:

WP 3.1.5 (SES DTH): Startup delay comparison across CDN/Edge/satellite methods; buffering frequency under different delivery methods
WP 3.x (EBU DTH): Equivalent startup and buffering tests for bidirectional satellite reception
Strand 5 (Maritime): Buffering and resilience tests under maritime connectivity conditions
Strand 6 / Arctic Space: Latency by region (C.1); scalability and error rate tests

Full strand-specific test plans are documented in the respective strand/WP technical notes.

Table 3: Strand 2 (WP 2.1) Tests Confirmed for Phase 2

These are the Strand 2 tests (defined in TVR §2.9.6, Table 3) that will be carried out in Phase 2, alongside tests specific to the testbed. Tests depend on the ability of far/near-edge components to collect statistics from the user player to evaluate QoE and QoS.

Scenario	Use Case	Testbed(s)	Test Title	KPI
B	DTE	RAI	Assess if watch time drops with degraded quality	Average Watch Time per User
E	~~DTE~~	~~RAI~~	~~Test correlation between delivery method and playback continuity~~	~~Average Session Duration~~
G	~~DTE~~	~~RAI~~	~~Test if completions are higher in resilient methods (less buffering)~~	~~Content Completion Rate~~
I	DTE	Arctic Space	Quantify wasted resources if streams are abandoned early	Content Completion Rate
J	DTE	RAI	Identify abandonment spikes after buffering events	Content Abandonment Points
K	DTE	RAI	Test if users drop at predictable quality loss points	Content Abandonment Points
M	DTE, DTH	SES, EBU, Maritime, Arctic Space	Directly test startup times distribution across methods	Start-up Delay
O	~~DTE~~	~~Maritime~~	~~Stress-test buffering under poor networks vs resilient distribution~~	~~Buffering Ratio / Rebuffering Events~~
P	DTH, DTE	SES, EBU, Maritime, RAI	Compare buffering frequency under different methods	Buffering Ratio / Rebuffering Events
R	DTH, DTE	SES, EBU, Arctic Space	Track frequency of resolution switches under different delivery	Buffering Ratio / Rebuffering Events
T	DTE	Maritime	Test retries under network drops or overloaded servers	Error Rate (Failed Attempts)
V	~~DTE~~	~~Maritime, Arctic Space~~	~~Measure load effect on failure rates under stress tests~~	~~Error Rate (Failed Attempts)~~
W	~~DTE~~	~~Maritime~~	~~Quantify cost of wasted connection attempts~~	~~Error Rate (Failed Attempts)~~
A.2	DTH, DTE	SES, EBU, Maritime, RAI	Track downtime as resilience indicator	Availability (Uptime %)
C.1	DTE	Arctic Space	Measure response times across different geographies	Latency by Region

Note: Struck-through scenarios (E, G, O, V, W) are no longer performable in Phase 2. E, G, O and W were dropped earlier following the STM update of April 2026 (no testbed coverage at RAI or Maritime); V has now also been reclassified as non-performable. TVR v2.3 (May 2026) is aligned: E, O, V and W appear in Appendix A.2 (non-performable); G has been removed from the appendix entirely. A number of other scenarios assume large-scale deployment and are not feasible in Phase 2 — these are detailed in Appendix 1, Section A.2 below.

A.1 — Performable Tests (Confirmed Feasible in Phase 2)

The following scenarios are confirmed as feasible in the Phase 2 testbeds and must be listed as test cases in the respective strand/WP test documents.

Scenario	Title	KVI	KPI	Testbeds
B	Assess if watch time drops with degraded quality	Quality	Average Watch Time per User	DTE
I	Quantify wasted resources if streams are abandoned early	Cost Efficiency	Content Completion Rate	DTE
J	Identify abandonment spikes after buffering events	Resilience	Content Abandonment Points	DTE
K	Test if users drop at predictable quality loss points	Quality	Content Abandonment Points	DTE
M	Directly test startup times distribution across methods	Quality	Start-up Delay	DTE & DTH
P	Compare buffering frequency under different methods	Quality	Buffering Ratio / Rebuffering Events	DTE
R	Track frequency of resolution switches under different delivery	Quality	Bitrate Stability / QoE	DTE & DTH
T	Test retries under network drops or overloaded servers	Resilience	Error / Failure Rate	DTE
A.2	Track downtime as resilience indicator	Resilience	Uptime / Availability	DTE & DTH
C.1	Measure response times across different geographies	Quality	Latency by Region	DTE

Scenario Descriptions

Scenario B — Assess if watch time drops with degraded quality

KVI: Quality · KPI: Average Watch Time per User · Testbeds: DTE
Input: Playback session logs with quality indicators (bitrate, resolution, buffering events); controlled quality degradation conditions (reduced bitrate, simulated packet loss, induced buffering); user engagement data under optimal and degraded conditions.
Output: Average watch time (min/user) segmented by distribution method and quality level; correlation analysis between quality degradation and watch time reduction.
Expectation: Watch time decreases as quality deteriorates; adaptive methods (CDN with ABR) mitigate the reduction.

Scenario I — Quantify wasted resources if streams are abandoned early

KVI: Cost Efficiency · KPI: Content Completion Rate · Testbeds: DTE (Arctic Space)
Input: Playback session logs showing stream start/end points; resource consumption data (bandwidth, compute) per stream; abandonment timestamps and causes.
Output: Resource waste ratio (resources consumed by abandoned streams vs. completed streams); cost model for wasted delivery per distribution method.
Expectation: Methods with higher completion rates waste fewer resources; early abandonment in centralised delivery is costlier due to longer data paths.

Scenario J — Identify abandonment spikes after buffering events

KVI: Resilience · KPI: Content Abandonment Points · Testbeds: DTE
Input: Playback logs showing exact abandonment timestamps; event data on buffering (frequency, duration, severity) preceding abandonment.
Output: Correlation between buffering events and abandonment points; identification of thresholds where buffering causes significant user drop-off.
Expectation: Abandonment spikes cluster immediately after prolonged or repeated buffering; resilient methods show fewer abandonment spikes.

Scenario K — Test if users drop at predictable quality loss points

KVI: Quality · KPI: Content Abandonment Points · Testbeds: DTE
Input: Playback logs recording quality degradation events (bitrate reductions, resolution drops); session abandonment logs aligned with those events.
Output: Patterns showing correlation between quality loss and session drop-offs; comparative data across distribution methods.
Expectation: Predictable abandonment points emerge at moments of severe quality loss; methods with adaptive bitrate may prevent sharp drops.

Scenario M — Directly test startup times distribution across methods

KVI: Quality · KPI: Start-up Delay · Testbeds: DTE & DTH
Input: Time measurements from user request initiation to video playback start across CDN, Edge, etc.; controlled tests under consistent network conditions.
Output: Startup delay (seconds) for each method — average, min, max, p95, p99.9 histograms; comparative dataset of responsiveness across methods.
Expectation: Edge or adaptive methods achieve lower average startup delays; centralised methods may show longer delays depending on server distance.

Scenario P — Compare buffering frequency under different methods

KVI: Quality · KPI: Buffering Ratio / Rebuffering Events · Testbeds: DTE
Input: Playback logs recording frequency of buffering events per stream; method identifiers (CDN, Edge, etc.).
Output: Mean, distribution, and peak buffering frequency per method; comparative dataset highlighting which methods deliver smoother playback.

Scenario R — Track frequency of resolution switches under different delivery

KVI: Quality · KPI: Bitrate Stability / QoE · Testbeds: DTE & DTH
Input: Playback event logs recording bitrate/resolution switches; timestamps and network conditions during switches.
Output: Resolution switch frequency per method; distribution of switch magnitude (up vs down); QoE impact correlation.
Expectation: More stable delivery methods show fewer downward resolution switches under variable conditions.

Scenario T — Test retries under network drops or overloaded servers

KVI: Resilience · KPI: Error / Failure Rate · Testbeds: DTE
Input: Request logs under simulated network drops and server overload conditions; method identifiers.
Output: Retry rates and failure rates per method; time-to-recovery measurements.
Expectation: Resilient methods with redundancy (CDN, edge failover) show lower failure rates and faster recovery.

Scenario A.2 — Track downtime as resilience indicator

KVI: Resilience · KPI: Uptime / Availability · Testbeds: DTE & DTH
Input: Uptime monitoring logs across all delivery endpoints; incident and failover event records.
Output: Availability percentage per method over the test period; correlation between architecture type and uptime.
Expectation: Distributed methods with redundancy achieve higher availability; centralised methods are more vulnerable to single points of failure.

Scenario C.1 — Measure response times across different geographies

KVI: Quality · KPI: Latency by Region · Testbeds: DTE (Arctic Space)
Input: Request-response timing data collected across geographically distributed endpoints; network path information; delivery method identifiers.
Output: Latency measurements (average, p95, p99) per region and delivery method; geographic latency heat maps; correlation between delivery architecture and regional performance.
Expectation: Edge and CDN methods achieve lower latency in remote regions compared to centralised approaches; satellite-augmented delivery narrows the latency gap for underserved geographies.

A.2 — Non-Performable Tests (Not Feasible in Phase 2)

The following scenarios cannot be executed in Phase 2 due to scale limitations, missing real-world user populations, or infrastructure constraints. They are documented for completeness and potential Phase 3 consideration.

Table mirrors TVR v2.3 Appendix A.2. Scenarios are listed in the order they appear in the appendix.

Scenario	Title	KVI	KPI	Reason Not Performable
A	Compare watch times across regions/distribution methods to test accessibility	Reach, Scalability	Average Watch Time per User	Requires multi-region real-user populations at scale
C	Evaluate trade-off between longer engagement vs. energy consumption	Cost Efficiency	Average Watch Time / Energy per Viewing Hour	Real-user longitudinal engagement data unavailable
D	Measure whether different methods (CDN vs Edge) support longer sessions globally	Reach, Quality	Average Session Duration	Global deployment at scale out of scope for Phase 2
E	Test correlation between delivery method and playback continuity	Quality	Average Session Duration	Reclassified by TVR v2.3 — no testbed coverage for this measurement in Phase 2
F	Compare energy per minute of viewing across methods	Cost Efficiency	Average Session Duration / Energy	Cross-method energy measurement requires live deployments at scale
H	Measure impact of delivery on full content enjoyment	Quality	Content Completion Rate	Requires real broadcast events with large viewer pools
L	Calculate energy/data wasted due to mid-stream exits	Cost Efficiency	Content Abandonment Points	Large-scale energy metering not available in testbeds
N	Estimate resource cost of longer startup overheads	Cost Efficiency	Start-up Delay	Cost modelling requires full production cost data
O	Stress-test buffering under poor networks vs resilient distribution	Resilience	Buffering Ratio / Rebuffering Events	Reclassified by TVR v2.3 — no Maritime/DTE testbed coverage in Phase 2
Q	Energy wasted during idle buffering quantified	Cost Efficiency	Buffering Ratio / Rebuffering Events	Requires energy instrumentation at viewer device scale
S	Test link between bitrate volatility and energy inefficiency	Cost Efficiency	Bitrate Stability / QoE	Energy measurement at the bitrate-event level not available
U	Evaluate impact of failures on playback experience	Quality	Error / Failure Rate	Requires real viewer feedback at scale
V	Measure load effect on failure rates under stress tests	Scalability	Error Rate (Failed Attempts)	Reclassified by TVR v2.3 — no testbed coverage for load-induced failure measurement in Phase 2
W	Quantify cost of wasted connection attempts	Cost Efficiency	Error Rate (Failed Attempts)	Reclassified by TVR v2.3 — no Maritime/DTE testbed coverage in Phase 2
X	Record app crashes under varying load/network conditions	Resilience	Error / Failure Rate	Requires real mobile app deployments at scale
Y	Test if crashes correlate with poor QoE	Quality	Error / Failure Rate	Requires cross-correlation of crash and QoE data at scale
Z	Costs linked to re-initialising playback sessions	Cost Efficiency	Error / Failure Rate	Production cost data not available
A.1	Verify geographic reach by uptime reporting across regions	Reach	Uptime / Availability	Multi-region deployment at real geographic scale not available
A.3	Measure uptime under simulated high concurrency	Quality, Scalability	Availability (Uptime %)	High-concurrency stress simulation (10k–100k users) beyond Phase 2 testbed capacity
A.4	Calculate costs of downtime incidents	Cost Efficiency	Availability (Uptime %)	Production cost data on downtime impact not available
B.1	Test resilience to sudden spikes (flash crowd scenarios)	Resilience	Load Distribution Performance	Flash-crowd-scale simulation beyond Phase 2 testbed capacity
B.2	Measure load balancing efficiency at scale (2M–20M concurrent viewers)	Scalability	Load Distribution Performance	Multi-million-viewer concurrency out of scope for Phase 2
B.3	Test performance & resilience at 2M–20M concurrent viewers	Security	Load Distribution Performance	Multi-million-viewer concurrency out of scope for Phase 2
B.4	Model cost of overprovisioning vs. efficient load sharing	Cost Efficiency	Load Distribution Performance	Production-scale provisioning and cost data not available
C.2	Compare latency effects on playback smoothness	Quality	Latency by Region	Requires per-session multi-region latency instrumentation at scale
C.3	Test multi-region scaling capabilities	Scalability	Latency by Region	Multi-region scaling testbed not available in Phase 2
C.4	Estimate impact of latency on energy use	Cost Efficiency	Latency by Region	Per-session energy measurement across regions not feasible at scale
D.1	Direct measurement of kWh required to deliver one stream under each method	Cost Efficiency	Energy Consumption per Stream	Per-stream end-to-end energy metering not available across server/CDN/Edge/network elements
E.1	Standardised test: measure kWh per viewing hour across methods	Cost Efficiency	Energy Consumption per Hour of Viewing	Cross-method controlled energy metering at scale not available
F.1	Compare idle vs active energy use under stress	Resilience	Server Energy Utilization Rate	Continuous energy instrumentation under controlled stress not available in testbeds
F.2	Measure energy use under scaling workloads	Scalability	Server Energy Utilization Rate	Scaling-workload simulation (100k–1M concurrent) beyond Phase 2 testbed capacity
F.3	Efficiency gains of high vs. low utilization	Cost Efficiency	Server Energy Utilization Rate	Production-scale utilization data not available
G.1	Compare network-level energy usage per GB delivered	Cost Efficiency	Network Energy Intensity (kWh/GB)	Network-element energy metering not available across routers/backhaul
H.1	Test resilience of systems under demand spikes	Resilience	Peak vs. Average Energy Demand	Flash-crowd-scale demand simulation beyond Phase 2 testbed capacity
H.2	Compare scaling cost during peaks vs. baseline	Scalability	Peak vs. Average Energy Demand	Production operational cost data not available
H.3	Identify inefficiencies caused by peak overcapacity	Cost Efficiency	Peak vs. Average Energy Demand	Production overprovisioning data not available

Scenario G (Test if completions are higher in resilient methods) has been removed from TVR v2.3 Appendix 1 entirely and is no longer in either A.1 or A.2.