Temporal Filtering¶
In-depth explanation of temporal filtering and minimum time gap enforcement.
Problem Statement¶
Video frames are highly correlated in time. Adjacent frames differ only by:
- Camera motion: ~cm displacement at typical AUV speeds (0.5–2 m/s)
- Subject motion: Organisms and particles move slowly relative to frame rate (30 fps)
- Lighting changes: Gradual as vehicle moves relative to sun/lights
Without temporal filtering, diversity selection would pick many near-duplicate frames from the same clip, wasting frame budget and annotation effort.
Algorithm¶
Overview¶
For each video, enforce minimum time gap (--min-gap seconds) between selected frames.
Input: Frames sorted by frame_idx
Output: Subset with temporal separation ≥ min-gap
Pseudo-Code¶
def temporal_filter(frames, min_gap_sec, fps):
kept = []
last_selected_idx = -infinity
for frame in sorted(frames, key=lambda f: f.frame_idx):
gap_sec = (frame.frame_idx - last_selected_idx) / fps
if gap_sec >= min_gap_sec:
kept.append(frame)
last_selected_idx = frame.frame_idx
return kept
Implementation¶
std::vector<FrameRecord> temporal_filter(
const std::vector<FrameRecord>& candidates,
double min_gap_sec)
{
std::map<std::string, std::vector<std::size_t>> by_video;
for (std::size_t i = 0; i < candidates.size(); i++)
by_video[candidates[i].video_path.string()].push_back(i);
std::vector<FrameRecord> kept;
for (auto& [_, indices] : by_video) {
std::sort(indices.begin(), indices.end(),
[&](std::size_t a, std::size_t b) {
return candidates[a].frame_idx < candidates[b].frame_idx;
});
double last_idx = -1e9;
double fps = candidates[indices[0]].fps;
for (std::size_t i : indices) {
const auto& r = candidates[i];
if ((r.frame_idx - last_idx) / fps >= min_gap_sec) {
kept.push_back(r);
last_idx = r.frame_idx;
}
}
}
return kept;
}
Key points:
- Group by video: Temporal filtering is per-video (frames from different videos are independent)
- Sort by frame index: Ensures temporal order
- Greedy selection: Accept frame if gap ≥ threshold (doesn't try to maximize total frames)
Example¶
Input¶
Video at 30 fps, min-gap = 2.0 seconds:
| Frame Index | Time (sec) | Quality | Interest Score |
|---|---|---|---|
| 0 | 0.0 | ✅ | 50 |
| 15 | 0.5 | ✅ | 80 |
| 30 | 1.0 | ✅ | 60 |
| 60 | 2.0 | ✅ | 90 |
| 61 | 2.03 | ✅ | 100 |
| 120 | 4.0 | ✅ | 70 |
| 180 | 6.0 | ✅ | 85 |
Processing¶
-
Frame 0: Gap = ∞ (first frame) ≥ 2.0 → Accept
last_selected = 0 -
Frame 15: Gap = (15 - 0) / 30 = 0.5 sec < 2.0 → Reject
-
Frame 30: Gap = (30 - 0) / 30 = 1.0 sec < 2.0 → Reject
-
Frame 60: Gap = (60 - 0) / 30 = 2.0 sec ≥ 2.0 → Accept
last_selected = 60 -
Frame 61: Gap = (61 - 60) / 30 = 0.03 sec < 2.0 → Reject
(Even though interest score is highest) -
Frame 120: Gap = (120 - 60) / 30 = 2.0 sec ≥ 2.0 → Accept
last_selected = 120 -
Frame 180: Gap = (180 - 120) / 30 = 2.0 sec ≥ 2.0 → Accept
last_selected = 180
Output¶
Kept frames: 0, 60, 120, 180
Rejection rate: 3 / 7 = 43%
Note: Frame 61 (highest interest score) is rejected due to proximity to frame 60.
When Temporal Filtering is Applied¶
Order in pipeline:
- Pass 1: Compute metrics, apply quality gates
- Pass 2a: Temporal filtering (this step)
- Pass 2b: Grid-based diversity selection
- Pass 3: Extract frames
Rationale: Apply temporal filtering before grid binning to ensure candidates entering the grid are temporally separated.
Alternative ordering (not used): - Apply after grid binning → Risk of near-duplicates in different cells - Apply after budget enforcement → Too late (redundant frames already selected)
Parameter Tuning¶
--min-gap (Minimum Gap in Seconds)¶
Trade-off:
- Higher (e.g., 5–10s): Strong temporal separation, fewer frames per video
- Lower (e.g., 0.5–1s): More frames per video, risk of near-duplicates
Recommended values:
| Vehicle Speed | Scene Change Rate | Recommended min-gap |
|---|---|---|
| Stationary (benthic lander) | Slow (static scene) | 5–10s |
| Slow (<1 m/s) | Moderate (gradual drift) | 2–5s |
| Fast (>1 m/s) | Rapid (transect) | 1–2s |
How to Choose¶
Method 1: Visual inspection
- Run with small gap (e.g.,
--min-gap 0.5) - Inspect output frames from same video
- If many look nearly identical, increase gap
Method 2: Frame rate estimation
Estimate how long it takes for scene to change:
Example (1m FOV, 0.5 m/s vehicle): \(\(\text{min\_gap} = \frac{1 \text{ m}}{0.5 \text{ m/s}} = 2 \text{ sec}\)\)
Method 3: Empirical
Start with 1–2 seconds (typical for AUV surveys), then adjust based on output.
Impact on Diversity¶
Positive Effects¶
✅ Reduces redundancy: Avoids near-duplicate frames
✅ Spreads coverage: Forces sampling across time (different scenes, conditions)
✅ Improves annotation efficiency: Each frame provides new information
Potential Issues¶
❌ Misses short events: If interesting feature lasts < min-gap, may not be captured
❌ Reduces candidate pool: Fewer frames available for grid binning
❌ Ignores spatial diversity: Doesn't account for vehicle position (may sample same location at different times)
Mitigation Strategies¶
For short events:
- Lower --min-gap (e.g., 0.5–1s)
- Raise --sample-fps to examine more frames (increases chance of catching event)
For large candidate pools:
- Temporal filtering rejection (10–40%) is acceptable
- If rejection > 50%, consider lowering --min-gap or increasing --sample-fps
For spatial diversity: - If vehicle revisits same location (e.g., benthic survey grid), temporal filtering is beneficial (avoids duplicate coverage) - If vehicle does single transect, temporal filtering spreads frames along path
Greedy vs. Optimal Selection¶
Current Approach (Greedy)¶
Algorithm: Accept first frame, then next frame ≥ min-gap later, repeat.
Pros: - ✅ Simple, fast (O(N log N) for sorting) - ✅ Deterministic - ✅ Intuitive
Cons: - ❌ Not optimal (may not maximize number of selected frames)
Example: Suboptimal Case¶
Frames at times: 0, 1, 3, 4, 6 seconds (min-gap = 2s):
Greedy selection: 1. Accept frame 0 2. Reject frame 1 (gap = 1s < 2s) 3. Accept frame 3 (gap = 3s ≥ 2s) 4. Reject frame 4 (gap = 1s < 2s) 5. Reject frame 6 (gap = 3s ≥ 2s, but 6 - 3 = 3s ≥ 2s... actually accept!)
Result: Frames 0, 3, 6 (3 frames)
Optimal selection (maximize count): - Accept frames 0, 3, 6 (3 frames) ← Same!
Alternative scenario: Frames at 0, 1.9, 4 seconds:
Greedy: 0, 4 (2 frames)
Optimal: 0, 1.9 (2 frames) or 0, 4 (2 frames) ← Same count, different choices
Conclusion: Greedy is often near-optimal for typical frame distributions. Optimal algorithm (dynamic programming) is O(N²), not worth the complexity.
Interaction with Grid Diversity¶
Temporal filtering runs first, so grid binning operates on temporally separated frames.
Effect on grid occupancy:
Without temporal filtering: - Many frames from same clip may fall in same grid cell (near-duplicates) - Dense cells become denser
With temporal filtering: - Fewer frames per clip enter grid - Cells represent temporally diverse instances of similar visual conditions
Example:
Clip A (blue water, 100 frames examined, 80 pass quality):
- Without temporal filter: 80 frames → many in same cell (brightness=0.3, sharpness=0.4, entropy=0.2)
- With temporal filter (min-gap=2s, 30 fps): 80 → ~15 kept → fewer redundant frames in same cell
Result: Grid cells are more diverse (temporally and visually).
Multi-Video Considerations¶
Independent Filtering¶
Temporal filtering is per-video:
- Frames from Video A and Video B are independent (no temporal constraint between them)
min-gapapplies within each video
Rationale: Videos may be recorded at different times/locations (no temporal relationship).
Cross-Video Diversity¶
Grid diversity handles cross-video selection:
- If Video A and Video B both have blue water frames, grid ensures limited contribution from that cell
- Temporal filtering ensures frames within each video are spread out
Combined effect: Both temporal and visual diversity.
Advanced: Adaptive Gaps¶
Current implementation: Fixed min-gap for all videos.
Potential enhancement: Adaptive gap based on scene change rate:
double adaptive_gap(const FrameRecord& prev, const FrameRecord& curr) {
double motion_factor = curr.motion / 10.0; // Normalize
return min_gap_base * (1.0 / (1.0 + motion_factor));
}
Effect: Larger gaps for static scenes, smaller gaps for dynamic scenes.
Not implemented (added complexity, minor benefit for typical use cases).
Visualization¶
Timeline Example¶
Video timeline (30 fps, min-gap = 2s):
Time (sec): 0 1 2 3 4 5 6 7 8 9 10
Frames: | | | | | | | | | | |
✓ ✓ ✓ ✓ ✓
│ │ │ │ │
│ │ │ │ │
Kept Kept Kept Kept Kept
Rejected: ╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳╳
(all frames between kept frames)
Gap enforcement: Minimum 2 seconds (60 frames at 30 fps) between kept frames.
Summary¶
Temporal filtering:
- ✅ Purpose: Avoid near-duplicate frames from same clip
- ✅ Method: Greedy selection with minimum time gap
- ✅ When: Before grid diversity selection
- ✅ Impact: Reduces candidates by ~10–40%, improves temporal diversity
Key parameter: --min-gap (seconds)
- 1–2s: Typical for fast-moving vehicles
- 2–5s: Typical for slow-moving vehicles
- 5–10s: Static or very slow scenes
Trade-off: Larger gap → fewer candidates, stronger temporal diversity
Next Steps¶
- Diversity Selection: How grid-based selection works after temporal filtering
- Quality Metrics: What metrics are used for filtering
- Tuning Guide: Practical parameter selection