Skip to content

filter: Exclude Concepts from Pascal VOC Annotations

The filter command allows you to selectively remove objects with specific concept (class) names from Pascal VOC annotation files. This is useful for cleaning datasets by removing unwanted classes or problematic annotations.

Usage

m3-download filter VOC_PATHS [VOC_PATHS...] [OPTIONS]

Parameters

  • VOC_PATHS: One or more paths to Pascal VOC XML files or directories containing XML files
  • -e, --exclude: Concept name to exclude (can be specified multiple times)
  • -o, --output-dir: (Optional) Output directory for filtered annotations
  • --embargo: (Optional) If any excluded concept is found, remove ALL annotations from that file v0.9.0

Data Loss Risk

If --output-dir is not specified, original annotation files will be modified directly. Always use the --output-dir option when testing a new filtering configuration.

How It Works

  1. The command processes each input path, handling both individual XML files and directories
  2. For each XML file, it:
    • Parses the annotation structure
    • Removes objects with concept names matching any exclusion criteria
    • Either overwrites the original file or writes to the output directory
  3. Statistics on removed annotations are provided

Embargo Mode

Embargo Mode

When the --embargo flag is used, if any of the excluded concepts are found in a file, all annotations in that file will be removed and the file will not be written to the output directory.

This is useful for completely removing images that contain certain problematic labels.

Directory Structure

When directory paths are provided, all XML files in those directories are processed. The command maintains the directory structure when writing to an output directory.

Added in v0.9.0.

Examples

Exclude a single concept from all XMLs in a directory:

m3-download filter annotations/ --exclude "unidentified fish"

Exclude multiple concepts and save to a new directory:

m3-download filter annotations/ --exclude "unidentified fish" --exclude "artifact" --output-dir filtered_annotations/

Process multiple directories and embargo files with excluded concepts:

m3-download filter dir1/ dir2/ --exclude "poor-quality" --exclude "ambiguous" --embargo --output-dir clean_annotations/

Exclude concepts from a specific file:

m3-download filter annotations/image001.xml --exclude "artifact" --output-dir filtered_annotations/