What it cleans up
Both formats carry the same noise around the text. The converter removes it:
- Timestamps and cue numbers are dropped (or kept as a compact inline prefix, your choice).
- Rolling / scrolling captions (common in auto-generated YouTube subtitles) repeat each line across consecutive cues. The converter dedupes that so each sentence appears once.
- Inline styling is translated:
<i>becomes*italic*,<b>becomes**bold**; karaoke and word-timing tags are removed. - Cue text is reflowed into paragraphs at natural pauses, so the result reads like prose instead of caption fragments.