Author Topic: Subtitle Extraction from ISO  (Read 502 times)

Offline Triltaison

  • Member
  • Posts: 576
Subtitle Extraction from ISO
« on: April 18, 2012, 06:37:23 AM »
I'm looking to extract the subtitle tracks from a few ISO files, but I'm pretty unfamiliar with this territory. Converting to ASS or some kind of text file would be great, since I'm hoping to translate them and it would be much handier to have them all laid out instead of having to pause the video every two seconds. I don't need to do anything fancy like preserve styles or timing, just rip the text itself.

Does anyone know of a good (and free) program for doing this? Doing a search on the forum gave me a few results, but the programs didn't support ISO. Thanks in advance.

Offline kadatherion

  • Member
  • Posts: 114
Re: Subtitle Extraction from ISO
« Reply #1 on: April 18, 2012, 07:22:25 AM »
The problem with iso copies is that subtitles stored in vob/sub format are actually images, not text. As such one has to OCR them (using programs such as SubRip) to convert them into a text file. Depending on the ripping software and its OCR algorithm you may carry over some typos when it mistakes one letter for another or it misses punctuation marks, but it more or less works: as you have to translate them you are going to rewrite the lines anyway so it's a non-issue. It does also somewhat preserve the timing (with a certain margin of error), but more often than not official subs have already bad timings from the start and you anyway will have to fine tune them to prevent scene bleeding and similar issues if you wish to apply them to an encode. It's still good enough as a draft to load up into Aegisub (or whatever you use) to have everything more or less laid out and timed with the relative audio/video track so you can quickly check what is written in the subs with what is actually said in the audio.

I used SubRip a couple times in the past when I had need, but it's been a while now so I can't really say if it still is the best/a good choice among freeware options or not.
« Last Edit: April 18, 2012, 07:25:20 AM by kadatherion »

Offline Triltaison

  • Member
  • Posts: 576
Re: Subtitle Extraction from ISO
« Reply #2 on: April 19, 2012, 12:26:48 AM »
Ahhh, that would explain the difficulty in finding support for ISO subs. I hadn't really done anything with them before, but was hopeful it was an easy extraction. Guess that's shot all to hell.  ::)

Majorly appreciate the detailed response. If it at least works in the way I need, that qualifies as "Good" at this point. Like I said, I don't need to worry about the timing or anything since I basically just need the script itself. You said typos and punctuation can get lost, but do you know if accent marks get picked up at all? I ask because the sub language is Italian, which is chock full of accented vowels. If it doesn't pick them up at all, it'd just be good to know from the get-go.

And again, thanks.

Offline datora

  • Member
  • Posts: 1411
  • "Warning! Otaku logic powers in use!"
Re: Subtitle Extraction from ISO
« Reply #3 on: April 19, 2012, 02:55:58 AM »
.
I'm sure I've run across a few topics here asking questions about OCR & scanning hard subs out of video files.  Definitely try some more searches along those lines, especially in the help & uploaders' forums as well as tech.  Since I've not tried it, I can't speak to specific topics because I didn't follow them in any detail.

Learning this is on my "to do" list.  Way down on my "to do" list, even though I really want to get the time for it eventually.
I win, once again, in my never-ending struggle against victory.

Offline Triltaison

  • Member
  • Posts: 576
Re: Subtitle Extraction from ISO
« Reply #4 on: April 19, 2012, 03:24:53 AM »
Thanks, datora. Now I know to look for that, rather than ISO related stuff (which pretty much gave me nothing useful). Guess it's time to whip out the ol' video converter and make me some hardsubs.

-And it's been on my "to do" list as well, so I guess I'll actually get to check it off finally.  ;)

Offline kadatherion

  • Member
  • Posts: 114
Re: Subtitle Extraction from ISO
« Reply #5 on: April 19, 2012, 11:29:43 AM »
You said typos and punctuation can get lost, but do you know if accent marks get picked up at all? I ask because the sub language is Italian, which is chock full of accented vowels. If it doesn't pick them up at all, it'd just be good to know from the get-go.

Accents are often a bit problematic too, yes. It goes like this: once you feed it the subs it asks you what is what as soon as it starts; it tries from the beginning to separate letters when he notices there's some space between two symbols, but sometimes that space is too small and it takes two or more letters together and you have to correct it. Anyway at first you have to tell it each time "this is a G, this is a R, this is an 'é' " and so on. Once he has in memory at least one entry for every letter it can encounter it can process the whole subtitle stream without further input by you. Accents and punctuation often confuse it because an accent might make it think it is actually more similar to another letter, a comma might get missed because it's small or with certain fonts confuse it into believing it's an apostrophe, a comma might be seen where there isn't one, suddenly it encounters an italicized line and it goes bonkers, things like that.

The more time you take at the beginning to manually give it information the more picky it's algorithm gets and the less errors it does, but you have to be patient and small symbols will often still give you troubles. I remember that when I did it with SubRip for a couple of eng subbed series I had to translate in Italian I ended up giving up on making it recognize apostrophes by themselves and just made it look for whole particles/common words (like it's, 'll, 'd, don't, can't etc): as those are relatively few and consistent in common English it made my life a lot easier as the software had much less freedom to be misled into confusing one thing for another. Italian isn't so convenient though with all those accents, or with how we use an apostrophe between the article and the subject whenever the latter is in the feminine, so it will likely give you some headaches. It's still way better than having to pause the video every 5 seconds... ::)


BTW, as it happens you are actually translating from Italian (to English I assume?), should you have doubts about some lines feel free to give me a word, lately I'm more or less grounded at home due to health issues so I often swing by here with too much free time for my own good on my hands.  ;)

Offline Triltaison

  • Member
  • Posts: 576
Re: Subtitle Extraction from ISO
« Reply #6 on: April 20, 2012, 11:38:38 PM »
Yep, it's Italian to English. Um... with Japanese audio? Headache much, right? -Especially since I'm far better at Spanish.  :laugh:

I hate to bug since this is MAJORLY casual and sort of a learning piece for me. I'm usually just a TL/Editor/Timer, but I hope to learn the ins and outs of converting and compiling along the way and the only way you can do that is getting your hands dirty. If something actually gets completed along the way, I'll be beyond thrilled and will happily share if it turns out decent.

Thanks for all the input, kadath. Here's hoping I can scrape something together for La Principessa Zaffiro.  ;)