.
I assume your goal is to have spoken files that are clear enough to learn from, and you wish to save filespace to a 'large degree.'
In which case I would go with *.mp3s for broad & simple compatibility. 128 Kb is overkill for voice only. I generally don't rip in anything less than 192, even voice, but I do it knowing that I can't hear the difference. It's a 'comfort zone' thing that I know the quality exceeds my hearing, and the minor extra space used is unnoticeable. I think the next usual step-down os 96(?) Kb. Which should be fine for voice-only if you're really pressed for space.
You might want to just take a couple of short sample files, like about a minute each, and code them to different formats & different compression levels. Put on a solid pair of headphones and listen carefully to determine for yourself what compromises best work for you.
If it was me, I'd rip it all to 192 *.mp3 and not think about it further. Also, consider variable bit rate and set the range from 96 to 192. You'll get smaller files than CBR 128 and all the important parts will be at 192 clarity. I've heard files in the past set to VBR 60-128 and they were quite passable for basic use ... but, that wasn't for teaching myself a spoken language.