[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision



The upload to contrib / experimental was rejected by the ftpmasters with
the following comment:

> can you please explain how I can recreate the files *.tiktoken?  There
> seem to be some sources missing ...

The two files in question are 50k lines of ASCII text that seem to be
some kind of index / vocabulary, and I have no idea how they were
created.  I suspect they might be an artifact of the model training, but
do not know.  Anyone got a clue to spare on how these were created and
how to rebuild them?  If we lack the source to rebuild them, I currently
believe the whisper package will have to go to non-free, not contrib.
Any help to figure this out would be most appreciated.

-- 
Happy hacking
Petter Reinholdtsen


Reply to: