Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

To: 1034091@bugs.debian.org
Subject: Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision
From: Petter Reinholdtsen <pere@hungry.com>
Date: Wed, 21 Jun 2023 17:33:44 +0200
Message-id: <[🔎] sa6fs6kkexz.fsf@hjemme.reinholdtsen.name>
Reply-to: Petter Reinholdtsen <pere@hungry.com>, 1034091@bugs.debian.org
In-reply-to: <sa6v8husa2w.fsf@hjemme.reinholdtsen.name>
References: <sa67cumhr96.fsf@hjemme.reinholdtsen.name> <sa6edop1ix0.fsf@hjemme.reinholdtsen.name> <sa6pm83vbbg.fsf@hjemme.reinholdtsen.name> <sa6v8husa2w.fsf@hjemme.reinholdtsen.name> <sa6jzymicr5.fsf@hjemme.reinholdtsen.name>

The upload to contrib / experimental was rejected by the ftpmasters with
the following comment:

> can you please explain how I can recreate the files *.tiktoken?  There
> seem to be some sources missing ...

The two files in question are 50k lines of ASCII text that seem to be
some kind of index / vocabulary, and I have no idea how they were
created.  I suspect they might be an artifact of the model training, but
do not know.  Anyone got a clue to spare on how these were created and
how to rebuild them?  If we lack the source to rebuild them, I currently
believe the whisper package will have to go to non-free, not contrib.
Any help to figure this out would be most appreciated.

-- 
Happy hacking
Petter Reinholdtsen

Reply to:

Prev by Date: Bug#1038796: ITP: ocaml-pp -- pretty printing for OCaml applications
Next by Date: Re: Transform Your Vision of a Cryptocurrency Exchange into Reality
Previous by thread: Bug#1038796: marked as done (ITP: ocaml-pp -- pretty printing for OCaml applications)
Next by thread: Re: Transform Your Vision of a Cryptocurrency Exchange into Reality
Index(es):
- Date
- Thread