[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3



Hello Piotr,
thanks for your comments!

On Thursday 17 January 2013 12:38:12 Piotr Ożarowski wrote:
> >  python-charade is a port of Mark Pilgrim's chardet with support for
> >  both Python 2 and Python 3.
> 
> if Python 3 support is the only reason why it was forked, note that we
> already have python3-chardet in Debian. Are there any other advantages?

The Python 3 support is not what made me think about packaging python-
charade: right now python3-requests 0.12.1-1 is using python3-chardet.

Note that I missed, when I sent the ITP, that the following is true for the 
development version. I took project information from the git but I missed 
that the default branch is the development one:

Inside clean and isolated virtualenv:

Python 2.7.3 (default, Jan  2 2013, 13:56:14) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import charade
>>> data = open('bom-utf-16-be.srt', 'rb').read()
>>> charade.detect(data)
{'confidence': 1.0, 'encoding': 'UTF-16BE'}

Python 3.2.3 (default, Sep 10 2012, 11:22:57) 
[GCC 4.7.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import charade
>>> data = open('bom-utf-16-be.srt', 'rb').read()
>>> charade.detect(data)
{'confidence': 1.0, 'encoding': 'UTF-16BE'}

Here, instead, the system wide Debian python-chardet:

Python 2.7.3 (default, Jan  2 2013, 13:56:14) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import chardet
>>> data = open('bom-utf-16-be.srt', 'rb').read()
>>> chardet.detect(data)
{'confidence': 1.0, 'encoding': 'UTF-16BE'}

Python 3.2.3 (default, Sep 10 2012, 11:22:57) 
[GCC 4.7.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import chardet
>>> data = open('bom-utf-16-be.srt', 'rb').read()
>>> chardet.detect(data)
{'confidence': 0.5, 'encoding': 'windows-1252'}

Is it worth backporting to python-chardet? Right now charade doesn't differ 
to much from it but in future it might be.
 
> > The package will be maintained under the umbrella of the DPMT and it's
> > a dependency for the new version (1.1.0) of python-requests.
> 
> can requests use chardet?

Right now, yes, since the two codebase don't differ much. requests is 
currently embedding charade 1.0.1, so there should be no problems.

Maybe I can just update requests using python-chardet for now, but I'm a 
bit worried about that missed detection on Python 3.

What do you suggest?

Kind regards,

-- 
 Daniele Tricoli 'Eriol'
 http://mornie.org

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: