[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#908678: Testing the filter-branch scripts



Antoine,

thank you very much for your filter-branch scripts.

I tested each:

1) the golang version:
It completes after 3h36min:

# git filter-branch --tree-filter '/split-by-year' HEAD
Rewrite a09118bf0a33f3721c0b8f6880c4cbb1e407a39d (68282/68286) (12994 seconds passed, remaining 0 predicted)
Ref 'refs/heads/master' was rewritten

But it doesn't Close() the os.OpenFile handles so ...
all data/CVE/list.yyyy files are 0 bytes long. Sic!

I can reproduce that just running the golang executable
against a current checkout of data/CVE/list.

# go version
go version go1.10.3 linux/amd64
(Stretch backport golang-go 2:1.10~5~bpo9+1)

2.1) the Python version
You claim #!/usr/bin/python3 in the shebang, so I tried that first:

# git filter-branch --tree-filter '/usr/bin/python3 /__pycache__/split-by-year.cpython-35.pyc' HEAD
Rewrite 990d3c4bbb49308fb3de1e0e91b9ba5600386f8a (1220/68293) (41 seconds passed, remaining 2254 predicted)
  Traceback (most recent call last):
  File "split-by-year.py", line 13, in <module>
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 5463: invalid start byte
tree filter failed: /usr/bin/python3 /__pycache__/split-by-year.cpython-35.pyc

The offending commit is:
* 990d3c4bbb - Rename sarge-checks data to something not specific to sarge, since we're working on etch now.
  Sorry for the probable annoyance, but it had to be done. (13 years ago) [Joey Hess]

There will be many more like this, so for Python3
this needs needs to be made unicode-agnostic.

Notice I compiled the .py to .pyc which makes it
much faster and thus well usable.

2.2) Python, when a string was a string .. Python2
Your code is actually Python2, so why not give that a try:

# git filter-branch --tree-filter '/usr/bin/python2 /split-by-year.pyc' HEAD
Rewrite b59da20b82011ffcfa6c4a453de9df58ee036b2c (2516/68293) (113 seconds passed, remaining 2954 predicted)
  Traceback (most recent call last):
  File "split-by-year.py", line 18, in <module>
    yearly = 'data/CVE/list.{:d}'.format(year)
NameError: name 'year' is not defined
tree filter failed: /usr/bin/python2 /split-by-year.pyc

The offending commit is:
* b59da20b82 - claim (13 years ago) [Moritz Muehlenhoff]
| diff --git a/data/CVE/list b/data/CVE/list
| index 7b5d1d21d6..cdf0b74dd0 100644
| --- a/data/CVE/list
| +++ b/data/CVE/list
| @@ -1,3 +1,4 @@
| +begin claimed by jmm
|  CVE-2005-3276 (The sys_get_thread_area function in process.c in Linux 2.6 before ...)
|       TODO: check
|  CVE-2005-3275 (The NAT code (1) ip_nat_proto_tcp.c and (2) ip_nat_proto_udp.c in ...)
| @@ -34,6 +35,7 @@ CVE-2005-3260 (Multiple cross-site scripting (XSS) vulnerabilities in ...)
|       TODO: check
|  CVE-2005-3259 (Multiple SQL injection vulnerabilities in versatileBulletinBoard (vBB) ...)
|       TODO: check
| +end claimed by jmm
|  CVE-2005-XXXX [Insecure caching of user id in mantis]
|       - mantis <unfixed> (bug #330682; unknown)
|  CVE-2005-XXXX [Filter information disclosure in mantis]

As you see the line "+begin claimed by jmm" breaks the too simplistic parser logic.
Unfortunately dry-running against a current version of data/CVE/list such errors do not show up.
The "violations" of the file format are transient and buried in history.

Best,
Daniel


Reply to: