Unidentified subject!
- To: Hrvoje Niksic <hniksic@srce.hr>, Joost Kooij <kooij@mpn.cp.philips.com>, debian-devel@lists.debian.org, Nicols Lichtmaier <nick@feedback.net.ar>, James Troup <J.J.Troup@scm.brad.ac.uk>, Raul Miller <rdm@test.legislate.com>
- Subject: Unidentified subject!
- From: Chris Lawrence <quango@ix.netcom.com>
- Date: Sat, 16 May 1998 22:26:12 -0500
- Message-id: <[🔎] 19980516222612.A19244@ix.netcom.com>
- Mail-followup-to: Hrvoje Niksic <hniksic@srce.hr>, Joost Kooij <kooij@mpn.cp.philips.com>, debian-devel@lists.debian.org, Nicols Lichtmaier <nick@feedback.net.ar>, James Troup <J.J.Troup@scm.brad.ac.uk>, Raul Miller <rdm@test.legislate.com>
Cc:
Bcc:
Subject: Re: upstreams maintainer conflict, was: wget: remove outdated manual page
Reply-To:
In-Reply-To: <[🔎] 19980516225619.U3613@test.legislate.com>; from Raul Miller on Sat, May 16, 1998 at 10:56:19PM -0400
Organization: Kathie Lee's Sweatshops
X-Operating-System: Linux/i486 2.1.99
X-Mutt-References: <[🔎] 19980516225619.U3613@test.legislate.com>
On May 16, Raul Miller wrote:
> [Aside: it would be nice to have mechanism to just generates
> a unique list of referenced URLs. This would allow more complicated
> filtering schemes to determine what to download (at the expense
> of having to run wget twice -- but it's easy enough to set up a
> web proxy). --spider only checks a single file.]
A list of all URLs in a particular web page can be fairly-easily generated;
see e.g. my findnew Python script (http://www.linux-m68k.org/py/findnew.py)
which does this very thing as part of its processing.
I'm sure one could write an ugly regular expression that could be awk'd over
a HTML file, the output of which could be piped through sort | uniq to
achieve the same effect. One could even do this in C if one cared (but I'll
stick with my Python version, at least it's readable...).
Chris
--
=============================================================================
| Chris Lawrence | The Realistic Consolidation Proposal |
| <quango@ix.netcom.com> | http://newforesthills.base.org/regional.html |
| | |
| Contract Programmer | Join the party that opposed the CDA |
| FedEx Ops Research | http://www.lp.org/ |
=============================================================================
--
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: