Unidentified subject!

To: Hrvoje Niksic <hniksic@srce.hr>, Joost Kooij <kooij@mpn.cp.philips.com>, debian-devel@lists.debian.org, Nicols Lichtmaier <nick@feedback.net.ar>, James Troup <J.J.Troup@scm.brad.ac.uk>, Raul Miller <rdm@test.legislate.com>
Subject: Unidentified subject!
From: Chris Lawrence <quango@ix.netcom.com>
Date: Sat, 16 May 1998 22:26:12 -0500
Message-id: <[🔎] 19980516222612.A19244@ix.netcom.com>
Mail-followup-to: Hrvoje Niksic <hniksic@srce.hr>, Joost Kooij <kooij@mpn.cp.philips.com>, debian-devel@lists.debian.org, Nicols Lichtmaier <nick@feedback.net.ar>, James Troup <J.J.Troup@scm.brad.ac.uk>, Raul Miller <rdm@test.legislate.com>

Cc: 
Bcc: 
Subject: Re: upstreams maintainer conflict, was: wget: remove outdated manual page
Reply-To: 
In-Reply-To: <[🔎] 19980516225619.U3613@test.legislate.com>; from Raul Miller on Sat, May 16, 1998 at 10:56:19PM -0400
Organization: Kathie Lee's Sweatshops
X-Operating-System: Linux/i486 2.1.99
X-Mutt-References: <[🔎] 19980516225619.U3613@test.legislate.com>

On May 16, Raul Miller wrote:
> [Aside: it would be nice to have mechanism to just generates
> a unique list of referenced URLs.  This would allow more complicated
> filtering schemes to determine what to download (at the expense
> of having to run wget twice -- but it's easy enough to set up a
> web proxy).  --spider only checks a single file.]

A list of all URLs in a particular web page can be fairly-easily generated;
see e.g. my findnew Python script (http://www.linux-m68k.org/py/findnew.py)
which does this very thing as part of its processing.

I'm sure one could write an ugly regular expression that could be awk'd over
a HTML file, the output of which could be piped through sort | uniq to
achieve the same effect.  One could even do this in C if one cared (but I'll
stick with my Python version, at least it's readable...).


Chris
-- 
=============================================================================
|      Chris Lawrence      |      The Realistic Consolidation Proposal      |
|  <quango@ix.netcom.com>  |  http://newforesthills.base.org/regional.html  |
|                          |                                                |
|    Contract Programmer   |       Join the party that opposed the CDA      |
|    FedEx Ops Research    |               http://www.lp.org/               |
=============================================================================


--
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to:

Follow-Ups:
- Re: your mail
  - From: Raul Miller <rdm@test.legislate.com>

Prev by Date: Re: upstreams maintainer conflict, was: wget: remove outdated manual page
Next by Date: Re: Seeking other archs to build packages on
Previous by thread: tcsh bug on master
Next by thread: Re: your mail
Index(es):
- Date
- Thread