[CivicAccess-discuss] Canadian Postal Code list
Daniel Haran
chebuctonian at gmail.com
Tue Sep 23 18:35:24 EDT 2008
On Tue, Sep 23, 2008 at 5:46 PM, Tracey P. Lauriault <tlauriau at gmail.com> wrote:
> Some was asking me what page scraping means. Could you explain - in sorta
> lay person terms?
Scraping is a way to extract structured information from websites.
Let's use my next project as an example.
The list of 813,358 postal codes is now public. I am writing software
that will go to a political party's website, submit the form to 'find
your candidate' and save the resulting page. Then I'll write another
small bit of software that reads each page, finds the electoral
district id, and outputs a single line:
<postal_code>,<district_id>
813,358 pages, one resulting file with as many lines.
Because of the large number of requests, compiling the data can take a
very long time. Getting one page per second, it would still take 9.4
days to get this data file.
I hope that helps... I may be in too deep to offer a good lay person's
explanation :)
d.
More information about the CivicAccess-discuss
mailing list