Freelancers Network - freelance work, jobs uk and world
wide
Prosperity4
Freelancing or Contracting?

Effortless freelancing. Enjoy a perfect work-Life balance.

Request a FREE Contracting pack Today!!



Projects by Month
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003


Projects by Year
2008
2007
2006
2005
2004
2003
2002
2001
2000
skill list top cap
Homepage
Join the Freelancer's Network
Update your details
Find a freelancer
Post a project
Find a project
Projects Archive
Post a job
Find a job
Jobs Archive
See Dan's Pages
See Andy's Pages
Link to this site
Resources
Join/Leave Forum
Forum Messages
+Additions+ Adverts
Advertising
Contact Us
Subscribe to our newsletter - enter your email address and hit return
Freelancers.net is owned and operated by Andy Stowell and Dan Winchester
skill list end cap
guru web hostcom

Find me again on Freelancers.net

Search Bot Crawler - Perl

posted on 13/12/2003

Hi,

We've got more work then we can handle at the moment and are therefore considering outsourcing this project.

Our company requires a crawler, to index thousands of external websites quickly and efficiently. We'd prefer it coded in Perl, although C++ or PHP proposals may be considered. Indexed information must be retreivable from a MySQL or PostgreSQL database in under a second.

Site visitors will be able to submit their website for spidering. Once approved by us, the link is the activated. The spider will then need to go to this link and spider that page and any other page on that site. It must not go outside of that websites domain though.

We have run tests with scripts such as PHPDig, although with a database of just 4,000,000 entries, it starts to get clogged up and rarely finishes it's spidering. Your script must therefore be bug free, error correcting and use the lowest possible processing power.

The script must index all HTML, PHP, ASP, Perl and Coldfusion page extensions and ignore image links. A big plus would be the ability to index PDF files. It will also ignore all HTTP error codes and follow HTTP redirection codes.

The database structure must be optimised for pulling data out. You must be able to provide the SQL commands required to pull data out of the system, including boolean search algorithms. The ability to easily change the search engines ranking algorithm would also be a plus.

We look forward to receiving your quotes.



We are no longer accepting applications for this project.

Back to Project List