To get me where I need to be, there are several problems that I need to solve in order to automate a process.
For the purposes of this post, I will only focus on the first problem.
There is a website where I download Excel workbooks (1 worksheet) with data. These are .xls files.
To download the files, I click on a link in my browser, which calls a PHP script. The PHP script simply takes a parameter in the URL, and retrieves the Excel file. Then a download window pops up (asking if I want to download the file).
My question is, how can I write a Perl script which can call the PHP script over the web, feed it the parameter, and save the .xls file to my hard drive. It has to be automated or it’s worthless.
Well, I don’t know Perl that well, but I can tell you to take Excel out of the equation. The problem you’re having is completely independent of the type of file you’re downloading.
As for the PHP script, I’m assuming the links you’re clicking on look something like: download.php?file=test.xls
If that’s the case, what you would do is create a Perl download program that just changes the “text.xls” portion.
There is a perl library LWP which will enable you to do this.
#!perl.exe
use strict;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");
# Create a request
my $req = HTTP::Request->new(GET => 'http://www.domain.com/app.php?xls=a.xls');
# Pass request to the user agent and get a response back in the files
my $outfile = 'c:\ emp\\a.xls';
my $res = $ua->request($req, $outfile);
unless ($res->is_success) {
print "Problem downloading xls \
";
}
exit;
c:\Perl64>test.pl
HTTP/1.1 200 OK
Connection: close
Date: Mon, 31 Jan 2011 07:43:14 GMT
Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 mod_auth_p
assthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
Content-Type: text/html
Client-Date: Mon, 31 Jan 2011 07:43:15 GMT
Client-Peer: 174.120.156.10:80
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
X-Powered-By: PHP/5.3.1
The excel download pages are for <b>Premium Members</b> only. You can download u
p to 10 years of annual financial data and 20 quarters of quarterly data. Please
upgrade your membership to continue. <a href=“/redirect.php?id=13&url=/membersh
ip/membership_upgrade.php”>Free Trial</a>.<br><br> <a href=“/redirect.php?id=13&
url=/membership/membership_upgrade.php”>Continue…</a>
The authorisation doesn’t seem to be challenge/response from Apache which the authorization method you used would have coped with. Rather the login is via HTML with, I speculate, some sort of session parameter.
I am not sure if I can help you much further. You can sign up for a free trial but not without giving you credit card details and I’m afraid I don’t fancy doing that.
If I was you I think I would try to ascertain how the site handles the login details. If it holds a session variable in a cookie that is something you might be able to work with.
It is unlikely that they would store a password in the cookie. It is more likely to be either an encrypted form of it or more likely a unique digest of the username/password and possibly one other piece of information.
In the worst case the digest might change and get updated on each access. In that case it will complicate matters further for you.
Found a couple of pages that might help
A guide to using lwp with specific emphasis on cookies and authenication