Simple Perl/PHP/Excel Problem.... (I think)

FYI: I know Perl and Excel, but not PHP.

To get me where I need to be, there are several problems that I need to solve in order to automate a process.

For the purposes of this post, I will only focus on the first problem.

There is a website where I download Excel workbooks (1 worksheet) with data. These are .xls files.

To download the files, I click on a link in my browser, which calls a PHP script. The PHP script simply takes a parameter in the URL, and retrieves the Excel file. Then a download window pops up (asking if I want to download the file).

My question is, how can I write a Perl script which can call the PHP script over the web, feed it the parameter, and save the .xls file to my hard drive. It has to be automated or it’s worthless.

Well, I don’t know Perl that well, but I can tell you to take Excel out of the equation. The problem you’re having is completely independent of the type of file you’re downloading.

As for the PHP script, I’m assuming the links you’re clicking on look something like: download.php?file=test.xls

If that’s the case, what you would do is create a Perl download program that just changes the “text.xls” portion.

You might want to take a look at this: http://articles.sitepoint.com/article/file-download-script-perl

There is a perl library LWP which will enable you to do this.

#!perl.exe

use strict;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");

# Create a request
my $req = HTTP::Request->new(GET => 'http://www.domain.com/app.php?xls=a.xls');

# Pass request to the user agent and get a response back in the files
my $outfile = 'c:\	emp\\a.xls';
my $res = $ua->request($req, $outfile);
unless ($res->is_success) {
	print "Problem downloading xls \
";
}
exit;

I tried the following code and below that is the response from the server when I ran it. (Note: I changed all pertinent info as to my login)

I am a premium member, but I have to authenticate in the script to download, but this code isn’t working.

use LWP::UserAgent;

$ua = new LWP::UserAgent;

$req = new HTTP::Request GET => ‘http://www.somewhere.com/download_something.php?code=1234’;
$req->authorization_basic(‘user’, ‘pass’);

print $ua->request($req)->as_string;

c:\Perl64>test.pl
HTTP/1.1 200 OK
Connection: close
Date: Mon, 31 Jan 2011 07:43:14 GMT
Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 mod_auth_p
assthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
Content-Type: text/html
Client-Date: Mon, 31 Jan 2011 07:43:15 GMT
Client-Peer: 174.120.156.10:80
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
X-Powered-By: PHP/5.3.1

The excel download pages are for <b>Premium Members</b> only. You can download u
p to 10 years of annual financial data and 20 quarters of quarterly data. Please
upgrade your membership to continue. <a href=“/redirect.php?id=13&url=/membersh
ip/membership_upgrade.php”>Free Trial</a>.<br><br> <a href=“/redirect.php?id=13&
url=/membership/membership_upgrade.php”>Continue…</a>

c:\Perl64>

I take it we are dealing with

http://www.gurufocus.com/download_financials.php?code=1234

The authorisation doesn’t seem to be challenge/response from Apache which the authorization method you used would have coped with. Rather the login is via HTML with, I speculate, some sort of session parameter.

Yes that’s the site. What can I do? I know Perl okay, but I am not familiar with HTTP or authentication at all…

I could contact the site but I’m not sure if they would help me with something like that.

I am not an expert on this kind of stuff, but I did an NS Lookup

nslookup 174.120.156.10
Server: home
Address: 192.168.1.254

Name: server.gurufocus.com
Address: 174.120.156.10

I am not sure if I can help you much further. You can sign up for a free trial but not without giving you credit card details and I’m afraid I don’t fancy doing that.

If I was you I think I would try to ascertain how the site handles the login details. If it holds a session variable in a cookie that is something you might be able to work with.

Well the browser keeps me logged in forever. I never have to sign in. I viewed my cookies and they do have a bunch of cookies for the site.

Cookie for gurufocus.com, uchome_loginuser the content is my user name. If I was looking for my password in a cookie, what am I looking for?

I just found a cookie for www.gurufocus.com only one cookie in it: phorum_session_v5

Content, the first part is my username, followed by some gibberish, which I assume is my encoded password?

How can I use it in Perl?

It is unlikely that they would store a password in the cookie. It is more likely to be either an encrypted form of it or more likely a unique digest of the username/password and possibly one other piece of information.

In the worst case the digest might change and get updated on each access. In that case it will complicate matters further for you.

Found a couple of pages that might help

A guide to using lwp with specific emphasis on cookies and authenication

http://lwp.interglacial.com/ch11_01.htm

The official cookbook

Also have a look at the perl documentation for HTTP::Cookies and LWP:UserAgent.

Hope this helps