Only PDFs listed in search

I’ve recently been asked to take over and redevelop a site. Curiously, the only pages Google has indexed of the old site are PDF versions of the content (apparently generated by a Joomla widget). Bing lists nothing at all.

There’s nothing in the robots.txt or any meta tags that would have caused this, and no analytics code in place. As the site is being started from scratch I have no access to anything in the former CMS or .htaccess.

Some sample URLs, stripped of domain.

Page: index.php?option=com_content&task=view&id=17&Itemid=31
PDF: index2.php?option=com_content&do_pdf=1&id=17

Any ideas what may have caused this, and whether it might affect indexing in future? The new site will have a different structure and friendlier URLs.

I’m curious … are the old pages still there and live … and Google isn’t finding any of them at all? I could understand if it was prioritising the PDFs over Joomla-based pages (heck, I would prioritise what I scrape off the bottom of my shoe over most Joomla-based pages, from the cruft:content ratio alone), but to only index PDFs that I assume are only/mostly linked from the HTML pages but not to have any trace of the HTML pages themselves is distinctly odd. It sounds like there should be something in the robots.txt or <meta> tags that is blocking indexing, but you say that’s not the case? What about canonical tags?

Yes, the pages were live at time of searching (though not for much longer as I’ve just switched nameservers).

No canonicals, but just noticed the pages all have a “verify-v1” meta. Might this, in the absence of any Google Analytics scripts, have something to do with it?

Just out of curiosity, what does a site:domain.com search produce? (Quickly now, before the nameserver change takes effect. ;))

My findings were based on a site:domain search.

Phrase searches also return only the PDFs, and a phrase search for the Home page’s long, unique, page title element that’s not present in the body returns nothing.

It could be sandboxed due to originality issues or something along those lines.
Have you looked in Google Webmaster Tools/Analytics?
Try a fetch and see if any problems come up.

Despite the verify tag, the owner isn’t sure whether Google Analytics stats were ever gathered and isn’t keen on querying the former developer. He thinks some SEO was done, and the old site did have descriptions and keywords. The latter a bit clumsy perhaps - 50 words including phrase permutations, mis-spellings, terms not in the content etc. - but not obviously overstuffed.

Due to the former site having been based around an abandoned business model, the site currently has only an “under development” page.

Are there any pros or cons in registering the domain with Webmaster Tools/GA before the new site is completed?

I’m not aware of any downsites using WT/GA at any time.