doubledee — 2013-12-11T22:10:31-05:00 — #1
Is there a way to stop an app like SiteSucker from "cloning" your entire website without your knowing or your permission? :eek:
ralphm — 2013-12-11T23:03:28-05:00 — #2
There certainly is! Don't put your site online.
doubledee — 2013-12-11T23:15:42-05:00 — #3
Some of you may not know this, but several years ago Ralph failed miserably as a stand-up comedian.
He was thrown into an alleyway, but fortunately Hawk come along and gave him a role here at SitePoint.
And ever since, we have been "blessed" with his comedic ways...
God Bless the misfits of this world...
ralphm — 2013-12-12T01:55:20-05:00 — #4
So just live with it. There are more important things to worry about, and really, to think that someone even wants to copy your site is a bit of a stretch—some would call it 'self flattery', shall we say. You are better off just getting on with running a business and take reasonable steps to back up your site etc. There will always be criminals and scammers out there, and you have to accept that, just like you accept viruses and deal with them when you get sick. It's just the world we live in.
john_betong — 2013-12-12T04:19:43-05:00 — #5
How do you know you are being cloned? Is SiteSucker in your browser logs?
Is your site material all your own work? If so then with a bit of luck Google will recognise your original content and increase your "SEO Brownie Points" because of the links to your site.
Copying is not theft: http://youtu.be/IeTybKL1pM4
stevie_d — 2013-12-12T07:44:06-05:00 — #6
Exactly. Once you've put your site online, you've made it available for other people to copy. Sure, you can keep an eye on things and use various plagiarism checkers, then hunt them down and kill them with spears or lawyers (whichever is more effective), report them to Google and their hosting providers ... but that's all a bit "shutting the stable door".
Over the years we've had a few problems with scraper sites stealing content off SPF, and it comes down to a decision each time how much effort you are prepared to put into pursuing each case and how much you are losing by it.
Of course, there is a third option, which is to put it behind a paywall, or at least a restricted registration zone, but as that will play hell with your Google rankings and usability, it's really a last resort.
doubledee — 2013-12-12T10:43:47-05:00 — #7
You're so coy, Ralph! :lol:
Right. And what was implied by my OP is that it is scary to think that there are tools out there which can whiz through your site and make a carbon copy of it in minutes, if not seconds.
So I was wondering if there was any easy way to detect that and stop it.
Admittedly, if Jane User is bored - or nefarious - and wants to spend all month manually saving each page on my website, there is little I can do.
But if someone fires off a script that automates that process, then maybe there is some way to detect it and "cut it off at the pass"?
If a person's website is mostly content, and someone could make a carbon copy of it in minutes, and then publish your content under a new domain name (e.g. www.DebbiesKnockOffSite.com) then that would be a big issue.
In fact, I JUST read a really fascinating article in Inc. magazine - old paper copy from my accountant, so sorry, no links - that talked about this guy in Germany that was basically stealing people's entire websites, changing the logos, and creaming his competition. (Dude is a multi-millionaire doing this.)
Granted, I think what he was doing was paying people in some 3rd world country to code exact copies of the sites he was stealing, but similar concept...
Understood, but I'm just trying to think 10 steps ahead, and looking for ways to cut off the bad guys.
(If a neighbor of yours tapped into your water line, you'd notice your water bill going up and investigate. Maybe there is some counter-tool out there that detects when someone is scanning your directories and making massive copies of things?! I dunno. Maybe that is wishful thinking?)
doubledee — 2013-12-12T10:49:23-05:00 — #8
No, right now my website is safe from everyone, because it is eternally stuck in "Development Mode" on my laptop!!! :lol:
But s-o-m-e-d-a-y it will be on the Internet...
That's my plan.
Wow! That is one funky video?! (:
Not sure who did that or what they were trying to say/prove, but I bet you the RIAA would disagree that "Copying Is Not Theft"...
technobear — 2013-12-12T11:00:37-05:00 — #9
As Ralph says, once the site's on-line, there is no way to absolutely guarantee that it won't be copied. However, you could make it more difficult by adding something like [a black hole or [URL="http://www.crawltrack.net/crawlprotect/"]Crawl Protect](http://perishablepress.com/blackhole-bad-bots/). (Don't be put off by the curious English on the Crawl Protect site; the author is not a native speaker.)
mittineague — 2013-12-12T14:14:25-05:00 — #10
I know Flood Control works well for limiting form submits. I wonder if there's anything like that for GET requests?
Crawl Protect looks like it relies on User-Agent to block crawlers, but User-Agent isn't reliable.