Data Mining / Scraped Content

Hello,

I’ve got a couple of ideas in the area of data mining, data analytics with scraped content.
For example: http://www.indix.com/usecases/enrich/ since they display an ASIN there, it’s most likely that they get some data from Amazon. Even if not, there are serveral services who get data from Amazon and A states in their TOS, that they don’t allow software to work with the displayed data.

(“You may not incorporate any portion of the Amazon Software into your own programs or compile any portion of it in combination with your own programs”)

Now I guess that’s kind of a grey zone and construable. Google kind of is doing exactly this, but Amazon would never disallow Google to index Amazons pages. (just one example - there are plenty of other websites and services in this combination).

How to handle this for a business idea? I’d like to generate content analysis, based on other websites content. Some might not state anything about this issue.

Thanks,
Transmitter

Have you tried contacting the sources?

An API or at least an agreement could save you much grief

Yes and no. For Amazon there isn’t any API for my idea.
And for another idea, it might not be the right way to go.
I mean … Google didn’t contact me and ask me whether it’s fine to put my website/news/events in their index. They just offer a service where I can remove the data if it’s too late and I have to register with them (Webmastertools) or if I know it in advance I could tell them in my robots.txt.

So maybe let’s move the question a bit:
If the webmaster didn’t block spiders via the robots.txt, I am allowed to spider and crawl their data. What am I allowed to do with the data in that case?

Yes, I have also no any idea about api for Amazon. But I am also interested to know about it. Please also leave me also a results about these. It is really very important for a ecommerce bisiness.

Post edited by cpradio to remove link drop

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.