Search for content in message boards

Welcome to the Content Publishers' Feedback forum

Replies: 92

Re: Welcome to the Content Publishers' Feedback forum

Posted: 25 Oct 2010 10:31PM GMT
Classification: Query
I'm the coordinator of the Genealogy Trails History Group.

I did not receive a copy of your announcement email. I had no idea this company was planning on going through our sites and taking our data. How was I supposed to know to contact you and ask you to not do something I didn't know anything about?? You should be contacting every website administrator and asking for permission to harvest data from their sites. The only reason I know anything about this new project of yours is because a concerned researcher wrote me and forwarded a copy of your announcement email.
In that announcement, you stated "To do this we will build an index of essential information in the record (e.g. the website link, the matching name, date, place), and make this available to our users through our search tools."

Translated to actions, this means you are going through our websites, using our organization's bandwidth to harvest our data, then you will strip everything out from that data (losing vital information in the process?) leaving only the names/dates/places part of it to create your own index of our work. For non-tech folks, that means you will take data from all the sites you are harvesting from, and jumble up all that data into one or more databases and those databases will reside on your website under your control. Researchers will then pay you for the opportunity to view your index of our work. We did the transcribing work and you get the money from people to view it??? And what's worse is we may even end up paying because your spiders may eat through our bandwidth quota to harvest the data contained in our 124,000+ webpages!

Your announcement email also says "For example, some sites contain a robots.txt file telling search engines (such as Google) not to crawl that site"
What you DON'T tell us webmasters is the one VITAL piece of information we need to block your spiders -- and that is the name of the bot agent you use. You bring up Google. Let's talk about Google then.... THEY follow "industry standards" by telling everybody right up front that their spidering agent's name is googlebot. What is yours? Where's the webpage on your site that gives this information freely and quickly without having to be a member to view? I spent a good chunk of time over the past few days looking for that information. Without it, we cannot protect our sites from your spider, and yet, you conveniently left that critical piece of information out of your announcement email.

Another question --- How often is that spider going to be hitting our sites, and how much of our monthly bandwidth quota is your spider going to eat through? Will it slow our sites down for the researchers trying to legitimately view the data?

Please do not compare what you're doing to what Google does because you're actually taking data from the sites instead of just reading it and you'll be using that data to build your own indexes. Google presents links to sites to visit -- without harvesting the actual data.

You say that we can ask that our data not be harvested by sending an email... which I did early this afternoon. I have not received any response to that email.

You say people can view the data for free if they create a free account. It took me 10 minutes to find the one button that creates a free account, and then it rejected me! Apparently at some time, and I'm remembering it to be at least 10 years ago - I took advantage of one of your freebie promotions and signed up. You still had my personal data stored on file from that long ago. Amazing. And scary to think you guys don't delete anything. What guarantee do sites like ours have that when we ask for our harvested data to be deleted, you actually will?

This should not be my responsibility to have to contact the multi-million dollar corporation of Ancesty.com and ask that it not take the data from our websites for its own use. But since you require me to "opt out", this is my public request that no site under our genealogytrails domains be spidered by any of your agent bots and that no data be harvested from our sites for your use. Hopefully you will respect that request. And please post the name of all spidering agents and bots you will be using to harvest data so we can act to protect our sites

Thank you.


Kim Torp
Genealogy Trails History Group
SubjectAuthorDate Posted
fayrita54 25 Oct 2010 5:21PM GMT 
Brian Edwards 26 Oct 2010 3:59AM GMT 
kimmysuet|MA2... 26 Oct 2010 4:31AM GMT 
Brian Edwards 26 Oct 2010 10:54PM GMT 
kimmysuet|MA2... 27 Oct 2010 1:53AM GMT 
AGHatchett3rd 27 Oct 2010 2:31AM GMT 
kimmysuet|MA2... 27 Oct 2010 5:14AM GMT 
AGHatchett3rd 27 Oct 2010 5:51AM GMT 
fayrita54 27 Oct 2010 7:48AM GMT 
AGHatchett3rd 27 Oct 2010 8:37AM GMT 
per page

Find a board about a specific topic