PSA: How to Prevent OpenAI’s Crawlers from Scraping Your Websites
Thread poster: Mr. Satan (X)
Mr. Satan (X)
Mr. Satan (X)
English to Indonesian
Aug 9, 2023

Hello all,

FYI, OpenAI has published a technical guide for webmasters to prevent their crawlers from mining parts or all of your websites. Some translators might object to having their web contents being used to train OpenAI’s GPT engine and improve upon its performance. Those inclined to do so can follow the instructions from the li
... See more
Hello all,

FYI, OpenAI has published a technical guide for webmasters to prevent their crawlers from mining parts or all of your websites. Some translators might object to having their web contents being used to train OpenAI’s GPT engine and improve upon its performance. Those inclined to do so can follow the instructions from the link below.

https://platform.openai.com/docs/gptbot

Please bear in mind, that restricting access to their crawlers could be detrimental to your traffic. In other words, your websites may rank lower in the search results, reducing visibility amid the already congested language industry. The decision is entirely yours.

Ciao
Collapse


Zea_Mays
 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 07:06
Danish to English
+ ...
We shouldn't forget that we want to be found, not ignored Aug 9, 2023

I asked Bard the following question:

'Can you suggest some competent freelance translators with IT knowledge for French to Danish?'

It suggested three translators with brief summaries for each. I happened to find myself as the first on the list. After changing the source language to English, I got three other results.

So instructing my website to deny AI crawlers access would be like cutting off my nose to spite my face.

Our energy would be bet
... See more
I asked Bard the following question:

'Can you suggest some competent freelance translators with IT knowledge for French to Danish?'

It suggested three translators with brief summaries for each. I happened to find myself as the first on the list. After changing the source language to English, I got three other results.

So instructing my website to deny AI crawlers access would be like cutting off my nose to spite my face.

Our energy would be better spent marketing our services, making AI work for us, rather than complaining about AI, a pointless exercise.
Collapse


Jorge Payan
 
Zea_Mays
Zea_Mays  Identity Verified
Italy
Local time: 08:06
Member (2009)
English to German
+ ...
SERPs Aug 9, 2023

Mr. Satan wrote:

Please bear in mind, that restricting access to their crawlers could be detrimental to your traffic. In other words, your websites may rank lower in the search results, reducing visibility amid the already congested language industry. The decision is entirely yours.

Ciao


Hm, I don't believe this has that impact on Google rankings as their robots are independent.
IMO it would have if you'd use general robots access restriction for example.

OpenAI is making a flight forward for the copyright infringement issue:
Until today they trained their machine with information on the web without asking,
where on many websites you see a clear copyright note.


 
Zea_Mays
Zea_Mays  Identity Verified
Italy
Local time: 08:06
Member (2009)
English to German
+ ...
Dicing with AI Aug 9, 2023

Thomas T. Frost wrote:

I asked Bard the following question:

'Can you suggest some competent freelance translators with IT knowledge for French to Danish?'

It suggested three translators with brief summaries for each. I happened to find myself as the first on the list.


I asked Bard the same and you are not among the three (unless your real name is different and you are studying Communication & IT)...


 
Zea_Mays
Zea_Mays  Identity Verified
Italy
Local time: 08:06
Member (2009)
English to German
+ ...
Where Bard's proposals come from Aug 9, 2023

Me asking Bard: Are you allowed to randomly propose professionals here? Or are they paying for it?

Bard: I am not allowed to randomly propose professionals here. I can only propose professionals who have agreed to be listed on my service. This is to ensure that the professionals I propose are qualified and experienced, and that they are willing to work with me.

The professionals I propose are not paying me for my services. I am a free service that is provi
... See more
Me asking Bard: Are you allowed to randomly propose professionals here? Or are they paying for it?

Bard: I am not allowed to randomly propose professionals here. I can only propose professionals who have agreed to be listed on my service. This is to ensure that the professionals I propose are qualified and experienced, and that they are willing to work with me.

The professionals I propose are not paying me for my services. I am a free service that is provided by Google. I am able to propose professionals for free because I am funded by Google.

I hope this answers your question. Let me know if you have any other questions.

Me asking Bard: How did they agree to be listed on your services? Can you provide an url where this can be done?

Bard: Professionals can agree to be listed on my services by creating a profile on Google My Business. Google My Business is a free service that allows businesses to create a listing on Google Maps and Google Search. When a professional creates a profile on Google My Business, they can choose to include their skills, experience, and availability. This information is then used to generate search results for people looking for freelancers in their area.

Here is the URL where professionals can create a profile on Google My Business: https://business.google.com/create

Please note that I am not affiliated with Google My Business. I am simply a large language model that is trained on a massive dataset of text and code. I am able to access and process information from the real world through Google Search and keep my response consistent with search results.

[Bearbeitet am 2023-08-09 16:16 GMT]
Collapse


 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 07:06
Danish to English
+ ...
Not important Aug 9, 2023

Zea_Mays wrote:

I asked Bard the same and you are not among the three (unless your real name is different and you are studying Communication & IT)...


I guess it takes many different parameters into consideration and doesn't necessarily give the same reply every time, just like Google search. But the point is not if it's me or someone else but that it can work as a marketing channel.

I don't have any content on my website that can be of much use for AI anyway, so I would gain nothing by banning it.

The copyright question is another story. As I understand it, AI does not store copies of the crawled sites but 'learns' from them, just like a human reader would. So is that allowed or not? I'm sure many lawyers can make a lot of money out of that question.


 
Zea_Mays
Zea_Mays  Identity Verified
Italy
Local time: 08:06
Member (2009)
English to German
+ ...
copyright Aug 9, 2023

Thomas T. Frost wrote:
The copyright question is another story. As I understand it, AI does not store copies of the crawled sites but 'learns' from them, just like a human reader would. So is that allowed or not? I'm sure many lawyers can make a lot of money out of that question.

If you use other people's material for business purposes this is a copyright infringement.
I think this will become a big issue very soon.


 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 07:06
Danish to English
+ ...
Where Bard's proposals come from Aug 9, 2023

Zea_Mays wrote:

Me asking Bard: Are you allowed to randomly propose professionals here? Or are they paying for it?


Interesting about Google My Business. I don't have a Google Maps or another other Google entry or profile, just a website, my Proz profile and LinkedIn. As these sites are available to the public, any crawler can use them.


 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 07:06
Danish to English
+ ...
Copyright Aug 9, 2023

Zea_Mays wrote:

If you use other people's material for business purposes this is a copyright infringement.
I think this will become a big issue very soon.


With that definition, none of us could do much research, as it's for business purposes. But I don't think that definition of copyright would stand up in court.

We can't do anything about this except follow the saga unfold in the media.


 
Zea_Mays
Zea_Mays  Identity Verified
Italy
Local time: 08:06
Member (2009)
English to German
+ ...
Copyright issue Aug 9, 2023

https://completemusicupdate.com/article/sarah-silverman-joins-authors-in-suing-chatgpt-for-copyright-infringement/

 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 07:06
Danish to English
+ ...
Lawsuit Aug 9, 2023



I know one or more such lawsuits are underway. They will probably end with some sort of compromise in a few years.

But this doesn't change anything for us here and now. AI is here, and we cannot change that, so we have to find out how best to live with it without damaging our own interests more than theirs. I just don't see how hiding our business websites could possibly be in our own interest.

I too worry about the impact of MT and AI, but raging about something we can't change will not help us. We have to adapt to the real world.


Dan Lucas
Jorge Payan
Maciek Drobka
 
Zea_Mays
Zea_Mays  Identity Verified
Italy
Local time: 08:06
Member (2009)
English to German
+ ...
Who doesn't fight... Aug 10, 2023

Thomas T. Frost wrote:

I know one or more such lawsuits are underway. They will probably end with some sort of compromise in a few years.

But this doesn't change anything for us here and now. AI is here, and we cannot change that, so we have to find out how best to live with it without damaging our own interests more than theirs. I just don't see how hiding our business websites could possibly be in our own interest.

I too worry about the impact of MT and AI, but raging about something we can't change will not help us. We have to adapt to the real world.

I wouldn't be too sure about that. Who doesn't fight has already lost.

The reason why OpenAI is now giving the option to block their crawlers is precisely the copyright issue.
They're certainly not targeting translators and their little promotional websites,
but people who don't want their work to be used by OpenAI to make money without paying for it.

Think of yourself as a content owner with an extensive informative blog that you have written over many years, not as a translator.

There's also a difference between the US and the EU. The latter takes copyright (and privacy) much more seriously than the US.


 
Thomas T. Frost
Thomas T. Frost  Identity Verified
Portugal
Local time: 07:06
Danish to English
+ ...
The real world Aug 10, 2023

Zea_Mays wrote:
I wouldn't be too sure about that. Who doesn't fight has already lost.


We can't disagree, but some fights are lost in advance, so the resources poured into them will be lost as well. In the case of AI, you can at best mitigate the problem by blocking access to content if that is more of an advantage than a disadvantage to you, bearing in mind that things are often not black and white.

Zea_Mays wrote:
The reason why OpenAI is now giving the option to block their crawlers is precisely the copyright issue.
They're certainly not targeting translators and their little promotional websites,
but people who don't want their work to be used by OpenAI to make money without paying for it.
Think of yourself as a content owner with an extensive informative blog that you have written over many years, not as a translator.


If I had a website with non-promotional content, I would indeed consider blocking AI access if I could see no benefit of giving it access.

If you have a blog, you need to consider what its purpose is. If it's a way to promote your services by making you appear competent, it's probably in your own interest that crawlers, traditional and AI, have access to it, as its promotional value may be higher than the tiny value you may lose in transferring a bit of knowledge to AI, but that's for you to weigh up. But in general, if something serves a promotional purpose, the whole point is to make it as widely known as possible.

Don't forget that if you have a blog or any other website, anyone can learn and copy from it, use it for their own work and (illegally) post your articles on their own ad-infested websites. Scraping is a big problem. You can often get the content taken down, but it takes time to send requests to Google, thieves' web hosting companies, the ad networks they use, etc. If the stolen content is published on a server in a 'lawless' jurisdiction, on the other hand, you can't do anything.

To sum up, I believe it's more beneficial to make calm, considered decisions than to fly into a range, causing more harm to yourself than AI.


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


PSA: How to Prevent OpenAI’s Crawlers from Scraping Your Websites







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »