Hide content from web crawlers with php. Is it possible? - php

I am new to programming so please if I say something stupid don't judge me.
I was wondering if there is any way to trick web crawlers, so some of the content of a website will be different for a human visitor, than a web spider.
So here's an idea I thought.
Everytime a visitor enter a page, there will be a script that will identify users gender from facebook API. If there is a return (if user is connected to facebook in the same browser) then some code will be printed with PHP to the page code. If it's a crawler, there will be no return so the code will not exist in the source code of that page.
I know that PHP is a server side language, so web crawlers don't have permition to scan those codes. If I am not right, please correct me.
Thank you.

I think what you are trying to do can be accomplished with robots.txt
This file can sit at the root of your web directory and it defines the rules for web crawlers.
See here: http://www.robotstxt.org/

Related

How to assign external domain to the page of current website

Sorry if the question will seem little fuzzy and you will be tempted to down-vote it as soon as you read it. I am far from being an expert in system administration but I will try to explain the problem as clearly as I can.
I need to create a website where users can have profiles and attach an external domain to their profile page on this site. Let's say we have a site:
myresume.com
User will register and create a profile and his url will look like:
myresume.com/username
The feature I need to add should enable user to point his domain to his profile url.
Is it possible via php as my whole application will be written in laravel 5.2?
I will probably need to have my nameservers or just give user the IP address which he needs to point his domain to but then I will need to fetch the domain from my side somehow and point it to the url (myresume.com/username). How can it be done?
If someone could give an explanation how it is done or at least point where to search for an answer it would be amazing. Thanks.
You need to know how Apache NGINX works.
You need to know what $_SERVER['HOST'] is.
Logic to implement this.
This is how you will do it:
Host your app on a VPS / dedicated where your app is the 'default' with no other VirtualHost.
On your index.php file, define URL. If URL is set, let your Laravel's route choose which profile_id is matched with URL and then redirect the user to that URL or render the output.
Simple?

Stop Facebook probing a site for content with PHP

Okay, so when you post a link on Facebook, it does a quick scan of the page to find images and text etc. to create a sort of preview on their site. I'm sure other social networks such as Twitter do much the same, too.
Anyway, I created a sort of "one time message" system, but when you create a message and send the link in a chat on Facebook, it probes the page and renders the message as "seen".
I know that the Facebook probe has a user agent of facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php), so I could just block all requests from anything with that user agent, but I was wondering if there's a more efficient way of achieving this with all sites that "probe" links for content?
No there's no fool-proof way to do this. The easiest way to achieve something like this is to manually block certain visitors from marking the content as seen.
Every entity on the web identifies itself with a user agent, although not every non-human entity identfies itself in an unique way there are online database like this one that can help achieve your goal.
In case of trying to block all bots via robots.txt, not every bot holds up to that standard. I will speculate that Facebook may try to prevent malware from being spread across their network by visiting any shared link.
you could try something like this in your robots.txt file
User-agent: *
Disallow: /

Topic search in search engines

I am developing a website similar to a web forum. Users will post queries and others help them through their replies. In many websites like mine, the topic of the query will be included in the URL, e.g. www.sample.com/topic-1.html or www.sample.com/topic-2.html, and these links can be found via search engines.
How can I dynamically generate the HTML files and configure my website so that search engines can access it?
No, they usually aren't putting these files on web server manually. They are rewriting urls using web server (e.g. apache2/nginx). Check the response headers to get more info about what happens behind the scenes.
Please see How to create friendly URL in php?

how to do a page to see one new from database and can be searched in google php

I am making a website of news, and i have the next problem: I don´t know how to do a page to see one specific new from my database and people can see the page in google search. I don´t know how to do that, cause the news are in the database, not in a fisical page like html, so google can see it,no? So, that´s my problem, i wanna do a page that recieve a new and can be searched on google too.
Thanks =)
"can be searched on google too" You mean you want to make a web application.If your news are in database, you need to take help from some coding (PHP, ASP.net...etc) to fetch the data and display it in HTML, provided your website is already hosted.
If not, take help from some web hosting service provider.

Restrict site access to QR scans only

I saw a few questions out there already about ensuring site access comes from QR code scans, but they seemed to be focused on analytics purposes (tracking where traffic was coming from), whereas my interest is in security/privacy.
I want to set up a site that can only be accessed when a provided QR code is scanned. In other words, I don't want the URL that the QR code possesses to be able to just be manually typed/pasted in for site access via other means.
I've been googling this issue for a bit with no luck whatsoever. I'm trying to think of a way with referring URLs or other means to ensure that a person arrived at the site by actually scanning the provided QR code.
EDIT: The solution would need to be scanner-independent as well (i.e. I cannot force users to download and use a specific QR scanner app) and cross-platform (Android + iOS + WinMo + BlackBerry, etc.).
Now I submit the issue to you wonderful folks.
We got something the same at our company. We provide a link like:
zxing://scan/?ret=http%3A%2F%2Ffoo.com%2Fproducts%2F%7BCODE%7D%2Fdescription&SCAN_FORMATS=UPC_A,EAN_13
Where {CODE} is the code which is returned in the QRCode. So what you can do is create an url like above (see more info for a link). And then put a encrypted data in the QRCode, so only if this url is clicked by the user and the data of the QRcode is correct. People can continue to go on your website. This way if the qrcode is leaked, they won't know the site. And if they know the site, the code is encrypted.
If people scan the barcode by clicking on your website. The zxing will open a new browser with the URL and the {CODE} filled with the scanned code.
But, people do need the barcode scanner from android or iphone.
More info:
zxing download / homepage
zXing scanning from w
You can't ensure that the URL came from scanning the QR code, that isn't possible. QR codes are just a method of encoding text, once the user knows the text they can do whatever they want with it.
You can, however, restrict the usefulness of the QR code so even if it is leaked it isn't useful. If possible, I would start by generating the QR codes on-demand with a random seed and have them expire shortly thereafter. This would make it so even if the QR code were leaked, it wouldn't be useful for very long.
About the best you can do is set a query string in your QR code. Something like:
http://www.example.com/myapp
Could be changed to something like:
http://www.example.com/myapp/?qrcode=1
This can then be handled in PHP with:
if(!isset($_GET['qrcode'])) die();
The problem with this, of course, though, is that anyone with the URL could simply navigate directly to that URL in their normal web browser.
This isn't something you can prevent, however.
You can also check whether $_SERVER['HTTP_USER_AGENT'] claims to be a mobile phone. Here's another question on the topic.
You could add parameters, but ultimately QR codes are just a method of encoding text, so whatever you encode can be typed into a browser if someone knows what's encoded.
If you are making post call to any web URL from QR code, then whatever body you are sending with it, will not be visible unless user went through QR scan mode.So by just entering Web URL user will not able to access web URL contents.

Categories