How to get Facebook Debugger to read canonical URL?

How to get Facebook Debugger to read canonical URL? - php

So this is happening when I test my website using Facebook's Open Graph Object Debugger:
It doesn't like the trailing numbers after the profile page. But I have both of these defined properly:
<meta property="og:url" content="http://www.website.com/profile/139">
<link rel="canonical" href="http://www.website.com/profile/139">
I've tried for hours and it just keeps redirecting to the homepage:
Is there anything I can add to my .htaccess file or PHP header to prevent this 301 redirect?
May be related to the way Facebook/Google handle URL parameters: http://gohe.ro/1fpOA0N

The anser was a problem with our domain host WP Engine, who tricks spiders into ignoring pure numeric strings at the end of page URL's. Pertains specifically to:
Googlebot (Google's spider)
Slurp! (Yahoo's spider)
BingBot (Bing's spider)
Facebook OG/Debugger
For example, the following URL:
http://www.website.com/profile/12345
Will be interpreted to these bots as:
http://www.website.com/profile
However, if the string is non-numeric the bots will recognize it. This is done for caching purposes. But again, this pertains only to WP Engine and a few other hosting providers.

Facebook treats the og:url Meta Tag as the Canonical for your page:
<meta property="og:url" content="http://www.yoursite.com/your-canonical-url" />
If your Canonical url is redirecting you are in fact creating a loop.
Don't redirect from your Canonical.
Canonical is the page which should be considered the better option for the spiders.
If a page has a Canonical url tag it means that it is NOT the best/default page but rather it is a lesser variation of the Canocical.

Related

Error loading jquery after migrating to https Wordpress [duplicate]

This morning, upon upgrading my Firefox browser to the latest version (from 22 to 23), some of the key aspects of my back office (website) stopped working.
Looking at the Firebug log, the following errors were being reported:
Blocked loading mixed active content "http://code.jquery.com/ui/1.8.10/themes/smoothness/jquery-ui.css"
Blocked loading mixed active content "http://ajax.aspnetcdn.com/ajax/jquery.ui/1.8.10/jquery-ui.min.js"`
among other errors caused by the latter of the two above not being loaded.
What does the above mean and how do I resolve it?

I found this blog post which cleared up a few things. To quote the most relevant bit:
Mixed Active Content is now blocked by default in Firefox 23!
What is Mixed Content?
When a user visits a page served over HTTP, their connection is open for eavesdropping and man-in-the-middle (MITM) attacks. When a user visits a page served over HTTPS, their connection with the web server is authenticated and encrypted with SSL and hence safeguarded from eavesdroppers and MITM attacks.
However, if an HTTPS page includes HTTP content, the HTTP portion can be read or modified by attackers, even though the main page is served over HTTPS. When an HTTPS page has HTTP content, we call that content “mixed”. The webpage that the user is visiting is only partially encrypted, since some of the content is retrieved unencrypted over HTTP. The Mixed Content Blocker blocks certain HTTP requests on HTTPS pages.
The resolution, in my case, was to simply ensure the jquery includes were as follows (note the removal of the protocol):
<link rel="stylesheet" href="//code.jquery.com/ui/1.8.10/themes/smoothness/jquery-ui.css" type="text/css">
<script type="text/javascript" src="//ajax.aspnetcdn.com/ajax/jquery.ui/1.8.10/jquery-ui.min.js"></script>
Note that the temporary 'fix' is to click on the 'shield' icon in the top-left corner of the address bar and select 'Disable Protection on This Page', although this is not recommended for obvious reasons.
UPDATE: This link from the Firefox (Mozilla) support pages is also useful in explaining what constitutes mixed content and, as given in the above paragraph, does actually provide details of how to display the page regardless:
Most websites will continue to work normally without any action on your part.
If you need to allow the mixed content to be displayed, you can do that easily:
Click the shield icon Mixed Content Shield in the address bar and choose Disable Protection on This Page from the dropdown menu.
The icon in the address bar will change to an orange warning triangle Warning Identity Icon to remind you that insecure content is being displayed.
To revert the previous action (re-block mixed content), just reload the page.

It means you're calling http from https. You can use src="//url.to/script.js" in your script tag and it will auto-detect.
Alternately you can use use https in your src even if you will be publishing it to a http page. This will avoid the potential issue mentioned in the comments.

In absence of a white-list feature you have to make the "all" or "nothing" Choice. You can disable mixed content blocking completely.
The Nothing Choice
You will need to permanently disable mixed content blocking for the current active profile.
In the "Awesome Bar," type "about:config". If this is your first time you will get the "This might void your warranty!" message.
Yes you will be careful. Yes you promise!
Find security.mixed_content.block_active_content. Set its value to false.
The All Choice
iDevelApp's answer is awesome.

Put the below <meta> tag into the <head> section of your document to force the browser to replace unsecure connections (http) to secured connections (https). This can solve the mixed content problem if the connection is able to use https.
<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
If you want to block then add the below tag into the <head> tag:
<meta http-equiv="Content-Security-Policy" content="block-all-mixed-content">

Its given the error because of security.
for this please use "https" not "http" in the website url.
For example :
"https://code.jquery.com/ui/1.8.10/themes/smoothness/jquery-ui.css"
"https://ajax.aspnetcdn.com/ajax/jquery.ui/1.8.10/jquery-ui.min.js"

In the relevant page which makes a mixed content https to http call which is not accessible we can add the following entry in the relevant and get rid of the mixed content error.
<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">

If you are consuming an internal service via AJAX, make sure the url points to https, this cleared up the error for me.
Initial AJAX URL: "http://XXXXXX.com/Core.svc/" + ApiName
Corrected AJAX URL: "https://XXXXXX.com/Core.svc/" + ApiName,

Simply changing HTTP to HTTPS solved this issue for me.
WRONG :
<script src="http://code.jquery.com/jquery-3.5.1.js"></script>
CORRECT :
<script src="https://code.jquery.com/jquery-3.5.1.js"></script>

I had this same problem because I bought a CSS template and it grabbed a javascript an external javascript file through http://whatever.js.com/javascript.js. I went to that page in my browser and then changed it to https://whatever... using SSL and it worked, so in my HTML javascript tag I just changed the URL to use https instead of http and it worked.

To force redirect on https protocol, you can also add this directive in .htaccess on root folder
RewriteEngine on
RewriteCond %{REQUEST_SCHEME} =http
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

#Blender Comment is the best approach. Never hard code the protocol anywhere in the code as it will be difficult to change if you move from http to https. Since you need to manually edit and update all the files.
This is always better as it automatically detect the protocol.
src="//code.jquery.com

I've managed to fix this using these :
For Firefox user
Open a new TAB enter about:config in the address bar to go to the configuration page.
Search for security.mixed_content.block_active_content
Change TRUE to FALSE.
For Chrome user
Click the Not Secure Warning next to the URL
Click Site Settings on the popup box
Change Insecure Content to Allow
Close and refresh the page

I found if you have issues with including or mixing your page with something like http://www.example.com, you can fix that by putting //www.example.com instead

I have facing same problem when my site goes from http to https. We have added rule for all request to redirect http to https.
You needs to add the redirection rule for inter site request, but you have to remove the redirection rule for external js/css.

I just fixed this problem by adding the following code in header:
<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">

#if (env('APP_DEBUG'))
<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
#endif
Syntax for Laravel Blade, Remember to use it for debugging only to avoid MITM attacks and eavs-dropping
Also using
http -> https
for Ajax or normal JS Scripts or CSS will also solve the issue.

If your app server is weblogic, then make sure WLProxySSL ON entry exists(and also make sure it should not be commented) in the weblogic.conf file in webserver's conf directory. then restart web server, it will work.

Meta Refresh without URL Rewrite

We're launching a members-only Wordpress site that is only capable of hiding pages/posts; however, part of our content is served up by an API that can't easily be hidden.
My best solution thus far is to embed an html meta redirect to the appropriate URL on a page that I can restrict using our Memberships plugin.
<meta http-equiv="refresh" content="0; url=https://www.example.com/?taxonomy=inventory" />
The only caveat to this method is that the URL is then exposed, and anyone could distribute the source.
Is there any way to use the meta redirect without rewriting the URL? I've tried a few things in the .htaccess file, but nothing has really yielded a viable solution.

Instead of using a meta refresh (which can easily be blocked with the right browser plugin), you should instead focus on adding a page and post hook that checks if the viewer has the appropriate permissions (logged in, member, etc) to view that page and redirect them server-side back to the homepage (or a custom error page).
This may not prevent the hidden page links from being shared, but it will prevent the content from being read.

Facebook sharer should pull images from fetched URL not from canonical URL?

Recently i have noticed that if someone shares my website URL in facebook sharer then it pulls images from canonical URL not from fetched URL that is happening because of i have added -
<meta property="og:url" content="http://www.mywebsite.com"/>
<meta property="og:description" content="description related to page/images">
but it is showing every time my home page's images not of fetched URL
my URL are looking like that-
canonical URL - http://www.mywebsite.com
fetched URL - http://www.mywebsite.com/tags/car
So i'm not getting any idea what do i do so facebook sharer always show my fetched URL's images.
I know this could be same question asked before like these-
Is it possible to extract metadata from fetched url instead of canonical url?
Canonical url being linked on Facebook rather than real URL. Dynamic OpenGraph tags coming up empty
but there is a same solution in both question that is -
I need to setup Intermediate URL redirection, so i just searched about the 301, and 302 redirection, but i'm not getting any idea how/where do i use it?
I need to do this for my wordpress and an other website(which is in zend framework).
Please tell me anything if anyone has did the same.
I will be grateful for Any help.

The problem you are facing is, that you are sending your home url as a canonical url of the shared url. This is wrong as a canonical url has to point to a resource with the same content as the fetched url. For a definition of what canonical urls are check RFC6596 or a good description from Google.
Pointing to the Index URL of you site, as you do, is not pointing to a canonical (equivalent) URL. By setting that URL you are saying to Facebook "You can go look there, you will find the same content as here. So just take everything from there." But I guess this is not what you intend.
If you still want to point to you index (which imho is abusing the system), then you can try also adding the metadata for an image, which should result in the image you provide to be used to represent the link:
<meta property="og:image" content="http://www.mywebsite.com/path/to/image.jpg" />
The reason, why you shouldn't point to your index as canonical: If a user A wants to share some specific content, user B clicking on the link in facebook won't find the expected content, instead he'll see the index page and doesn't know which content user A wanted to share.
The correct way to use the og:url meta, is to point to a real canonical (equivalent) URL which will show the smae content as the fetched URL. Often such a link is referred to as permalink. If you can't provide such an url just use the fetched url itself or leave it away. Pointing to the index is wrong.
For wordpress there is a plugin, which should do this correctly. And the open graph protocol is documented here.

Facebook uses og:url tag to consolidate the like and share count. Whatever url you will mention in og:url, facebook will share that url and increase the share count for that url. Otherwise, your likes and shares will be distributed among canonical urls.
<meta property="og:url" content="http://www.mywebsite.com"/>
If you set a og:image tag, then facebook sharer will pick that image. But make sure the image is of correct dimensions. Facebook check the dimension as well. I always use 600X315.
Check image sizes here.
<meta property="og:image" content="http://www.mywebsite.com/path/to/image.jpg" />
Once you are done don't forget to clear the cache.
Put your URL here
Fetch new scrape information to see the changes.
Meta tags like keyword, title and descriptions are used by search engines.
og tags are used by facebook.
and canonical urls are by rel="canonical"
For any url, whatever data you want Facebook to fetch, needs to be set in og tags.
And then debug the URL to see the new scrape information to see the changes.

Linkedin sharing urls / not parsing open graph

The Linkedin documentation can be found here
As it says, it needs:
og:title
og:description
og:image
og:url
Here is an example of my wordpress blog source code that for simplicity I use Jetpack plug-in:
<!-- Jetpack Open Graph Tags -->
<meta property="og:type" content="article" />
<meta property="og:title" content="Starbucks Netherlands Intel" />
<meta property="og:url" content="http://lorentzos.com/starbucks-netherlands-intel/" />
<meta property="og:description" content="Today I had some free time at work. I wanted to play more with Foursquare APIs. So the question: "What is the correlation of the Starbucks Chain in the Netherlands?". Methodology: I found all the p..." />
<meta property="og:site_name" content="Dionysis Lorentzos" />
<meta property="og:image" content="http://lorentzos.com/wp-content/uploads/2013/08/starbucks-intel-nl-238x300.png" />
In Facebook it works great, or you can see the meta data here. However LinkedIn is more stubborn and doesn't really parse the data even the If you're unable to set Open Graph tags within the page that's being shared, LinkedIn will attempt to fetch the content automatically by determining the title, description, thumbnail image, etc.
I know that I don't have the og:image:width tag but Linkedin doesn't even parse title, description or url. Any ideas to debug it?

I checked again my html and found some warnings/errors in metadata. I fixed them and all work good. So the solution if you encounter the same problem:
Check your html again and debug it. Even if the page load well in your browser, the LinkedIn parser is not as powerful in terms of small errors. This tool might help.

My very first suggestion is appending a meaningless query to the URL, so that LinkedIn thinks it's a new link (this doesn't affect anything else) i.e.:
http://example.com/link.php?42 or http://example.com/link.html?refid=LinkedIn
If that doesn't suit your needs, a more drastic measure is in order.
After making sure you don't have any errors in your console and validating your site using:
http://validator.w3.org/...
Add the prefix attribute to every tag (not to html tag), then re-sign in with your LinkedIn account to clear the cache...
prefix="go: http://ogp.me/ns#" i.e.:
<meta prefix="og: http://ogp.me/ns#" property="og:title" content="Title of Page" />
<meta prefix="og: http://ogp.me/ns#" property="og:type" content="article" />
<meta prefix="og: http://ogp.me/ns#" property="og:image" content="http://example.com/image.jpg" />
<meta prefix="og: http://ogp.me/ns#" property="og:url" content="http://example.com/" />
I hope one of these three solutions works for someone. Cheers!

If you're sure you've done everything right (using open graph meta tags, no errors on validator.w3.org) and it still is not working, be sure to try it with a different page, it might be a LinkedIn cache thing.
I had a <h1>Project information</h1> on my page, which LinkedIn used as the title for sharing the page, instead of the <title> or <meta property="og:title" [...]/> tag. Even though I did everything right. But when I completely removed this <h1>Project information</h1> from the page source, it kept using 'Project information' as the title even thought it wasn't on the page anymore.
After trying a different page, it worked.

I stumbled about the same problem for our Wordpress site. The problem is created by conflicting OGP and oembed headers in standard wordpress + yoast / jetpack seo plugin.
You need to disabled the oembed headers with this plugin (this has no side effects): https://wordpress.org/plugins/disable-embeds/
After that you can force a fresh link preview by appending a ?1 as some of you guys already pointed out!
I hope that fixes your problem.
I wrote a detailed explanation for the problem here: https://pmig.at/2017/10/26/linkedin-link-preview-for-wordpress/

Linkedin caches the urls so it's very practical to make sure that this is not your problem before starting to debug.
This might tool then might come in handy: https://www.linkedin.com/post-inspector/inspect/
Here you can preview your url and see how it looks like when sharing. It refreshes the caching as well so you can be sure if you have a problem or if it was the caching only.

After a long trial and error I found out that my .htaccess was somehow blocking the Linkedin robot (wordpress site). For those who use the ithemes security plugin for wordpress or another security plugin make sure that LinkedIn is not blocked.
Make sure there is no line like:
RewriteCond %{HTTP_USER_AGENT} ^Link [NC,OR]
The easiest way to check is to use wordpress default htaccess lines.
As mentioned before, make sure you don't retry cached pages in linkedin.

You can try this only once a week!
I had a link to my site and I wanted to customize the image Linkedin displayed. So I added open graph tags which didn't seem to render at all. Until I read this:
The first time that LinkedIn's crawlers visit a webpage when asked to share content via a URL, the data it finds (Open Graph values or our own analysis) will be cached for a period of approximately 7 days.
This means that if you subsequently change the article's description, upload a new image, fix a typo in the title, etc., you will not see the change represented during any subsequent attempts to share the page until the cache has expired and the crawler is forced to revisit the page to retrieve fresh content.
https://developer.linkedin.com/docs/share-on-linkedin

The solution for me was to add a hashbang. I am on an ajax style application which doesn't render the whole page, I think linkedin has a bit of a hissy fit about the text/image not being on the page on initial scrape, adding
%23!
to the end of my encoded url or
#!
to the unencoded url before sending it off to linkedin seemed to do the trick nicely for my share button popup. Not wsure if this is only Ajax/js apps or not but it certainly solved a couple of hours of effort for me.
I guess this is only useful if your application is setup to handle the escape_fragment in the url and render a static page not a dynamic one but I can't test this theory right now

This was happening on one of my client's sites as well. I discovered that the .htaccess file was blocking the site from LinkedIn if the user-agents contained the string "jakarta".
As soon as I remove this filtering, LinkedIn was able to access all of the required the OpenGraph (og) information when the client would post a link.

True, the documentation states that you can have: title, url, description, and image. But in reality, you have two options. Pick one of the two following sets and use it, as you have no other choice...
Set 1 Options
og:title
og:url
og:image
Set 2 Options
og:title
og:url
og:description
That is the reason why og:description is mysteriously missing from preview links. But if you drop image, then your description will finally display.
Try it: Wikipedia has an og description but no og image, while GitHub has both. Share Wikipedia and Share GitHub. Clearly seems like either you get a choice to display description or a choice to display image. I have spent weeks struggling with LinkedIn Support to correct this, but to no avail.

How to protect a site from (google) caching?

I would like to hide some content from public (like google cached pages). Is it possible?

Add the following HTML tag in the <head> section of your web pages to prevent Google from showing the Cached link for a page.
<META NAME="ROBOTS" CONTENT="noarchive">
Check out Google webmaster central | Meta tags to see what other meta tags Google understands.

Option 1: Disable 'Show Cached Site' Link In Google Search Results
If you want to prevent google from archiving your site, add the following meta tag to your section:
<meta name="robots" content="noarchive">
If your site is already cached by Google, you can request its removal using Google's URL removal tool. For more instructions on how to use this tool, see "Remove a page or site from Google's search results" at Google Webmaster Central.
Option 2: Remove Site From Google Index Completely
Warning! The following method will remove your site from Google index completely. Use it only if you don't want your site to show up in Google results.
To prevent ("protect") your site from getting to Google's cache, you can use robots.txt. For instructions on how to use this file, see "Block or remove pages using a robots.txt file".
In principle, you need to create a file named robots.txt and serve it from your site's root folder (/robots.txt). Sample file content:
User-agent: *
Disallow: /folder1/
User-Agent: Googlebot
Disallow: /folder2/
In addition, consider setting robots meta tag in your HTML document to noindex ("Using meta tags to block access to your site"):
To prevent all robots from indexing your site, set <meta name="robots" content="noindex">
To selectively block only Google, set <meta name="googlebot" content="noindex">
Finally, make sure that your settings really work, for instance with Google Webmaster Tools.

robots.txt: http://www.robotstxt.org/

You can use a robots.txt file to request that your page is not indexed. Google and other reputable services will adhere to this, but not all do.
The only way to make sure that your site content isn't indexed or cached by any search engine or similar service is to prevent access to the site unless the user has a password.
This is most easily achieved using HTTP Basic Auth. If you're using the Apache web server, there are lots of tutorials (example) on how to configure this. A good search term to use is htpasswd.

A simple way to do this would be with a <meta name="robots" content="noarchive"/>
You can also achieve a similar effect with the robots.txt file.
For a good explanation, see the official google blog on the robot's execution policy

I would like to hide some content from public....
Use a login system to view the content.
...(like google cached pages).
Configure robots.txt to deny Google bot.

If you want to limit who can see content, secure it behind some form of authentication mechanism (e.g. password protection, even if it is just HTTP Basic Auth).
The specifics of how to implement that would depend on the options provided by your server.

You can also add this HTTP Header on your response, instead of needing to update the html files:
X-Robots-Tag: noarchive
eg for Apache:
Header set X-Robots-Tag "noarchive"
See also: https://developers.google.com/search/reference/robots_meta_tag?csw=1

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.