Digg button rejects encoded url - php

I wrote a php site (it's still a prototype) and I placed a Digg button on it. It was easy but...
The official manual says: "the URL has to be encoded". I did that with urlencode(). After urlencode, my URL looks like this:
http%3A%2F%2Fwww.mysite.com%2Fen%2Fredirect.php%3Fl%3Dhttp%3A%2F%2Fwww.othersite.rs%2FNews%2FWorld%2F227040%2FRusia-Airplane-crashed%26N%3DRusia%3A+Airplane+crashed
So far it's good, but when I want to submit that URL to Digg, it is recognized as an invalid URL:
http://www.mysite.com/en/redirect.php?l=http://www.othersite.rs/News/World/227040/Rusia-Airplane-crashed&N=Rusia:+Airplane crashed
If I place a "+" between "Airplane" and "crashed" (at the end of the link), then Digg recognizes it without any problems!
Please help, this bizarre problem is killing my brain cells!
P.S. For purpose of this answer, urls are changed (to nonexisting ones) because, in the original, non-english sites are involved.

After you've urlencode()ed it, encode the resulting plus signs as well:
$encoded_url = urlencode($original_url);
$final_url = str_replace('+', '%2B', $encoded_url);
Or alternatively, you could replace spaces in your URL with + first, and then urlencode() the result:
$spaceless_url = str_replace(' ', '+', $original_url);
$final_url = urlencode($spaceless_url);
If your own site required the parameters in the query string to be encoded in the first place, you wouldn't have the issue (since there wouldn't be an unencoded space in the original URL).

Related

Slugs for SEO using PHP - Appending name to end of URL

Something I have noticed on the StackOverflow website:
If you visit the URL of a question on StackOverflow.com:
"https://stackoverflow.com/questions/10721603"
The website adds the name of the question to the end of the URL, so it turns into:
"https://stackoverflow.com/questions/10721603/grid-background-image-using-imagebrush"
This is great, I understand that this makes the URL more meaningful and is probably good as a technique for SEO.
What I wanted to Achieve after seeing this Implementation on StackOverflow
I wish to implement the same thing with my website. I am happy using a header() 301 redirect in order to achieve this, but I am attempting to come up with a tight script that will do the trick.
My Code so Far
Please see it working by clicking here
// Set the title of the page article (This could be from the database). Trimming any spaces either side
$original_name = trim(' How to get file creation & modification date/times in Python with-dash?');
// Replace any characters that are not A-Za-z0-9 or a dash with a space
$replace_strange_characters = preg_replace('/[^\da-z-]/i', " ", $original_name);
// Replace any spaces (or multiple spaces) with a single dash to make it URL friendly
$replace_spaces = preg_replace("/([ ]{1,})/", "-", $replace_strange_characters);
// Remove any trailing slashes
$removed_dashes = preg_replace("/^([\-]{0,})|([\-]{2,})|([\-]{0,})$/", "", $replace_spaces);
// Show the finished name on the screen
print_r($removed_dashes);
The Problem
I have created this code and it works fine by the looks of things, it makes the string URL friendly and readable to the human eye. However, it I would like to see if it is possible to simplify or "tightened it up" a bit... as I feel my code is probably over complicated.
It is not so much that I want it put onto one line, because I could do that by nesting the functions into one another, but I feel that there might be an overall simpler way of achieving it - I am looking for ideas.
In summary, the code achieves the following:
Removes any "strange" characters and replaces them with a space
Replaces any spaces with a dash to make it URL friendly
Returns a string without any spaces, with words separated with dashes and has no trailing spaces or dashes
String is readable (Doesn't contain percentage signs and + symbols like simply using urlencode()
Thanks for your help!
Potential Solutions
I found out whilst writing this that article, that I am looking for what is known as a URL 'slug' and they are indeed useful for SEO.
I found this library on Google code which appears to work well in the first instance.
There is also a notable question on this on SO which can be found here, which has other examples.
I tried to play with preg like you did. However it gets more and more complicated when you start looking at foreign languages.
What I ended up doing was simply trimming the title, and using urlencode
$url_slug = urlencode($title);
Also I had to add those:
$title = str_replace('/','',$title); //Apache doesn't like this character even encoded
$title = str_replace('\\','',$title); //Apache doesn't like this character even encoded
There are also 3rd party libraries such as: http://cubiq.org/the-perfect-php-clean-url-generator
Indeed, you can do that:
$original_name = ' How to get file creation & modification date/times in Python with-dash?';
$result = preg_replace('~[^a-z0-9]++~i', '-', $original_name);
$result = trim($result, '-');
To deal with other alphabets you can use this pattern instead:
~\P{Xan}++~u
or
~[^\pL\pN]++~u

Python 3 normalize URL

Alright, so apparently python 3 is pretty ridiculous when it comes to urllib.
So, I have an url like this formatted like so,
http_request = "http://localhost/system/index.php/index_file/store?cid={0}&cname={1}&fname={2}&fdir='{3}'"\
.format(client_id, client_name, each[1], each[2])
where each[1] and each[2] are the file names and file directories, respectively.
So a generated result of http_request through print() would give something like this,
http://localhost/system/index.php/index_file/store? \
cid=90823&cname=John Smith&fname=Sample Document.doc& \
fdir='C:\Users\williamyang\Desktop\Files\90823 Michelle Moore\Sample Document.doc'
(The purpose of the lone backslash is just so it fits here better. The actual code doesn't have lone backslashes at the end of each line.)
And that was perfectly fine if I enter that URL into a browser. The PHP app recieved all the indices through $_GET, then off to MySQL, no problems.
But if I let python do it,
PHP tells me indices $_GET['fname'] and $_GET['fdir'] Does not exist!!! What madness. Okay, then,
I tried everything from urllib.parse, urllib encoding and decoding, http_request.replace('\\', '/'), and many others.
None of which worked.
I was once told by my prof python does funny things when it comes to character encoding.
here is how I send my URL, before all the crazy and useless urllib parse experiments
def getResponseCode(url):
conn = urllib.request.urlopen((url))
return conn.read()
Where url = http_request
How can I go about solving this?
PHP says $_GET['fname'] and $_GET['fdir'] Does not exist
But when I paste the auto-generated http_request into a browser,
Everything is fine
URLs are not supposed to contain spaces. Your browser will automatically percent-encode URLs, replacing characters that shouldn't be in a URL with something like %20 or +, following the rules of URL escaping. Python won't do this automatically; most likely, the convenience introduces ambiguities that matter for programming, but don't bother the average web user. The Python tools for url escaping are urllib.quote and urllib.quote_plus; you probably want quote_plus. Pass the path component of the URL to urllib.quote_plus before sticking it to the domain name, and you should be good to go.
Solution for python 2:
How can I normalize a URL in python
Solution for python 3:
Ma wonky solution>
right after reading directories from os.walk() do var.replace(" ", "_")
on php end,
$var = str_replace('_', ' ', $_GET['var']);

Using ampersand in pretty URL breaks URL

I have seen plenty of people having this problem and it seems the only way to stop apache treating the encoded ampersand and a URL ampersand is it use the mod rewrite B flag, RewriteRule ^(.*)$ index.php?path=$1 [L,QSA,B].
However, this isn't available in earlier versions of apache and has to be installed which is also not supported by some hosting companies.
I have found a solution that works well for us. We have a url of /search/results/Takeaway+Foods/Inverchorachan,+Argyll+&+Bute+
This obviously breaks the url at & giving us /search/results/Takeaway+Foods/Inverchorachan,+Argyll which then gives a 404 error as there is no such page.
The url is held in the $_GET['url'] array. If it finds an & the it splits the array for each ampersand.
The following code pieces the URL back together by traversing the $_GET array for each piece.
I would like to know if this has any hidden problems that I may not be aware of.
The code:
$newurl = "";
foreach($_GET as $key=>$pcs) {
if($newurl=="")
$newurl = $pcs;
else
$newurl .= "& ".rtrim($key,"_");
}
//echo $newurl;exit;
if($newurl!='') $url=$newurl;
I am trimming the underscore from the piece as apache added this. Not sure why but any help on this would be great.
You said in a cooment:
We want the URL to show the ampersand so substituting with other characters is not an option.
Short answer: Don't do it.
Seriously, don't use ampersands this way in URLs. Even if looks pretty. Ampersands have a special meaning in a URL and trying to override that meaning because it looks nice is a very bad idea.
Most web-based software (including Apache, PHP and all browsers) makes assumptions about what an ampersand means in a URL, which you will find very hard to work around.
In particular, you will utterly confuse Google and other search engines if you've got arbitrary ampersands in the URL, so it will completely destroy your SEO rank.
If you must have an ampersand in the string, use urlencoding to turn it into a URL-friendly %26. This won't look good in the user's URL string, but it will work as intended.
If that's not acceptable, then substitute something different for ampersands; maybe the word "and", or a character like and underscore, or perhaps just remove it from the string without a replacement.
All of these are common practice. Trying to force the URL to have an actual ampersand character in it is not common practice, and for very good reason.
Take a look at urlencode :
You can also replace the "&" char with something not breaking the URI and won't be interpreted by apache like the "|" char.
We have had this fix in place for two weeks now so I believe that this has solved the issue. I hope this will help someone with a similar issue as I searched for weeks for a solution outside of an apache upgrade to include the B flag. Our users can now type in Bed & Breakfast and we can then serve the appropriate page.
Here is the fix in PHP.
$newurl = "";
foreach($_GET as $key=>$pcs)
{
if($newurl=="")
$newurl = $pcs;
else
$newurl .= "& ".rtrim($key,"_");
}
if($newurl!='') $url=$newurl;

Escaping white space : %20 or + - smarty

I am using smarty as a template engine. I have to escape an image file path {$filepath|urlencode}, the problem is that the white space are converted into a '+', which prevent the image to be reached on the server : %20 would work, how to escape correctly my path ?
Edit : more precisely, I use the facebook share link
I use a facebook share as so and it doesn't display the image when shared :
``
The final code looks like for my specific usage :
<a href="http://www.facebook.com/dialog/feed?app_id=...&link=http%3A%2F%2Fmysite.org%2Findex.php%3Fpage%3Dcampaign%26campaign_id%3D18&picture=http%3A%2F%2Fmysite.org%2Ffiles%2Fcampaign%2Fimage%2Foriginals%2F18%2FSans+titre-3.jpg&name=Some text "Text d'Text", Text&description=Rejoignez%20la%20campagne%21&redirect_uri=http%3A%2F%2Fmysite.org%2Findex.php%3Fpage%3Dcampaign%26campaign_id%3D18"onclick="window.open(this.href);return false;">
on the same site, all the facebook share link works perfectly and the image displays well ! Reason why I thought it was the link of that specific image that is not working
escape is what you're searching for. Take a look at:
http://www.smarty.net/docsv2/en/language.modifier.escape.tpl
{$filepath|escape:"url"}
urlencode is used to encode (not escape!) a string to be used as a query part inside an URL passed as GET var: http://php.net/manual/en/function.urlencode.php
URL encoded space is either a plus sign or %20. They are equivalent, and are both interpreted as a space on the server.
If you see either in the URL, then the server will see a space.
You say that the plus sign is preventing the image from being loaded. This sounds like a deeper problem than simply using the wrong encoding. Possibly it's being double-encoded?
What is the actual URL being requested in the browser? Open the dev tools/Firebug, and look at the requests to find out. If the URL includes %2B then the plus sign is being double-encoded. This is the problem you need to solve.
The other solution, of course, is not to use spaces in filenames on the web. The only reason one would want spaces in filenames is for readability, but since the web requires spaces to be urlencoded, it removes that readability anyway. Take away the spaces, and the problem will go away by itself.

PHP - Is $_GET still functional with spaces in the URL?

I've got this simple question... If you've got spaces in the URL $_GET, lets say
"page=about us", and if you then use this code IF($_GET['page']=='about us'). Would that work? Or do you have to use IF($_GET['page']=='about%20us')?
Thanks for your time!
Your browser converts the literal space in URLs into a + sign - before sending a HTTP request. When PHP sees that very + sign, it will become a space again for $_GET.
So, yes, it will work when comparing it against == "about us". Don't compare against the %20. (That's a valid, but unlikely encoding in URLs anyway.)
Sidenode. It's best not to rely on browser magic. When outputting the link, embed the + in place of spaces yourself.
Look at url_decode.

Categories