Problem with cyrillic characters in friendly url - php

Here's the thing. I have friendly urls like
http://site.com/blog/read/мъдростта-на-вековете
http://site.com/blog/read/green-apple
The last segment is actually the friendly title of the blog article. The problem is when I try to pass that segment to the database, the cyrillic fonts turn into something like %D1%8A%D0%B4%D1%80%D0%BE%D1%81%D1%8 and couldn't match the database record. In the address bar in my browser it looks normal (мъдростта-на-вековете) but if I choose 'copy url location' the last segment again turns into these strange characters. I'm using CodeIgniter and everything is set to UTF-8.
Please help! :(

The text is just being encoded to fit the specification for URLs.
Echo out the data to a log to see what you are actually trying to pass to the database.
You should be able to decode it with urldecode.

The above answers are ok, but if you want to use routing with cyrillic it isn't enough. For example if you have http://site.com/блог/статия/мъдростта-на-вековете you will have to do something like this:
In config/routes.php: $route['блог/статия/(:any)'] = "blog/article/$1";
In system/core/URI.php , in the function _explode_segments(), you can change
$val = trim($this->_filter_uri($val));
to
$val = urldecode(trim($this->_filter_uri($val)));
This will solve the above problem plus controllers and functions.

Actually, Firefox is cheating you here: the URL actually is url-encoded, but is shown as if it wasn't. So copy-pasting and retrieving it on the server will have the URL encoded.
(Not sure if other browsers behave in the same way.)

Related

PHP - Getting numbers when reading korean characters

I have following url with some Korean characters at end of the url:
http://localhost/example/popular-tags/고양이
Now I am reading 고양이 from url like this which is used in Laravel5 for read data from url string:
$TagName = str_slug(Request::segment(2), "-");
but getting number eab3a0ec9691ec9db4 not 고양이 characters.
Any idea how to get Korean characters.
Thanks.
According to this, str_slug translates to ASCII.
You can't use Laravel's helpers to work with UTF-8 slugs. I've tried it and it has really big problems (in routes, helpers etc). I even wrote my own package to work with UTF-8. You can try it, cause if you want to get URLs like http://localhost/example/popular-tags/고양이 and you don't use !#$# and similar characters in URL, it will work for you. Or you can copy some slug creation code from it and use it inside you own project.
Also, you can simplify things and just use:
Route('example/popular-tags/{slug}' ...`
And in controller:
public function index($slug)
{
$data = Model::where('slug', $slug)->get();

How to fix funny characters coming from twitter API

Im using Twitter's RESTful API 1.1 and on odd occations usually when there is a URL embedded in the tweet it pulls through in funny charcters e.g.
#MyHandle_123 RT #ThinkAfricaFeed: Controversy & acrimony may surround Nigeria's country's federalist system but it may be the country's best option: htt…
I tried to call the function utf8_decode but its still renders funny characters in my browser.
Any idea's on how I can get the returned values to show correctly?
I was running into a similar problem, since you tried the utf8 decode and it didn't work, try this:
htmlentities($td->text, ENT_NOQUOTES, 'UTF-8');
where td is the object whose text or item is being referenced.
Hope that helps

Slugs for SEO using PHP - Appending name to end of URL

Something I have noticed on the StackOverflow website:
If you visit the URL of a question on StackOverflow.com:
"https://stackoverflow.com/questions/10721603"
The website adds the name of the question to the end of the URL, so it turns into:
"https://stackoverflow.com/questions/10721603/grid-background-image-using-imagebrush"
This is great, I understand that this makes the URL more meaningful and is probably good as a technique for SEO.
What I wanted to Achieve after seeing this Implementation on StackOverflow
I wish to implement the same thing with my website. I am happy using a header() 301 redirect in order to achieve this, but I am attempting to come up with a tight script that will do the trick.
My Code so Far
Please see it working by clicking here
// Set the title of the page article (This could be from the database). Trimming any spaces either side
$original_name = trim(' How to get file creation & modification date/times in Python with-dash?');
// Replace any characters that are not A-Za-z0-9 or a dash with a space
$replace_strange_characters = preg_replace('/[^\da-z-]/i', " ", $original_name);
// Replace any spaces (or multiple spaces) with a single dash to make it URL friendly
$replace_spaces = preg_replace("/([ ]{1,})/", "-", $replace_strange_characters);
// Remove any trailing slashes
$removed_dashes = preg_replace("/^([\-]{0,})|([\-]{2,})|([\-]{0,})$/", "", $replace_spaces);
// Show the finished name on the screen
print_r($removed_dashes);
The Problem
I have created this code and it works fine by the looks of things, it makes the string URL friendly and readable to the human eye. However, it I would like to see if it is possible to simplify or "tightened it up" a bit... as I feel my code is probably over complicated.
It is not so much that I want it put onto one line, because I could do that by nesting the functions into one another, but I feel that there might be an overall simpler way of achieving it - I am looking for ideas.
In summary, the code achieves the following:
Removes any "strange" characters and replaces them with a space
Replaces any spaces with a dash to make it URL friendly
Returns a string without any spaces, with words separated with dashes and has no trailing spaces or dashes
String is readable (Doesn't contain percentage signs and + symbols like simply using urlencode()
Thanks for your help!
Potential Solutions
I found out whilst writing this that article, that I am looking for what is known as a URL 'slug' and they are indeed useful for SEO.
I found this library on Google code which appears to work well in the first instance.
There is also a notable question on this on SO which can be found here, which has other examples.
I tried to play with preg like you did. However it gets more and more complicated when you start looking at foreign languages.
What I ended up doing was simply trimming the title, and using urlencode
$url_slug = urlencode($title);
Also I had to add those:
$title = str_replace('/','',$title); //Apache doesn't like this character even encoded
$title = str_replace('\\','',$title); //Apache doesn't like this character even encoded
There are also 3rd party libraries such as: http://cubiq.org/the-perfect-php-clean-url-generator
Indeed, you can do that:
$original_name = ' How to get file creation & modification date/times in Python with-dash?';
$result = preg_replace('~[^a-z0-9]++~i', '-', $original_name);
$result = trim($result, '-');
To deal with other alphabets you can use this pattern instead:
~\P{Xan}++~u
or
~[^\pL\pN]++~u

Escaping white space : %20 or + - smarty

I am using smarty as a template engine. I have to escape an image file path {$filepath|urlencode}, the problem is that the white space are converted into a '+', which prevent the image to be reached on the server : %20 would work, how to escape correctly my path ?
Edit : more precisely, I use the facebook share link
I use a facebook share as so and it doesn't display the image when shared :
``
The final code looks like for my specific usage :
<a href="http://www.facebook.com/dialog/feed?app_id=...&link=http%3A%2F%2Fmysite.org%2Findex.php%3Fpage%3Dcampaign%26campaign_id%3D18&picture=http%3A%2F%2Fmysite.org%2Ffiles%2Fcampaign%2Fimage%2Foriginals%2F18%2FSans+titre-3.jpg&name=Some text "Text d'Text", Text&description=Rejoignez%20la%20campagne%21&redirect_uri=http%3A%2F%2Fmysite.org%2Findex.php%3Fpage%3Dcampaign%26campaign_id%3D18"onclick="window.open(this.href);return false;">
on the same site, all the facebook share link works perfectly and the image displays well ! Reason why I thought it was the link of that specific image that is not working
escape is what you're searching for. Take a look at:
http://www.smarty.net/docsv2/en/language.modifier.escape.tpl
{$filepath|escape:"url"}
urlencode is used to encode (not escape!) a string to be used as a query part inside an URL passed as GET var: http://php.net/manual/en/function.urlencode.php
URL encoded space is either a plus sign or %20. They are equivalent, and are both interpreted as a space on the server.
If you see either in the URL, then the server will see a space.
You say that the plus sign is preventing the image from being loaded. This sounds like a deeper problem than simply using the wrong encoding. Possibly it's being double-encoded?
What is the actual URL being requested in the browser? Open the dev tools/Firebug, and look at the requests to find out. If the URL includes %2B then the plus sign is being double-encoded. This is the problem you need to solve.
The other solution, of course, is not to use spaces in filenames on the web. The only reason one would want spaces in filenames is for readability, but since the web requires spaces to be urlencoded, it removes that readability anyway. Take away the spaces, and the problem will go away by itself.

What to do with a community URL style like Last.FM or Wikipedia?

I'm trying to understand how I should work with characters in URLs, this because I'm building a site where the user can store content and go to the content's page by digiting it's name in the URL.
so, something like Wikipedia or Last.FM website.
I see in the site, user can write something likehttp://it.wikipedia.org/wiki/Trentemøller and the page of the artist can reached.
after the page is loaded, if I copy the URL i see written as:http://it.wikipedia.org/wiki/Trentemøller but if I paste it into a text editor, it will be pasted as
http://it.wikipedia.org/wiki/Trentem%C3%B8ller
so the char ø is pasted as %C3%B8
of course the same is for URLs like this (the page of the artist Takeshi Kobayashi)
http://www.last.fm/music/小林武史
http://www.last.fm/music/%E5%B0%8F%E6%9E%97%E6%AD%A6%E5%8F%B2
If I digit the first or the second, the page works in any case, why?
I think I should do something with the .htacces and mod_rewrite but I'm not sure, are the special chars automatically converted to the url special chars?
and then, how can I do to let PHP do the right query with the content name?
if I have a table like
table_users
- username
- age
- height
- weight
- sex
- email
- country
I'm able with mod_rewrite to write an address like http://mysite.com/user/bob to get the username bob from the table_users but what about http://mysite.com/user/小林武史?
here I show a simple example of what I think to do:
#.htaccess
RewriteEngine On
RewriteRule ^(user/)([a-zA-Z0-9_+-]+)([/]?)$ user.php?username=$2
<?php
// this is the page user.php
// this is the way I use to get the url value
print $_REQUEST["username"];
?>
this works, but it's limited to [a-zA-Z0-9_+-], how to be more compatible with all chars like the others without loss too much security?
Did someone know some way to avoid troubles?
Try urlencode and urldecode
Edit :
Here is Visualy the Description of url encoding and decoding
http://blog.neraliu.com/wp-content/uploads/2009/10/url-encoding.png
Most browsers urlencode() 小林武史 to %E5%B0%8F%E6%9E%97%E6%AD%A6%E5%8F%B2.
Reguarding your .htaccess mod_rewrite rules, have you considered using something like:
RewriteEngine On
RewriteRule ^(user/)(.+?)[/]?$ user.php?username=$2
As far as I understand every URL with not ASCII characters is mapped to unique ASCII based url. This is actually a feature on the client side. Please look at: http://kmeleon.sourceforge.net/bugs/viewbug.php?bugid=631 to see examples and link to RFCs coverting this one.

Categories