best way to escape and create a slug [duplicate] - php

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
URL Friendly Username in PHP?
im somehow confused in using proper functions to escape and create a slug
i used this :
$slug_title = mysql_real_escape_string()($mtitle);
but someone told me not to use it and use urlencode()
which one is better for slugs and security
as i can see in SO , it inserts - between words :
https://stackoverflow.com/questions/941270/validating-a-slug-in-django

Using either MySQL or URL escaping is not the way to go.
Here is an article that does it better:
function toSlug($string,$space="-") {
if (function_exists('iconv')) {
$string = #iconv('UTF-8', 'ASCII//TRANSLIT', $string);
}
$string = preg_replace("/[^a-zA-Z0-9 -]/", "", $string);
$string = strtolower($string);
$string = str_replace(" ", $space, $string);
return $string;
}
This also works correctly for accented characters.

mysql_real_escape_string() has different purpose than urlencode() which both aren't appropriate for creating a slug.
A slug is supposed to be a clear & meaningful phrase that concisely describes the page.
mysql_real_escape_string() escapes dangerous characters that can change the purpose of the original query string.
urlencode() escapes invalid URL characters with "%" followed by 2 hex digits that represents their code (e.g. %20 for space). This way, the resulting string will not be clear & meaningful, because of the unpleasant characters sequences, e.g. http://www.domain.com/bad%20slug%20here%20%3C--
Thus any characters which may be affected by urlencode() should be omitted, except for spaces that are usually replaced with -.

Related

Remove apostrophe from a string using php [duplicate]

This question already has answers here:
How do I replace certain parts of my string?
(5 answers)
Closed last month.
Is there a anyway to remove apostrophe from a string in php?
example:- If string is Mc'win then it should be shown as Mcwin
$Str = "Mc'win";
/*
some code to remove the apostrophe
*/
echo $Str; // should display Mcwin
You can use str_replace.
$Str = str_replace('\'', '', $Str);
or
$Str = str_replace("'", '', $Str);
This will replace all apostrophes with nothing (the 2nd argument) from $Str. The first example escapes the apostrophe so str_replace will recognize it as the character to replace and not part of the enclosure.
If your variable has been sanitized, you may be frustrated to find you can't remove the apostrophe using $string = str_replace("'","",$string);
$string = "A'bcde";
$string = filter_var($string, FILTER_SANITIZE_STRING);
echo $string." = string after sanitizing (looks the same as origin)<br>";
$string = str_replace("'","",$string);
echo $string." ... that's odd, still has the apostrophe!<br>";
This is because sanitizing converts the apostrophe to ' , but you may not notice this, because if you echo the string it looks the same as the origin string.
You need to modify your replace search characters to '
which works after sanitizing.
$string = str_replace("'","",$string);
in my case, i got single quote issue when i wanted to store it to database (in my case MySQL). So, i remove the single quotes using this method
str_replace("'", "", trim($_GET["message"]))
But, the problems comes. Some data required us to have single quotes. So, instead of removing the quotes I try to save the single quotes (escaping single quotes) so it can be used in the future (in my case at Android)
My Idea is to replace from ' to ''.
So here is the final
$content = str_replace("'", "''", trim($_GET["message"])); // double quotes for escape single quotes
This answer is for someone that persist problem like me. I got better solution. Cheers!

Convert text to hyphen-separated string (slug) including other custom replacements

I want to make a hyphen-separated string (for use in the URL) based on the user-submitted title of the post.
Suppose if the user entered the title of the post as:
$title = "USA is going to deport indians -- Breaking News / News India";
I want to convert it as below
$slug = usa-is-going-to-deport-indians-breaking-news-news-india";
There could be some more characters that I also want to be converted. For Example '&' to 'and' and '#', '%', to hyphen(-).
One of the ways that I tried was to use the str_replace() function, but with this method I have to call str_replace() too many times and it is time consuming.
One more problem is there could be more than one hyphen (-) in the title string, I want to convert more than one hyphens (-) to one hyphen(-).
Is there any robust and efficient way to solve this problem?
You can use preg_replace function to do this :
Input :
$string = "USA is going to deport indians -- Breaking News / News India";
$string = preg_replace("/[^\w]+/", "-", $string);
echo strtolower($string);
Output :
usa-is-going-to-deport-indians-breaking-news-news-india
I would suggest using the sanitize_title() function
check the documentation
There are three steps in this task (creating a "slug" string); each requires a separate pass over the input string.
Cast all characters to lowercase.
Replace ampersand symbols with [space]and[space] to ensure that the symbol is not consumed by a later replacement AND the replacement "and" is not prepended or appended to its neighboring words.
Replace sequences of one or more non-alphanumeric characters with a literal hyphen.
Multibyte-safe Code: (Demo)
$title = "ÛŞÃ is going to dèport 80% öf indians&citizens are #concerned -- Breaking News / News India";
echo preg_replace(
'/[^\pL\pN]+/u',
'-',
str_replace(
'&',
' and ',
mb_strtolower($title)
)
);
Output:
ûşã-is-going-to-dèport-80-öf-indians-and-citizens-are-concerned-breaking-news-news-india
Note that the replacement in str_replace() could be done within the preg_replace() call by forming an array of find strings and an array of replacement strings. However, this may be false economy -- although there would be fewer function calls, the more expensive regex-based function call would make two passes over the entire string.
If you wish to convert accented characters to ASCII characters, then perhaps read the different techniques at Convert accented characters to their plain ascii equivalents.
If you aren't worries about multibyte characters, then the simpler version of the same approach would be:
echo preg_replace(
'/[^a-z\d]+/',
'-',
str_replace(
'&',
' and ',
strtolower($title)
)
);
To mop up any leading or trailing hyphens in the result string, it may be a good idea to unconditionally call trim($resultstring, '-'). Demo
For a deeper dive on the subject of creating a slug string, read PHP function to make slug (URL string).

Is there a better way to strip/form this string in PHP?

I am currently using what appears to be a horribly complex and unnecessary solution to form a required string.
The string could have any punctuation and will include slashes.
As an example, this string:
Test Ripple, it\'s a comic book one!
Using my current method:
str_replace(" ", "-", trim(preg_replace('/[^a-z0-9]+/i', ' ', str_replace("'", "", stripslashes($string)))))
Returns the correct result:
Test-Ripple-its-a-comic-book-one
Here is a breakdown of what my current (poor) solution is doing in order to achieve the desired output:-
Strip all slashes from the string
remove any apostrophes with str_replace
remove any remaining punctuation using preg_replace and replace it with whitespace
Trim off any extra whitespace from the beginning/end of string which may have been caused by punctuation.
Replace all whitespace with '-'
But there must be a better and more efficient way. Can anyone help?
Personally it looks fine to me however I would make one small change.
Change
preg_replace("/[^a-z0-9]+/i"
to the following
preg_replace("/[^a-zA-Z0-9\s]/"

Checking for special characters [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
preg_match php special characters
As part of my register system I need to check for the existence of special characters In an variable. How can I perform this check? The person who gives the most precise answer gets best.
Assuming that you mean html entities when you say "special chars", you can use this:
<?php
$table = get_html_translation_table(HTML_ENTITIES, ENT_COMPAT, 'UTF-8');
$chars = implode('', array_keys($table));
if (preg_match("/[{$chars}]+/", $string) === 1) {
// special chars in string
}
get_html_translation_table gets all the possible html entities. If you only want the entities that the function htmlspecialchars converts, then you can pass HTML_SPECIALCHARS instead of HTML_ENTITIES. The return value of get_html_translation_table is an array of (html entity, escaped entity) pairs.
Next, we want to put all the html entities in a regular expression like [&"']+, which will match any substring containing one of the characters inside square brackets of length 1 or more. So we use array_keys to get the keys of the translation table (the unencoded html entities), and implode them together into a single string.
Then we put them into the regular expression and use preg_match to see if the string contains any of those characters. You can read more about regular expression syntax at the PHP docs.
$special_chars = // all the special characters you want to check for
$string = // the string you want to check for
if (preg_match('/'.$special_chars.'/', $string))
{
// special characters exist in the string.
}
Check the manual of preg_match for more details
A quick google search for "php special characters" brings up some good info:
htmlentities() - http://php.net/manual/en/function.htmlentities.php
htmlspecialchars() - http://php.net/manual/en/function.htmlspecialchars.php

What is the best way to clean a string for placement in a URL, like the question name on SO?

I'm looking to create a URL string like the one SO uses for the links to the questions. I am not looking at rewriting the url (mod_rewrite). I am looking at generating the link on the page.
Example: The question name is:
Is it better to use ob_get_contents() or $text .= ‘test’;
The URL ends up being:
http://stackoverflow.com/questions/292068/is-it-better-to-use-obgetcontents-or-text-test
The part I'm interested in is:
is-it-better-to-use-obgetcontents-or-text-test
So basically I'm looking to clean out anything that is not alphanumeric while still keeping the URL readable. I have the following created, but I'm not sure if it's the best way or if it covers all the possibilities:
$str = urlencode(
strtolower(
str_replace('--', '-',
preg_replace(array('/[^a-z0-9 ]/i', '/[^a-z0-9]/i'), array('', '-'),
trim($urlPart)))));
So basically:
trim
replace any non alphanumeric plus the space with nothing
then replace everything not alphanumeric with a dash
replace -- with -.
strtolower()
urlencode() -- probably not needed, but just for good measure.
As you pointed out already, urlencode() is not needed in this case and neither is trim(). If I understand correctly, step 4 is to avoid multiple dashes in a row, but it will not prevent more than two dashes. On the other hand, dashes connecting two words (like in "large-scale") will be removed by your solution while they seem to be preserved on SO.
I'm not sure that this is really the best way to do it, but here's my suggestion:
$str = strtolower(
preg_replace( array('/[^a-z0-9\- ]/i', '/[ \-]+/'), array('', '-'),
$urlPart ) );
So:
remove any character that is neither space, dash, nor alphanumeric
replace any consecutive number of spaces or dashes with a single dash
strtolower()

Categories