Best way to convert title into url compatible mode in PHP? - php

http://domain.name/1-As Low As 10% Downpayment, Free Golf Membership!!!
The above url will report 400 bad request,
how to convert such title to user friendly good request?

You may want to use a "slug" instead. Rather than using the verbatim title as the URL, you strtolower() and replace all non-alphanumeric characters with hyphens, then remove duplicate hyphens. If you feel like extra credit, you can strip out stopwords, too.
So "1-As Low As 10% Downpayment, Free Golf Membership!!!" becomes:
as-low-as-10-downpayment-free-gold-membership
Something like this:
function sluggify($url)
{
# Prep string with some basic normalization
$url = strtolower($url);
$url = strip_tags($url);
$url = stripslashes($url);
$url = html_entity_decode($url);
# Remove quotes (can't, etc.)
$url = str_replace('\'', '', $url);
# Replace non-alpha numeric with hyphens
$match = '/[^a-z0-9]+/';
$replace = '-';
$url = preg_replace($match, $replace, $url);
$url = trim($url, '-');
return $url;
}
You could probably shorten it with longer regexps but it's pretty straightforward as-is. The bonus is that you can use the same function to validate the query parameter before you run a query on the database to match the title, so someone can't stick silly things into your database.

See the first answer here URL Friendly Username in PHP?:
function Slug($string)
{
return strtolower(trim(preg_replace('~[^0-9a-z]+~i', '-', html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8')), ENT_QUOTES, 'UTF-8')), '-'));
}
$user = 'Alix Axel';
echo Slug($user); // alix-axel
$user = 'Álix Ãxel';
echo Slug($user); // alix-axel
$user = 'Álix----_Ãxel!?!?';
echo Slug($user); // alix-axel

You can use urlencode or rawurlencode... for example Wikipedia do that. See this link:
http://en.wikipedia.org/wiki/Ichigo_100%25
that's the php encoding for % = %25

I just create a gist with a useful slug function:
https://gist.github.com/ninjagab/11244087
You can use it to convert title to seo friendly url.
<?php
class SanitizeUrl {
public static function slug($string, $space="-") {
$string = utf8_encode($string);
if (function_exists('iconv')) {
$string = iconv('UTF-8', 'ASCII//TRANSLIT', $string);
}
$string = preg_replace("/[^a-zA-Z0-9 \-]/", "", $string);
$string = trim(preg_replace("/\\s+/", " ", $string));
$string = strtolower($string);
$string = str_replace(" ", $space, $string);
return $string;
}
}
$title = 'Thi is a test string with some "strange" chars ò à ù...';
echo SanitizeUrl::slug($title);
//this will output:
//thi-is-a-test-string-with-some-strange-chars-o-a-u

You could use the rawurlencode() function

To simplify just full the list of the variable $change_to and $to_change
<?php
// Just full the array list to make replacement complete
// In this space will change to _, à to just a
$to_change = [
' ', 'à', 'à', 'â','é', 'è', 'ê', 'ç', 'ù', 'ô', 'ö' // and so on
];
$change_to = [
'_', 'a', 'a', 'a', 'e', 'e', 'e','c', 'u', 'o', 'o' // and so on
];
$texts = 'This is my slug in êlàb élaboré par';
$page_id = str_replace($to_change, $change_to, $texts);

Related

remove an utf-8 text in string with str_replace

I try to remove an UTF-8 link in stirng
$old = array("سایت (english) :");
$new = array('');
$string = str_replace($old, $new, $string);
but no success ...can somebody please tell me my mistake?
Note I can remove pure non-english or pure english but not both in one text
try with this :
$string = mb_convert_encoding ($string, 'UTF-8');
$old = array (mb_convert_encoding ("سایت (english) :", 'UTF-8'));
$new = array ('');
$string = str_replace ($old, $new, $string);

Replacing characters in MIME encoded emails

I am looking for a way to simply replace characters with their ASCII counterparts in MIME encoded emails. I've written preliminary code below, but it seems like the str_replace commands I'm using will keep on going forever to catch all possible combinations. Is there a more efficient way to do this?
<?php
$strings = "=?utf-8?Q?UK=20Defence=20=2D=20Yes=2C=20Both=20Labour=20and=20Tory=20Need=20To=20Be=20Very=20Much=20Clearer=20On=20Defence?=";
function decodeString($input){
$space = array("=?utf-8?Q?","=?UTF-8?Q?", "=20","?=");
$hyphen = array("=E2=80=93","=2D");
$dotdotdot = "=E2=80=A6";
$pound = "=C2=A3";
$comma = "=2C";
$decode = str_replace($space, ' ', $input);
$decode = str_replace($hyphen, '-', $decode);
$decode = str_replace($pound, '£', $decode);
$decode = str_replace($comma, ',', $decode);
$decode = str_replace($dotdotdot, '...', $decode);
return $decode;
}
echo decodeString($strings);
?>
I figured it out - I have to pass $strings to the mb_decode_mimeheader() function.

Extracting words from a text using php

Hello friends have a little problem. I need to extract only the words of a text "anyone".
I tried to retrieve the words using strtok (), strstr (). some regular expressions, but only managed to extract some words.
The problem is complex due to the number of characters and symbols that can accompany the words.
The example text which must be extracted words. This is a sample text:
Main article: our 46,000 required, !but (1947-2011) mail#server.com March 8, 2014 Gutenberg's 34-DE 'a' 3,1415 Us: #unknown n go http://google.com or www.google.com and http://www.google.com (r) The 509th "composite" and; C-54 #dog v4.0 ¿as is done? ¿article... agriculture? x ¿cat? now! Hi!! (87 meters).
Sample text, for testing.
The result of extracting the text should be:
Main article our required but March Gutenberg's a go or and The composite and dog as is done article agriculture cat now Hi meters
Sample text for testing
The first function I wrote to facilitate the work
function PreText($text){
$text = str_replace("\n", ".", $text);
$text = str_replace("\r", ".", $text);
$text = str_replace("'", "", $text);
$text = str_replace("?", "", $text);
$text = str_replace("¿", "", $text);
$text = str_replace("(", "", $text);
$text = str_replace(")", "", $text);
$text = str_replace('"', "", $text);
$text = str_replace(';', "", $text);
$text = str_replace('!', "", $text);
$text = str_replace('<', "", $text);
$text = str_replace('>', "", $text);
$text = str_replace('#', "", $text);
$text = str_replace(",", "", $text);
$text = str_replace(".c", "", $text);
$text = str_replace(".C", "", $text);
return $text;
}
Split function:
function SplitWords($text){
$words = explode(" ", $text);
$ContWords = count($words);
for ($i = 0; $i < $ContWords; $i++){
if (ctype_alpha($words[$i])) {
$NewText .= $words[$i].", ";
}
}
return $NewText;
}
The program:
<?
include_once ('functions.php');
$text = "Main article: our 46,000 ...";
$text = PreText($text);
$text = SplitWords($text);
echo $text;
?>
Is that the code has a long way. We appreciate your help.
If I understand you correctly, you want to remove all non-letters from the string. I would use preg_replace
$text = "Main article: our 46,000...";
$text = preg_replace("/[^a-zA-Z' ]/","",$text);
This should remove everything that is not a letter, apostrophe or a space.
Try this almost your requirement
<?php
$text = <<<HEREDOC
Main article: our 46,000 required, !but (1947-2011) mail#server.com March 8, 2014 Gutenberg's 34-DE 'a' 3,1415 Us: #unknown n go http://google.com or www.google.com and
http://www.google.com (r) The 509th composite" and; C-54 #dog v4.0 ¿as is done? ¿article... agriculture? x ¿cat? now! Hi!! (87 meters). Sample text, for testing.
HEREDOC;
//replace all kind of URLs and emails from text
$url_email = "((https?|ftp)\:\/\/)?"; // SCHEME
$url_email .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)?"; // User and Pass
$url_email .= "([a-z0-9-.]*)\.([a-z]{2,4})"; // Host or IP
$url_email .= "(\:[0-9]{2,5})?"; // Port
$url_email .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path
$url_email .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?"; // GET Query
$url_email .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor
$text = preg_replace("/$url_email/","",$text);
//replace anything like Us: #unknown
$text = preg_replace("/Us:.?#\\w+/","",$text);
//replace all Non-Alpha characters
$text = preg_replace("/[^a-zA-Z' ]/","",$text);
echo $text;
?>

Add http:// to a link if it doesn't have it

I have this very simple url bbcoder which i wish to adjust so if the linked does not contain http:// to add it in, how can i do this?
$find = array(
"/\[url\=(.+?)\](.+?)\[\/url\]/is",
"/\[url\](.+?)\[\/url\]/is"
);
$replace = array(
"$2",
"$1"
);
$body = preg_replace($find, $replace, $body);
You can use a (http://)? to match the http:// if exists, and ignore the group result in 'replace to' pattern and use your own http:// , like this:
$find = array(
"/\[url\=(http://)?(.+?)\](.+?)\[\/url\]/is",
"/\[url\](http://)?(.+?)\[\/url\]/is"
);
$replace = array(
"$3",
"$2"
);
$body = preg_replace($find, $replace, $body);
if(strpos($string, 'http://') === FALSE) {
// add http:// to string
}
// I've added the http:// in the regex, to make it optional, but not remember it,
// than always add it in the replace
$find = array(
"/\[url\=(?:http://)(.+?)\](.+?)\[\/url\]/is",
"/\[url\](.+?)\[\/url\]/is"
);
$replace = array(
"$2",
"http://$1"
);
$body = preg_replace($find, $replace, $body);
If you would use a callback function and preg_replace_callback(), you can use something like this:
You can do that this way. It will always add 'http://', and than the string without 'http://'
$string = 'http://'. str_replace('http://', '', $string);

Slug URL generation function overriding the Ç

I have this function above to create url slugs from posts title, the problem is that the ç characther is not being converted to c. It is actually being override by the function.
Example post title: Coração de Pelúcia
The slug generated: coraao-de-pelucia
How can i fix this function to generate the slug like: coracao-de-pelucia
function generate_seo_link($input,$replace = '-',$remove_words = true,$words_array = array())
{
//make it lowercase, remove punctuation, remove multiple/leading/ending spaces
$return = trim(ereg_replace(' +',' ',preg_replace('/[^a-zA-Z0-9\s]/','',strtolower($input))));
//remove words, if not helpful to seo
//i like my defaults list in remove_words(), so I wont pass that array
if($remove_words) { $return = remove_words($return,$replace,$words_array); }
//convert the spaces to whatever the user wants
//usually a dash or underscore..
//...then return the value.
return str_replace(' ',$replace,$return);
}
You should use the iconv module and a function such as this one to do the conversion:
function url_safe($string){
$url = $string;
setlocale(LC_ALL, 'pt_BR'); // change to the one of your language
$url = iconv("UTF-8", "ASCII//TRANSLIT", $url);
$url = preg_replace('~[^\\pL0-9_]+~u', '-', $url);
$url = trim($url, "-");
$url = strtolower($url);
return $url;
}

Categories