How can I replace ":" with "/" in slugify function? - php

I have a function which slugifies the text, it works well except that I need to replace ":" with "/". Currently it replaces all non-letter or digits with "-". Here it is :
function slugify($text)
{
// replace non letter or digits by -
$text = preg_replace('~[^\\pL\d]+~u', '-', $text);
// trim
$text = trim($text, '-');
// transliterate
if (function_exists('iconv'))
{
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
}
// lowercase
$text = strtolower($text);
// remove unwanted characters
$text = preg_replace('~[^-\w]+~', '', $text);
if (empty($text))
{
return 'n-a';
}
return $text;
}

I made just a couple modifications. I provided a search/replace set of arrays to let us replace most everything with -, but replace : with /:
$search = array( '~[^\\pL\d:]+~u', '~:~' );
$replace = array( '-', '/' );
$text = preg_replace( $search, $replace, $text);
And later on, this last preg_replace was replacing our / with an empty string. So I permited foward slashes in the character class.
$text = preg_replace('~[^-\w\/]+~', '', $text);
Which outputs the following:
// antiques/antiquities
echo slugify( "Antiques:Antiquities" );

Related

Special characters showing invalid

I am using a way to compress HTML on fly. Below is the function
function compress_page($buffer) {
$search = array(
'/\>[^\S ]+/s', /*strip whitespaces after tags, except space*/
'/[^\S ]+\</s', /*strip whitespaces before tags, except space*/
'/(\s)+/s', /*shorten multiple whitespace sequences*/
);
$replace = array(
'>',
'<',
'\\1',
);
$buffer = preg_replace($search, $replace, $buffer);
return $buffer;
}
function is working but the problem is, after implement this, germam characters are not showing anymore. They are showing like "�". Can you please help me to find problem.
I tried other ways to minify HTML but get same proble.
Maybe it's happen because you are not add Unicode flag support to regex.
Anyway I write a code to minified:
function sanitize_output($buffer, $type = null) {
$search = array(
'/\>[^\S ]+/s', // strip whitespaces after tags, except space
'/[^\S ]+\</s', // strip whitespaces before tags, except space
'/(\s)+/s', // shorten multiple whitespace sequences
'/<!--(.|\s)*?-->/', // Remove HTML comments
'#/\*(.|\s)*\*/#Uu' // Remove JS comments
);
$replace = array(
'>',
'<',
' ',
'',
''
);
if( $type == 'html' ){
// Remove quets of attributs
$search[] = '#(\w+=)(?:"|\')((\S|\.|\-|/|_|\(|\)|\w){1,8})(?:"|\')#u';
$replace[] = '$1$2';
// Remove spaces beetween tags
$search[] = '#(>)\s+(<)#mu';
$replace[] = '$1$2';
}
$buffer = str_replace( PHP_EOL, '', preg_replace( $search, $replace, $buffer ) );
return $buffer;
}
After research, I found this solution. This will minify full html in one line.
function pt_html_minyfy_finish( $html ) {
$html = preg_replace('/<!--(?!s*(?:[if [^]]+]|!|>))(?:(?!-->).)*-->/s', '', $html);
$html = str_replace(array("\r\n", "\r", "\n", "\t"), '', $html);
while ( stristr($html, ' '))
$html = str_replace(' ', ' ', $html);
return $html;
}
Hope this will help someone!

preg_replace remove all unwanted characters

i want to block or remove all unwanted characters from from my site
characters like ᾄҭᾄ
or нєℓℓσ
ĤĔĹĹŐ
etc..
my code now is
class badWordsC
{
public function check($text)
{
$badwords = 'com|net|org|info|.name|.biz|.me|.tv|.tel|.mobi|.asia|.uk|.eu|.us|.in|.tk|.cc|.ws|.bz|.mn|.co|.tw|.vn|.es|.pw|.club|.ca|.cn|.email|.photography|.photos|.tips|.solutions|.center|.gallery|.kitchen|.land|.technology|.today|.academy|.computer|.shoes|.careers|.domains|.coffee|.link|.guru|.estate|.company|.bike|.clothing|.holdings|.plumbing|.singles|.ventures|.camera|.equipment|.graphics|.lighting|.construction|.contractors|.directory|.diamonds|.enterprises|.voyage|.recipes|.gift|.site|.ly|.gq|.cf|.ga|.ml|.tk|in|rb2';
$badwords .= 'type|ingoogle';
$badwords = explode('|', $badwords);
$goodwords = 'youtube.com|prntscr.com|az545221.vo.msecnd.net';
$goodwords .= 'wink|crying|fingerscrossed|blushing|wondering|inlove|evilgrin|yawning|puking|in';
$goodwords = explode('|', $goodwords);
$text = str_replace($goodwords, '', $text);
$text = trim(preg_replace('/\s\s+/', '', $text));
$text = preg_replace('/\P{L}+/u', '', $text);
foreach ($badwords as $word)
{
if (strpos($text, $word) !== false || strpos($text, strtoupper($word)) !== false)
{
return false;
}
}
$text = preg_replace("/[a-zA-Z0-9]/", '', $text);
$text = preg_replace(array('/)/','/(/','/;/','/-/','/+/','/لأ/','/لإ/','/لا/','/إ/','/أ/', '/ا/', '/ض/', '/ص/', '/ث/', '/ق/', '/ف/', '/غ/', '/ع/', '/ه/', '/خ/', '/ح/', '/ج/', '/د/', '/ش/', '/س/', '/ي/', '/ب/', '/ل/', '/ت/', '/ن/', '/م/', '/ك/', '/ط/', '/ئ/', '/ء/', '/ؤ/', '/ر/', '/ى/', '/ة/', '/و/', '/ز/', '/ظ/', '/ذ/', '/ـ/'), '', $text);
if($text != '')
{
return false;
}
return true;
}
}
its working but not bloking or removing characters like н Ĕ Ő
any idea ?
The u modifier you will need to use you also need to expand your character class to include the non-ascii characters.
I'd use:
/[[:alnum:]]/u
Regex Demo: https://regex101.com/r/iS1yZ2/2
That is a posix bracket, you can see more of those here, www.regular-expressions.info/posixbrackets.html.
Also in your second expression the + needs to be escaped (or put in a character class, there are some symbols putting in a character won't fix -, ], ^) because that is a quantifier. There is a PHP function that will escape special characters, preg_quote.

Convert URL to Slug With PHP

I'm using the below code to try and convert to slug and for some reason it's not echoing anything. I know I'm missing something extremely obvious. Am I not calling the function?
<?php
$string = "Can't You Convert This To A Slug?";
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
echo $string;
}
?>
You are echoing after the code exit from function.
try like this:
function clean_string($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}
$some = clean_string("Can't You Convert This To A Slug?");
echo $some;
Or like this:
function clean_me(&$string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}
$some = "Can't You Convert This To A Slug?";
clean_me($some);
echo $some;
<?php
$string = "Can't You Convert This To A Slug?";
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}
$string = clean($string);
echo $string;
?>

Remove non-ascii characters from string

I'm getting strange characters when pulling data from a website:
Â
How can I remove anything that isn't a non-extended ASCII character?
A more appropriate question can be found here:
PHP - replace all non-alphanumeric chars for all languages supported
A regex replace would be the best option. Using $str as an example string and matching it using :print:, which is a POSIX Character Class:
$str = 'aAÂ';
$str = preg_replace('/[[:^print:]]/', '', $str); // should be aA
What :print: does is look for all printable characters. The reverse, :^print:, looks for all non-printable characters. Any characters that are not part of the current character set will be removed.
Note: Before using this method, you must ensure that your current character set is ASCII. POSIX Character Classes support both ASCII and Unicode and will match only according to the current character set. As of PHP 5.6, the default charset is UTF-8.
You want only ASCII printable characters?
use this:
<?php
header('Content-Type: text/html; charset=UTF-8');
$str = "abqwrešđčžsff";
$res = preg_replace('/[^\x20-\x7E]/','', $str);
echo "($str)($res)";
Or even better, convert your input to utf8 and use phputf8 lib to translate 'not normal' characters into their ascii representation:
require_once('libs/utf8/utf8.php');
require_once('libs/utf8/utils/bad.php');
require_once('libs/utf8/utils/validation.php');
require_once('libs/utf8_to_ascii/utf8_to_ascii.php');
if(!utf8_is_valid($str))
{
$str=utf8_bad_strip($str);
}
$str = utf8_to_ascii($str, '' );
$clearstring=filter_var($rawstring, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH);
UPDATE:
FILTER_SANITIZE_STRING is deprecated since PHP 8.1
https://www.php.net/manual/en/migration81.deprecated.php#migration81.deprecated.filter
Kind of related, we had a web application that had to send data to a legacy system that could only deal with the first 128 characters of the ASCII character set.
Solution we had to use was something that would "translate" as many characters as possible into close-matching ASCII equivalents, but leave anything that could not be translated alone.
Normally I would do something like this:
<?php
// transliterate
if (function_exists('iconv')) {
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
}
?>
... but that replaces everything that can't be translated into a question mark (?).
So we ended up doing the following. Check at the end of this function for (commented out) php regex that just strips out non-ASCII characters.
<?php
public function cleanNonAsciiCharactersInString($orig_text) {
$text = $orig_text;
// Single letters
$text = preg_replace("/[∂άαáàâãªä]/u", "a", $text);
$text = preg_replace("/[∆лДΛдАÁÀÂÃÄ]/u", "A", $text);
$text = preg_replace("/[ЂЪЬБъь]/u", "b", $text);
$text = preg_replace("/[βвВ]/u", "B", $text);
$text = preg_replace("/[çς©с]/u", "c", $text);
$text = preg_replace("/[ÇС]/u", "C", $text);
$text = preg_replace("/[δ]/u", "d", $text);
$text = preg_replace("/[éèêëέëèεе℮ёєэЭ]/u", "e", $text);
$text = preg_replace("/[ÉÈÊË€ξЄ€Е∑]/u", "E", $text);
$text = preg_replace("/[₣]/u", "F", $text);
$text = preg_replace("/[НнЊњ]/u", "H", $text);
$text = preg_replace("/[ђћЋ]/u", "h", $text);
$text = preg_replace("/[ÍÌÎÏ]/u", "I", $text);
$text = preg_replace("/[íìîïιίϊі]/u", "i", $text);
$text = preg_replace("/[Јј]/u", "j", $text);
$text = preg_replace("/[ΚЌК]/u", 'K', $text);
$text = preg_replace("/[ќк]/u", 'k', $text);
$text = preg_replace("/[ℓ∟]/u", 'l', $text);
$text = preg_replace("/[Мм]/u", "M", $text);
$text = preg_replace("/[ñηήηπⁿ]/u", "n", $text);
$text = preg_replace("/[Ñ∏пПИЙийΝЛ]/u", "N", $text);
$text = preg_replace("/[óòôõºöοФσόо]/u", "o", $text);
$text = preg_replace("/[ÓÒÔÕÖθΩθОΩ]/u", "O", $text);
$text = preg_replace("/[ρφрРф]/u", "p", $text);
$text = preg_replace("/[®яЯ]/u", "R", $text);
$text = preg_replace("/[ГЃгѓ]/u", "r", $text);
$text = preg_replace("/[Ѕ]/u", "S", $text);
$text = preg_replace("/[ѕ]/u", "s", $text);
$text = preg_replace("/[Тт]/u", "T", $text);
$text = preg_replace("/[τ†‡]/u", "t", $text);
$text = preg_replace("/[úùûüџμΰµυϋύ]/u", "u", $text);
$text = preg_replace("/[√]/u", "v", $text);
$text = preg_replace("/[ÚÙÛÜЏЦц]/u", "U", $text);
$text = preg_replace("/[Ψψωώẅẃẁщш]/u", "w", $text);
$text = preg_replace("/[ẀẄẂШЩ]/u", "W", $text);
$text = preg_replace("/[ΧχЖХж]/u", "x", $text);
$text = preg_replace("/[ỲΫ¥]/u", "Y", $text);
$text = preg_replace("/[ỳγўЎУуч]/u", "y", $text);
$text = preg_replace("/[ζ]/u", "Z", $text);
// Punctuation
$text = preg_replace("/[‚‚]/u", ",", $text);
$text = preg_replace("/[`‛′’‘]/u", "'", $text);
$text = preg_replace("/[″“”«»„]/u", '"', $text);
$text = preg_replace("/[—–―−–‾⌐─↔→←]/u", '-', $text);
$text = preg_replace("/[ ]/u", ' ', $text);
$text = str_replace("…", "...", $text);
$text = str_replace("≠", "!=", $text);
$text = str_replace("≤", "<=", $text);
$text = str_replace("≥", ">=", $text);
$text = preg_replace("/[‗≈≡]/u", "=", $text);
// Exciting combinations
$text = str_replace("ыЫ", "bl", $text);
$text = str_replace("℅", "c/o", $text);
$text = str_replace("₧", "Pts", $text);
$text = str_replace("™", "tm", $text);
$text = str_replace("№", "No", $text);
$text = str_replace("Ч", "4", $text);
$text = str_replace("‰", "%", $text);
$text = preg_replace("/[∙•]/u", "*", $text);
$text = str_replace("‹", "<", $text);
$text = str_replace("›", ">", $text);
$text = str_replace("‼", "!!", $text);
$text = str_replace("⁄", "/", $text);
$text = str_replace("∕", "/", $text);
$text = str_replace("⅞", "7/8", $text);
$text = str_replace("⅝", "5/8", $text);
$text = str_replace("⅜", "3/8", $text);
$text = str_replace("⅛", "1/8", $text);
$text = preg_replace("/[‰]/u", "%", $text);
$text = preg_replace("/[Љљ]/u", "Ab", $text);
$text = preg_replace("/[Юю]/u", "IO", $text);
$text = preg_replace("/[fifl]/u", "fi", $text);
$text = preg_replace("/[зЗ]/u", "3", $text);
$text = str_replace("£", "(pounds)", $text);
$text = str_replace("₤", "(lira)", $text);
$text = preg_replace("/[‰]/u", "%", $text);
$text = preg_replace("/[↨↕↓↑│]/u", "|", $text);
$text = preg_replace("/[∞∩∫⌂⌠⌡]/u", "", $text);
//2) Translation CP1252.
$trans = get_html_translation_table(HTML_ENTITIES);
$trans['f'] = 'ƒ'; // Latin Small Letter F With Hook
$trans['-'] = array(
'…', // Horizontal Ellipsis
'˜', // Small Tilde
'–' // Dash
);
$trans["+"] = '†'; // Dagger
$trans['#'] = '‡'; // Double Dagger
$trans['M'] = '‰'; // Per Mille Sign
$trans['S'] = 'Š'; // Latin Capital Letter S With Caron
$trans['OE'] = 'Œ'; // Latin Capital Ligature OE
$trans["'"] = array(
'‘', // Left Single Quotation Mark
'’', // Right Single Quotation Mark
'›', // Single Right-Pointing Angle Quotation Mark
'‚', // Single Low-9 Quotation Mark
'ˆ', // Modifier Letter Circumflex Accent
'‹' // Single Left-Pointing Angle Quotation Mark
);
$trans['"'] = array(
'“', // Left Double Quotation Mark
'”', // Right Double Quotation Mark
'„', // Double Low-9 Quotation Mark
);
$trans['*'] = '•'; // Bullet
$trans['n'] = '–'; // En Dash
$trans['m'] = '—'; // Em Dash
$trans['tm'] = '™'; // Trade Mark Sign
$trans['s'] = 'š'; // Latin Small Letter S With Caron
$trans['oe'] = 'œ'; // Latin Small Ligature OE
$trans['Y'] = 'Ÿ'; // Latin Capital Letter Y With Diaeresis
$trans['euro'] = '€'; // euro currency symbol
ksort($trans);
foreach ($trans as $k => $v) {
$text = str_replace($v, $k, $text);
}
// 3) remove <p>, <br/> ...
$text = strip_tags($text);
// 4) & => & " => '
$text = html_entity_decode($text);
// transliterate
// if (function_exists('iconv')) {
// $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
// }
// remove non ascii characters
// $text = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $text);
return $text;
}
?>
I also think that the best solution might be to use a regular expression.
Here's my suggestion:
function convert_to_normal_text($text) {
$normal_characters = "a-zA-Z0-9\s`~!##$%^&*()_+-={}|:;<>?,.\/\"\'\\\[\]";
$normal_text = preg_replace("/[^$normal_characters]/", '', $text);
return $normal_text;
}
Then you can use it like this:
$before = 'Some "normal characters": Abc123!+, some ASCII characters: ABC+ŤĎ and some non-ASCII characters: Ąąśćł.';
$after = convert_to_normal_text($before);
echo $after;
Displays:
Some "normal characters": Abc123!+, some ASCII characters: ABC+ and some non-ASCII characters: .
I just had to add the header
header('Content-Type: text/html; charset=UTF-8');
This should be pretty straight forwards and no need for iconv function:
// Remove all characters that are not the separator, a-z, 0-9, or whitespace
$string = preg_replace('![^'.preg_quote('-').'a-z0-_9\s]+!', '', strtolower($string));
// Replace all separator characters and whitespace by a single separator
$string = preg_replace('!['.preg_quote('-').'\s]+!u', '-', $string);
My problem is solved
$text = 'Châu Thái Nhân 12/09/2022';
echo preg_replace('/[\x00-\x1F\x7F]/', '', $text);
//Châu Thái Nhân 12/09/2022
I think the best way to do something like this is by using ord() command. This way you will be able to keep characters written in any language. Just remember to first test your text's ord results. This will not work on unicode.
$name="βγδεζηΘKgfgebhjrf!##$%^&";
//this function will clear all non greek and english characters on greek-iso charset
function replace_characters($string)
{
$str_length=strlen($string);
for ($x=0;$x<$str_length;$x++)
{
$character=$string[$x];
if ((ord($character)>64 && ord($character)<91) || (ord($character)>96 && ord($character)<123) || (ord($character)>192 && ord($character)<210) || (ord($character)>210 && ord($character)<218) || (ord($character)>219 && ord($character)<250) || ord($character)==252 || ord($character)==254)
{
$new_string=$new_string.$character;
}
}
return $new_string;
}
//end function
$name=replace_characters($name);
echo $name;

PHP - a function to "sanitize" a string

is there any PHP function available that replaces spaces and underscores from a string with dashes?
Like:
Some Word
Some_Word
Some___Word
Some Word
Some ) # $ ^ Word
=> some-word
basically, the sanitized string should only contain a-z characters, numbers (0-9), and dashes (-).
This should produce the desired result:
$someword = strtolower(preg_replace("/[^a-z]+/i", "-", $theword));
<?php
function sanitize($s) {
// This RegEx removes any group of non-alphanumeric or dash
// character and replaces it/them with a dash
return strtolower(preg_replace('/[^a-z0-9-]+/i', '-', $s));
}
echo sanitize('Some Word') . "\n";
echo sanitize('Some_Word') . "\n";
echo sanitize('Some___Word') . "\n";
echo sanitize('Some Word') . "\n";
echo sanitize('Some ) # $ ^ Word') . "\n";
Output:
Some-Word
Some-Word
Some-Word
Some-Word
Some-Word
You might like to try preg_replace:
http://php.net/manual/en/function.preg-replace.php
Example from this page:
<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace($pattern, $replacement, $string);
//April1,2003
?>
You might like to try a search for "search friendly URLs with PHP" as there is quite a bit of documentation, example:
function friendlyURL($string){
$string = preg_replace("`\[.*\]`U","",$string);
$string = preg_replace('`&(amp;)?#?[a-z0-9]+;`i','-',$string);
$string = htmlentities($string, ENT_COMPAT, 'utf-8');
$string = preg_replace( "`&([a-z])(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig|quot|rsquo);`i","\\1", $string );
$string = preg_replace( array("`[^a-z0-9]`i","`[-]+`") , "-", $string);
return strtolower(trim($string, '-'));
}
and usage:
$myFriendlyURL = friendlyURL("Barca rejects FIFA statement on Olympics row");
echo $myFriendlyURL; // will echo barca-rejects-fifa-statement-on-olympics-row
Source: http://htmlblog.net/seo-friendly-url-in-php/
I found a few interesting solutions throughout the web.. note none of this is my code. Simply copied here in hopes of helping you build a custom function for your own app.
This has been copied from Chyrp. Should work well for your needs!
/**
* Function: sanitize
* Returns a sanitized string, typically for URLs.
*
* Parameters:
* $string - The string to sanitize.
* $force_lowercase - Force the string to lowercase?
* $anal - If set to *true*, will remove all non-alphanumeric characters.
*/
function sanitize($string, $force_lowercase = true, $anal = false) {
$strip = array("~", "`", "!", "#", "#", "$", "%", "^", "&", "*", "(", ")", "_", "=", "+", "[", "{", "]",
"}", "\\", "|", ";", ":", "\"", "'", "‘", "’", "“", "”", "–", "—",
"—", "–", ",", "<", ".", ">", "/", "?");
$clean = trim(str_replace($strip, "", strip_tags($string)));
$clean = preg_replace('/\s+/', "-", $clean);
$clean = ($anal) ? preg_replace("/[^a-zA-Z0-9]/", "", $clean) : $clean ;
return ($force_lowercase) ?
(function_exists('mb_strtolower')) ?
mb_strtolower($clean, 'UTF-8') :
strtolower($clean) :
$clean;
}
EDIT:
Even easier function I found! Just a few lines of code, fairly self-explanitory.
function slug($z){
$z = strtolower($z);
$z = preg_replace('/[^a-z0-9 -]+/', '', $z);
$z = str_replace(' ', '-', $z);
return trim($z, '-');
}
Not sure why #Dagon chose to leave a comment instead of an answer, but here's an expansion of his answer.
php's preg_replace function allows you to replace anything with anything else.
Here's an example for your case:
$input = "a word 435 (*^(*& HaHa";
$dashesOnly = preg_replace("#[^-a-zA-Z0-9]+#", "-", $input);
print $dashesOnly; // prints a-word-435-HaHa;
You can think of writing this piece of code with the help of regular expressions.
But I dont see any available functions which help you directly replace the " " with "-"

Categories