This is very strange and i cant find something similar on the internet.
I got a table of strings in greek characters that contains alot of special chars, so i wanted to remove 'em.
function clean($string) {
$string = preg_replace('/([$#!\?!\+\#\%\^\*\[\]\<\>\;\:\'\"\`\~\,\?\_\=\«\»])+/', ' ' ,$string);
$string = preg_replace('/\s+/', ' ',$string);
return $string;
}
$prok=clean($row['name']);
echo $row['name'].'-'.$prok;
This is working ok except when the character Π is inside the string.
If so the Π is replace with a questionmark.
Does anyone have an idea what the problem could be ??
You can try using mb_ereg_replace to support multibyte:
function clean($string) {
$string = mb_ereg_replace('/([$#!\?!\+\#\%\^\*\[\]\<\>\;\:\'\"\`\~\,\?\_\=\«\»])+/', ' ' ,$string);
$string = mb_ereg_replace('/\s+/', ' ',$string);
return $string;
}
$prok=clean($row['name']);
echo $row['name'].'-'.$prok;
Or use the /u modifier for unicode strings:
function clean($string) {
$string = preg_replace('/([$#!\?!\+\#\%\^\*\[\]\<\>\;\:\'\"\`\~\,\?\_\=\«\»])+/u', ' ' ,$string);
$string = preg_replace('/\s+/u', ' ',$string);
return $string;
}
$prok=clean($row['name']);
echo $row['name'].'-'.$prok;
Related
I have this code to find and replace some string if it's between some {{ and }}.
$string = 'This is my {{important keyword}}.';
$string = preg_replace('/{{(.*?)}}/', '$1', $string);
return $string;
How I can change the $1 part to have something like this:
important keyword
So the href needs to have the match item converted like a slug (words separated with a dash, no accent or special char).
Thanks.
You have to use preg_replace_callback() to allow change on matches in a function.
See also use($url) to allow the function to access to the external variable.
Code:
$url = 'https://example.com';
$string = 'This is my {{important keyword}}.';
$string = preg_replace_callback('/{{(.*?)}}/', function($matches) use ($url) {
$newURL = $url . '/' . str_replace(' ', '-', $matches[1]);
return '' . htmlentities($matches[1]) . '';
}, $string);
echo $string;
Output:
This is my important keyword.
I'd like to pad each multibyte character with spaces on either side. I can strip them out just fine, but I'd like to leave them in and just pad them.
For example: 👉😀👈 to 👉 😀 👈.
Using underscores to represent spaces: 👉😀👈 to _👉__😀__👈_
Use this monsterous-already-cooked regex:
$regex = "[\\x{fe00}-\\x{fe0f}\\x{2712}\\x{2714}\\x{2716}\\x{271d}\\x{2721}\\x{2728}\\x{2733}\\x{2734}\\x{2744}\\x{2747}\\x{274c}\\x{274e}\\x{2753}-\\x{2755}\\x{2757}\\x{2763}\\x{2764}\\x{2795}-\\x{2797}\\x{27a1}\\x{27b0}\\x{27bf}\\x{2934}\\x{2935}\\x{2b05}-\\x{2b07}\\x{2b1b}\\x{2b1c}\\x{2b50}\\x{2b55}\\x{3030}\\x{303d}\\x{1f004}\\x{1f0cf}\\x{1f170}\\x{1f171}\\x{1f17e}\\x{1f17f}\\x{1f18e}\\x{1f191}-\\x{1f19a}\\x{1f201}\\x{1f202}\\x{1f21a}\\x{1f22f}\\x{1f232}-\\x{1f23a}\\x{1f250}\\x{1f251}\\x{1f300}-\\x{1f321}\\x{1f324}-\\x{1f393}\\x{1f396}\\x{1f397}\\x{1f399}-\\x{1f39b}\\x{1f39e}-\\x{1f3f0}\\x{1f3f3}-\\x{1f3f5}\\x{1f3f7}-\\x{1f4fd}\\x{1f4ff}-\\x{1f53d}\\x{1f549}-\\x{1f54e}\\x{1f550}-\\x{1f567}\\x{1f56f}\\x{1f570}\\x{1f573}-\\x{1f579}\\x{1f587}\\x{1f58a}-\\x{1f58d}\\x{1f590}\\x{1f595}\\x{1f596}\\x{1f5a5}\\x{1f5a8}\\x{1f5b1}\\x{1f5b2}\\x{1f5bc}\\x{1f5c2}-\\x{1f5c4}\\x{1f5d1}-\\x{1f5d3}\\x{1f5dc}-\\x{1f5de}\\x{1f5e1}\\x{1f5e3}\\x{1f5ef}\\x{1f5f3}\\x{1f5fa}-\\x{1f64f}\\x{1f680}-\\x{1f6c5}\\x{1f6cb}-\\x{1f6d0}\\x{1f6e0}-\\x{1f6e5}\\x{1f6e9}\\x{1f6eb}\\x{1f6ec}\\x{1f6f0}\\x{1f6f3}\\x{1f910}-\\x{1f918}\\x{1f980}-\\x{1f984}\\x{1f9c0}\\x{3297}\\x{3299}\\x{a9}\\x{ae}\\x{203c}\\x{2049}\\x{2122}\\x{2139}\\x{2194}-\\x{2199}\\x{21a9}\\x{21aa}\\x{231a}\\x{231b}\\x{2328}\\x{2388}\\x{23cf}\\x{23e9}-\\x{23f3}\\x{23f8}-\\x{23fa}\\x{24c2}\\x{25aa}\\x{25ab}\\x{25b6}\\x{25c0}\\x{25fb}-\\x{25fe}\\x{2600}-\\x{2604}\\x{260e}\\x{2611}\\x{2614}\\x{2615}\\x{2618}\\x{261d}\\x{2620}\\x{2622}\\x{2623}\\x{2626}\\x{262a}\\x{262e}\\x{262f}\\x{2638}-\\x{263a}\\x{2648}-\\x{2653}\\x{2660}\\x{2663}\\x{2665}\\x{2666}\\x{2668}\\x{267b}\\x{267f}\\x{2692}-\\x{2694}\\x{2696}\\x{2697}\\x{2699}\\x{269b}\\x{269c}\\x{26a0}\\x{26a1}\\x{26aa}\\x{26ab}\\x{26b0}\\x{26b1}\\x{26bd}\\x{26be}\\x{26c4}\\x{26c5}\\x{26c8}\\x{26ce}\\x{26cf}\\x{26d1}\\x{26d3}\\x{26d4}\\x{26e9}\\x{26ea}\\x{26f0}-\\x{26f5}\\x{26f7}-\\x{26fa}\\x{26fd}\\x{2702}\\x{2705}\\x{2708}-\\x{270d}\\x{270f}]|\\x{23}\\x{20e3}|\\x{2a}\\x{20e3}|\\x{30}\\x{20e3}|\\x{31}\\x{20e3}|\\x{32}\\x{20e3}|\\x{33}\\x{20e3}|\\x{34}\\x{20e3}|\\x{35}\\x{20e3}|\\x{36}\\x{20e3}|\\x{37}\\x{20e3}|\\x{38}\\x{20e3}|\\x{39}\\x{20e3}|\\x{1f1e6}[\\x{1f1e8}-\\x{1f1ec}\\x{1f1ee}\\x{1f1f1}\\x{1f1f2}\\x{1f1f4}\\x{1f1f6}-\\x{1f1fa}\\x{1f1fc}\\x{1f1fd}\\x{1f1ff}]|\\x{1f1e7}[\\x{1f1e6}\\x{1f1e7}\\x{1f1e9}-\\x{1f1ef}\\x{1f1f1}-\\x{1f1f4}\\x{1f1f6}-\\x{1f1f9}\\x{1f1fb}\\x{1f1fc}\\x{1f1fe}\\x{1f1ff}]|\\x{1f1e8}[\\x{1f1e6}\\x{1f1e8}\\x{1f1e9}\\x{1f1eb}-\\x{1f1ee}\\x{1f1f0}-\\x{1f1f5}\\x{1f1f7}\\x{1f1fa}-\\x{1f1ff}]|\\x{1f1e9}[\\x{1f1ea}\\x{1f1ec}\\x{1f1ef}\\x{1f1f0}\\x{1f1f2}\\x{1f1f4}\\x{1f1ff}]|\\x{1f1ea}[\\x{1f1e6}\\x{1f1e8}\\x{1f1ea}\\x{1f1ec}\\x{1f1ed}\\x{1f1f7}-\\x{1f1fa}]|\\x{1f1eb}[\\x{1f1ee}-\\x{1f1f0}\\x{1f1f2}\\x{1f1f4}\\x{1f1f7}]|\\x{1f1ec}[\\x{1f1e6}\\x{1f1e7}\\x{1f1e9}-\\x{1f1ee}\\x{1f1f1}-\\x{1f1f3}\\x{1f1f5}-\\x{1f1fa}\\x{1f1fc}\\x{1f1fe}]|\\x{1f1ed}[\\x{1f1f0}\\x{1f1f2}\\x{1f1f3}\\x{1f1f7}\\x{1f1f9}\\x{1f1fa}]|\\x{1f1ee}[\\x{1f1e8}-\\x{1f1ea}\\x{1f1f1}-\\x{1f1f4}\\x{1f1f6}-\\x{1f1f9}]|\\x{1f1ef}[\\x{1f1ea}\\x{1f1f2}\\x{1f1f4}\\x{1f1f5}]|\\x{1f1f0}[\\x{1f1ea}\\x{1f1ec}-\\x{1f1ee}\\x{1f1f2}\\x{1f1f3}\\x{1f1f5}\\x{1f1f7}\\x{1f1fc}\\x{1f1fe}\\x{1f1ff}]|\\x{1f1f1}[\\x{1f1e6}-\\x{1f1e8}\\x{1f1ee}\\x{1f1f0}\\x{1f1f7}-\\x{1f1fb}\\x{1f1fe}]|\\x{1f1f2}[\\x{1f1e6}\\x{1f1e8}-\\x{1f1ed}\\x{1f1f0}-\\x{1f1ff}]|\\x{1f1f3}[\\x{1f1e6}\\x{1f1e8}\\x{1f1ea}-\\x{1f1ec}\\x{1f1ee}\\x{1f1f1}\\x{1f1f4}\\x{1f1f5}\\x{1f1f7}\\x{1f1fa}\\x{1f1ff}]|\\x{1f1f4}\\x{1f1f2}|\\x{1f1f5}[\\x{1f1e6}\\x{1f1ea}-\\x{1f1ed}\\x{1f1f0}-\\x{1f1f3}\\x{1f1f7}-\\x{1f1f9}\\x{1f1fc}\\x{1f1fe}]|\\x{1f1f6}\\x{1f1e6}|\\x{1f1f7}[\\x{1f1ea}\\x{1f1f4}\\x{1f1f8}\\x{1f1fa}\\x{1f1fc}]|\\x{1f1f8}[\\x{1f1e6}-\\x{1f1ea}\\x{1f1ec}-\\x{1f1f4}\\x{1f1f7}-\\x{1f1f9}\\x{1f1fb}\\x{1f1fd}-\\x{1f1ff}]|\\x{1f1f9}[\\x{1f1e6}\\x{1f1e8}\\x{1f1e9}\\x{1f1eb}-\\x{1f1ed}\\x{1f1ef}-\\x{1f1f4}\\x{1f1f7}\\x{1f1f9}\\x{1f1fb}\\x{1f1fc}\\x{1f1ff}]|\\x{1f1fa}[\\x{1f1e6}\\x{1f1ec}\\x{1f1f2}\\x{1f1f8}\\x{1f1fe}\\x{1f1ff}]|\\x{1f1fb}[\\x{1f1e6}\\x{1f1e8}\\x{1f1ea}\\x{1f1ec}\\x{1f1ee}\\x{1f1f3}\\x{1f1fa}]|\\x{1f1fc}[\\x{1f1eb}\\x{1f1f8}]|\\x{1f1fd}\\x{1f1f0}|\\x{1f1fe}[\\x{1f1ea}\\x{1f1f9}]|\\x{1f1ff}[\\x{1f1e6}\\x{1f1f2}\\x{1f1fc}]";
Inside a preg_replace_callback():
var_dump(preg_replace_callback("#$regex#u", function($match) {
return $match[0]." ";
}, '👉😀👈'));
Outputs:
string(18) "👉 😀 👈 "
Live demo
I found this function that someone had added in the PHP docs that splits a multibyte string into an array of characters (like str_split) and modified it.
function addSpaces($string) {
$strlen = mb_strlen($string);
$new_string = '';
while ($strlen) {
$char = mb_substr($string,0,1,"UTF-8");
if (strlen($char) > 1) {
$new_string .= " $char ";
} else {
$new_string .= $char;
}
$string = mb_substr($string,1,$strlen,"UTF-8");
$strlen = mb_strlen($string);
}
return $new_string;
}
This question has other ways to do that split that could be similarly modified. The modification is, if strlen of one of the split characters is greater than 1, then it's multibyte, so add the spaces.
Simple regex replace could work as well...
mb_regex_encoding("UTF-8");
echo mb_ereg_replace(
'([^\p{L}\s])',
' \\1 ',
'text 👉😀👈 other text 👉😀👈'
);
outputs: text 👉 😀 👈 other text 👉 😀 👈
function pad_emojis($string) {
$default_encoding = mb_regex_encoding();
mb_regex_encoding("UTF-8");
$string = mb_ereg_replace('([^\p{L}\s])', ' \\1 ', $string);
mb_regex_encoding($default_encoding);
return $string;
}
I have clean function for remove special caracter from string but that function also removing Turkish caracter (ı,ğ,ş,ç,ö) from string
function clean($string) {
$string = str_replace(' ', ' ', $string);
$string = preg_replace('/[^A-Za-z0-9\-]/', ' ', $string);
return preg_replace('/-+/', '-', $string);
}
How can I fix it ?
Add those characters you want to keep to preg, also add Upper cases if neededç I edited your code:
function clean($string) {
$string = str_replace(' ', ' ', $string);
$string = preg_replace('/[^A-Za-z0-9\-ığşçöüÖÇŞİıĞ]/', ' ', $string);
return preg_replace('/-+/', '-', $string);
}
Test:
$str='Merhaba=Türkiye 12345 çok çalış another one ! *, !#_';
var_dump(clean($str));
//Output: string(57) "Merhaba Türkiye 12345 çok çalış another one "
You can use iconv to replacing special characters like à->a, è->e
<?php
$string = "ʿABBĀSĀBĀD";
echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $string);
// output: [nothing, and you get a notice]
echo iconv('UTF-8', 'ISO-8859-1//IGNORE', $string);
// output: ABBSBD
echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $string);
// output: ABBASABAD
// Yay! That's what I wanted!
?>
Credits:
https://gist.github.com/swas/10643194
#dmp y
#Nisse Engström
Maybe you can try:
function clean($string) {
$string = str_replace(' ', ' ', $string);
$string = preg_replace('/[^A-Za-z0-9ĞİŞığşçö\-]/', ' ', $string);
return preg_replace('/-+/', '-', $string);
}
Which special characters you want to replace?
Maybe be it'll be easier to change a paradigm of cleaning from everything except ... to something concrete.
<?php
function garbagereplace($string) {
$garbagearray = array('#','#','$','%','^','&','*');
$garbagecount = count($garbagearray);
for ($i=0; $i<$garbagecount; $i++) {
$string = str_replace($garbagearray[$i], '-', $string);
}
return $string;
}
echo garbagereplace('text##$text%^&*text');
?>
I have a function that replace all hashtag with hrefs.
function hash_me($ret) {
$ret = preg_replace('/(\#)([^\s]+)/', ' #$2 ', $ret);
}
It works well. It will return the string(and the rest non-hashtags words)with hashtags as links.
The thing is that i want to replace with hrefs only hashtags that contain english characters.Non-english hashtags should be ignored.
How can i merge/fit :
preg_match('/#[^a-z\d]/i',$da_string)
with the above function?
Thank you!
You can use the unicode character class Latin:
function hash_me($ret) {
$ret = preg_replace('/#([\p{Latin}0-9]+)/', ' $0 ', $ret);
}
But keep in mind that Latin and english are two things different.
For only english characters:
function hash_me($ret) {
$ret = preg_replace('/#([a-z0-9]+)/i', ' $0 ', $ret);
}
or shorter:
function hash_me($ret) {
$ret = preg_replace('/#([^\W_]+)/', ' $0 ', $ret);
}
I'm using the below code to try and convert to slug and for some reason it's not echoing anything. I know I'm missing something extremely obvious. Am I not calling the function?
<?php
$string = "Can't You Convert This To A Slug?";
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
echo $string;
}
?>
You are echoing after the code exit from function.
try like this:
function clean_string($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}
$some = clean_string("Can't You Convert This To A Slug?");
echo $some;
Or like this:
function clean_me(&$string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}
$some = "Can't You Convert This To A Slug?";
clean_me($some);
echo $some;
<?php
$string = "Can't You Convert This To A Slug?";
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}
$string = clean($string);
echo $string;
?>