preg_replace only for non-english characters

preg_replace only for non-english characters - php

I have a function that replace all hashtag with hrefs.
function hash_me($ret) {
$ret = preg_replace('/(\#)([^\s]+)/', ' #$2 ', $ret);
}
It works well. It will return the string(and the rest non-hashtags words)with hashtags as links.
The thing is that i want to replace with hrefs only hashtags that contain english characters.Non-english hashtags should be ignored.
How can i merge/fit :
preg_match('/#[^a-z\d]/i',$da_string)
with the above function?
Thank you!

You can use the unicode character class Latin:
function hash_me($ret) {
$ret = preg_replace('/#([\p{Latin}0-9]+)/', ' $0 ', $ret);
}
But keep in mind that Latin and english are two things different.
For only english characters:
function hash_me($ret) {
$ret = preg_replace('/#([a-z0-9]+)/i', ' $0 ', $ret);
}
or shorter:
function hash_me($ret) {
$ret = preg_replace('/#([^\W_]+)/', ' $0 ', $ret);
}

Related

Regular expressions in php deletes dots, commas, etc

I have this code:
$words = ['mleko', 'masło'];
$words = explode(' ', $value); // create an array of words
foreach($words as $word) { //iterate through words
$word = preg_replace('/[^\w]/uis', '', $word);
if (in_array(mb_strtolower($word), $allergens)) {
$return .= "<b>" . $word . "</b> ";
} else {
$return .= $word . " ";
}
}
The above code works fine, but it deletes characters like:,. e.t.c.
How can I fix it? :)

Problem jest w podejściu jakie zastosowałeś. Nie tylko w linii
$word = preg_replace('/[^\w]/uis', '', $word);
which should be extended with the characters of the Polish alphabet (the range of the \w class is [a-zA-Z0-9_], remember to mention the range of lowercase and uppercase characters separately) like in this line
$word = preg_replace('/[^\wąćęłńóśźżĄĆĘŁŃÓŚŹŻ]/uis', '', $word);
Moreover, I believe that the above line is used incorrectly. In my opinion, you should save the result of this operation in another variable as below
$rawWord = preg_replace('/[^\wąćęłńóśźżĄĆĘŁŃÓŚŹŻ]/uis', '', $word);
Thanks to this, you have access to both the purified and the original value, which you can use in this way
if (in_array(mb_strtolower($rawWord), $allergens)) {
$return .= str_replace($rawWord, "<b>{$rawWord}</b> ", $word);
} else {
$return .= $word;
}
With this approach, however, you will still miss some characters. Even spaces that you filtered out with explode earlier. In my opinion, instead of concatenating a string, you should build an array and finally concatenate it with spaces. Complete code below.
$allergens = ['jogurt', 'jaja', 'żytni', "jogurt", "banan"];
$value = 'Chleb żytni, masło z mleka, jogurt naturalny z mleka, jaja, pieczeń rzymska z kaszą gryczaną.';
$returns = [];
$words = explode(' ', $value); // create an array of words
foreach($words as $word) { //iterate through words
$rawWord = preg_replace('/[^\wąćęłńóśźżĄĆĘŁŃÓŚŹŻ]/uis', '', $word);
if (in_array(mb_strtolower($rawWord), $allergens)) {
$returns[] = str_replace($rawWord, "<b>{$rawWord}</b>", $word);
} else {
$returns[] = $word;
}
}
$return = implode(' ', $returns);
Look at this line
$returns[] = str_replace($rawWord, "<b>{$rawWord}</b>", $word);
Replaces the original word (containing the characters you want to ignore) with cleaned and bold version of the word. This keeps all characters (like commas) stuck to the word.
In the $return variable at the end you will get something like this
Chleb żytni, masło z mleka, jogurt naturalny z mleka, jaja, pieczeń rzymska z kaszą gryczaną.

How can I pad each multibyte character / emoji with spaces around it in a string?

I'd like to pad each multibyte character with spaces on either side. I can strip them out just fine, but I'd like to leave them in and just pad them.
For example: 👉😀👈 to 👉 😀 👈.
Using underscores to represent spaces: 👉😀👈 to _👉__😀__👈_

Use this monsterous-already-cooked regex:
$regex = "[\\x{fe00}-\\x{fe0f}\\x{2712}\\x{2714}\\x{2716}\\x{271d}\\x{2721}\\x{2728}\\x{2733}\\x{2734}\\x{2744}\\x{2747}\\x{274c}\\x{274e}\\x{2753}-\\x{2755}\\x{2757}\\x{2763}\\x{2764}\\x{2795}-\\x{2797}\\x{27a1}\\x{27b0}\\x{27bf}\\x{2934}\\x{2935}\\x{2b05}-\\x{2b07}\\x{2b1b}\\x{2b1c}\\x{2b50}\\x{2b55}\\x{3030}\\x{303d}\\x{1f004}\\x{1f0cf}\\x{1f170}\\x{1f171}\\x{1f17e}\\x{1f17f}\\x{1f18e}\\x{1f191}-\\x{1f19a}\\x{1f201}\\x{1f202}\\x{1f21a}\\x{1f22f}\\x{1f232}-\\x{1f23a}\\x{1f250}\\x{1f251}\\x{1f300}-\\x{1f321}\\x{1f324}-\\x{1f393}\\x{1f396}\\x{1f397}\\x{1f399}-\\x{1f39b}\\x{1f39e}-\\x{1f3f0}\\x{1f3f3}-\\x{1f3f5}\\x{1f3f7}-\\x{1f4fd}\\x{1f4ff}-\\x{1f53d}\\x{1f549}-\\x{1f54e}\\x{1f550}-\\x{1f567}\\x{1f56f}\\x{1f570}\\x{1f573}-\\x{1f579}\\x{1f587}\\x{1f58a}-\\x{1f58d}\\x{1f590}\\x{1f595}\\x{1f596}\\x{1f5a5}\\x{1f5a8}\\x{1f5b1}\\x{1f5b2}\\x{1f5bc}\\x{1f5c2}-\\x{1f5c4}\\x{1f5d1}-\\x{1f5d3}\\x{1f5dc}-\\x{1f5de}\\x{1f5e1}\\x{1f5e3}\\x{1f5ef}\\x{1f5f3}\\x{1f5fa}-\\x{1f64f}\\x{1f680}-\\x{1f6c5}\\x{1f6cb}-\\x{1f6d0}\\x{1f6e0}-\\x{1f6e5}\\x{1f6e9}\\x{1f6eb}\\x{1f6ec}\\x{1f6f0}\\x{1f6f3}\\x{1f910}-\\x{1f918}\\x{1f980}-\\x{1f984}\\x{1f9c0}\\x{3297}\\x{3299}\\x{a9}\\x{ae}\\x{203c}\\x{2049}\\x{2122}\\x{2139}\\x{2194}-\\x{2199}\\x{21a9}\\x{21aa}\\x{231a}\\x{231b}\\x{2328}\\x{2388}\\x{23cf}\\x{23e9}-\\x{23f3}\\x{23f8}-\\x{23fa}\\x{24c2}\\x{25aa}\\x{25ab}\\x{25b6}\\x{25c0}\\x{25fb}-\\x{25fe}\\x{2600}-\\x{2604}\\x{260e}\\x{2611}\\x{2614}\\x{2615}\\x{2618}\\x{261d}\\x{2620}\\x{2622}\\x{2623}\\x{2626}\\x{262a}\\x{262e}\\x{262f}\\x{2638}-\\x{263a}\\x{2648}-\\x{2653}\\x{2660}\\x{2663}\\x{2665}\\x{2666}\\x{2668}\\x{267b}\\x{267f}\\x{2692}-\\x{2694}\\x{2696}\\x{2697}\\x{2699}\\x{269b}\\x{269c}\\x{26a0}\\x{26a1}\\x{26aa}\\x{26ab}\\x{26b0}\\x{26b1}\\x{26bd}\\x{26be}\\x{26c4}\\x{26c5}\\x{26c8}\\x{26ce}\\x{26cf}\\x{26d1}\\x{26d3}\\x{26d4}\\x{26e9}\\x{26ea}\\x{26f0}-\\x{26f5}\\x{26f7}-\\x{26fa}\\x{26fd}\\x{2702}\\x{2705}\\x{2708}-\\x{270d}\\x{270f}]|\\x{23}\\x{20e3}|\\x{2a}\\x{20e3}|\\x{30}\\x{20e3}|\\x{31}\\x{20e3}|\\x{32}\\x{20e3}|\\x{33}\\x{20e3}|\\x{34}\\x{20e3}|\\x{35}\\x{20e3}|\\x{36}\\x{20e3}|\\x{37}\\x{20e3}|\\x{38}\\x{20e3}|\\x{39}\\x{20e3}|\\x{1f1e6}[\\x{1f1e8}-\\x{1f1ec}\\x{1f1ee}\\x{1f1f1}\\x{1f1f2}\\x{1f1f4}\\x{1f1f6}-\\x{1f1fa}\\x{1f1fc}\\x{1f1fd}\\x{1f1ff}]|\\x{1f1e7}[\\x{1f1e6}\\x{1f1e7}\\x{1f1e9}-\\x{1f1ef}\\x{1f1f1}-\\x{1f1f4}\\x{1f1f6}-\\x{1f1f9}\\x{1f1fb}\\x{1f1fc}\\x{1f1fe}\\x{1f1ff}]|\\x{1f1e8}[\\x{1f1e6}\\x{1f1e8}\\x{1f1e9}\\x{1f1eb}-\\x{1f1ee}\\x{1f1f0}-\\x{1f1f5}\\x{1f1f7}\\x{1f1fa}-\\x{1f1ff}]|\\x{1f1e9}[\\x{1f1ea}\\x{1f1ec}\\x{1f1ef}\\x{1f1f0}\\x{1f1f2}\\x{1f1f4}\\x{1f1ff}]|\\x{1f1ea}[\\x{1f1e6}\\x{1f1e8}\\x{1f1ea}\\x{1f1ec}\\x{1f1ed}\\x{1f1f7}-\\x{1f1fa}]|\\x{1f1eb}[\\x{1f1ee}-\\x{1f1f0}\\x{1f1f2}\\x{1f1f4}\\x{1f1f7}]|\\x{1f1ec}[\\x{1f1e6}\\x{1f1e7}\\x{1f1e9}-\\x{1f1ee}\\x{1f1f1}-\\x{1f1f3}\\x{1f1f5}-\\x{1f1fa}\\x{1f1fc}\\x{1f1fe}]|\\x{1f1ed}[\\x{1f1f0}\\x{1f1f2}\\x{1f1f3}\\x{1f1f7}\\x{1f1f9}\\x{1f1fa}]|\\x{1f1ee}[\\x{1f1e8}-\\x{1f1ea}\\x{1f1f1}-\\x{1f1f4}\\x{1f1f6}-\\x{1f1f9}]|\\x{1f1ef}[\\x{1f1ea}\\x{1f1f2}\\x{1f1f4}\\x{1f1f5}]|\\x{1f1f0}[\\x{1f1ea}\\x{1f1ec}-\\x{1f1ee}\\x{1f1f2}\\x{1f1f3}\\x{1f1f5}\\x{1f1f7}\\x{1f1fc}\\x{1f1fe}\\x{1f1ff}]|\\x{1f1f1}[\\x{1f1e6}-\\x{1f1e8}\\x{1f1ee}\\x{1f1f0}\\x{1f1f7}-\\x{1f1fb}\\x{1f1fe}]|\\x{1f1f2}[\\x{1f1e6}\\x{1f1e8}-\\x{1f1ed}\\x{1f1f0}-\\x{1f1ff}]|\\x{1f1f3}[\\x{1f1e6}\\x{1f1e8}\\x{1f1ea}-\\x{1f1ec}\\x{1f1ee}\\x{1f1f1}\\x{1f1f4}\\x{1f1f5}\\x{1f1f7}\\x{1f1fa}\\x{1f1ff}]|\\x{1f1f4}\\x{1f1f2}|\\x{1f1f5}[\\x{1f1e6}\\x{1f1ea}-\\x{1f1ed}\\x{1f1f0}-\\x{1f1f3}\\x{1f1f7}-\\x{1f1f9}\\x{1f1fc}\\x{1f1fe}]|\\x{1f1f6}\\x{1f1e6}|\\x{1f1f7}[\\x{1f1ea}\\x{1f1f4}\\x{1f1f8}\\x{1f1fa}\\x{1f1fc}]|\\x{1f1f8}[\\x{1f1e6}-\\x{1f1ea}\\x{1f1ec}-\\x{1f1f4}\\x{1f1f7}-\\x{1f1f9}\\x{1f1fb}\\x{1f1fd}-\\x{1f1ff}]|\\x{1f1f9}[\\x{1f1e6}\\x{1f1e8}\\x{1f1e9}\\x{1f1eb}-\\x{1f1ed}\\x{1f1ef}-\\x{1f1f4}\\x{1f1f7}\\x{1f1f9}\\x{1f1fb}\\x{1f1fc}\\x{1f1ff}]|\\x{1f1fa}[\\x{1f1e6}\\x{1f1ec}\\x{1f1f2}\\x{1f1f8}\\x{1f1fe}\\x{1f1ff}]|\\x{1f1fb}[\\x{1f1e6}\\x{1f1e8}\\x{1f1ea}\\x{1f1ec}\\x{1f1ee}\\x{1f1f3}\\x{1f1fa}]|\\x{1f1fc}[\\x{1f1eb}\\x{1f1f8}]|\\x{1f1fd}\\x{1f1f0}|\\x{1f1fe}[\\x{1f1ea}\\x{1f1f9}]|\\x{1f1ff}[\\x{1f1e6}\\x{1f1f2}\\x{1f1fc}]";
Inside a preg_replace_callback():
var_dump(preg_replace_callback("#$regex#u", function($match) {
return $match[0]." ";
}, '👉😀👈'));
Outputs:
string(18) "👉 😀 👈 "
Live demo

I found this function that someone had added in the PHP docs that splits a multibyte string into an array of characters (like str_split) and modified it.
function addSpaces($string) {
$strlen = mb_strlen($string);
$new_string = '';
while ($strlen) {
$char = mb_substr($string,0,1,"UTF-8");
if (strlen($char) > 1) {
$new_string .= " $char ";
} else {
$new_string .= $char;
}
$string = mb_substr($string,1,$strlen,"UTF-8");
$strlen = mb_strlen($string);
}
return $new_string;
}
This question has other ways to do that split that could be similarly modified. The modification is, if strlen of one of the split characters is greater than 1, then it's multibyte, so add the spaces.

Simple regex replace could work as well...
mb_regex_encoding("UTF-8");
echo mb_ereg_replace(
'([^\p{L}\s])',
' \\1 ',
'text 👉😀👈 other text 👉😀👈'
);
outputs: text 👉 😀 👈 other text 👉 😀 👈
function pad_emojis($string) {
$default_encoding = mb_regex_encoding();
mb_regex_encoding("UTF-8");
$string = mb_ereg_replace('([^\p{L}\s])', ' \\1 ', $string);
mb_regex_encoding($default_encoding);
return $string;
}

Regular expression change the text, when it shouldnt

This is very strange and i cant find something similar on the internet.
I got a table of strings in greek characters that contains alot of special chars, so i wanted to remove 'em.
function clean($string) {
$string = preg_replace('/([$#!\?!\+\#\%\^\*\[\]\<\>\;\:\'\"\`\~\,\?\_\=\«\»])+/', ' ' ,$string);
$string = preg_replace('/\s+/', ' ',$string);
return $string;
}
$prok=clean($row['name']);
echo $row['name'].'-'.$prok;
This is working ok except when the character Π is inside the string.
If so the Π is replace with a questionmark.
Does anyone have an idea what the problem could be ??

You can try using mb_ereg_replace to support multibyte:
function clean($string) {
$string = mb_ereg_replace('/([$#!\?!\+\#\%\^\*\[\]\<\>\;\:\'\"\`\~\,\?\_\=\«\»])+/', ' ' ,$string);
$string = mb_ereg_replace('/\s+/', ' ',$string);
return $string;
}
$prok=clean($row['name']);
echo $row['name'].'-'.$prok;
Or use the /u modifier for unicode strings:
function clean($string) {
$string = preg_replace('/([$#!\?!\+\#\%\^\*\[\]\<\>\;\:\'\"\`\~\,\?\_\=\«\»])+/u', ' ' ,$string);
$string = preg_replace('/\s+/u', ' ',$string);
return $string;
}
$prok=clean($row['name']);
echo $row['name'].'-'.$prok;

Regular expression for finding multiple patterns from a given string

I am using regular expression for getting multiple patterns from a given string.
Here, I will explain you clearly.
$string = "about us";
$newtag = preg_replace("/ /", "_", $string);
print_r($newtag);
The above is my code.
Here, i am finding the space in a word and replacing the space with the special character what ever i need, right??
Now, I need a regular expression that gives me patterns like
about_us, about-us, aboutus as output if i give about us as input.
Is this possible to do.
Please help me in that.
Thanks in advance!

And finally, my answer is
$string = "contact_us";
$a = array('-','_',' ');
foreach($a as $b){
if(strpos($string,$b)){
$separators = array('-','_','',' ');
$outputs = array();
foreach ($separators as $sep) {
$outputs[] = preg_replace("/".$b."/", $sep, $string);
}
print_r($outputs);
}
}
exit;

You need to do a loop to handle multiple possible outputs :
$separators = array('-','_','');
$string = "about us";
$outputs = array();
foreach ($separators as $sep) {
$outputs[] = preg_replace("/ /", $sep, $string);
}
print_r($outputs);

You can try without regex:
$string = 'about us';
$specialChar = '-'; // or any other
$newtag = implode($specialChar, explode(' ', $string));
If you put special characters into an array:
$specialChars = array('_', '-', '');
$newtags = array();
foreach ($specialChars as $specialChar) {
$newtags[] = implode($specialChar, explode(' ', $string));
}
Also you can use just str_replace()
foreach ($specialChars as $specialChar) {
$newtags[] = str_replace(' ', $specialChar, $string);
}

Not knowing exactly what you want to do I expect that you might want to replace any occurrence of a non-word (1 or more times) with a single dash.
e.g.
preg_replace('/\W+/', '-', $string);

If you just want to replace the space, use \s
<?php
$string = "about us";
$replacewith = "_";
$newtag = preg_replace("/\s/", $replacewith, $string);
print_r($newtag);
?>

I am not sure that regexes are the good tool for that. However you can simply define this kind of function:
function rep($str) {
return array( strtr($str, ' ', '_'),
strtr($str, ' ', '-'),
str_replace(' ', '', $str) );
}
$result = rep('about us');
print_r($result);

Matches any character that is not a word character
$string = "about us";
$newtag = preg_replace("/(\W)/g", "_", $string);
print_r($newtag);
in case its just that... you would get problems if it's a longer string :)

How to uppercase first letter after a hyphen, ie Adam Smith-Jones

I'm looking for a way to uppercase the first letter/s of a string, including where the names are joined by a hyphen, such as adam smith-jones needs to be Adam Smith-Jones.
ucwords() (or ucfirst() if I split them into firstname, lastname) only does Adam Smith-jones

$string = implode('-', array_map('ucfirst', explode('-', $string)));

What do you think about the following code ?
mb_convert_case(mb_strtolower($value, "UTF-8"), MB_CASE_TITLE, "UTF-8");
Please note that this also handles accented characters (usefull for some languages such as french).

Is this ok ?
function to_upper($name)
{
$name=ucwords($name);
$arr=explode('-', $name);
$name=array();
foreach($arr as $v)
{
$name[]=ucfirst($v);
}
$name=implode('-', $name);
return $name;
}
echo to_upper("adam smith-jones");

Other way:
<?php
$str = 'adam smith-jones';
echo preg_replace("/(-)([a-z])/e","'\\1'.strtoupper('\\2')", ucwords($str));
?>

/**
* Uppercase words including after a hyphen
*
* #param string $text lower-case text
* #return string Upper-Case text
*/
function uc_hyphenated_words($text)
{
return str_replace("- ","-",ucwords(str_replace("-","- ",$text)));
}

<?php
// note - this does NOT do what you want - but I think does what you said
// perhaps you can modify it to do what you want - or we can help if you can
// provide a bit more about the data you need to update
$string_of_text = "We would like to welcome Adam Smith-jones to our 3rd, 'I am addicted to stackoverflow-posting' event.";
// both Smith-Jones and Stackoverflow-Posting should result
// may be wrong
$words = explode(' ',$string_of_text);
foreach($words as $index=>$word) {
if(false !== strpos('-',$word)) {
$parts = explode('-',$word);
$newWords = array;
foreach($parts as $wordIndex=>$part) {
$newWords[] = ucwords($part);
}
$words[$index] = implode('-',$newWords);
}
}
$words = implode(' ',$words);
?>
Something akin to this - untested - for the purposes of making sure I understand the question.

You can us 'ucwords' to capitalize all words at once, and 'implode' and 'explode' together, like this:
ucwords(implode(" ", explode("_", "my_concatinated_word_string")));

function capWords($string) {
$string = str_replace("-", " - ", $string);
$string = ucwords(strtolower($string));
$string = str_replace(" - ", "-", $string);
return $string;
}

Here is a simple function that can convert all the words in a string to title case:
function toTitleCase($string) {
return preg_replace_callback('/\w+/', function ($match) {
return ucfirst(strtolower($match[0]));
}, $string);
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_replace only for non-english characters - php

Related

Regular expressions in php deletes dots, commas, etc

How can I pad each multibyte character / emoji with spaces around it in a string?

Regular expression change the text, when it shouldnt

Regular expression for finding multiple patterns from a given string

How to uppercase first letter after a hyphen, ie Adam Smith-Jones

Categories

Resources