PHP - How to modify matched pattern and replace - php

I have string which contain space in its html tags
$mystr = "< h3> hello mom ?< / h3>"
so i wrote regex expression for it to detect the spaces in it
$pattern = '/(?<=<)\s\w+|\s\/\s\w+|\s\/(?=>)/mi';
so next i want to modify the matches by removing space from it and replace it, so any idea how it can be done? so that i can fix my string like
"&lt;h3&gt; hello mom ?&lt;/h3&gt;"
i know there is php function pre_replace but not sure how i can modify the matches
$result = preg_replace( $pattern, $replace , $mystr );

For the specific tags like you showed, you can use
preg_replace_callback('/&lt;(?:\s*\/)?\s*\w+\s*&gt;/ui', function($m) {
return preg_replace('/\s+/u', '', $m[0]);
}, $mystr)
The regex - note the u flag to deal with Unicode chars in the string - matches
&lt; - a literal string
(?:\s*\/)? - an optional sequence of zero or more whitespaces and a / char
\s* - zero or more whitespaces
\w+ - one or more word chars
\s* - zero or more whitespaces
&gt; - a literal string.
The preg_replace('/\s+/u', '', $m[0]) line in the anonymous callback function removes all chunks of whitespaces (even those non-breaking spaces).

You could keep it simple and do:
$output = str_replace(['&lt; / ', '&lt; ', '&gt; '],
['&lt;/', '&lt;', '&gt;'], $input);

Related

how to clean a dirty csv string using php regex

my string may be like this:
# *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?
in fact - it is a dirty csv string - having names of jpg images
I need to remove any non-alphanum chars - from both sides of the string
then - inside the resulting string - remove the same - except commas and dots
then - remove duplicates commas and dots - if any - replace them with single ones
so the final result should be:
lorem.jpg,ipsum.jpg,dolor.jpg
I firstly tried to remove any white space - anywhere
$str = str_replace(" ", "", $str);
then I used various forms of trim functions - but it is tedious and a lot of code
the additional problem is - duplicates commas and dots may have one or more instances - for example - .. or ,,,,
is there a way to solve this using regex, pls ?
List of modeled steps following your words:
Step 1
"remove any non-alphanum chars from both sides of the string"
translated: remove trailing and tailing consecutive [^a-zA-Z0-9] characters
regex: replace ^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$ with $1
Step 2
"inside the resulting string - remove the same - except commas and dots"
translated: remove any [^a-zA-Z0-9.,]
regex: replace [^a-zA-Z0-9.,] with empty string
Step 3
"remove duplicates commas and dots - if any - replace them with single ones"
translated: replace consecutive [,.] as a single
instance
regex: replace (\.{2,}) with .
regex: replace (,{2,}) with ,
PHP Demo:
https://onlinephp.io/c/512e1
<?php
$subject = " # *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?";
$firstStep = preg_replace('/^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$/', '$1', $subject);
$secondStep = preg_replace('/[^a-z,A-Z0-9.,]/', '', $firstStep);
$thirdStepA = preg_replace('(\.{2,})', '.', $secondStep);
$thirdStepB = preg_replace('(,{2,})', ',', $thirdStepA);
echo $thirdStepB; //lorem.jpg,ipsum.jpg,dolor.jpg
Look at
https://www.php.net/manual/en/function.preg-replace.php
It replace anything inside a string based on pattern. \s represent all space char, but care of NBSP (non breakable space, \h match it )
Exemple 4
$str = preg_replace('/\s\s+/', '', $str);
It will be something like that
Can you try this :
$string = ' # *lorem.jpg,,,, ip sum.jpg,dolor .jpg,-/ ?';
// this will left only alphanumirics
$result = preg_replace("/[^A-Za-z0-9,.]/", '', $string);
// this will remove duplicated dot and ,
$result = preg_replace('/,+/', ',', $result);
$result = preg_replace('/\.+/', '.', $result);
// this will remove ,;. and space from the end
$result = preg_replace("/[ ,;.]*$/", '', $result);

Looking for specific character in capture group

I need to replace all double quotes in any (variable) given string.
For example:
$text = 'data-caption="hello"world">';
$pattern = '/data-caption="[[\s\S]*?"|(")]*?">/';
$output = preg_replace($pattern, '"', $text);
should result in:
"hello"world"
(The above pattern is my attempt at getting it to work)
The problem is that I don't now in advance if and how many double quotes are going to be in the string.
How can i replace the " with quot; ?
You may match strings between data-caption=" and "> and then replace all " inside that match with " using a mere str_replace:
$text = 'data-caption="<element attribute1="wert" attribute2="wert">Name</element>">';
$pattern = '/data-caption="\K.*?(?=">)/';
$output = preg_replace_callback($pattern, function($m) {
return str_replace('"', '"', $m[0]);
}, $text);
print_r($output);
// => data-caption="<element attribute1="wert" attribute2="wert">Name</element>">
See the PHP demo
Details
data-caption=" - starting delimiter
\K - match reset operator
.*? - any 0+ chars other than line break chars, as few as possible
(?=">) - a positive lookahead that requires the "> substring immediately to the right of the current location.
The match is passed to the anonymous function inside preg_replace_callback (accessible via $m[0]) and that is where it is possible to replace all " symbols in a convenient way.

preg_replace vs trim PHP

I am working with a slug function and I dont fully understand some of it and was looking for some help on explaining.
My first question is about this line in my slug function $string = preg_replace('# +#', '-', $string); Now I understand that this replaces all spaces with a '-'. What I don't understand is what the + sign is in there for which comes after the white space in between the #.
Which leads to my next problem. I want a trim function that will get rid of spaces but only the spaces after they enter the value. For example someone accidentally entered "Arizona " with two spaces after the a and it destroyed the pages linked to Arizona.
So after all my rambling I basically want to figure out how I can use a trim to get rid of accidental spaces but still have the preg_replace insert '-' in between words.
ex.. "Sun City West " = "sun-city-west"
This is my full slug function-
function getSlug($string){
if(isset($string) && $string <> ""){
$string = strtolower($string);
//var_dump($string); echo "<br>";
$string = preg_replace('#[^\w ]+#', '', $string);
//var_dump($string); echo "<br>";
$string = preg_replace('# +#', '-', $string);
}
return $string;
}
You can try this:
function getSlug($string) {
return preg_replace('#\s+#', '-', trim($string));
}
It first trims extra spaces at the beginning and end of the string, and then replaces all the other with the - character.
Here your regex is:
#\s+#
which is:
# = regex delimiter
\s = any space character
+ = match the previous character or group one or more times
# = regex delimiter again
so the regex here means: "match any sequence of one or more whitespace character"
The + means at least one of the preceding character, so it matches one or more spaces. The # signs are one of the ways of marking the start and end of a regular expression's pattern block.
For a trim function, PHP handily provides trim() which removes all leading and trailing whitespace.

Strip phpbb bbcode

I want to display the most recent posts from my phpbb3 forum on my website, but without the bbcode.
so I'm trying to strip the bbcode but without succes
one of the posts for example could be:
[quote="SimonLeBon":3pwalcod]bladie bla bla[/quote:3pwalcod]bla bla bladie bla blaffsd
fsdjhgfd dgfgdffgdfg
to strip bbcodes i use the function i found via google, I've tried several other similiar functions aswell:
<?php
function stripBBCode($text_to_search) {
$pattern = '|[[\/\!]*?[^\[\]]*?]|si';
$replace = '';
return preg_replace($pattern, $replace, $text_to_search);
}
?>
This however doesn't really have any effect.
This will strip bbcode, that is valid (i.e. opening tags matching closing tags).
$str = preg_replace('/\[(\w+)=.*?:(.*?)\](.*?)\[\/\1:\2\]/', '$3', $str);
CodePad.
Reusable Function
function stripBBCode($str) {
return preg_replace('/\[(\w+)=.*?:(.*?)\](.*?)\[\/\1:\2\]/', '$3', $str);
}
Explanation
\[ match literal [.
(\w+) Match 1 or more word characters and save in capturing group 1.
= Match literal =.
.*? Match ungreedily every character except \n between = and :.
: Match literal :.
(.*?) Match ungreedily every character except \n between : and ] and save in capturing group 2.
\] Match literal ].
(.*?) Match ungreedily every character except \n between : and ] and save in capturing group 3.
\[ Match literal [.
/\1\2 Match previous capturing groups again.
\] Match literal ].
Here's the one from phpBB (slightly adjusted to be standalone):
/**
* Strips all bbcode from a text and returns the plain content
*/
function strip_bbcode(&$text, $uid = '')
{
if (!$uid)
{
$uid = '[0-9a-z]{5,}';
}
$text = preg_replace("#\[\/?[a-z0-9\*\+\-]+(?:=(?:".*"|[^\]]*))?(?::[a-z])?(\:$uid)\]#", ' ', $text);
$match = return array(
'#<!\-\- e \-\->.*?<!\-\- e \-\->#',
'#<!\-\- l \-\-><a (?:class="[\w-]+" )?href="(.*?)(?:(&|\?)sid=[0-9a-f]{32})?">.*?</a><!\-\- l \-\->#',
'#<!\-\- ([mw]) \-\-><a (?:class="[\w-]+" )?href="(.*?)">.*?</a><!\-\- \1 \-\->#',
'#<!\-\- s(.*?) \-\-><img src="\{SMILIES_PATH\}\/.*? \/><!\-\- s\1 \-\->#',
'#<!\-\- .*? \-\->#s',
'#<.*?>#s',
);
$replace = array('\1', '\1', '\2', '\1', '', '');
$text = preg_replace($match, $replace, $text);
}
Why don't you use the BBCode parsing facilities that are built in to PHP?
http://php.net/manual/en/book.bbcode.php
Nowadays, use phpbb's own function https://wiki.phpbb.com/Strip_bbcode

Selective string reduction

I would like to know how to strip all non-alphanumeric characters from a string except for underscores and dashes in PHP.
Use preg_replace with /[^a-zA-Z0-9_\-]/ as the pattern and '' as the replacement.
$string = preg_replace('/[^a-zA-Z0-9_\-]/', '', $string);
EDIT
As skippy said, you can use the i modifier for case insensitivity:
$string = preg_replace('/[^a-z0-9_\-]/i', '', $string);
Use preg_replace:
$str = preg_replace('/[^\w-]/', '', $str);
The first argument to preg_replace is a regular expression. This one contains:
/ - starting delimiter -- start the regex
[ - start character class -- define characters that can be matched
^ - negative -- make the character class match only characters that don't match the selection that follows
\w - word character -- so don't match word characters. These are A-Za-z0-9 and _ (underscore)
- - hyphen -- don't match hypens either
] - close the character class
/ - ending delimiter -- close the regex
Note that this only matches hyphens (i.e. -). It does not match genuine dash characters (– or —).
Accepts a-z, A-Z, 0-9, '-', '_' and spaces:
$str = preg_replace("/[^a-z0-9\s_-]+/i", '', $tr);
No spaces:
$str = preg_replace("/[^a-z0-9_-]+/i", '', $tr);

Categories