Removing 'words' contained in strings with non-alphanumeric characters? - php

What is the recommended method in PHP for removing 'words' in strings with non-alphanumeric characters please?
$string = "Test let's test 123. https://youtu.be/dQw4w9WgXcQ EOTest.";
desired result:
"Test test 123. EOTest.";
Method 1 - regex
Method 2 - explode(), foreach() and str_replace or preg_replace

Try using the preg_split, preg_grep, and implode functions, like so:
$string = "Test let's test 123. https://youtu.be/dQw4w9WgXcQ EOTest.";
$words = preg_split('/\s+/', $string); // split on one or more spaces
$filter = preg_grep('/^[A-Za-z\d.]+$/', $words); // allow dot, letters, and numbers
$result = implode(' ', $filter); // turn it into a string
print_r($result); // -> Test test 123. EOTest.
I hope that helps!

Related

highlight last 3 number in the string

how to highlight last 3 number in the string:
ab9c5lek94ke72koe8nsk9
i want output:
ab9c5lek94ke72koe8nsk9
i tried following:
$str = "ab9c5lek94ke72koe8nsk9";
$numbers = preg_replace('/[^0-9]/', '', $str );
$last3 = substr($numbers, -3);
$highlights = str_split($last3);
$output = preg_replace('/'.implode('|', $highlights).'/i', '<b>$0</b>', $str);
but it highlight:
ab9c5lek94ke72koe8nsk9
You can achieve that easily using regular expression with PHP's preg_replace() function. Just find last 3 digits 3 times at the end of the string. See the following code:
$str1 = 'ab9c5lek94ke2koe8nsk9';
$str2 = 'dag2vue41a89au76zhz30';
echo preg_replace('/(\d)([^\d]*)(\d)([^\d]*)(\d)([^\d]*)$/mui', '<b>$1</b>$2<b>$3</b>$4<b>$5</b>$6', $str1);
echo preg_replace('/(\d)([^\d]*)(\d)([^\d]*)(\d)([^\d]*)$/mui', '<b>$1</b>$2<b>$3</b>$4<b>$5</b>$6', $str2);
Outputs
ab9c5lek94ke2koe8nsk9
and
dag2vue41a89au76zhz30
You can have a regex like below:
/(\d)([^\d]*)(\d)([^\d]*)(\d)([^\d]*)$/
which is basically (\d)([^\d]*) 3 times with a $ sign at the end. It means we are matching a digit followed by 0 or more non digit characters. Note that we do need the $ sign at the end to match only last 3 digits.
Snippet:
<?php
$str = "ab9c5lek94ke2koe8nsk9";
echo preg_replace('/(\d)([^\d]*)(\d)([^\d]*)(\d)([^\d]*)$/',"<b>$1</b>$2<b>$3</b>$4<b>$5</b>$6",$str);
Demo: https://3v4l.org/PoYCG
The replacement string is just having the bold tags with group number of the matched strings which are to be highlighted.
You can split the string on the 3rd from last number, then highlight everything in that sub string and then recombine the substrings.
In my example, I get all of the numbers in the string and put them in array. Then I use that array to get the delimiter (72 in your example), then use that to split the string into two arrays, at that point you can highlight everything in array[1] and then combine it back into array[0]
$str = "ab9c5lek94ke72koe8nsk9";
// get all of the numbers in order and place into it's own array
$numbers = trim(preg_replace('/[^0-9]/', ' ', $str ));
$numbers = preg_split('/\s+/', $numbers);
// using that get the delimiter to where to split the string. and split.
$delim = $numbers[count($numbers)-3];
$arr = explode($delim, $str);
$arr[1] = $delim . $arr[1];
// then highlight everything in $arr[1]
// then combine $arr[0] and $arr[1]

Php make spaces in a word with a dash

I have the following string:
$thetextstring = "jjfnj 948"
At the end I want to have:
echo $thetextstring; // should print jjf-nj948
So basically what am trying to do is to join the separated string then separate the first 3 letters with a -.
So far I have
$string = trim(preg_replace('/s+/', ' ', $thetextstring));
$result = explode(" ", $thetextstring);
$newstring = implode('', $result);
print_r($newstring);
I have been able to join the words, but how do I add the separator after the first 3 letters?
Use a regex with preg_replace function, this would be a one-liner:
^.{3}\K([^\s]*) *
Breakdown:
^ # Assert start of string
.{3} # Match 3 characters
\K # Reset match
([^\s]*) * # Capture everything up to space character(s) then try to match them
PHP code:
echo preg_replace('~^.{3}\K([^\s]*) *~', '-$1', 'jjfnj 948');
PHP live demo
Without knowing more about how your strings can vary, this is working solution for your task:
Pattern:
~([a-z]{2}) ~ // 2 letters (contained in capture group1) followed by a space
Replace:
-$1
Demo Link
Code: (Demo)
$thetextstring = "jjfnj 948";
echo preg_replace('~([a-z]{2}) ~','-$1',$thetextstring);
Output:
jjf-nj948
Note this pattern can easily be expanded to include characters beyond lowercase letters that precede the space. ~(\S{2}) ~
You can use str_replace to remove the unwanted space:
$newString = str_replace(' ', '', $thetextstring);
$newString:
jjfnj948
And then preg_replace to put in the dash:
$final = preg_replace('/^([a-z]{3})/', '\1-', $newString);
The meaning of this regex instruction is:
from the beginning of the line: ^
capture three a-z characters: ([a-z]{3})
replace this match with itself followed by a dash: \1-
$final:
jjf-nj948
$thetextstring = "jjfnj 948";
// replace all spaces with nothing
$thetextstring = str_replace(" ", "", $thetextstring);
// insert a dash after the third character
$thetextstring = substr_replace($thetextstring, "-", 3, 0);
echo $thetextstring;
This gives the requested jjf-nj948
You proceeding is correct. For the last step, which consists in inserting a - after the third character, you can use the substr_replace function as follows:
$thetextstring = 'jjfnj 948';
$string = trim(preg_replace('/\s+/', ' ', $thetextstring));
$result = explode(' ', $thetextstring);
$newstring = substr_replace(implode('', $result), '-', 3, false);
If you are confident enough that your string will always have the same format (characters followed by a whitespace followed by numbers), you can also reduce your computations and simplify your code as follows:
$thetextstring = 'jjfnj 948';
$newstring = substr_replace(str_replace(' ', '', $thetextstring), '-', 3, false);
Visit this link for a working demo.
Oldschool without regex
$test = "jjfnj 948";
$test = str_replace(" ", "", $test); // strip all spaces from string
echo substr($test, 0, 3)."-".substr($test, 3); // isolate first three chars, add hyphen, and concat all characters after the first three

PHP Regex: Remove words less than 3 characters

I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);
Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries
You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''
Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);
As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"
To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.

Some confusion with preg_replace

I am very confused with preg_replace, I have this string and I would like to change ONLY the number before the _
$string = 'string/1_491107.jpg';
$newstring = preg_replace('#([0-9]+)_#', '666', $string);
But then I get "string/666491107.jpg" instead "string/666_491107.jpg"
Thanks
What you're doing here is matching the numbers in the parenthesis as $1 in your replacement. You don't actually say "only the stuff in parenthesises should be replaced".
You could do it like this:
$string = 'string/1_491107.jpg';
$newstring = preg_replace('#[0-9]+_#', '666_', $string);
or you could use a positive lookahead (only match a number sequence followed by an underscore, but don't include the underscore in the match):
$string = 'string/1_491107.jpg';
$newstring = preg_replace('#[0-9]+(?=_)#', '666', $string);
Regex 101 demo
You've got the underscore as part of the text to be replaced; so you also need to include it in the replacement:
$string = 'string/1_491107.jpg';
$newstring = preg_replace('#([0-9]+)_#', '666_', $string);

Replace all characters in string apart from PHP

I have a string Trade Card Catalogue 1988 Edition I wish to remove everything apart from 1988.
I could have an array of all letters and do a str_replace and trim, but I wondered if this was a better solution?
$string = 'Trade Card Catalogue 1988 Edition';
$letters = array('a','b','c'....'x','y','z');
$string = str_to_lower($string);
$string = str_replace($letters, '', $string);
$string = trim($string);
Thanks in advance
Regular expression?
So assuming you want the number (and not the 4th word or something like that):
$str = preg_replace('#\D#', '', $str);
\D means every character that is not a digit. The same as [^0-9].
If there could be more numbers but you only want to get a four digit number (a year), this will also work (but obviously fails if you there are several four digit numbers and you want to get a specific one) :
$str = preg_replace('#.*?(\d{4,4}).*#', '\1', $str);
You can actually just pass the entire set of characters to be trimmed as a parameter to trim:
$string = trim($string, 'abc...zABC...Z ' /* don't forget the space */);

Categories