how to clean a dirty csv string using php regex - php

my string may be like this:
# *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?
in fact - it is a dirty csv string - having names of jpg images
I need to remove any non-alphanum chars - from both sides of the string
then - inside the resulting string - remove the same - except commas and dots
then - remove duplicates commas and dots - if any - replace them with single ones
so the final result should be:
lorem.jpg,ipsum.jpg,dolor.jpg
I firstly tried to remove any white space - anywhere
$str = str_replace(" ", "", $str);
then I used various forms of trim functions - but it is tedious and a lot of code
the additional problem is - duplicates commas and dots may have one or more instances - for example - .. or ,,,,
is there a way to solve this using regex, pls ?

List of modeled steps following your words:
Step 1
"remove any non-alphanum chars from both sides of the string"
translated: remove trailing and tailing consecutive [^a-zA-Z0-9] characters
regex: replace ^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$ with $1
Step 2
"inside the resulting string - remove the same - except commas and dots"
translated: remove any [^a-zA-Z0-9.,]
regex: replace [^a-zA-Z0-9.,] with empty string
Step 3
"remove duplicates commas and dots - if any - replace them with single ones"
translated: replace consecutive [,.] as a single
instance
regex: replace (\.{2,}) with .
regex: replace (,{2,}) with ,
PHP Demo:
https://onlinephp.io/c/512e1
<?php
$subject = " # *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?";
$firstStep = preg_replace('/^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$/', '$1', $subject);
$secondStep = preg_replace('/[^a-z,A-Z0-9.,]/', '', $firstStep);
$thirdStepA = preg_replace('(\.{2,})', '.', $secondStep);
$thirdStepB = preg_replace('(,{2,})', ',', $thirdStepA);
echo $thirdStepB; //lorem.jpg,ipsum.jpg,dolor.jpg

Look at
https://www.php.net/manual/en/function.preg-replace.php
It replace anything inside a string based on pattern. \s represent all space char, but care of NBSP (non breakable space, \h match it )
Exemple 4
$str = preg_replace('/\s\s+/', '', $str);
It will be something like that

Can you try this :
$string = ' # *lorem.jpg,,,, ip sum.jpg,dolor .jpg,-/ ?';
// this will left only alphanumirics
$result = preg_replace("/[^A-Za-z0-9,.]/", '', $string);
// this will remove duplicated dot and ,
$result = preg_replace('/,+/', ',', $result);
$result = preg_replace('/\.+/', '.', $result);
// this will remove ,;. and space from the end
$result = preg_replace("/[ ,;.]*$/", '', $result);

Related

PHP - How to modify matched pattern and replace

I have string which contain space in its html tags
$mystr = "&lt; h3&gt; hello mom ?&lt; / h3&gt;"
so i wrote regex expression for it to detect the spaces in it
$pattern = '/(?<=<)\s\w+|\s\/\s\w+|\s\/(?=>)/mi';
so next i want to modify the matches by removing space from it and replace it, so any idea how it can be done? so that i can fix my string like
"&lt;h3&gt; hello mom ?&lt;/h3&gt;"
i know there is php function pre_replace but not sure how i can modify the matches
$result = preg_replace( $pattern, $replace , $mystr );
For the specific tags like you showed, you can use
preg_replace_callback('/&lt;(?:\s*\/)?\s*\w+\s*&gt;/ui', function($m) {
return preg_replace('/\s+/u', '', $m[0]);
}, $mystr)
The regex - note the u flag to deal with Unicode chars in the string - matches
&lt; - a literal string
(?:\s*\/)? - an optional sequence of zero or more whitespaces and a / char
\s* - zero or more whitespaces
\w+ - one or more word chars
\s* - zero or more whitespaces
&gt; - a literal string.
The preg_replace('/\s+/u', '', $m[0]) line in the anonymous callback function removes all chunks of whitespaces (even those non-breaking spaces).
You could keep it simple and do:
$output = str_replace(['&lt; / ', '&lt; ', '&gt; '],
['&lt;/', '&lt;', '&gt;'], $input);

Php make spaces in a word with a dash

I have the following string:
$thetextstring = "jjfnj 948"
At the end I want to have:
echo $thetextstring; // should print jjf-nj948
So basically what am trying to do is to join the separated string then separate the first 3 letters with a -.
So far I have
$string = trim(preg_replace('/s+/', ' ', $thetextstring));
$result = explode(" ", $thetextstring);
$newstring = implode('', $result);
print_r($newstring);
I have been able to join the words, but how do I add the separator after the first 3 letters?
Use a regex with preg_replace function, this would be a one-liner:
^.{3}\K([^\s]*) *
Breakdown:
^ # Assert start of string
.{3} # Match 3 characters
\K # Reset match
([^\s]*) * # Capture everything up to space character(s) then try to match them
PHP code:
echo preg_replace('~^.{3}\K([^\s]*) *~', '-$1', 'jjfnj 948');
PHP live demo
Without knowing more about how your strings can vary, this is working solution for your task:
Pattern:
~([a-z]{2}) ~ // 2 letters (contained in capture group1) followed by a space
Replace:
-$1
Demo Link
Code: (Demo)
$thetextstring = "jjfnj 948";
echo preg_replace('~([a-z]{2}) ~','-$1',$thetextstring);
Output:
jjf-nj948
Note this pattern can easily be expanded to include characters beyond lowercase letters that precede the space. ~(\S{2}) ~
You can use str_replace to remove the unwanted space:
$newString = str_replace(' ', '', $thetextstring);
$newString:
jjfnj948
And then preg_replace to put in the dash:
$final = preg_replace('/^([a-z]{3})/', '\1-', $newString);
The meaning of this regex instruction is:
from the beginning of the line: ^
capture three a-z characters: ([a-z]{3})
replace this match with itself followed by a dash: \1-
$final:
jjf-nj948
$thetextstring = "jjfnj 948";
// replace all spaces with nothing
$thetextstring = str_replace(" ", "", $thetextstring);
// insert a dash after the third character
$thetextstring = substr_replace($thetextstring, "-", 3, 0);
echo $thetextstring;
This gives the requested jjf-nj948
You proceeding is correct. For the last step, which consists in inserting a - after the third character, you can use the substr_replace function as follows:
$thetextstring = 'jjfnj 948';
$string = trim(preg_replace('/\s+/', ' ', $thetextstring));
$result = explode(' ', $thetextstring);
$newstring = substr_replace(implode('', $result), '-', 3, false);
If you are confident enough that your string will always have the same format (characters followed by a whitespace followed by numbers), you can also reduce your computations and simplify your code as follows:
$thetextstring = 'jjfnj 948';
$newstring = substr_replace(str_replace(' ', '', $thetextstring), '-', 3, false);
Visit this link for a working demo.
Oldschool without regex
$test = "jjfnj 948";
$test = str_replace(" ", "", $test); // strip all spaces from string
echo substr($test, 0, 3)."-".substr($test, 3); // isolate first three chars, add hyphen, and concat all characters after the first three

Regex rules in an array

Maybe it can not be solved this issue as I want, but maybe you can help me guys.
I have a lot of malformed words in the name of my products.
Some of them has leading ( and trailing ) or maybe one of these, it is same for / and " signs.
What I do is that I am explode the name of the product by spaces, and examines these words.
So I want to replace them to nothing. But, a hard drive could be 40GB ATA 3.5" hard drive. I need to process all the word, but I can not use the same method for 3.5" as for () or // because this 3.5" is valid.
So I only need to replace the quotes, when it is at the start of the string AND at end of the string.
$cases = [
'(testone)',
'(testtwo',
'testthree)',
'/otherone/',
'/othertwo',
'otherthree/',
'"anotherone',
'anothertwo"',
'"anotherthree"',
];
$patterns = [
'/^\(/',
'/\)$/',
'~^/~',
'~/$~',
//Here is what I can not imagine, how to add the rule for `"`
];
$result = preg_replace($patterns, '', $cases);
This is works well, but can it be done in one regex_replace()? If yes, somebody can help me out the pattern(s) for the quotes?
Result for quotes should be this:
'"anotherone', //no quote at end leave the leading
'anothertwo"', //no quote at start leave the trailin
'anotherthree', //there are quotes on start and end so remove them.
You may use another approach: rather than define an array of patterns, use one single alternation based regex:
preg_replace('~^[(/]|[/)]$|^"(.*)"$~s', '$1', $s)
See the regex demo
Details:
^[(/] - a literal ( or / at the start of the string
| - or
[/)]$ - a literal ) or / at the end of the string
| - or
^"(.*)"$ - a " at the start of the string, then any 0+ characters (due to /s option, the . matches a linebreak sequence, too) that are captured into Group 1, and " at the end of the string.
The replacement pattern is $1 that is empty when the first 2 alternatives are matched, and contains Group 1 value if the 3rd alternative is matched.
Note: In case you need to replace until no match is found, use a preg_match with preg_replace together (see demo):
$s = '"/some text/"';
$re = '~^[(/]|[/)]$|^"(.*)"$~s';
$tmp = '';
while (preg_match($re, $s) && $tmp != $s) {
$tmp = $s;
$s = preg_replace($re, '$1', $s);
}
echo $s;
This works
preg_replace([[/(]?(.+)[/)]?|/\"(.+)\"/], '$1', $string)

How to match alphanumeric and symbols using PHP?

I'm working with text content in UTF8 encoding stored in variable $title.
Using preg_replace, how do I append an extra space if the $title string is ending with:
upper/lower case character
digit
symbol, eg. ? or !
This should do the trick:
preg_replace('/^(.*[\w?!])$/', "$1 ", $string);
In essence what it does is if the string ends in one of your unwanted characters it appends a single space.
If the string doesn't match the pattern, then preg_replace() returns the original string - so you're still good.
If you need to expand your list of unwanted endings you can just add them into the character block [\w?!]
Using a positive lookbehind before the end of the line.
And replace with a space.
$title = preg_replace('/(?<=[A-Za-z0-9?!])$/',' ', $title);
Try it here
You may want to try this Pattern Matching below to see if that does it for you.
<?php
// THE REGEX BELOW MATCHES THE ENDING LOWER & UPPER-CASED CHARACTERS, DIGITS
// AND SYMBOLS LIKE "?" AND "!" AND EVEN A DOT "."
// HOWEVER YOU CAN IMPROVISE ON YOUR OWN
$rxPattern = "#([\!\?a-zA-Z0-9\.])$#";
$title = "What is your name?";
var_dump($title);
// AND HERE, YOU APPEND A SINGLE SPACE AFTER THE MATCHED STRING
$title = preg_replace($rxPattern, "$1 ", $title);
var_dump($title);
// THE FIRST var_dump($title) PRODUCES:
// 'What is your name?' (length=18)
// AND THE SECOND var_dump($title) PRODUCES
// 'What is your name? ' (length=19) <== NOTICE THE LENGTH FROM ADDED SPACE.
You may test it out HERE.
Cheers...
You need
$title=preg_replace("/.*[\w?!]$/", "\\0 ", $title);

Don't show white spaces in regex output

I need to match everything but not the white spaces
For example in this string : 16790 - 140416 / 3300
I want to have the following result made by regex (and PHP) without white spaces: 140416/3300
I used : \s+\d+\s+-\s+(\d+\s+\/\s+\d+) and of course it gave me the results with the whitespaces:
140416 / 3300
How could I have a match 1 result without white spaces ?
Thank you
$subject = "16790 - 140416 / 3300";
$result = preg_replace('%.*?(\d+)\s+/\s+(\d+)%', '$1/$2', $subject);
echo $result;
// 140416/3300
http://regex101.com/r/oV4hN0
If you're just removing an unknown number of spaces and tabs from a string, you can use
$result = preg_replace('~\s*~', '', $subject);
If you're matching the whole pattern you gave (not just the division), you can use this:
$result = preg_replace('~(\d+)\s*-\s*(\d+)\s*/\s*(\d+)~', '\1-\2/\3', $subject);
Finally, and this might be the best, if you'd like to remove spaces around operators such as =,*,-,/ when they are surrounded by digits, you can use this:
$result = preg_replace('~(\d)\s*([+*/-])\s*(\d)~', '\1\2\3', $subject);

Categories