Using preg_replace()? - php

I'm trying to understand the function preg_replace(), but it looks rather cryptic to me. From looking at the documentation on the function here. I understand that it consists of three things - the subject, the pattern for matching, and what to replace it with.
I'm currently trying to 'sanitize' numerical input by replacing anything that isn't a number. So far I know I would need to allow the numbers 0-9, but remove anything that isn't a number and replace it with: "".
Instead of escaping every character I need to, is there some way to simply not allow any other character than the numbers 0-9? Also if anyone could shed light on how the 'pattern for matching' part works...

If you want to sanitize a string to replace anything that isn't a number, you would write a regular expression that matches characters not in list.
The pattern [0-9] would match all numerals. Placing a caret (^) at the beginning of the set matches everything that isn't in the set: [^0-9]
$result = preg_replace('/[^0-9]/', '', $input);
Note that this will also filter out periods/decimal points and other mathematical marks. You could include periods/decimal points (allowing floats) by making the period allowed:
$result = preg_replace('/[^0-9.]/', '', $input);
Note that the period (.) is the wildcard character in regular expressions. It doesn't need to be escaped in a bracket expression, but it would elsewhere in the pattern.

preg_replace() is easier than you think:
$p = "/[A-Z a-z]/";
$r = "";
$s = "12345t7890";
echo preg_replace($p, $r, $s);
Will output "123457890" - note no '6'

is_numeric(); isn't resolve your problem ?

Related

Looking to use preg_replace to remove characters from my strings

I have the right function, just not finding the right regex pattern to remove (ID:999999) from the string. This ID value varies but is all numeric. I like to remove everything including the brackets.
$string = "This is the value I would like removed. (ID:17937)";
$string = preg_replace('#(ID:['0-9']?)#si', "", $string);
Regex is not more forte! And need help with this one.
Try this:
$string = preg_replace('# \(ID:[0-9]+\)#si', "", $string);
You need to escape the parenthesis using backslashes \.
You shouldn't use quotes around the number range.
You should use + (one or more) instead of ? (zero or one).
You can add a space at the start, to avoid having a space at the end of the resulting string.
In PHP regex is in / and not #, after that, parentheses are for capture group so you must escape them to match them.
Also to use preg_replace replacement you will need to use capture group so in your case /(\(ID:[0-9]+\))/si will be the a nice regular expression.
Here are two options:
Code: (Demo)
$string = "This is the value I would like removed. (ID:17937)";
var_export(preg_replace('/ \(ID:\d+\)/',"",$string));
echo "\n\n";
var_export(strstr($string,' (ID:',true));
Output: (I used var_export() to show that the technique is "clean" and gives no trailing whitespaces)
'This is the value I would like removed.'
'This is the value I would like removed.'
Some points:
Regex is a better / more flexible solution if your ID substring can exist anywhere in the string.
Your regex pattern doesn't need a character class if you use the shorthand range character \d.
Regex generally speaking should only be used when standard string function will not suffice or when it is proven to be more efficient for a specific case.
If your ID substring always occurs at the end of the string, strstr() is an elegant/perfect function.
Both of my methods write a (space) before ID to make the output clean.
You don't need either s or i modifiers on your pattern, because s only matters if you use a . (dot) and your ID is probably always uppercase so you don't need a case-insensitive search.

Using regex to extract numbers and symbols from string

I have a string with text, numbers, and symbols. I'm trying to extract the numbers, and symbols from the string with limited success. Instead of getting the entire number and symbols, I'm only getting part of it. I will explain my regex below, to make it more clearer, and easier to understand.
\d : any number
[+,-,*,/,0-9]+ : 1 or more of any +,-,*,/, or number
\d : any number
Code:
$string = "text 1+1-1*1/1= text";
$regex = "~\d[+,-,*,/,0-9]+\d~siU";
preg_match_all($regex, $string, $matches);
echo $matches[0][0];
Expected Results
1+1-1*1/1
Actual Results
1+1
Remove the U flag. It's causing the the + to be nongreedy in its matching. Also, you don't need commas between characters in your character list. (You only need 1 , if you're trying match it. You do need to escape - so that it doesn't think you're trying to make a range
The problem here is that your regex does mix up quite a few unescaped metacharacters. In your character class you have [+,-,*,/,0-9]. You do not need to separate different characters with commas, that will only tell the regex-engine to include commas in your expression. Furthermore, you need to escape the -, as it has a special meaning inside the character class. As it is, it will be interpreted as 'characters from "," to "," instead of the literal character "-". A similar problem exists with the "/"-character. The expression \d[+\-*/0-9]+\d should do the trick.
Didn't test it with your code but should work :)
((?:[0-9]+[\+|\-|\*|\/]?)+)
More in details, if you want to understand my pattern : https://regex101.com/r/mF0zO8/2

PHP Regex to find a specific substring

So basically, I have a big string with some other information, and somewhere at the end, I have the following structure of a string:
62AC979D-5277D720
It is numbers and uppercase letters. I would like to extract this substring from many lines of the bigger strings which all contain it at different places. I have tried:
preg_match('/^[\w]+$/', $string);
But I really don't have much experience with regular expressions. Can someone provide the regex necessary or at least tell me where I am mistaken? Thank you for your time!
This regex should do it for you,
([A-Z\d]{8}-[A-Z\d]{8})
in use
<?php
$string = 'This is 62AC979D-5277D720 the whole string.';
preg_match_all('~([A-Z\d]{8}-[A-Z\d]{8})~', $string, $value);
print_r($value[1]);
Your current regex fails I suspect because of the ^ and $. These mark the start and end of the string you are searching for (or line if the m modifier is used). The \w is also a-z, A-Z, 0-9 and _. I think you only care about capital letters and you want to allow only one dash. If the target will also always only be 8 characters you can add the {8} in place of the +. The () are to capture the value that is found. The first found value in $string will be $value[1][0].
Demo: http://sandbox.onlinephpfunctions.com/code/c6b2c391d95c5454a3c7ea81d5ac4a3bb8e49aef
preg_match_all('/\\b[0-9A-Z]+-[0-9A-Z]+\\b/')
This should do it for you.
preg_match('/\\b[0-9A-Z]{8}-[0-9A-Z]{8}\\b/', $string);
This works for the string you gave i.e 8 numbers or alphabets followed by - and then numbers and alphabets again
You try this.
preg_match('/^[0-9A-Z]{8}-[0-9A-Z]{8}$/', $string)

remove whatever i want from string

I got a few keywords, symbols, letters etc I want to remove from my php string. I'm trying to add it but it doesn't work too well.
$string = preg_replace("/(?![=$'%-mp4mp3])\p{P}/u","", $check['title']);
pretty much I want to to remove word mp3, mp4, ./, apples from the string.
Please help guide me, thanks in advance!
First: [] in regular expression introduces a character class. A hyphen is used to represent a character range between two symbols. So the reason your regular expression would make too many erasures (as I suppose) is because [=$'%-mp4mp3] means =, $, ', everything from % to m (72 characters actually!), p, 3, 4.
Second: your regular expression doesn't grab "bad" characters/keywords. Actually, you erase punctuation after bad characters/keywords, as negative lookahead is meta sequence (it is not included in match).
Change your regex to:
"/[=$'%-]|mp3|mp4/u"
You don't need regex for that.
$string = "Your original string here";
$keywords = array('mp3', 'mp4');
echo str_replace($keywords, '', $string);

Insert separators into a string in regular intervals

I have the following string in php:
$string = 'FEDCBA9876543210';
The string can be have 2 or more (I mean more) hexadecimal characters
I wanted to group string by 2 like :
$output_string = 'FE:DC:BA:98:76:54:32:10';
I wanted to use regex for that, I think I saw a way to do like "recursive regex" but I can't remember it.
Any help appreciated :)
If you don't need to check the content, there is no use for regex.
Try this
$outputString = chunk_split($string, 2, ":");
// generates: FE:DC:BA:98:76:54:32:10:
You might need to remove the last ":".
Or this :
$outputString = implode(":", str_split($string, 2));
// generates: FE:DC:BA:98:76:54:32:10
Resources :
www.w3schools.com - chunk_split()
www.w3schools.com - str_split()
www.w3schools.com - implode()
On the same topic :
Split string into equal parts using PHP
Sounds like you want a regex like this:
/([0-9a-f]{2})/${1}:/gi
Which, in PHP is...
<?php
$string = 'FE:DC:BA:98:76:54:32:10';
$pattern = '/([0-9A-F]{2})/gi';
$replacement = '${1}:';
echo preg_replace($pattern, $replacement, $string);
?>
Please note the above code is currently untested.
You can make sure there are two or more hex characters doing this:
if (preg_match('!^\d*[A-F]\d*[A-F][\dA-F]*$!i', $string)) {
...
}
No need for a recursive regex. By the way, recursive regex is a contradiction in terms. As a regular language (which a regex parses) can't be recursive, by definition.
If you want to also group the characters in pairs with colons in between, ignoring the two hex characters for a second, use:
if (preg_match('!^[\dA-F]{2}(?::[A-F][\dA-F]{2})*$!i', $string)) {
...
}
Now if you want to add the condition requiring tow hex characters, use a positive lookahead:
if (preg_match('!^(?=[\d:]*[A-F][\d:]*[A-F])[\dA-F]{2}(?::[A-F][\dA-F]{2})*$!i', $string)) {
...
}
To explain how this works, the first thing it does it that it checks (with a positive lookahead ie (?=...) that you have zero or more digits or colons followed by a hex letter followed by zero or more digits or colons and then a letter. This will ensure there will be two hex letters in the expression.
After the positive lookahead is the original expression that makes sure the string is pairs of hex digits.
Recursive regular expressions are usually not possible. You may use a regular expression recursively on the results of a previous regular expression, but most regular expression grammars will not allow recursivity. This is the main reason why regular expressions are almost always inadequate for parsing stuff like HTML. Anyways, what you need doesn't need any kind of recursivity.
What you want, simply, is to match a group multiple times. This is quite simple:
preg_match_all("/([a-z0-9]{2})+/i", $string, $matches);
This will fill $matches will all occurrences of two hexadecimal digits (in a case-insensitive way). To replace them, use preg_replace:
echo preg_replace("/([a-z0-9]{2})/i", $string, '\1:');
There will probably be one ':' too much at the end, you can strip it with substr:
echo substr(preg_replace("/([a-z0-9]{2})/i", $string, '\1:'), 0, -1);
While it is not horrible practice to use rtrim(chunk_split($string, 2, ':'), ':'), I prefer to use direct techniques that avoid "mopping up" after making modifications.
Code: (Demo)
$string = 'FEDCBA9876543210';
echo preg_replace('~[\dA-F]{2}(?!$)\K~', ':', $string);
Output:
FE:DC:BA:98:76:54:32:10
Don't be intimidated by the regex. The pattern says:
[\dA-F]{2} # match exactly two numeric or A through F characters
(?!$) # that is not located at the end of the string
\K # restart the fullstring match
When I say "restart the fullstring match" I mean "forget the previously matched characters and start matching from this point forward". Because there are no additional characters matched after \K, the pattern effectively delivers the zero-width position where the colon should be inserted. In this way, no original characters are lost in the replacement.

Categories