Insert separators into a string in regular intervals - php

I have the following string in php:
$string = 'FEDCBA9876543210';
The string can be have 2 or more (I mean more) hexadecimal characters
I wanted to group string by 2 like :
$output_string = 'FE:DC:BA:98:76:54:32:10';
I wanted to use regex for that, I think I saw a way to do like "recursive regex" but I can't remember it.
Any help appreciated :)

If you don't need to check the content, there is no use for regex.
Try this
$outputString = chunk_split($string, 2, ":");
// generates: FE:DC:BA:98:76:54:32:10:
You might need to remove the last ":".
Or this :
$outputString = implode(":", str_split($string, 2));
// generates: FE:DC:BA:98:76:54:32:10
Resources :
www.w3schools.com - chunk_split()
www.w3schools.com - str_split()
www.w3schools.com - implode()
On the same topic :
Split string into equal parts using PHP

Sounds like you want a regex like this:
/([0-9a-f]{2})/${1}:/gi
Which, in PHP is...
<?php
$string = 'FE:DC:BA:98:76:54:32:10';
$pattern = '/([0-9A-F]{2})/gi';
$replacement = '${1}:';
echo preg_replace($pattern, $replacement, $string);
?>
Please note the above code is currently untested.

You can make sure there are two or more hex characters doing this:
if (preg_match('!^\d*[A-F]\d*[A-F][\dA-F]*$!i', $string)) {
...
}
No need for a recursive regex. By the way, recursive regex is a contradiction in terms. As a regular language (which a regex parses) can't be recursive, by definition.
If you want to also group the characters in pairs with colons in between, ignoring the two hex characters for a second, use:
if (preg_match('!^[\dA-F]{2}(?::[A-F][\dA-F]{2})*$!i', $string)) {
...
}
Now if you want to add the condition requiring tow hex characters, use a positive lookahead:
if (preg_match('!^(?=[\d:]*[A-F][\d:]*[A-F])[\dA-F]{2}(?::[A-F][\dA-F]{2})*$!i', $string)) {
...
}
To explain how this works, the first thing it does it that it checks (with a positive lookahead ie (?=...) that you have zero or more digits or colons followed by a hex letter followed by zero or more digits or colons and then a letter. This will ensure there will be two hex letters in the expression.
After the positive lookahead is the original expression that makes sure the string is pairs of hex digits.

Recursive regular expressions are usually not possible. You may use a regular expression recursively on the results of a previous regular expression, but most regular expression grammars will not allow recursivity. This is the main reason why regular expressions are almost always inadequate for parsing stuff like HTML. Anyways, what you need doesn't need any kind of recursivity.
What you want, simply, is to match a group multiple times. This is quite simple:
preg_match_all("/([a-z0-9]{2})+/i", $string, $matches);
This will fill $matches will all occurrences of two hexadecimal digits (in a case-insensitive way). To replace them, use preg_replace:
echo preg_replace("/([a-z0-9]{2})/i", $string, '\1:');
There will probably be one ':' too much at the end, you can strip it with substr:
echo substr(preg_replace("/([a-z0-9]{2})/i", $string, '\1:'), 0, -1);

While it is not horrible practice to use rtrim(chunk_split($string, 2, ':'), ':'), I prefer to use direct techniques that avoid "mopping up" after making modifications.
Code: (Demo)
$string = 'FEDCBA9876543210';
echo preg_replace('~[\dA-F]{2}(?!$)\K~', ':', $string);
Output:
FE:DC:BA:98:76:54:32:10
Don't be intimidated by the regex. The pattern says:
[\dA-F]{2} # match exactly two numeric or A through F characters
(?!$) # that is not located at the end of the string
\K # restart the fullstring match
When I say "restart the fullstring match" I mean "forget the previously matched characters and start matching from this point forward". Because there are no additional characters matched after \K, the pattern effectively delivers the zero-width position where the colon should be inserted. In this way, no original characters are lost in the replacement.

Related

Looking to use preg_replace to remove characters from my strings

I have the right function, just not finding the right regex pattern to remove (ID:999999) from the string. This ID value varies but is all numeric. I like to remove everything including the brackets.
$string = "This is the value I would like removed. (ID:17937)";
$string = preg_replace('#(ID:['0-9']?)#si', "", $string);
Regex is not more forte! And need help with this one.
Try this:
$string = preg_replace('# \(ID:[0-9]+\)#si', "", $string);
You need to escape the parenthesis using backslashes \.
You shouldn't use quotes around the number range.
You should use + (one or more) instead of ? (zero or one).
You can add a space at the start, to avoid having a space at the end of the resulting string.
In PHP regex is in / and not #, after that, parentheses are for capture group so you must escape them to match them.
Also to use preg_replace replacement you will need to use capture group so in your case /(\(ID:[0-9]+\))/si will be the a nice regular expression.
Here are two options:
Code: (Demo)
$string = "This is the value I would like removed. (ID:17937)";
var_export(preg_replace('/ \(ID:\d+\)/',"",$string));
echo "\n\n";
var_export(strstr($string,' (ID:',true));
Output: (I used var_export() to show that the technique is "clean" and gives no trailing whitespaces)
'This is the value I would like removed.'
'This is the value I would like removed.'
Some points:
Regex is a better / more flexible solution if your ID substring can exist anywhere in the string.
Your regex pattern doesn't need a character class if you use the shorthand range character \d.
Regex generally speaking should only be used when standard string function will not suffice or when it is proven to be more efficient for a specific case.
If your ID substring always occurs at the end of the string, strstr() is an elegant/perfect function.
Both of my methods write a (space) before ID to make the output clean.
You don't need either s or i modifiers on your pattern, because s only matters if you use a . (dot) and your ID is probably always uppercase so you don't need a case-insensitive search.

How to remove all non-uppercase characters in a string?

Yeah I'm basically just trying to explode a phrase like Social Inc. or David Jason to SI and DJ. I've tried using explode but couldn't figure out how to explode everything BUT the capital letters, do I need to use preg_match()?
You can use this regex (?![A-Z]). with preg_replace() to replace every char except the one in uppercase.
preg_replace("/(?![A-Z])./", "", $yourvariable)
The regex will look for anythings NOT an uppercase letter ( ?! negative lookahead ).
I've created a regex101 if you wish to test it with other cases.
EDIT As an update of this thread, You could also use the ^ char inside the square braquets to reverse the effect.
preg_replace("/([^A-Z])./", "", $yourvariable)
This will match all char that are not uppercase and replace them with nothing.
Quick and easy:
$ucaseletters = preg_replace('/[^A-Z]/', '', $input);
This will replace everything that is not an uppercase Letter within the Range A-Z.
Explanation:
^ within [] (Character-Set) is the negation-Operator (=anything that is NOT...)
Nicholas and Bernhard have provided successful regex patterns but they are not as efficient as they could be.
Use /[^A-Z]+/ and an empty replacement string with preg_replace().
preg_replace('~[^A-Z]+~', '', $string)
The negated character class has a one or more quantifier, so longer substrings are matched and fewer replacements are required.
The multibyte/unicode equivalent would be: (Demo)
preg_replace('~[^\p{Lu}]+~u', '', 'Az+0ǻÉé') // outputs: AÉ
This is the best pattern to use with preg_split as well, but preg_split generates an array, so there is the extra step of calling implode.
I've got a more complicated solution but it works too!
$s = str_split("Social Inc.");
foreach ($s as $idx => $char) {
if(preg_match("/[A-Z]/", $char))
{
echo $char;
}
}
It will echo the upper-case letters.

Using preg_replace()?

I'm trying to understand the function preg_replace(), but it looks rather cryptic to me. From looking at the documentation on the function here. I understand that it consists of three things - the subject, the pattern for matching, and what to replace it with.
I'm currently trying to 'sanitize' numerical input by replacing anything that isn't a number. So far I know I would need to allow the numbers 0-9, but remove anything that isn't a number and replace it with: "".
Instead of escaping every character I need to, is there some way to simply not allow any other character than the numbers 0-9? Also if anyone could shed light on how the 'pattern for matching' part works...
If you want to sanitize a string to replace anything that isn't a number, you would write a regular expression that matches characters not in list.
The pattern [0-9] would match all numerals. Placing a caret (^) at the beginning of the set matches everything that isn't in the set: [^0-9]
$result = preg_replace('/[^0-9]/', '', $input);
Note that this will also filter out periods/decimal points and other mathematical marks. You could include periods/decimal points (allowing floats) by making the period allowed:
$result = preg_replace('/[^0-9.]/', '', $input);
Note that the period (.) is the wildcard character in regular expressions. It doesn't need to be escaped in a bracket expression, but it would elsewhere in the pattern.
preg_replace() is easier than you think:
$p = "/[A-Z a-z]/";
$r = "";
$s = "12345t7890";
echo preg_replace($p, $r, $s);
Will output "123457890" - note no '6'
is_numeric(); isn't resolve your problem ?

how to extract a certain digit from a String using regular expression in php?

I have a String (filename): s_113_2.3gp
How can I extract the number that appears after the second underscore? In this case it's '2' but in some cases that can be a few digits number.
Also the number of digits that appears after the first underscore can vary so the length of this String is not constant.
You can use a capturing group:
preg_match('/_(\d+)\.\w+$/', $str, $matches);
$number = $matches[1];
\d+ represents 1 or more digits. The parentheses around that capture it, so you can later retrieve it with $matches[1]. The . needs to be escaped, because otherwise it would match any character but line breaks. \w+ matches 1 or more word characters (digits, letters, underscores). And finally the $ represents the end of the string and "anchors" the regular expression (otherwise you would get problems with strings containing multiple .).
This also allows for arbitrary file extensions.
As Ωmega pointed out below there is another possibility, that does not use a capturing group. With the concept of lookarounds, you can avoid matching _ at the start and the \.\w+$ at the end:
preg_match('/(?<=_)\d+(?=\.\w+$)/', $str, $matches);
$number = $matches[0];
However, I would recommend profiling, before applying this rather small optimization. But it is something to keep in mind (or rather, to read up on!).
Using regex lookaround it is very short code:
$n = preg_match('/(?<=_)\d+(?=\.)/', $str, $m) ? $m[0] : "";
...which reads: find one or more digits \d+ that are between underscore (?<=_) and period (?=\.)

Remove number then a space from the start of a string

How would I go about removing numbers and a space from the start of a string?
For example, from '13 Adam Court, Cannock' remove '13 '
Because everyone else is going the \d+\s route I'll give you the brain-dead answer
$str = preg_replace("#([0-9]+ )#","",$str);
Word to the wise, don't use / as your delimiter in regex, you will experience the dreaded leaning-toothpick-problem when trying to do file paths or something like http://
:)
Use the same regex I gave in my JavaScript answer, but apply it using preg_replace():
preg_replace('/^\d+\s+/', '', $str);
Try this one :
^\d+ (.*)$
Like this :
preg_replace ("^\d+ (.*)$", "$1" , $string);
Resources :
preg_replace
regular-expressions.info
On the same topic :
Regular expression to remove number, then a space?
regular expression for matching number and spaces.
I'd use
/^\d+\s+/
It looks for a number of any size in the beginning of a string ^\d+
Then looks for a patch of whitespace after it \s+
When you use a backslash before certain letters it represents something...
\d represents a digit 0,1,2,3,4,5,6,7,8,9.
\s represents a space .
Add a plus sign (+) to the end and you can have...
\d+ a series of digits (number)
\s+ multiple spaces (typos etc.)
The same regex I gave you on your other question still applies. You just have to use preg_replace() instead.
Search for /^[\s\d]+/ and replace with the empty string. Eg:
$str = preg_replace(/^[\s\d]+/, '', $str);
This will remove digits and spaces in any order from the beginning of the string. For something that removes only a number followed by spaces, see BoltClock's answer.
If the input strings all have the same ecpected format and you will receive the same result from left trimming all numbers and spaces (no matter the order of their occurrence at the front of the string), then you don't actually need to fire up the regex engine.
I love regex, but know not to use it unless it provides a valuable advantage over a non-regex technique. Regex is often slower than non-regex techniques.
Use ltrim() with a character mask that includes spaces and digits.
Code: (Demo)
var_export(
ltrim('420 911 90210 666 keep this part', ' 0..9')
);
Output:
'keep this part'
It wouldn't matter if the string started with a space either. ltrim() will greedily remove all instances of spaces or numbers from the start of the string intil it can't anymore.

Categories