Separate hex blocks in PHP - php

Anyone knows an way to "separate" the blocks of this hex code?
[49cd0d18] -> 1238175000
[00010000] -> 1
[0069] -> 105
[543ace68] -> timestamp
000000000000000000000000000000000000000
Complete:
49cd0d1800010000543ace68000000000000000000000000000000000000000
Oh, of course... This values, can be different... I just know, that will not be the same. So, I need to know how to "count" blocks, and then, "cut".
I'll be very grateful with your help!

Regex are easy solution for problem like theses:
You can see the regex on that link: https://regex101.com/r/qP1bC7/1
Note: Don't forget to put delimiter (the slashes in my example) around your regex when you use it in your code :
/^(\w{8})(\w{8})(\w{4})(\w{8})(\w{39})$/
The caret and the dollar sign delimit respectively the beginning and the ending of the string.
The parenthesis are capturing groups.
\w match any letter (A-Z in lower and upper case), the digits (0-9) and underscore (_).
{8} means that it must match exactly 8 characters
And you can see an example of the code here:
http://sandbox.onlinephpfunctions.com/code/c3a0ec3a45c53eb2c1b8e21cb978253ea4a28e52
The third parameter is an array to store the match of the regex (it is passed by reference, so you have to create it before using it). The first (0) index will be the whole match and the successive index (1-6) will the result of the capturing groups (there are 5 of them).
You could also extract substring with PHP native functions.
$string = "49cd0d18000100000069543ace68000000000000000000000000000000000000000";
$matches = array();
$matches[] = substr($string, 0, 8);
$matches[] = substr($string, 8, 8);
$matches[] = substr($string, 16, 4);
$matches[] = substr($string, 20, 8);
$matches[] = substr($string, 28, 39);
var_dump($matches);
You can test the code here: http://sandbox.onlinephpfunctions.com/code/18935d55feb86dffdc17f8854572e0935b4aab0e
Additional note: PHP native functions are faster than regex. The regular expressions have to be compile every time you use them (but PHP keep a pool of the last 1000 regexes used). You can benchmark both solutions if performance is an important matter. Otherwise, I'd say that both solutions are pretty equivalent.
Good success and don't forget to like,
Jonathan Parent-Lévesque from Montreal

Related

Find and split a string by the first character that is not 0

I wanted to know how I could split a string based on the first character that is not 0, e.g.
$ID = ABC-000000160810;
I want to split the id so it looks like this:
$split_ID = 160810;
I tried to just get the last 6 digits, however the problem was that the 6 digits might not always be consistent, so just need to split based on the first non-zero. What is the easiest way to achieve this?
Thanks.
Here's a way using a regular expression:
$id = 'ABC-000000160810';
preg_match('/-0*([1-9][0-9]*)/', $id, $matches);
$split_id = $matches[1];
You can use ltrim if you only want to remove leading zeroes.
$ID = ABC-000000160810;
$split_ID = ltrim($str, '0');
Use ltrim to remove leading characters.
$id = 'ABC-00001234';
$numeric = ltrim(mb_substr($id, mb_strpos($id, '-') + 1), '0');
echo $numeric; // 1234
The above requires the mbstring extension to be enabled. If you encounter an error, either enable the extension or use the non-multibyte functions substr and strpos. Probably you should get in the habit of using the mb_ string functions.
This should also work:
const CHAR_MASK = 'a..zA..Z-0';
$id = 'ABC-00001234';
$numeric = ltrim($id, CHAR_MASK);
echo $numeric; // 1234
For your example "ABC-00000016081" you might use a regex that would match the first part up until you encounter not a zero and then use \K to not include the previously consumed characters in the final match.
[^-]+-0+\K[1-9][0-9]+
[^-]+ Match not a - one or more times using a negated character class
- Match literally
0+ Match one or more times a zero (If you want your match without leading zeroes you could use 0*)
\K Resets the starting point of the reported match
[1-9][0-9]* Match your value starting with a digit 1 -9
Test
You can substr off the ABC part and multiply with 1 to make it a number.
$ID = "ABC-000000160810";
Echo substr($ID, 4)*1;

Match all substrings that end with 4 digits using regular expressions

I am trying to split a string in php, which looks like this:
ABCDE1234ABCD1234ABCDEF1234
Into an array of string which, in this case, would look like this:
ABCDE1234
ABCD1234
ABCDEF1234
So the pattern is "an undefined number of letters, and then 4 digits, then an undefined number of letters and 4 digits etc."
I'm trying to split the string using preg_split like this:
$pattern = "#[0-9]{4}$#";
preg_split($pattern, $stringToSplit);
And it returns an array containing the full string (not split) in the first element.
I'm guessing the problem here is my regex as I don't fully understand how to use them, and I am not sure if I'm using it correctly.
So what would be the correct regex to use?
You don't want preg_split, you want preg_match_all:
$str = 'ABCDE1234ABCD1234ABCDEF1234';
preg_match_all('/[a-z]+[0-9]{4}/i', $str, $matches);
var_dump($matches);
Output:
array(1) {
[0]=>
array(3) {
[0]=>
string(9) "ABCDE1234"
[1]=>
string(8) "ABCD1234"
[2]=>
string(10) "ABCDEF1234"
}
}
PHP uses PCRE-style regexes which let you do lookbehinds. You can use this to see if there are 4 digits "behind" you. Combine that with a lookahead to see if there's a letter ahead of you, and you get this:
(?<=\d{4})(?=[a-z])
Notice the dotted lines on the Debuggex Demo page. Those are the points you want to split on.
In PHP this would be:
var_dump(preg_split('/(?<=\d{4})(?=[a-z])/i', 'ABCDE1234ABCD1234ABCDEF1234'));
Use the principle of contrast:
\D+\d{4}
# requires at least one non digit
# followed by exactly four digits
See a demo on regex101.com.
In PHP this would be:
<?php
$string = 'ABCDE1234ABCD1234ABCDEF1234';
$regex = '~\D+\d{4}~';
preg_match_all($regex, $string, $matches);
?>
See a demo on ideone.com.
I'm no good at regex so here is the road less traveled:
<?php
$s = 'ABCDE1234ABCD1234ABCDEF1234';
$nums = range(0,9);
$num_hit = 0;
$i = 0;
$arr = array();
foreach(str_split($s) as $v)
{
if(isset($nums[$v]))
{
++$num_hit;
}
if(!isset($arr[$i]))
{
$arr[$i] = '';
}
$arr[$i].= $v;
if($num_hit === 4)
{
++$i;
$num_hit = 0;
}
}
print_r($arr);
First, why is your attempted pattern not delivering the desired output? Because the $ anchor tells the function to explode the string by using the final four numbers as the "delimiter" (characters that should be consuming while dividing the string into separate parts).
Your result:
array (
0 => 'ABCDE1234ABCD1234ABCDEF', // an element of characters before the last four digits
1 => '', // an empty element containing the non-existent characters after the four digits
)
In plain English, to fix your pattern, you must:
Not consume any characters while exploding and
Ensure that no empty elements are generated.
My snippet is at the bottom of this post.
Second, there seems to be some debate about what regex function to use (or even if regex is a preferrable tool).
My stance is that using a non-regex method will require a long-winded block of lines which will be equally if not more difficult to read than a regex pattern. Using regex affords you to generate your result in one-line and not in an unsightly fashion. So let's dispose of iterated sets of conditions for this task.
Now the critical concern is whether this task is simply "extracting" data from a consistent and valid string (case "A"), or if it is "validating AND extracting" data from a string (case"B") because the input cannot be 100 trusted to be consistent/correct.
In case A, you needn't concern yourself with producing valid elements in the output, so preg_split() or preg_match_all() are good candidates.
In case B, preg_split() would not be advisable, because it only hunts for delimiting substrings -- it remains ignorant of all other characters in the string.
Assuming this task is case A, then a decision is still pending about the better function to call. Well, both functions generate an array, but preg_match_all() creates a multidimensional array while you desire a flat array (like preg_split() provides). This means you would need to add a new variable to the global scope ($matches) and append [0] to the array to access the desired fullstring matches. To someone who doesn't understand regex patterns, this may border on the bad practice of using "magic numbers".
For me, I strive to code for Directness and Accuracy, then Efficiency, then Brevity and Clarity. Since you're not likely to notice any performance drops while performing such a small operation, efficiency isn't terribly important. I just want to make some comparisons to highlight the cost of a pattern that leverages only look-arounds or a pattern that misses an oportunity to greedily match predictable characters.
/(?<=\d{4})(?=[a-z])/i 79 steps (Demo)
~\d{4}\K~ 25 steps (Demo)
/[a-z]+[0-9]{4}\K/i 13 steps (Demo)
~\D+[0-9]{4}\K~ 13 steps (Demo)
~\D+\d{4}\K~ 13 steps (Demo)
FYI, \K is a metacharacter that means "restart the fullstring match", in other words "forget/release all previously matched characters up to this point". This effectively ensures that no characters are lost while spitting.
Suggested technique: (Demo)
var_export(
preg_split(
'~\D+\d{4}\K~', // pattern
'ABCDE1234ABCD1234ABCDEF1234', // input
0, // make unlimited explosions
PREG_SPLIT_NO_EMPTY // exclude empty elements
)
);
Output:
array (
0 => 'ABCDE1234',
1 => 'ABCD1234',
2 => 'ABCDEF1234',
)

preg_replace or regex string translation

I found some partial help but cannot seem to fully accomplish what I need. I need to be able to do the following:
I need an regular expression to replace any 1 to 3 character words between two words that are longer than 3 characters with a match any expression:
For example:
walk to the beach ==> walk(.*)beach
If the 1 to 3 character word is not preceded by a word that's longer than 3 characters then I want to translate that 1 to 3 letter word to '<word> ?'
For example:
on the beach ==> on ?the ?beach
The simpler the rule the better (of course, if there's an alternative more complicated version that's more performant then I'll take that as well as I eventually anticipate heavy usage eventually).
This will be used in a PHP context most likely with preg_replace. Thus, if you can put it in that context then even better!
By the way so far I have got the following:
$string = preg_replace('/\s+/', '(.*)', $string);
$string = preg_replace('/\b(\w{1,3})(\.*)\b/', '${1} ?', $string);
but that results in:
walk to the beach ==> 'walk(.*)to ?beach'
which is not what I want. 'on the beach' seems to translate correctly.
I think you will need two replacements for that. Let's start with the first requirement:
$str = preg_replace('/(\w{4,})(?: \w{1,3})* (?=\w{4,})/', '$1(.*)', $str);
Of course, you need to replace those \w (which match letters, digits and underscores) with a character class of what you actually want to treat as a word character.
The second one is a bit tougher, because matches cannot overlap and lookbehinds cannot be of variable length. So we have to run this multiple times in a loop:
do
{
$str = preg_replace('/^\w{0,3}(?: \w{0,3})* (?!\?)/', '$0?', $str, -1, $count);
} while($count);
Here we match everything from the beginning of the string, as long as it's only up-to-3-letter words separated by spaces, plus one trailing space (only if it is not already followed by a ?). Then we put all of that back in place, and append a ?.
Update:
After all the talk in the comments, here is an updated solution.
After running the first line, we can assume that the only less-than-3-letter words left will be at the beginning or at the end of the string. All others will have been collapsed to (.*). Since you want to append all spaces between those with ?, you do not even need a loop (in fact these are the only spaces left):
$str = preg_replace('/ /', ' ?', $str);
(Do this right after my first line of code.)
This would give the following two results (in combination with the first line):
let us walk on the beach now go => let ?us ?walk(.*)beach ?now ?go
let us walk on the beach there now go => let ?us ?walk(.*)beach(.*)there ?now ?go

Regular expression to match an exact number of occurrence for a certain character

I'm trying to check if a string has a certain number of occurrence of a character.
Example:
$string = '123~456~789~000';
I want to verify if this string has exactly 3 instances of the character ~.
Is that possible using regular expressions?
Yes
/^[^~]*~[^~]*~[^~]*~[^~]*$/
Explanation:
^ ... $ means the whole string in many regex dialects
[^~]* a string of zero or more non-tilde characters
~ a tilde character
The string can have as many non-tilde characters as necessary, appearing anywhere in the string, but must have exactly three tildes, no more and no less.
As single character is technically a substring, and the task is to count the number of its occurences, I suppose the most efficient approach lies in using a special PHP function - substr_count:
$string = '123~456~789~000';
if (substr_count($string, '~') === 3) {
// string is valid
}
Obviously, this approach won't work if you need to count the number of pattern matches (for example, while you can count the number of '0' in your string with substr_count, you better use preg_match_all to count digits).
Yet for this specific question it should be faster overall, as substr_count is optimized for one specific goal - count substrings - when preg_match_all is more on the universal side. )
I believe this should work for a variable number of characters:
^(?:[^~]*~[^~]*){3}$
The advantage here is that you just replace 3 with however many you want to check.
To make it more efficient, it can be written as
^[^~]*(?:~[^~]*){3}$
This is what you are looking for:
EDIT based on comment below:
<?php
$string = '123~456~789~000';
$total = preg_match_all('/~/', $string);
echo $total; // Shows 3

Extract characters occurring before one of several forbidden characters

I want to discard all remaining characters in a string as soon as one of several unwanted characters is encountered.
As soon as a blacklisted character is encountered, the string before that point should be returned.
For instance, if I have an array:
$chars = array("a", "b", "c");
How would I go through the following string...
log dog hat bat
...and end up with:
log dog h
The strcspn function is what you are looking for.
<?php
$mask = "abc";
$string = "log dog hat bat";
$result = substr($string,0,strcspn($string,$mask));
var_dump($result);
?>
There is certainly nothing wrong with Vinko's answer and I might be more inclined to recommend that technique in a professional script because regex is likely to perform slower, but purely for a point of difference for researchers, regex could be used.
For the record, to convert the array of ['a', 'b', 'c'] to abc, just call implode($array) -- an empty glue string is not necessary.
Code: (Demo) -- split in half on first occurrence of a|b|c, then access first element
echo preg_split('~[abc]~', $string, 2)[0];
Code: (Demo) -- match leading substring of non-a|b|c characters, then access first element
echo preg_match('~^[^abc]+~', $string, $match) ? $match[0] : '';
I should state that if any of your blacklisted characters have special meaning to the regex engine while inside of a character class, then they will need to be escaped.

Categories