Javascript lastIndex regex property to PHP regex - php

I'm trying to translate a javascript script in PHP. So far is going good, but I stumbled across some code on which I'm clueless:
while (match = someRegex.exec(text)) {
m = match[0];
if (m === "-") {
var lastIndex = someRegex.lastIndex,
nextToken = someRegex.exec(parts.content);
if (nextToken) {
...
}
someRegex.lastIndex = lastIndex;
}
}
The someRegex variable looks like this:
/[^\\-]+|-|\\(?:[0-3][0-7]{0,2}|[4-7][0-7]?|x[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4}|c[A-Za-z]|[\S\s]?)/g
exec should be equivalent to preg_match_all in PHP:
preg_match_all($someRegex, $text, $match);
$match = $match[0]; // I get the same results so it works
foreach($match as $m){
if($m === '-'){
// here I don't know how to handle lastIndex and the 2nd exec :(
}
}

I wouldn't use that lastIndex magic at all - essentially you're executing the regex twice on each index. If you really want to do that, you'd need to set the PREG_OFFSET_CAPTURE flag in preg_match_all so that you can get the position, add capture length and use it as the next preg_match offset.
Better use something like this:
preg_match_all($someRegex, $text, $match);
$match = $match[0]; // get all matches (no groups)
$len = count($match);
foreach($match as $i=>$m){
if ($m === '-') {
if ($i+1 < $len) {
$nextToken = $match[$i+1];
…
}
…
}
…
}

Actually, exec is not equivalent to preg_match_all, as exec stops at the first match (the g modifier only sets the lastIndex value to loop through the string). It's equivalent to preg_match.
So you find the first match, get the value thanks to the $array argument, the offset of this value (contained in $flags) and continue your search by setting the offset (last argument).
I guess the second execution won't be a problem as you'll do exactly the same thing as in the javascript version.
Note that I haven't tried the loop, but it should be pretty straightforward once you've figured out how preg_match works exactly with the optional arguments (I'll run some test).
$lastIndex = 0;
while(preg_match($someRegex, $text, $match, PREG_OFFSET_CAPTURE, $lastIndex) {
$m = $match[0][0];
$lastIndex = $match[0][1] + strlen($m); //thanks Bergi for the correction
if($m === '-') {
// assuming the $otherText relate to the parts.content thing
if(preg_match($someRegex, $otherText, $secondMatch, 0, $lastIndex)) {
$nextToken = $secondMatch[0];
...
}
}
}
I guess that should be it (excuse any small error, haven't done php for a while).

Related

How to find ALL substrings in string using starting and ending words arrays PHP

I've spent my last 4 hours figuring out how to ... I got to ask for your help now.
I'm trying to extract from a text multiple substring match my starting_words_array and ending_words_array.
$str = "Do you see that ? Indeed, I can see that, as well as this." ;
$starting_words_array = array('do','I');
$ending_words_array = array('?',',');
expected output : array ([0] => 'Do you see that ?' [1] => 'I can see that,')
I manage to write a first function that can find the first substring matching one of both arrays items. But i'm not able to find how to loop it in order to get all the substring matching my requirement.
function SearchString($str, $starting_words_array, $ending_words_array ) {
forEach($starting_words_array as $test) {
$pos = strpos($str, $test);
if ($pos===false) continue;
$found = [];
forEach($ending_words_array as $test2) {
$posStart = $pos+strlen($test);
$pos2 = strpos($str, $test2, $posStart);
$found[] = ($pos2!==false) ? $pos2 : INF;
}
$min = min($found);
if ($min !== INF)
return substr($str,$pos,$min-$pos) .$str[$min];
}
return '';
}
Do you guys have any idea about how to achieve such thing ?
I use preg_match for my solution. However, the start and end strings must be escaped with preg_quote. Without that, the solution will be wrong.
function searchString($str, $starting_words_array, $ending_words_array ) {
$resArr = [];
forEach($starting_words_array as $i => $start) {
$end = $ending_words_array[$i] ?? "";
$regEx = '~'.preg_quote($start,"~").".*".preg_quote($end,"~").'~iu';
if(preg_match_all($regEx,$str,$match)){
$resArr[] = $match[0];
}
}
return $resArr;
}
The result is what the questioner expects.
If the expressions can occur more than once, preg_match_all must also be used. The regex must be modify.
function searchString($str, $starting_words_array, $ending_words_array ) {
$resArr = [];
forEach($starting_words_array as $i => $start) {
$end = $ending_words_array[$i] ?? "";
$regEx = '~'.preg_quote($start,"~").".*?".preg_quote($end,"~").'~iu';
if(preg_match_all($regEx,$str,$match)){
$resArr = array_merge($resArr,$match[0]);
}
}
return $resArr;
}
The resut for the second variant:
array (
0 => "Do you see that ?",
1 => "Indeed,",
2 => "I can see that,",
)
I would definitely use regex and preg_match_all(). I won't give you a full working code example here but I will outline the necessary steps.
First, build a regex from your start-end-pairs like that:
$parts = array_map(
function($start, $end) {
return $start . '.+' . $end;
},
$starting_words_array,
$ending_words_array
);
$regex = '/' . join('|', $parts) . '/i';
The /i part means case insensitive search. Some characters like the ? have a special purpose in regex, so you need to extend above function in order to escape it properly.
You can test your final regex here
Then use preg_match_all() to extract your substrings:
preg_match_all($regex, $str, $matches); // $matches is passed by reference, no need to declare it first
print_r($matches);
The exact structure of your $matches array will be slightly different from what you asked for but you will be able to extract your desired data from it
Benni answer is best way to go - but let just point out the problem in your code if you want to fix those:
strpos is not case sensitive and find also part of words so you need to changes your $starting_words_array = array('do','I'); to $starting_words_array = array('Do','I ');
When finding a substring you use return which exit the function so you want find any other substring. In order to fix that you can define $res = []; at the beginning of the function and replace return substr($str,$pos,... with $res[] = substr($str,$pos,... and at the end return the $res var.
You can see example in 3v4l - in that example you get the output you wanted

PHP Regex for similarity check

Can you think of any regular expression that resolves these similarities in PHP? The idea is to get a match without considering the last letters.
<?php
$word1 = 'happyness';
$word2 = 'happys';
if (substr($word1, 0, -4) == substr($word2, 0, -1))
{
echo 'same word1';
}
$word1 = 'kisses';
$word2 = 'kiss';
if (substr($word1, 0, -2) == $word2)
{
echo 'same word2';
}
$word1 = 'consonant';
$word2 = 'consonan';
if (substr($word1, 0, -1) == $word2)
{
echo 'same word3';
}
By putting the words together like happys happyness and capturing as many word characters from word 1 as word 2 matches. See this demo at regex101. Use it with the i flag for casless matching.
^(\w+)\w* \1
To use this in PHP with preg_match see this PHP demo at tio.run
preg_match('/^(\w+)\w* \1/i', preg_quote($word1,'/')." ".preg_quote($word2,'/'), $out);
where $out[1] holds the captures or $out would be an empty array if there wasn't a match.
You could use a small helper function, the first function just matches up to the length of the second string, so doesn't care how many characters it truncates. The main code works similar to your code except it uses the length of the second value as the length of the substring to take...
function match( string $a, string $b ) {
return substr($a, 0, strlen($b)) === $b;
}
This function is slightly more complicated as it takes into account a maximum gap length...
function match( string $a, string $b, int $length = 3 ) {
$len = max(strlen($a)-$length, strlen($b));
return substr($a, 0, $len) === $b;
}
So call it something along the lines of
$word1 = 'happyness';
$word2 = 'happys';
if (match($word1,$word2))
{
echo 'same word1';
}
You can use preg_match to match these data with regex as /^word2/ against word1. So regex would check if word1 starts with word2 or not, because of ^ symbol at the start.
It's always better to preg_quote() before matching to escape regex meta characters for accurate results.
<?php
$tests = [
[
'happyness',
'happys'
],
[
'kisses',
'kiss'
],
[
'consonant',
'consonan'
]
];
$filtered = array_filter($tests,function($values){
$values[1] = preg_quote($values[1]);
return preg_match("/^$values[1]/",$values[0]) === 1;
});
print_r($filtered);
Demo: https://3v4l.org/SLf15
You could also do a small function to find the similarity between the given 2 words. It could look like:
function similarity($word1, $word2)
{
$splittedWord1 = str_split($word1);
$splittedWord2 = str_split($word2);
$similarChars = array_intersect_assoc($splittedWord1, $splittedWord2);
return count($similarChars) / max(count($splittedWord1), count($splittedWord2));
}
var_dump(similarity('happyness', 'happys'));
var_dump(similarity('happyness', 'testhappys'));
var_dump(similarity('kisses', 'kiss'));
var_dump(similarity('consonant', 'consonan'));
The result would look like:
float(0.55555555555556)
int(0)
float(0.66666666666667)
float(0.88888888888889)
Based on the resulted percentage you could decide if the given words should be considered the same or not.
I'm not sure regex is the answer here.
You could try similar_text(), which returns the number of similar characters (and optionally sets a percentage value to a variable). Maybe if you consider the last two letters as non-important, you can see if the strlen() - $skippedCharacters is the same as what is matched. For example:
$skippedCharacters = 2;
$word1 = 'kisses';
$word2 = 'kiss';
$match = similar_text($word1, $word2);
if ($match + $skippedCharacters >= strlen($word1))
{
echo 'same word2';
}
You could use the PHP levenshtein function.
The levenshtein() function returns the Levenshtein distance between two strings. The Levenshtein distance is the number of characters you have to replace, insert or delete to transform string1 into string2.
$lev = levenshtein($word1, $word2);
The lower the number the bigger the similarity.

str replace - replace the x-value

Assuming I have a string
$str="0000,1023,1024,1025,1024,1023,1027,1025,1024,1025,0000";
there are three 1024, I want to replace the third with JJJJ, like this :
output :
0000,1023,1024,1025,1024,1023,1027,1025,JJJJ,1025,0000
how to make str_replace can do it
thanks for the help
As your question asks, you want to use str_replace to do this. It's probably not the best option, but here's what you do using that function. Assuming you have no other instances of "JJJJ" throughout the string, you could do this:
$str = "0000,1023,1024,1025,1024,1023,1027,1025,1024,1025,0000";
$str = str_replace('1024','JJJJ',$str,3)
$str = str_replace('JJJJ','1024',$str,2);
Here is what I would do and it should work regardless of values in $str:
function replace_str($str,$search,$replace,$num) {
$pieces = explode(',',$str);
$counter = 0;
foreach($pieces as $key=>$val) {
if($val == $search) {
$counter++;
if($counter == $num) {
$pieces[$key] = $replace;
}
}
}
return implode(',',$pieces);
}
$str="0000,1023,1024,1025,1024,1023,1027,1025,1024,1025,0000";
echo replace_str($str, '1024', 'JJJJ', 3);
I think this is what you are asking in your comment:
function replace_element($str,$search,$replace,$num) {
$num = $num - 1;
$pieces = explode(',',$str);
if($pieces[$num] == $search) {
$pieces[$num] = $replace;
}
return implode(',',$pieces);
}
$str="0000,1023,1024,1025,1024,1023,1027,1025,1024,1025,0000";
echo replace_element($str,'1024','JJJJ',9);
strpos has an offset, detailed here: http://php.net/manual/en/function.strrpos.php
So you want to do the following:
1) strpos with 1024, keep the offset
2) strpos with 1024 starting at offset+1, keep newoffset
3) strpos with 1024 starting at newoffset+1, keep thirdoffset
4) finally, we can use substr to do the replacement - get the string leading up to the third instance of 1024, concatenate it to what you want to replace it with, then get the substr of the rest of the string afterwards and concatenate it to that. http://www.php.net/manual/en/function.substr.php
You can either use strpos() three times to get the position of the third 1024 in your string and then replace it, or you could write a regex to use with preg_replace() that matches the third 1024.
if you want to find the last occurence of your string you can used strrpos
Do it like this:
$newstring = substr_replace($str,'JJJJ', strrpos($str, '1024'), strlen('1024') );
See working demo
Here's a solution with less calls to one and the same function and without having to explode, iterate over the array and implode again.
// replace the first three occurrences
$replaced = str_replace('1024', 'JJJJ', $str, 3);
// now replace the firs two, which you wanted to keep
$final = str_replace('JJJJ', '1024', $replaced, 2);

PHP Extract numbers from a string

I want to extract numbers from a string in PHP like following :
if the string = 'make1to6' i would like to extract the numeric character before and after the 'to' substring in the entire string. i.e. 1 and 6 are to be extracted
i will be using these returned values for some calculations.' i would like to extract the numeric character before and after the 'to' substring in the entire string. i.e. 1 and 6 are to be extracted
The length of the string is not fixed and can be a max of 10 characters in length.The number can be of max two digits on either side of 'to' in the string.
Some example string values :
sure1to3
ic3to9ltd
anna1to6
joy1to4val
make6to12
ext12to36
thinking of something like :
function beforeTo(string) {
return numeric_value_before_'to'_in_the_string;
}
function afterTo(string) {
return numeric_value_after_'to'_in_the_string;
}
i will be using these returned values for some calculations.
You could use preg_match_all to achive this:
function getNumbersFromString($str) {
$matches = array();
preg_match_all('/([0-9]+)/', $str, $matches);
return $matches;
}
$matches = getNumbersFromString('hej 12jippi77');
Use preg_match with a regex that will extract the numbers for you. Something like this should do the trick for you:
$matches = null;
$returnValue = preg_match('/([\d+])to([\d+])/uis', 'ic3to9ltd', $matches);
After this $matches will look like:
array (
0 => '3to9',
1 => '3',
2 => '9',
);
You should read somewhat on regular expressions, it's not hard to do stuff like this if you know how they work. Will make your life easier. ;-)
You can use a regular expression as such, it should match exactly your specification:
$string = 'make6to12';
preg_match('{^.*?(?P<before>\d{1,2})to(?P<after>\d{1,2})}m', $string, $match);
echo $match['before'].', '.$match['after']; // 6, 12
You can use this:
// $str holds the string in question
if (preg_match('/(\d+)to(\d+)/', $str, $matches)) {
$number1 = $matches[1];
$number2 = $matches[2];
}
You can use regular expressions.
$string = 'make1to6';
if (preg_match('/(\d{1,10})to(\d{1,10})/', $string, $matches)) {
$number1 = (int) $matches[1];
$number2 = (int) $matches[2];
} else {
// Not found...
}
<?php
$data = <<<EOF
sure1to3
ic3to9ltd
anna1to6
joy1to4val
make6to12
ext12to36
EOF;
preg_match_all('#(\d+)to(\d+)#s', $data, $matches);
header('Content-Type: text/plain');
//print_r($matches);
foreach($matches as $match)
{
echo sprintf("%d, %d\n", $match[1], $match[2]);
}
?>
This is what Regular Expressions are for - you can match multiple instances of very specific patterns and have them returned to you in an array. It's pretty awesome, truth be told :)
Take a look here for how to use the built in regular expression methods in php : LINK
And here is a fantastic tool for testing regular expressions: LINK
<?php
list($before, $after) = explode('to', 'sure1to3');
$before_to = extract_ints($before);
$after_to = extract_ints($after);
function extract_ints($string) {
$ints = array();
$len = strlen($string);
for($i=0; $i < $len; $i++) {
$char = $string{$i};
if(is_numeric($char)) {
$ints[] = intval($char);
}
}
return $ints;
}
?>
A regex seems really unnecessary here since all you are doing is checking is_numeric() against a bunch of characters.

Replace multiple occurrences of a string with different values

I have a script that generates content containing certain tokens, and I need to replace each occurrence of a token, with different content resulting from a separate loop.
It's simple to use str_replace to replace all occurrences of the token with the same content, but I need to replace each occurrence with the next result of the loop.
I did see this answer: Search and replace multiple values with multiple/different values in PHP5?
however it is working from pre-defined arrays, which I don't have.
Sample content:
This is an example of %%token%% that might contain multiple instances of a particular
%%token%%, that need to each be replaced with a different piece of %%token%% generated
elsewhere.
I need to replace each occurrence of %%token%% with content generated, for argument's sake, by this simple loop:
for($i=0;$i<3;$i++){
$token = rand(100,10000);
}
So replace each %%token%% with a different random number value $token.
Is this something simple that I'm just not seeing?
Thanks!
I don't think you can do this using any of the search and replace functions, so you'll have to code up the replace yourself.
It looks to me like this problem works well with explode(). So, using the example token generator you provided, the solution looks like this:
$shrapnel = explode('%%token%%', $str);
$newStr = '';
for ($i = 0; $i < count($shrapnel); ++$i) {
// The last piece of the string has no token after it, so we special-case it
if ($i == count($shrapnel) - 1)
$newStr .= $shrapnel[$i];
else
$newStr .= $shrapnel[$i] . rand(100,10000);
}
I know this is an old thread, but I stumbled across it while trying to achieve something similar. If anyone else sees this, I think this is a little nicer:
Create some sample text:
$text="This is an example of %%token%% that might contain multiple instances of a particular
%%token%%, that need to each be replaced with a different piece of %%token%% generated
elsewhere.";
Find the search string with regex:
$new_text = preg_replace_callback("|%%token%%|", "_rand_preg_call", $text);
Define a callback function to change the matches
function _rand_preg_call($matches){
return rand(100,10000);
}
Echo the results:
echo $new_text;
So as a function set:
function _preg_replace_rand($text,$pattern){
return preg_replace_callback("|$pattern|", "_rand_preg_call", $text);
}
function _rand_preg_call($matches){
return rand(100,10000);
}
I had a similar issue where I had a file that I needed to read. It had multiple occurrences of a token, and I needed to replace each occurrence with a different value from an array.
This function will replace each occurrence of the "token"/"needle" found in the "haystack" and will replace it with a value from an indexed array.
function mostr_replace($needle, $haystack, $replacementArray, $needle_position = 0, $offset = 0)
{
$counter = 0;
while (substr_count($haystack, $needle)) {
$needle_position = strpos($haystack, $needle, $offset);
if ($needle_position + strlen($needle) > strlen($haystack)) {
break;
}
$haystack = substr_replace($haystack, $replacementArray[$counter], $needle_position, strlen($needle));
$offset = $needle_position + strlen($needle);
$counter++;
}
return $haystack;
}
By the way, 'mostr_replace' is short for "Multiple Occurrence String Replace".
You can use the following code:
$content = "This is an example of %%token%% that might contain multiple instances of a particular %%token%%, that need to each be replaced with a different piece of %%token%% generated elsewhere.";
while (true)
{
$needle = "%%token%%";
$pos = strpos($content, $needle);
$token = rand(100, 10000);
if ($pos === false)
{
break;
}
else
{
$content = substr($content, 0,
$pos).$token.substr($content, $pos + strlen($token) + 1);
}
}

Categories