Find numbers on a string and order by them - php

I have this string
$s = "red2 blue5 black4 green1 gold3";
I need to order by the number, but can show the numbers.
Numbers will always appears at the end of the word.
the result should be like:
$s = "green red gold black blue";
Thanks!

Does it always follow this pattern - separated by spaces?
I would break down the problem as such:
I would first start with parsing the string into an array where the key is the number and the value is the word. You can achieve this with a combination of preg_match_all and array_combine
Then you could use ksort in order to sort by the keys we set with the previous step.
Finally, if you wish to return your result as a string, you could implode the resulting array, separating by spaces again.
An example solution could then be:
<?php
$x = "red2 blue5 black4 green1 gold3";
function sortNumberedWords(string $input) {
preg_match_all('/([a-zA-Z]+)([0-9]+)/', $input, $splitResults);
$combined = array_combine($splitResults[2], $splitResults[1]);
ksort($combined);
return implode(' ', $combined);
}
echo sortNumberedStrings($x);
The regex I'm using here matches two seperate groups (indicated by the brackets):
The first group is any length of a string of characters a-z (or capitalised). Its worth noting this only works on the latin alphabet; it won't match ö, for example.
The second group matches any length of a string of numbers that appears directly after that string of characters.
The results of these matches are stored in $splitResults, which will be an array of 3 elements:
[0] A combined list of all the matches.
[1] A list of all the matches of group 1.
[2] A list of all the matches of group 2.
We use array_combine to then combine these into a single associative array. We wish for group 2 to act as the 'key' and group 1 to act as the 'value'.
Finally, we sort by the key, and then implode it back into a string.

$s = "red2 blue5 black4 green1 gold3";
$a=[];
preg_replace_callback('/[a-z0-9]+/',function($m) use (&$a){
$a[(int)ltrim($m[0],'a..z')] = rtrim($m[0],'0..9');
},$s);
ksort($a);
print " Version A: ".implode(' ',$a);
$a=[];
foreach(explode(' ',$s) as $m){
$a[(int)ltrim($m,'a..z')] = rtrim($m,'0..9');
}
ksort($a);
print " Version B: ".implode(' ',$a);
preg_match_all("/([a-z0-9]+)/",$s,$m);
foreach($m[1] as $i){
$a[(int)substr($i,-1,1)] = rtrim($i,'0..9');
}
ksort($a);
print " Version C: ".implode(' ',$a);
Use one of them, but also try to understand whats going on here.

Related

PHP count instances of array items in a string

In PHP 7.3:
Given this array of low relevancy keywords...
$low_relevancy_keys = array('guitar','bass');
and these possible strings...
$keywords_db = "red white"; // desired result 0
$keywords_db = "red bass"; // desired result 1
$keywords_db = "red guitar"; // desired result 1
$keywords_db = "bass guitar"; // desired result 2
I need to know the number of matches as described above. A tedious way is to convert the string to a new array ($keywords_db_array), loop through $keywords_db_array, and then loop through $low_relevancy_keys while incrementing a count of matches. Is there a more direct method in PHP?
The way you described in your question but using array_* functions:
echo count(array_intersect(explode(' ', $keywords_db), $low_relevancy_keys));
(note that you can replace explode with preg_split if you need to be more flexible)
or using preg_match_all (that returns the number of matches):
$pattern = '~\b' . implode('\b|\b', $low_relevancy_keys) . '\b~';
echo preg_match_all($pattern, $keywords_db);
demo

How to implode a multi-dimensional array?

I have an array of arrays like:
$array = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]]
Now, I want to implode this array such that it would return something like this:
COTV_LITE(1800)
COTV_PREMIUM(2200)
How do I achieve this? Calling just the implode() function did not work:
implode ('<br>', $array);
You can call array_map() to implode the nested arrays:
echo implode('<br>', array_map(function($a) { return implode(' ', $a); }, $array));
DEMO
output:
1. COTV_LITE(1800)<br>2. COTV_PREMIUM(2200)
You can use variable length arguments variadic in PHP >= 5.6
Option1
$items = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]];
echo implode(' ',array_merge(...$items));
Output
1. COTV_LITE(1800) 2. COTV_PREMIUM(2200)
This is more of a precursor for the next option.
Option2
If you want to get a bit more creative you can use preg_replace too:
$items = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]];
$replace = [
'/^(\d+\.)$/' => '<li>\1 ',
'/^(\w+\(\d+\))$/' => '\1</li>'
];
echo '<ul>'.implode(preg_replace(array_keys($replace),$replace,array_merge(...$items))).'</ul>';
Output
<ul><li>1. COTV_LITE(1800)</li><li>2. COTV_PREMIUM(2200)</li></ul>
Option3
And lastly using an olordered list, which does the numbers for you. In this case we only need the second item from the array (index 1):
$items = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]];
echo '<ol><li>'.implode('</li><li>',array_column($items,1)).'</li></ol>';
Output
<ol><li>COTV_LITE(1800)</li><li>COTV_PREMIUM(2200)</li></ol>
Personally, I would put it in the ol that way you don't have to worry about the order of the numbers, you can let HTML + CSS handle them. Also it's probably the easiest and most semantically correct way, But I don't know if the numbering in the array has any special meaning or not.
In any case I would most definitely put this into a list to render it to HTML. This will give you a lot more options for styling it, later.
Update
want to use option 1. But how do I put each option on a different line using <br>
That one will put the <br> between each array element:
echo implode('<br>',array_merge(...$items));
Output
1.<br>COTV_LITE(1800)<br>2.<br>COTV_PREMIUM(2200)
The only way to easily fix that (while keeping the array_merge) is with preg_replace, which is the second one. So I will call this:
Option 1.2
$items = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]];
echo implode(preg_replace('/^(\w+\(\d+\))$/',' \1<br>',array_merge(...$items)));
Output
1. COTV_LITE(1800)<br>2. COTV_PREMIUM(2200)<br>
Sandbox
Basically there is no way to tell where the end item is after merging them. That operation effectively flattens the array out and gives us something like this:
["1.","COTV_LITE(1800)","2.","COTV_PREMIUM(2200)"]
So that Regex does this 'COTV_PREMIUM(2200)' becomes ' COTV_PREMIUM(2200)<br>'. This is just a way of changing that without having to dip into the array with some logic or something. WE wind up with this modification to the array:
["1."," COTV_LITE(1800)<br>","2."," COTV_PREMIUM(2200)<br>"]
Then with implode we just flatten it again into a string:
"1. COTV_LITE(1800)<br>2. COTV_PREMIUM(2200)<br>"
The Regex ^(\w+\(\d+\))$
^ - Match start of string
(...) - capture group 1
\w+ - match any working character a-zA-Z0-9_ one or more, eg. COTV_PREMIUM
\( - match the ( literally
\d+ - match digits 0-9 one or more, eg 2200
\) - match the ) literally
$ - match end of string
So this matches the pattern of the second (or even) items in the array, then we replace that with this:
The Replacement ' \1<br>'
{space} - adds a leading space
\1 - the value of capture group 1 (from above)
<br> - append a line break
Hope that makes sense. This should work as long as they meet that pattern. Obviously we can adjust the pattern, but with such a small sample size it's hard for me to know what variations will be there.
For example something as simple as (.+\))$ will work TestIt. This one just looks for the ending ). We just need somethng to capture all of the even ones, while not matching the odd. Regular expressions can be very confusing the first few times you see them, but they are extremely powerful.
PS - I added a few links to the function names, these go the the PHP documentation page for them.
Cheers!
Try this
$items = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]];
$imploded = [];
foreach($items as $item) {
$item_entry = implode(' ', $item);
echo $item_entry . '<br/>'; // display items
$imploded[] = $item_entry;
}
// your desired result is in $imploded variable for further use

preg_replace with a word in an array

I am trying to use certain words in a array called keywords, which will be used to be replaced in a string by "as".
for($i = 0; $i<sizeof($this->keywords[$this->lang]); $i++)
{
$word = $this->keywords[$this->lang][$i];
$a = preg_replace("/\b$word\b/i", "as",$this->code);
}
It works with if I replace the variable $word with something like /\bhello\b/i, which then would replace all hello words with "as".
Is the approach am using even possible?
Before to be a pattern, it's a double quoted string, so variables will be replaced, it's not the problem.
The problem is that you use a loop to change several words and you store the result in $a:
the first iteration, all the occurences of the first word in $this->code are replaced and the new string is stored in $a.
but the next iteration doesn't reuse $a as third parameter to replace the next word, but always the original string $this->code
Result: after the for loop $a contains the original string but with only the occurences of the last word replaced with as.
When you want to replace several words with the same string, a way consists to build an alternation: word1|word2|word3.... It can easily be done with implode:
$alternation = implode('|', $this->keywords[$this->lang]);
$pattern = '~\b(?:' . $alternation . ')\b~i';
$result = preg_replace($pattern, 'as', $this->code);
So, when you do that, the string is parsed only once and all the words are replaced in one shot.
If you have a lot of words and a very long string:
Testing a long alternation has a significant cost. Even if the pattern starts with \b that highly reduces the possible positions for a match, your pattern will have hard time to succeed and more to fail.
Only in this particular case, you can use this another way:
First you define a placeholder (a character or a small string that can't be in your string, lets say §) that will be inserted in each positions of word boundaries.
$temp = preg_replace('~\b~', '§', $this->code);
Then you change all the keywords like this §word1§, §word2§ ... and you build an associative array where all values are the replacement string:
$trans = [];
foreach ($this->keywords[$this->lang] as $word) {
$trans['§' . $word . '§'] = 'as';
}
Once you have do that you add an empty string with the placeholder as key. You can now use the fast strtr function to perform the replacement:
$trans['§'] = '';
$result = strtr($temp, $trans);
The only limitation of this technic is that it is case-sensitive.
it will work if you keep it like bellow:
$a = preg_replace("/\b".$word."\b/i", "as",$this->code);

Efficient way to parse this string into array in PHP?

Background
I have an array which I create by splitting a string based on every occurrence of 0d0a using preg_split('/(?<=0d0a)(?!$)/').
For example:
$string = "78781110d0a78782220d0a";
will be split into:
Array ( [0] => 78781110d0a [1] => 78782220d0a )
A valid array element has to start with 7878 and end with 0d0a.
The Problem
But sometimes, there's an additional 0d0a in the string which splits into an extra and invalid array element, i.e., that doesn't begin with 7878.
Take this string for example:
$string = "78781110d0a2220d0a78783330d0a";
This is split into:
Array ( [0] => 78781110d0a [1] => 2220d0a [2] => 78783330d0a )
But it should actually be:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a)
My Solution
I've written the following (messy) code to get around this:
$data = Array('78781110d0a','2220d0a','78783330d0a');
$i = 0; //count for $data array;
$j = 0; //count for $dataFixed array;
$dataFixed = $data;
foreach($data as $packet) {
if (substr($packet,0,4) != "7878") { //if packet doesn't start with 7878, do some fixing
if ($i != 0) { //its the first packet, can't help it!
$j++;
if ((substr(strtolower($packet), -4, 4) == "0d0a")) { //if the packet doesn't end with 0d0a, its 'mostly' not valid, so discard it
$dataFixed[$i-$j] = $dataFixed[$i-$j] . $packet;
}
unset($dataFixed[$i-$j+1]);
$dataFixed = array_values($dataFixed);
}
}
$i++;
}
Description
I first copy the array to another array $dataFixed. In a foreach loop of the $data array, I check whether it starts with 7878. If it doesn't, I join it with the previous array in $data. I then unset the current array in $dataFixed and reset the array elements with array_values.
But I'm not very confident about this solution.. Is there a better, more efficient way?
UPDATE
What if the input string doesn't end in 0d0a like its supposed to? It will stick to the previous array element..
For e.g.: in the string 78781110d0a2220d0a78783330d0a0000, 0000 should be separated as another array element.
Use another positive lookahead (?=7878) to form:
preg_split('/(?<=0d0a)(?=7878)/',$string)
Note: I removed (?!$) because I wasn't sure what that was for, based on your example data.
For example, this code:
$string = "78781110d0a2220d0a78783330d0a";
$array = preg_split('/(?<=0d0a)(?=7878)(?!$)/',$string);
print_r($array);
Results in:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a )
UPDATE:
Based on your revised question of having possible random characters at the end of the input string, you can add three lines to make a complete program of:
$string = "78781110d0a2220d0a787830d0a330d0a0000";
$array = preg_split('/(?<=0d0a)(?=7878)/',$string);
$temp = preg_split('/(7878.*0d0a)/',$array[count($array)-1],null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$array[count($array)-1] = $temp[0];
if(count($temp)>1) { $array[] = $temp[1]; }
print_r($array);
We basically do the initial splitting, then split the last element of the resulting array by the expected data format, keeping the delimiter using PREG_SPLIT_DELIM_CAPTURE. The PREG_SPLIT_NO_EMPTY ensures we won't get an empty array element if the input string doesn't end in random characters.
UPDATE 2:
Based on your comment below where it seems you're implying there might be random characters between any of the desired matches, and you want these random characters preserved, you could do this:
$string = "0078781110d0a2220d0a2220d0a0000787830d0a330d0a000078781110d0a2220d0a0000787830d0a330d0a0000";
$split1 = preg_split('/(7878.*?0d0a)/',$string,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$result = array();
foreach($split1 as $e){
$split2 = preg_split('/(.*0d0a)/',$e,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
foreach($split2 as $el){
// test if $el doesn't start with 7878 and ends with 0d0a
if(strpos($el,'7878') !== 0 && substr($el,-4) == '0d0a'){
//if(preg_match('/^(?!7878).*0d0a$/',$el) === 1){
$result[ count($result)-1 ] = $result[ count($result)-1 ] . $el;
} else {
$result[] = $el;
}
}
}
print_r($result);
The strategy employed here is different than above. First we split the input string based on the delimiter that matches your desired data, using the nongreedy regex .*?. At this point we have some strings that contain the ending of a desired value and some garbage at the end, so we split again based on the last occurrence of "0d0a" with the greedy regex .*0d0a. We then append any of those resulting values that don't start with "7878" but end with "0d0a" to the previous value, as this should repair the first and second halves that got split because it contained an extra "0d0a".
I provided two methods for the innermost if statement, one using regular expressions. The regex one is marginally slower in my testing, so I've left that one commented out.
I might still not have your full requirements, so you'll have to let me know if it works and perhaps provided your full dataset.
I think you are using a delimiter "0d0a" which also happens to be part of a content! Its not possible to avoid getting junk data as long as delimiter can also be part of content. Somehow delimiter must be unique.
Possible solutions.
Change the delimited to something else that doesn't occur as part of your data ( 000000, #!.;)
If you are definite about length of text that easy arrange item may have, use it. As per examples its not possible.
Solutions given in answers considering only sample data you have shared. If you are confidant about what will be the content of string, then these solutions given by others are pretty good to use. Otherwise these solutions wont assure you guarantee!
Best solution: Fix right delimiter then use regex or explode whatever you prefer.
Why don't you use preg_match_all instead? You can avoid all of the non-capturing groups (the look aheads, look behinds) in order to split the string (which without the non-capturing groups removes the matches), and just find the matches you're looking for:
Updated
<?php
$string = "00787817878110d0a22278780d0a78783330d0a00";
preg_match_all('/7878.*?0d0a(?=7878|[^(7878)]*?$)/', $string, $arr);
print_r($arr);
?>
Gives an array $arr[0] => ( [0] => 787817878110d0a22278780d0a, [1] => 78783330d0a ). Strips leading and trailing garbage characters (whatever doesn't start with 7878 or end with 7878 or 0d0a.
So $arr[0] would be the array of values that you are looking for.
See example on ideone
Works with multiple 7878 values and multiple 0d0a values (even though that's ridiculous).
Update
If splitting is more your style, why not avoid regular expressions altogether?
<?php
$string = "787817878110d0a22278780d0a78783330d0a";
$arr = explode('0d0a7878', $string);
$string = implode('0d0a,7878', $arr);
$arr = explode(',', $string);
print_r($arr);
?>
Here we split the string by the delimiter 0d0a7878, which is what #CharlieGorichanaz's solution is doing, and props to him for the quick, accurate solution. We then add a comma, because who doesn't love comma separated values? And we explode again on the commas for an array of desired values. Performance-wise, this ought to be faster than using regular expressions. See example.

PHP Using str_word_count with strsplit to form array after x words

I've got a large string that I want to put in an array after each 50 words. I thought about using strsplit to cut, but realised that wont take the words in to consideration, just split when it gets to x char.
I've read about str_word_count but can't work out how to put the two together.
What I've got at the moment is:
$outputArr = str_split($output, 250);
foreach($outputArr as $arOut){
echo $arOut;
echo "<br />";
}
But I want to substitute that to form each item of the array at 50 words instead of 250 characters.
Any help will be much appreciated.
Assuming that str_word_count is sufficient for your needs¹, you can simply call it with 1 as the second parameter and then use array_chunk to group the words in groups of 50:
$words = str_word_count($string, 1);
$chunks = array_chunk($words, 50);
You now have an array of arrays; to join every 50 words together and make it an array of strings you can use
foreach ($chunks as &$chunk) { // important: iterate by reference!
$chunk = implode(' ', $chunk);
}
¹ Most probably it is not. If you want to get what most humans consider acceptable results when processing written language you will have to use preg_split with some suitable regular expression instead.
There's another way:
<?php
$someBigString = <<<SAMPLE
This, actually, is a nice' old'er string, as they said, "divided and conquered".
SAMPLE;
// change this to whatever you need to:
$number_of_words = 7;
$arr = preg_split("#([a-z]+[a-z'-]*(?<!['-]))#i",
$someBigString, $number_of_words + 1, PREG_SPLIT_DELIM_CAPTURE);
$res = implode('', array_slice($arr, 0, $number_of_words * 2));
echo $res;
Demo.
I consider preg_split a better tool (than str_word_count) here. Not because the latter is inflexible (it is not: you can define what symbols can make up a word with its third param), but because preg_split will essentially stop processing the string after getting N items.
The trick, as quite common with this function, is to capture delimiters as well, then use them to reconstruct the string with the first N words (where N is given) AND punctuation marks saved.
(of course, the regex used in my example does not strictly comply to str_word_count locale-dependent behavior. But it still restricts the words to consist of alpha, ' and - symbols, with the latter two not at the beginning and the end of any word).

Categories