Searching contents of a file using keywords inside array : PHP - php

I have 3 keywords inside array $val[1] :e.g one, two, three
and the following code:
foreach ($bus as $val){
$val = preg_split('/-./', $val, -1, PREG_SPLIT_NO_EMPTY); // split by _
$val[1] = trim(preg_replace('/\s*\(.*/', '', $val[1])); // remove () if found
$val[1] = trim(preg_replace("/\s*\.\.\.*/", '', $val[1])); // remove ... if found
$pattern = "/.*$val[1].*/i";
$data3 = file_get_contents("foo.txt");
preg_match_all($pattern, $data3, $matches);
foreach ($matches[0] as $v){
echo $pattern."<BR>".$v."<BR>";} }
However when I do the last echo $pattern in the loop, I found out that it only prints out /.*two.*/i (meaning it's using only the second keyword to search the file) as opposed to outside of it where it is able to print all keywords
/.*one.*/i
/.*two.*/i
/.*three.*/i
Where did I go wrong in the code? (I'll only be getting 3 lines of text as the result as each keyword will only return one result)
EDIT: Think i left out an important part of the code, I've edited it to show it more accurately - so I dont think I've been overwriting $val
Examples of what $bus will print
Ayer Rajah Avenue-.Opp JVC Electron...
Ayer Rajah Avenue-.JVC Electronics ...
Portsdown Road-.Opp Portsdown Camp ...

You are over writing your $val array!
foreach($val[1] as $keyword) //when you do "as $val" that means for the 2nd time.. $val[1] does not exist!
{
$pattern = "/.*$keyword.*/i"; ...

use
PREG_SET_ORDER
as 4th argument in preg_match_all
reference

Try using different names for your array and the element in your foreach statement.
Also I'd recommend you put file_get_contents outside your loop for efficiency reasons.
Finally, your last foreach loop should uses $matches, not $matches[0] (which just calls the first match.

You want to use something like:
foreach ($val as $keyword) {
// replace usage of $val with $keyword
You are taking the second element of the array, $val[1], which is two, and then applying a foreach against two. Then you are storing it right back on top of your original variable, $val.
Edit
You are only working with $val[1]. Don't you need to apply all of your logic to all of the entries in $val?
$val = preg_split('/-./', $val, -1, PREG_SPLIT_NO_EMPTY); // split by _
for ($i = 0; $i < count($val); $i++) {
$val[$i] = trim(preg_replace('/\s*\(.*/', '', $val[$i])); // remove () if found
$val[$i] = trim(preg_replace("/\s*\.\.\.*/", '', $val[$i])); // remove ... if found
$pattern = "/.*$val[$i].*/i";
// ...
}

Related

What am I doing wrong in this php regular expression?

I have a problem. I don't know what could be the cause. What I want to do is to know if an element of an array has a certain word, and I use this regular expression /.*example.*/, and this is the code:
$array = ['example1', '2example', 'no'];
$matches = [];
$var = "example";
foreach($array as $element)
{
preg_match("/.*$var.*/", $element, $matches);
}
But when I run the above code and see the value of $matches it is an empty array. What am I doing wrong?
That's because you are looping through your $array and probably print the result after the loop.
So $matches just includes the matching elements of the last item in your $array.
But because 'no' is the last element, and it doesn't fulfill the regex requirements, $matches is empty.
To have a better understanding of what is happening, try to use print_r($matches) within your loop, after you called preg_matches().
And after that, try to call it after your loop and see the difference.
You need two variables. One is the result of the current match, and another is a list of all the matches. After each call you can push the result of the current match to the list.
$array = ['example1', '2example', 'no'];
$matches = [];
$var = "example";
foreach($array as $element)
{
if (preg_match("/.*$var.*/", $element, $match)) {
$matches[] = $match[0];
}
}
print_r($matches);

How to find ALL substrings in string using starting and ending words arrays PHP

I've spent my last 4 hours figuring out how to ... I got to ask for your help now.
I'm trying to extract from a text multiple substring match my starting_words_array and ending_words_array.
$str = "Do you see that ? Indeed, I can see that, as well as this." ;
$starting_words_array = array('do','I');
$ending_words_array = array('?',',');
expected output : array ([0] => 'Do you see that ?' [1] => 'I can see that,')
I manage to write a first function that can find the first substring matching one of both arrays items. But i'm not able to find how to loop it in order to get all the substring matching my requirement.
function SearchString($str, $starting_words_array, $ending_words_array ) {
forEach($starting_words_array as $test) {
$pos = strpos($str, $test);
if ($pos===false) continue;
$found = [];
forEach($ending_words_array as $test2) {
$posStart = $pos+strlen($test);
$pos2 = strpos($str, $test2, $posStart);
$found[] = ($pos2!==false) ? $pos2 : INF;
}
$min = min($found);
if ($min !== INF)
return substr($str,$pos,$min-$pos) .$str[$min];
}
return '';
}
Do you guys have any idea about how to achieve such thing ?
I use preg_match for my solution. However, the start and end strings must be escaped with preg_quote. Without that, the solution will be wrong.
function searchString($str, $starting_words_array, $ending_words_array ) {
$resArr = [];
forEach($starting_words_array as $i => $start) {
$end = $ending_words_array[$i] ?? "";
$regEx = '~'.preg_quote($start,"~").".*".preg_quote($end,"~").'~iu';
if(preg_match_all($regEx,$str,$match)){
$resArr[] = $match[0];
}
}
return $resArr;
}
The result is what the questioner expects.
If the expressions can occur more than once, preg_match_all must also be used. The regex must be modify.
function searchString($str, $starting_words_array, $ending_words_array ) {
$resArr = [];
forEach($starting_words_array as $i => $start) {
$end = $ending_words_array[$i] ?? "";
$regEx = '~'.preg_quote($start,"~").".*?".preg_quote($end,"~").'~iu';
if(preg_match_all($regEx,$str,$match)){
$resArr = array_merge($resArr,$match[0]);
}
}
return $resArr;
}
The resut for the second variant:
array (
0 => "Do you see that ?",
1 => "Indeed,",
2 => "I can see that,",
)
I would definitely use regex and preg_match_all(). I won't give you a full working code example here but I will outline the necessary steps.
First, build a regex from your start-end-pairs like that:
$parts = array_map(
function($start, $end) {
return $start . '.+' . $end;
},
$starting_words_array,
$ending_words_array
);
$regex = '/' . join('|', $parts) . '/i';
The /i part means case insensitive search. Some characters like the ? have a special purpose in regex, so you need to extend above function in order to escape it properly.
You can test your final regex here
Then use preg_match_all() to extract your substrings:
preg_match_all($regex, $str, $matches); // $matches is passed by reference, no need to declare it first
print_r($matches);
The exact structure of your $matches array will be slightly different from what you asked for but you will be able to extract your desired data from it
Benni answer is best way to go - but let just point out the problem in your code if you want to fix those:
strpos is not case sensitive and find also part of words so you need to changes your $starting_words_array = array('do','I'); to $starting_words_array = array('Do','I ');
When finding a substring you use return which exit the function so you want find any other substring. In order to fix that you can define $res = []; at the beginning of the function and replace return substr($str,$pos,... with $res[] = substr($str,$pos,... and at the end return the $res var.
You can see example in 3v4l - in that example you get the output you wanted

Shortcode style parsing

Looking at how WP uses shortcodes I thoufght I could implement the same structure into a project, I assumed this would be availble somwehere but have yet to track down.
I started to parse myself starting with a preg_match_all
preg_match_all('/[[^]]*]/', $content, $match);
and that return the array with all the shortcodes inside content as expected but then looking at parsing the name, variables or array keys with values I start getting real heavy on parsing.
My current thought is to break up on spaces, then parse each but then i run into spaces in the values even though they are in quotes. So if i parse quoted data first then spaces to re-construct it seems very wasteful. I don't need to re-invent the wheel here so any input is fantastic.
example
[shortcodename key1="this is a value" key2="34"]
would like to have
Array
(
[shortcodename] => Array
(
[key1] => this is a value
[key2] => 34
)
)
here is the complete function that is working if anyone else is looking to do the same, obviously this is not meant to run user content but the called function should do any checks as this only replaces the shortcode if the funtction has a return value.
function processShortCodes($content){ // locate data inside [ ] and
//process the output, place back into content and returns
preg_match_all('/\[[^\]]*\]/', $content, $match);
$regex = '~"[^"]*"(*SKIP)(*F)|\s+~';
foreach ($match[0] as $key => $val){
$valOrig = $val; // keep uncleaned value to replace later
$val = trim(substr($val, 1, -1));
$replaced = preg_replace($regex,":",$val);
$exploded = explode(':',$replaced);
if (is_array($exploded)){
$fcall = array();
$fcallName = array_shift($exploded); // function name
if (function_exists($fcallName)){ // If function exsist then go
foreach ($exploded as $aKey => $aVal){
$arr = explode("=", $aVal);
if (substr($arr[1], 0, 1) == '&'){
$fCall[$arr[0]]=substr($arr[1], 6, -6); // quotes can be "
}else{
$fCall[$arr[0]]=substr($arr[1], 1, -1);
}
}
if ( is_array($fCall) && $fcallName ){
$replace = call_user_func($fcallName, $fCall);
if ($replace){
$content = str_replace($valOrig,$replace,$content);
}
}
}
}
}
You can try this to change all spaces not wrapped in quotes to let's say a semicolon then explode by semicolon
$regex = '~"[^"]*"(*SKIP)(*F)|\s+~';
$subject = 'hola hola "pepsi cola" yay';
$replaced = preg_replace($regex,";",$subject);
$exploded = explode(';', $replaced);
Credits

Search a letter in a list of words?

I have an array called 'words' storing many words.
For example:
I have 'systematic', 'سلام','gear','synthesis','mysterious', etc.
NB: we have utf8 words too.
How to query efficiently to see which words include letters 's','m','e' (all of them) ?
The output would be:
systematic,mysterious
I have no idea how to do such a thing in PHP. It should be efficient because our server would suffer otherwise.e.
Use a regular expression to split each string into an array of characters, and use array_intersect() to find out if all the characters in your search array is present in the split array:
header('Content-Type: text/plain; charset=utf8');
$words = array('systematic', 'سلام','gear','synthesis','mysterious');
$search = array('s','m','e');
foreach ($words as $word) {
$char_array = utf8_str_split($word);
$contains = array_intersect($search, $char_array) == $search;
echo sprintf('%s : %s', $word, (($contains) ? 'True' : 'False'). PHP_EOL);
}
function utf8_str_split($str) {
return preg_split('/(?!^)(?=.)/u', $str);
}
Output:
systematic : True
سلام : False
gear : False
synthesis : False
mysterious : True
Demo.
UPDATE: Or, alternatively, you could use array_filter() with preg_match():
$array = array_filter($words, function($item) {
return preg_match('~(?=[^s]*s)(?=[^m]*m)(?=[^e]*e)~u', $item);
});
Output:
Array
(
[0] => systematic
[4] => mysterious
)
Demo.
This worked to me:
$words = array('systematic', 'سلام','gear','synthesis','mysterious');
$letters=array('s','m', 'e');
foreach ($words as $w) {
//print "lets check word $w<br>";
$n=0;
foreach ($letters as $l) {
if (strpos($w, $l)!==false) $n++;
}
if ($n>=3) print "$w<br>";
}
It returns
systematic
mysterious
Explanation
It uses nested foreach: one for the words and the other one for the letters to be matched.
In case any letter is matched, the counter is incremented.
Once the letters loop is over, it checks how many matches were there and prints the word in case it is 3.
Something like this:
$words = array('systematic', 'سلام','gear','synthesis','mysterious');
$result=array();
foreach($words as $word){
if(strpos($word, 's') !== false &&
strpos($word, 'm') !== false &&
strpos($word, 'e') !== false){
$result[] = $word;
}
}
echo implode(',',$result); // will output 'systematic,mysterious'
Your question is wide a little bit.
What I understand from your question that's those words are saved in a database table, so you may filter the words before getting them into the array, using SQL like function.
in case you want to search for a letters in an array of words, you could loop over the array using foreach and each array value should be passed to strpos function.
http://www.php.net/function.strpos
why not use PREG_GREP
$your_array = preg_grep("/[sme]/", $array);
print_r($your_array);
WORKING DEMO

regular expression word preceded by char

I want to grab a specific string only if a certain word is followed by a = sign.
Also, I want to get all the info after that = sign until a / is reached or the string ends.
Let's take into example:
somestring.bla/test=123/ohboy/item/item=capture
I want to get item=capture but not item alone.
I was thinking about using lookaheads but I'm not sure it this is the way to go. I appreciate any help as I'm trying to grasp more and more about regular expressions.
[^/=]*=[^/]*
will give you all the pairs that match your requirements.
So from your example it should return:
test=123
item=capture
Refiddle Demo
If you want to capture item=capture, it is straightforward:
/item=[^\/]*/
If you want to also extract the value,
/item=([^\/]*)/
If you only want to match the value, then you need to use a look-behind.
/(?<=item=)[^\/]*/
EDIT: too many errors due to insomnia. Also, screw PHP and its failure to disregard separators in a character group as separators.
Here is a function I wrote some time ago. I modified it a little, and added the $keys argument so that you can specify valid keys:
function getKeyValue($string, Array $keys = null) {
$keys = (empty($keys) ? '[\w\d]+' : implode('|', $keys));
$pattern = "/(?<=\/|$)(?P<key>{$keys})\s*=\s*(?P<value>.+?)(?=\/|$)/";
preg_match_all($pattern, $string, $matches, PREG_SET_ORDER);
foreach ($matches as & $match) {
foreach ($match as $key => $value) {
if (is_int($key)) {
unset($match[$key]);
}
}
}
return $matches ?: FALSE;
}
Just trow in the string and valid keys:
$string = 'somestring.bla/test=123/ohboy/item/item=capture';
$keys = array('test', 'item');
$keyValuePairs = getKeyValue($string, $keys);
var_dump($keyValuePairs);

Categories