PHP finding "words" within a string - php

I need to compare 2 lists of strings against each other and output strings which contain the strings searched for. should be very easy, i just can't figure it out.
to overly simplify it, let's use arrays. I am accessing an API with SOAP and running it against my own list contained in a table, but.... let's use arrays. the comparison is what i'm having trouble with.
hit submit button on listsearch.php and it executes.
ARRAY Mylist : TED, DEAD, FIRST, LAST, PUPPY
ARRAY TheirList..<br> teddybearnoose, <br>hauntedhouse, <br>hehasdeparted, <br>deadmouse, <br>walkingdead, <br>thegratefuldead, <br>firstkiss, <br>thinkfirst,<br> firsttobelast,<br> firstmanonthemoon, <br>firstreattempted, <br>somecrap, <br>something, <br>notdisplayed, <br>50000otherwords,<br> miscjunk
outputs as:
TEDdybearnoose<br>
haunTEDhouse<br>
hehasdeparTED<br>
DEADmouse<br>
walkingDEAD<br>
thegratefulDEAD<br>
FIRSTkiss<br>
thinkFIRST<br>
FIRSTtobeLAST <--- note<br>
FIRSTmanonthemoon<br>
FIRSTreattempTED <--- note<br>
<br>
only outputs strings which contain a string in my list, in any position. CAPS is just to make the words stand out to you. not important.
now, part 2?
same "TheirList", except i type a keyword into a text area, and select whether i want it at the beginning end or anywhere from a dropdown.
keywordsearch.php
search for: [ TED ] at: [beginning / end / anywhere] of string.
how would you make that one work?
Thanks in advance. This should be a breeze for most of you. I appreciate it. i'll try to answer questions promptly

You can use strpos() to find the position of a substring (docs).
It makes it very easy to check whether the substring occurred at the beginning or at the end of the string:
// String contains substring
strpos($string, $substring) !== false;
// String starts with substring
strpos($string, $substring) === 0;
// String ends with substring
strpos($string, $substring) === strlen($string) - strlen($substring);

Related

Delete multiple file for/while

I have a php pull down that I select an item and delete
all files associated with it.
It works well if there was only 5 or 6. After I put the
first 4 to test and get it working I realized it could
take a very long time to enter in a couple hundred and
would blot the script.
Not knowing enough about for and while loops is there
anyone that might have a way to help?
There will never be more than one set deleted at a time.
Thanks in advance.
<?php
$workitem = $_POST["workitem"];
$workdirPAth = "/var/work.files/";
if($workitem == 'item1.php')
{
unlink("$workdirPath/page1.php");
unlink("$workdirPath/temp1.php");
unlink("$workdirPath/all1.php");
}
if($workitem == 'item2.php')
{
unlink("$workdirPath/page2.php");
unlink("$workdirPath/temp2.php");
unlink("$workdirPath/all2.php");
}
if($workitem == 'item3.php')
{
unlink("$workdirPath/page3.php");
unlink("$workdirPath/temp3.php");
unlink("$workdirPath/all3.php");
}
if($workitem == 'item4.php')
{
unlink("$workdirPath/page4.php");
unlink("$workdirPath/temp4.php");
unlink("$workdirPath/all3.php");
?>
Some simple pattern matching and substitution is all you need here.
First, the code:
1. if (preg_match('/^item(\d+)\.php$/', $workitem, $matches)) {
2. $number = $matches[1];
3. foreach(array('page','temp','all') as $base) {
4. unlink("$workdirPath/$base$number.php");
5. }
6. } else {
7. # unrecognized work item value; complain to user or whatever
8. }
The preg_match function takes a pattern, a string, and an array. If the string matches the pattern, the parts that match are stored in the array. The particular type of pattern is a *p*erl5-compatible *reg*ular expression, which is where the preg_ part of the name comes from.
Regular expressions are scary-looking to the uninitiated, but they're a handy way to scan a string and get some values out of it. Most characters just represent themselves; the string "foo" matches the regular expression /foo/. But some characters have special meanings that let you make more general patterns to match a whole set of strings where you don't have to know ahead of time exactly what's in them.
The /s just mark the beginning and end of the actual regular expression; they're there because you can stick additional modifier flags inside the string along with the expression itself.
The ^and $ arepresent the beginning and end of the string. "/foo/" matches "foo", but also "foobar", "bunnyfoofoo", and so on - any string that contains "foo" will match. But /^foo$/ matches only "foo" exactly.
\d means "any digit". + means "one or more of that last thing". So \d+ means "one or more digits".
The period (.) is special; it matches any character at all. Since we want a literal period, we have to escape it with a backslash; \. just matches a period.
So our regular expression is '/^item\d+\.php$/', which will match any itemnumber.php filename. But that's not quite enough. The preg_match function is basically a binary test: does the string match the pattern or not, yes or no? In this case, it's not enough to just say "yup, the string is valid"; we need to know which items specifically the user specified. That's what capture groups are for. We use parentheses to say "remember what matched this part", and provide an array name that gets filled with those remembrances.
The part of the string that matches the whole regular expression (which may not be the whole string, if the regular expression isn't anchored with ^...$ like this one is) is always put in element 0 of the array. If you use parentheses in the regular expression, then the part of the string that matches the part of the regular expression inside the first pair of parentheses is stored in element 1 of the array; if there's a second set of parentheses, the matching part of the string goes in element 2 of the array, and so on.
So we put parentheses around our number ((\d+)) and then the actual number will be remembered in element 1 of our $matches array.
Great, we have a number. Now we just need to use it to build up the filenames we want to delete.
In each case, we want to delete three files: page$n.php, temp$n.php, and all$n.php, where $n is the number we extracted above. We could just put three unlink calls, but since they're all so similar, we can use a loop instead.
Take the different prefixes that are the same no matter the number, and make an array out of them. Then loop over that array. In the body of the loop, the variable $base will contain whichever element of the array it's currently on. Stick that between the $workdirPath prefix and the $number we got from the match, append .php, and that's your file. unlink it and go back to the top of the loop to grab the next one.

Find nth character except if its enclosed in brackets php

I use the following function to find the nth character in a string which works well. However there is one exception, lets say its a comma for this purpose, what i need to alter about this is that if the coma is within ( and ) then it shouldnt count that
function strposnth($haystack, $needle, $nth=1, $insenstive=0)
{
//if its case insenstive, convert strings into lower case
if ($insenstive) {
$haystack=strtolower($haystack);
$needle=strtolower($needle);
}
//count number of occurances
$count=substr_count($haystack,$needle);
//first check if the needle exists in the haystack, return false if it does not
//also check if asked nth is within the count, return false if it doesnt
if ($count<1 || $nth > $count) return false;
//run a loop to nth number of occurrence
//start $pos from -1, cause we are adding 1 into it while searching
//so the very first iteration will be 0
for($i=0,$pos=0,$len=0;$i<$nth;$i++)
{
//get the position of needle in haystack
//provide starting point 0 for first time ($pos=0, $len=0)
//provide starting point as position + length of needle for next time
$pos=strpos($haystack,$needle,$pos+$len);
//check the length of needle to specify in strpos
//do this only first time
if ($i==0) $len=strlen($needle);
}
//return the number
return $pos;
}
So ive got the regex working that only captures the comma when outside of () which is:
'/,(?=[^)]*(?:[(]|$))/'
and you can see a live example working here:
http://regex101.com/r/xE4jP8
but im not sure how to make it work within the strpos loop, i know what i need to do, tell it the needle has this regex exception but i am not sure how to make it work. Maybe i should ditch the function and use another method?
Just to mention my end result i want is to split the string after every 6 commas before the next string starts, example:
rttr,ertrret,ertret(yes,no),eteert,ert ert,rtrter,0 rttr,ert(yes,no)rret,ert ret,eteert,ertert,rtrter,1 rttr,ertrret,ert ret,eteert,ertert,rtrter,0 rttr,ertrret,ert ret,eteert,ertert,rtrter,2 rttr,ert(white,black)rret,ert ret,eteert,ertert,rtrter,0 rttr,ertrret,ert ret,eteert,ertert,rtrter,0 rttr,ertrret,ert ret,et(blue,green)eert,ertert,rtrter,1
Note that there is always a 1 digit number (1-3) and a space after the 6th comma before the next part of the string begins but i cant really rely on that as its possible earlier in the string this pattern could happen so i can always rely on the fact ill need to split the string after the first digit and space after the 6th comma. So i want to split the string directly after this.
For example the above string would be split like this:
rttr,ertrret,ertret(yes,no),eteert,ert ert,rtrter,0
rttr,ert(yes,no)rret,ert ret,eteert,ertert,rtrter,1
rttr,ertrret,ert ret,eteert,ertert,rtrter,0
rttr,ertrret,ert ret,eteert,ertert,rtrter,2
rttr,ert(white,black)rret,ert ret,eteert,ertert,rtrter,0
rttr,ertrret,ert ret,eteert,ertert,rtrter,0
rttr,ertrret,ert ret,et(blue,green)eert,ertert,rtrter,1
I can do that myself pretty easily if i know how to get the position of the character then i can use substr to split it but an easier way might be preg_split but im not sure how that would work until i figure this part out
I hope i wasnt too confusing in explaining, i bet i was :)
For these kind of nesting problems regex usually is not the right tool. However, when the problem is actually not that complicated, as yours seems to be, regex will do just fine.
Try this:
(?:^|,)((?:[^,(]*(?:\([^)]*\))?)*)
^ start the search with a comma or the start of the string
^ start non capture group
^ search until comma or open parenthesis
^ if parenthesis found then capture until
^ end of parenthesis
^ end of capture group repeat if necessary
See it in action: http://regex101.com/r/eS0cX4
As you can see this will capture everything between the comma's outside of the parenthesis. If you get all these matches into an array using preg_match_all you can split it any which way you like.

How to find if two characters are in an array php

I am looking to develop a search function that allows users to just search for the item, or modify their search with a price range in brackets. So that is to say if they are looking for a car, then they can enter either car and receive all cars in the database or they can enter car (100, 299) or car(100, 299) and receive only cars in the database with the price range of 100 to 299.
Before what I did was three different explode function calls, but that was cumbersome and looked ridiculously ugly. I also tried to put the the brackets in an array and then compare that against the word searched (a word is basically an array of characters) but that didn't work. Finally I have been reading up on strpos and substr but they don't seem to fit the requirements as strpos returns the first occurrence of the the character and substr returns the characters within a specified length after a specific occurrence.
So for example the problem with strpos is the user can just enter ( and no ) bracket and I'll make a call to my search function with who knows what. And for example the problem with substr is that the price range can vary wildly.
You can use preg_match to parse the search string - I'm assuming that's the part you're having issues with.
if (preg_match('/car ?\(([^,]+), ?([^\)]+)\)/', $search_text, $matches)) {
$low_price = $matches[1];
$high_price = $matches[2];
//do your price filtering here
}
The regular expression may need a little tweaking, I don't remember offhand if parentheses need to be escaped in character classes.
Yes, Sam is right. You should do this with regular expressions.
Look for preg_match() on the documentation
To complete his answer, the regular expression for your case is:
$regex = "^([a-zA-Z]+)\s\(([0-9]+),([0-9]+)\)$"
if (preg_match($regex, $search_text, $matches)) {
$type = $matches[0];
$low_price = $matches[1];
$high_price = $matches[2];
//do your price filtering here
}
Be careful, as the array containing matches starts at index 0, not one.

Any faster, simpler alternative to php preg_match

I am using cakephp 1.3 and I have textarea where users submit articles. On submit, I want to look into the article for certain key words and and add respective tags to the article.
I was thinking of preg_match, But preg_match pattern has to be string. So I would have to loop through an array(big).
Is there a easier way to plug in the keywords array for the pattern.
I appreciate all your help.
Thanks.
I suggest treating your array of keywords like a hash table. Lowercase the article text, explode by spaces, then loop through each word of the exploded array. If the word exists in your hash table, push it to a new array while keeping track of the number of times it's been seen.
I ran a quick benchmark comparing regex to hash tables in this scenario. To run it with regex 1000 times, it took 17 seconds. To run it with a hash table 1000 times, it took 0.4 seconds. It should be an O(n+m) process.
$keywords = array("computer", "dog", "sandwich");
$article = "This is a test using your computer when your dog is being a dog";
$arr = explode(" ", strtolower($article));
$tracker = array();
foreach($arr as $word){
if(in_array($word, $keywords)){
if(isset($tracker[$word]))
$tracker[$word]++;
else
$tracker[$word] = 1;
}
}
The $tracker array would output: "computer" => 1, "dog" => 2. You can then do the process to decide what tags to use. Or if you don't care about the number of times the keyword appears, you can skip the tracker part and add the tags as the keywords appear.
EDIT: The keyword array may need to be an inverted index array to ensure the fastest lookup. I am not sure how in_array() works, but if it searches, then this isn't as fast as it should be. An inverted index array would look like
array("computer" => 1, "dog" => 1, "sandwich" => 1); // "1" can be any value
Then you would do isset($keywords[$word]) to check if the word matches a keyword, instead of in_array(), which should give you O(1). Someone else may be able to clarify this for me though.
If you don't need the power of regular expressions, you should just use strpos().
You will still need to loop through the array of words, but strpos is much, much faster than preg_match.
Of course, you could try matching all the keywords using one single regexp, like /word1|word2|word3/, but I'm not sure it is what you are looking for. And also I think it would be quite heavy and resource-consuming.
Instead, you can try with a different approach, such as splitting the text into words and checking if the words are interesting or not. I would make use of str_word_count() using someting like:
$text = 'this is my string containing some words, some of the words in this string are duplicated, some others are not.';
$words_freq = array_count_values(str_word_count($text, 1));
that splits the text into words and counts occurrences. Then you can check with in_array($keyword, $words_freq) or array_intersect(array_keys($words_freq), $my_keywords).
If you are not interested, as I guess, to the keywords case, you can strtolower() the whole text before proceeding with words splitting.
Of course, the only way to determine which approach is the best is to setup some testing, by running various search functions against some "representative" and quite long text and measuring the execution time and resource usage (try microtime(TRUE) and memory_get_peak_usage() to benchmark this).
EDIT: I cleaned up a bit the code and added a missing semi-colon :)
If you want to look for multiple words from an array, then combine said array into an regular expression:
$regex_array = implode("|", array_map("preg_escape", $array));
preg_match_all("/($regex_array)/", $src, $tags);
This converts your array into /(word|word|word|word|word|...)/. The arrray_map and preg_escape part is optional, only needed if the $array might contain special characters.
Avoid strpos and loops for this case. preg_match is faster for searching after alternatives.
strtr()
If given two arguments, the second
should be an array in the form
array('from' => 'to', ...). The return
value is a string where all the
occurrences of the array keys have
been replaced by the corresponding
values. The longest keys will be tried
first. Once a substring has been
replaced, its new value will not be
searched again.
Add tags manually? Just like we add tags here at SO.

How to split a string and find the occurence of one string in another?

I need to figure out how to do some C# code in php, and im not sure exactly how.
so first off i need the Split function, im going to have a string like
"identifier 82asdjka271akshjd18ajjd"
and i need to split the identifier word from the rest. so in C#, i used string.Split(new char{' '}); or something like that (working off the top of my head) and got two strings, the first word, and then the second part.. i understand that the php split function has been deprecated as of PHP 5.3.0.. so thats not an option, what are the alternatives?
and im also looking for a IndexOf function, so if i had the above code again as an example, i would need the location of 271 in the string, so i can generate a substring.
you can use explode for splitting and strpos for finding the index of one string inside another.
$a = "identifier 82asdjka271akshjd18ajjd";
$arr = explode(' ',$a); // split on space..to get an array of size 2.
$pos = strpos($arr[1],'271'); // search for '271' in the 2nd ele of array.
echo $pos; // prints 8

Categories