I'm currently looking to implement a search function on my website.
I have it working with 1 word/name, but I can't seem to figure out how to split and identify certain parts of the search string.
Example:
I have a user in my database with the name "Steve de Vette"
(My country has words in between almost all of the first and last names but not always, and sometimes more than one. ex: "Kees van der Berg") But his name is of course split up in multiple parts. "vNaam", "Tvoegsel"(meaning the "de" or "van der") and "aNaam".
This complicates things a bit for me, since I now have to split the search string, which on it's own isn't a big deal. But I need to know how I can get the correct results every time.
So I guess it comes down to this: How can I make it so that the name is split up like it should, or maybe there's a way to strip these thing all together, but for the likes of me I can't seem to figure it out.
Any help would be greatly appreciated!
EDIT:
I have tried just exploding the name and searching with multiple OR_LIKE clauses. This works until I have no "tussenvoegsel" and one of the Like statements reads "OR anaam LIKE '%%'"
split the string with explode and search for the first and last item.
$string1 = "Steve de Vette";
$string2 = "Kees van der Berg";
$ex1 = explode(" ", $string2);
$nr = count($ex1);
echo $ex1[0]; //firstname
echo ' ';
echo $ex1[$nr-1]; //lastname
Well you can use the PHP string searching funciton.
$pos = strpos($string, $character);
You could use this to find the first space in the name. So if you take "Steve de Vette", you could first find Steve as the first name, then the rest of the string you could search again or keep the rest of it as the last name.
This is a snippet of code taken from my own site.
$fname = strstr($entry," ",true); <-- finds the first name (all characters up to the first space)
$len = strlen($fname) + 1; <-- skips over the space to the last name
$entrylen = strlen($entry); <-- gets the length of the search string used
$sname = substr($entry, $len, $entrylen); <-- gets the rest of the string (last name)
Hope you find this helpful
What i do is strip out any spaces all together. I store spaces in my database like normal but use the replace feature when searching to strip out spaces. then strip out spaces from the search field as well and use the like with the wild card on the right hand side. I try to make the search as simple as possible. searching with one word seems to work better all together so forcing one word seems to be the thing that works for me.
Related
I'm trying to stop users searching with terms that give far too many results.
For example, I'd like users to be able to search for "Big Island" but not search for "Island".
I tried this:
$array = array("island", "islands", "island's", "islet", "ilsets", "reef", "reefs", "shoal", "shoals");
if (0 < count(array_intersect(array_map('strtolower', explode(' ', $searchTerm)), $array)))
{
echo "No results. The search term used was too general.";
exit();
}
But that stops the search for any phrase with the stop words in it.
I guess I'm looking for something that goes like:
if the string contains this word or that word (and only one of those words!) stop what you're doing, else carry on...
look at the strstr function it may help you http://php.net/manual/en/function.strstr.php
Maybe look at searching after removing "stop words" which are words like "the" "and" etc. So remove the "stop words", then check if the search string is empty. Also make sure to add your other words to the stop words list.
Try looking at in_array()
That way you can just say
if (in_array($searchterm,$array))
{
// stop searching
}
Maybe let the user search and then present an "Related searches" à la Google...
Why don't you try with regular expressions - here is a simple demo. You can of course generate the character groups (big|small) etc. from predefined arrays
$regex = '/^(big|small)\s+(Island|islet|reef)$/';
if ( 1 === preg_match($regex, 'big reef ') {...}
I found the following code which removes everything from a php var other than the letters and numbers, perfect!
$string = "remove ever^&thing but *&^*&%£ letters & numbers*&^*";
$cleansedstring = preg_replace('#\W#', '', $string);
echo $cleansedstring;
But I want to query a column in mysql using the same rule. The aim is to kinda remove all ampersands, apostrophes, hyphens etc from the equation.
Right now I use some ugly REPLACE statements!
AND Replace(Replace(Replace(Replace(Replace(MYColumn, '&', ''), '-', ''), ' ', ''), '(', ''), ')', '') =
I would like a bullet proof query going forward that only looks at numbers and letters!
:)
EDIT: I guess I should explain my problem further, perhaps one of you can suggest something else....
I generate links across my website by pulling out long company names and cleaning them up for a seo friendly URLs. So say we have "Bill & Ben's Flower Pot/Garden Service" my eventual URL to their page looks like /bill-bens-flower-pot-garden-service.htm
That URL is a rewritten URL so the actual page is like company.php?name=bill-bens-flower-pot-garden-service
I need to grab "name" and query the database to return their company details. But getting:
bill-bens-flower-pot-garden-service
to return the details of:
Bill & Ben's Flower Pot/Garden Service
Isn't always bullet proof it seems.
It just feels like my code is a little messy in how many replaces I'm doing all over the place. My theory was to just strip all non letters and numeric data out and compare the strings that way:
WHERE Company = 'billbensflowerpotgardenservice' - but I can't seem to do that on the field name itself?
Any suggestions!
You can use regular expressions since mysql 5.1. Here is the documentation page for that:
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
This is a homework assignment and my first experience using RegEx. I am starting to grasp the syntax and symbols used and can do some simple pattern matching/manipulation, but can't quite foresee how to achieve some of the goals of this assignment.
I have been given a text file that is formatted like this:
Steve Blenheim:238-923-7366:95 Latham Lane, Easton, PA 83755:11/12/56:20300
Betty Boop:245-836-8357:635 Cutesy Lane, Hollywood, CA 91464:6/23/23:14500
Igor Chevsky:385-375-8395:3567 Populus Place, Caldwell, NJ 23875:6/18/68:23400
Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:245700
There are about 50 lines of names and corresponding info, each entry is on a new line and each 'field' is separated by a colon. Mostly I need to find specific things from the file and print them on a webpage but I don't quite understand.
Here is one problem I solved:
$myFile = "datebook.txt";
$data = file($myFile);//I have used this to place all data in an array, but it may be necessary to place the data into a string?
//1) Print all lines containing the pattern Street (case insensitive).
$pattern = "/street/i";
$linesFound = preg_grep($pattern, $data);
echo "<pre>", print_r($linesFound, true), "</pre>";
Here are some I have not and specific questions regarding them:
2) Print the first and last names in which the first name starts with a letter ‘B’.
How do I only search for first names and not last names, city names, etc?
How do I print the full name and only the full name?
5) Print Lori Gortz’s name and address.
I understand how to find the pattern 'Lori Gortz' but how do I return her address as well?
11) Print lines that end in exactly five digits.
12) Print the file with the first and last names reversed.
14) Give everyone a $250.00 raise.
Don't know how to do any of these. I assume the last number for each entry is their salary.
Any help is appreciated. Please respond with an explanation of the code as well, thank you.
Check the RegEx quick reference, I think you'll figure out most of your tasks there. For example, Lori's address would be a string after the number after the second colon and before the second coma (in her line, of course).
The best way to do all the task would be to go over each line and make an array with all the elements. That way you could easy replace names, increase salaries, check if it ends with 5 digits, etc.
You can also try this online tester. Good luck.
Edit:
Little help for a start:
^[A-z ]* this gets full names
^[A-z]* this gets first names
etc...
Edit2:
See what this code does:
$line = "Betty Boop:245-836-8357:635 Cutesy Lane, Hollywood, CA 91464:6/23/23:14500";
$regex = "/\s|:/";
$result = preg_split($regex, $line);
:)
I don't want to do all of them, but here's some hints.. For question 2:
^[A-Z]* B.*$
^ basically means a new line.
[A-Z]* means any number of characters from A-Z
Next we match a space
Next we match a B
The .* means any number of other characters.
Lastly, we match with an end of line using $
This can definitely be improved and made more flexible, but I'll let you do that..
I am using cakephp 1.3 and I have textarea where users submit articles. On submit, I want to look into the article for certain key words and and add respective tags to the article.
I was thinking of preg_match, But preg_match pattern has to be string. So I would have to loop through an array(big).
Is there a easier way to plug in the keywords array for the pattern.
I appreciate all your help.
Thanks.
I suggest treating your array of keywords like a hash table. Lowercase the article text, explode by spaces, then loop through each word of the exploded array. If the word exists in your hash table, push it to a new array while keeping track of the number of times it's been seen.
I ran a quick benchmark comparing regex to hash tables in this scenario. To run it with regex 1000 times, it took 17 seconds. To run it with a hash table 1000 times, it took 0.4 seconds. It should be an O(n+m) process.
$keywords = array("computer", "dog", "sandwich");
$article = "This is a test using your computer when your dog is being a dog";
$arr = explode(" ", strtolower($article));
$tracker = array();
foreach($arr as $word){
if(in_array($word, $keywords)){
if(isset($tracker[$word]))
$tracker[$word]++;
else
$tracker[$word] = 1;
}
}
The $tracker array would output: "computer" => 1, "dog" => 2. You can then do the process to decide what tags to use. Or if you don't care about the number of times the keyword appears, you can skip the tracker part and add the tags as the keywords appear.
EDIT: The keyword array may need to be an inverted index array to ensure the fastest lookup. I am not sure how in_array() works, but if it searches, then this isn't as fast as it should be. An inverted index array would look like
array("computer" => 1, "dog" => 1, "sandwich" => 1); // "1" can be any value
Then you would do isset($keywords[$word]) to check if the word matches a keyword, instead of in_array(), which should give you O(1). Someone else may be able to clarify this for me though.
If you don't need the power of regular expressions, you should just use strpos().
You will still need to loop through the array of words, but strpos is much, much faster than preg_match.
Of course, you could try matching all the keywords using one single regexp, like /word1|word2|word3/, but I'm not sure it is what you are looking for. And also I think it would be quite heavy and resource-consuming.
Instead, you can try with a different approach, such as splitting the text into words and checking if the words are interesting or not. I would make use of str_word_count() using someting like:
$text = 'this is my string containing some words, some of the words in this string are duplicated, some others are not.';
$words_freq = array_count_values(str_word_count($text, 1));
that splits the text into words and counts occurrences. Then you can check with in_array($keyword, $words_freq) or array_intersect(array_keys($words_freq), $my_keywords).
If you are not interested, as I guess, to the keywords case, you can strtolower() the whole text before proceeding with words splitting.
Of course, the only way to determine which approach is the best is to setup some testing, by running various search functions against some "representative" and quite long text and measuring the execution time and resource usage (try microtime(TRUE) and memory_get_peak_usage() to benchmark this).
EDIT: I cleaned up a bit the code and added a missing semi-colon :)
If you want to look for multiple words from an array, then combine said array into an regular expression:
$regex_array = implode("|", array_map("preg_escape", $array));
preg_match_all("/($regex_array)/", $src, $tags);
This converts your array into /(word|word|word|word|word|...)/. The arrray_map and preg_escape part is optional, only needed if the $array might contain special characters.
Avoid strpos and loops for this case. preg_match is faster for searching after alternatives.
strtr()
If given two arguments, the second
should be an array in the form
array('from' => 'to', ...). The return
value is a string where all the
occurrences of the array keys have
been replaced by the corresponding
values. The longest keys will be tried
first. Once a substring has been
replaced, its new value will not be
searched again.
Add tags manually? Just like we add tags here at SO.
I'm looking for the best reliable way to return the first and last name of a person given the full name, so far the best I could think of is the following regular expression:
$name = preg_replace('~\b(\p{L}+)\b.+\b(\p{L}+)\b~i', '$1 $2', $name);
The expected output should be something like this:
William -> William // Regex Fails
William Henry -> William Henry
William Henry Gates -> William Gates
I also want it to support accents, for instance "João".
EDIT: I understand that some names will not be properly identified, but this isn't a problem for me, since this is going to be used on a local site where the last word is the last name (might not be the whole surname though) but this isn't a problem since all I want is a quick way to say "Dear FIRST_NAME LAST_NAME"... So all this discussion, while totally valid, is useless to me.
Can someone help me with this?
This might not be what you want to hear, but I don't think this problem is suited to a regular expression since names are not regular. I don't think they are even context-sensitive or context-free. If anything, they are unrestricted (I would have to sit down and think that through more than I did before I say that for sure, though) and no regular expression engine can parse an unrestricted grammar.
Instead of a regex you might find it easier to do something like:
$parts = explode(" ", $name);
$first = $parts[0];
$last = ""
if (count($parts) > 1) {
$last = $parts[count($parts) - 1];
}
You might want to replace multiple consecutive bits of whitespace with a single space first, so you don't get empty bits, and get rid of trailing/leading whitespace:
$name = ereg_replace("[ \t\r\n]+", " ", trim($name));
As is, you're requiring a last name -- which, of course, your first example doesn't have.
Use clustered grouping, (?:...), and 0-or-1 count, ?, for the middle and last names as a whole to allow them to be optional:
'~\b(\p{L}+)\b (?: .+\b(\p{L}+)\b )?~ix' # x for spacing
This should allow the first name to be captured whether middle/last names are given or not.
$name = preg_replace('~\b(\p{L}+)\b(?:.+\b(\p{L}+)\b)?~i', '$1 $2', $name);
Depending on how clean your data is, I think you are going to have a tough time finding a single regex that does what you want. What different formats do you expect the names to be in? I've had to write similar code and there can be a lot of variations:
- first last
- last, first
- first middle last
- last, first middle
And then you have things like suffixes (Junior, senior, III, etc.) and prefixes ( Mr., Mrs, etc), combined names (e.g. John and Mary Smith). As some others have already mentioned you also have to deal with multi-part last names (e.g. Victor de la Hoya) as well.
I found I had to deal with all of those possibilities before I could reliably pull out the first and last names.
If you're defining first and last name as the text before the first space and after the last space, then just split the string on spaces and grab the first and last elements of the array.
However, depending on the context/scope of what you're doing, you may need to re-evaluate things - not all names around the world will meet this pattern.
I think your best option is to simply treat everything after the first name as the surname i.e.
William Henry Gates
Forename: William
Surname: Henry Gates
Its the safest mechanism as not everyone will enter their middle name anyway. You can't simply extract William - ignore Henry - and extract Gates as for all you know, Henry is part of the Surname.
Here is simple non regex way
$name=explode(" ",$name);
$first_name=reset($name);
$last_name=end($name);
$result=$first_name.' '.$last_name;