Problem with stristr text matching in PHP - php

I'm running a fairly simple script which tries to match strings from a csv file with potential matches in a mysql table (collation: ut8_general_ci). For each row in the csv file, I pull out the string (haystack) I want, which looks something like this:
"Full Cmte. Member City of Rutland Rutland VT"
For each string, I pull the list of matches from my db, and cycle through them until stristr finds a match. (I'm using stristr instead of regex because it's simpler and (I think?) quicker.) Some of the matching strings don't make grammatic/syntactical sense because they're constructed as aliases particular to this data set. One of them is "City of Rutland Rutland VT" (an alias for "City of Rutland (VT)"), which should, but doesn't, match the string above. For more than 90% of these matches, I don't have any problems. However, certain text matching doesn't seem to work.
Here are a list of those that fail to produce a match:
Haystack => Needle
"Full Cmte. Member City of Ocala Ocala FL" => "City of Ocala Ocala FL"
"Full Board Member Water and Sanitation District Anthony NM" => "Water and Sanitation District Anthony"
"Energy Clean Air & Climate Change Subcmte Member Consol Inc." => "Consol Inc."
"Full Council Member; Sr. VP Integrated Services Burke Inc. Cincinnati OH" => "Burke Inc."
"City of San Antonio TX" => "City of San Antonio TX"
"Full Cmte member United National Indian Tribal Youth Inc. (UNITY)" => "United National Indian Tribal Youth Inc."
"ECA&CC Sub. Member Cyprus Amax Minerals Inc." => "Cyprus Amax Minerals Inc."
"Silcon Valley Manufacturing Group" => "Silcon Valley Manufacturing Group"
"President Global Environmental Resources Inc. Washington DC" => "Global Environmental Resources Inc."
"Lancaster Laboratories Inc." => "Lancaster Laboratories Inc."
I'm not sure what to make of this, unless it's something very basic that I've totally missed. it seems that most of the errors have "inc." in the match, but not sure that's what's causing it.
Here's the code (though the answer below fit the bill):
$patterns = array();
$patterns[0] = '/\s+/';
$patterns[1] = '/&/';
$replacement = array();
$replacement[0] = ' ';
$replacement[1] = 'and';
$name = trim(preg_replace($patterns,$replacement,$name));
if(stristr($name,trim(preg_replace($patterns,$replacement,$org->org_name)))) {
// code here
}
It's not terribly graceful right now and I would appreciate any additional insight as to how to normalize strings for matching.

My guess is that you view this through a browser, as html, so that (multiple) whitespace all condenses to a single space. This way it looks like it should match, but it doesn't.
A convenient way to prevent this, with little side effects, is to preprocess both the needle and the haystack:
$needle = trim(preg_replace('/\s+/',' ',$needle));
$haystack = trim(preg_replace('/\s+/',' ',$haystack));
The trim() is to solve issues caused by leading or trailing whitespace.

Related

How do you uppercase only certain parts of a single string that's in format "Town comma Initials"?

I have a single location field where people can enter whatever they want, but generally they will enter something in the format of "Town, Initials". So for example, these entries...
New york, Ny
columbia, sc
charleston
washington, DC
BISMARCK, ND
would ideally become...
New York, NY
Columbia, SC
Charleston
Washington, DC
Bismarck, ND
Obviously I can use ucfirst() on the string to handle the first character, but these are things I'm not sure how to do (if they can be done at all)...
Capitalizing everything after a comma
Lowercasing everything before the comma (aside from the first character)
Is this easily doable or do I need to use some sort of regex function?
You could simply chop it up and fix it.
<?php
$geo = 'New york, Ny
columbia, sc
charleston
washington, DC
BISMARCK, ND';
$geo = explode(PHP_EOL, $geo);
foreach ($geo as $str) {
// chop
$str = explode(',', $str);
// fix
echo
(!empty($str[0]) ? ucwords(strtolower(trim($str[0]))) : null).
(!empty($str[1]) ? ', '.strtoupper(trim($str[1])) : null).PHP_EOL;
}
https://3v4l.org/ojl2M
Though you should not trust the user to enter the correct format. Instead find a huge list of all states and auto complete them in. Perhaps something like https://gist.github.com/maxrice/2776900 - then validate against it.

How to put a comma before and after a word in PHP?

Lets say, this is my address 537 Great North Road Grey Lynn Auckland City Auckland.
I want to put comma (,) after Grey Lynn and Auckland City
Then address will 537 Great North Road, Grey Lynn, Auckland City, Auckland
How can I do it in PHP? When the length is not fixed.
This is not a perfect solution but you can get an idea how you deal with it.
By using PHP :
$t = "537 Great North Road Grey Lynn Auckland City Auckland";
$t = str_replace(
["Road", "Lynn", "City"], // neddle
["Road,", "Lynn,", "City,"], // replace
$t
);
echo $t;
More Details
I would suggest you look at Regular Expressions (RegEx) to achieve this.
In that way you could loop through each address and use the regex pattern to replace where a comma is required.
However, I believe due to the format of the data it might be very hard to actually achieve this. The only thing you have to detect where a comma needs to go is a space, and that isn't reliable as you can have spaces between road names etc where you don't want commas to be placed!
If you can I would suggest splitting the data up, so rather than having the address in one string you have it split in separate columns / variables, for "house number", "street", "town" etc.. That way you could then use a simple string concatenation to place the commas where they should go.
E.g.:
$houseNumber . " " . $street . ", " . $town . ",";
I hope that helps!
Try This Before and after variable you can put comma.
<?php
$GreyLynn = "Grey Lynn";
$AucklandCity = "Auckland City";
echo ' , '.$GreyLynn.' , '.$AucklandCity;
?>
$seperate = "537 Great North Road Grey Lynn Auckland City Auckland";
$replace = str_replace ("Grey Lynn", ",Grey Lynn, ",$seperate);
$location = str_replace `("Auckland City", "Auckland City, ",$replace);`
Result:
537 Great North Road ,Grey Lynn, Auckland City, Auckland

How do I reformat imported 'addresses' using an Array?

I need a strategy (and help) to accomplish the following;
I import addresses into the DB in this format:
[ 111 SW 22ND RD, 11111 NE 224TH ST ]
What I need is this:
[111 SW 22nd Road, 11111 NE 244th Street]
So my objective here is two folds:
to lowercase the suffix in the street number [22nd / 144th]
to replace the abbreviated street-type with the full word (first letter capitalized), ie. ST -> Street / RD -> Road
I thought the best way to solve this is to;
Lowercase everything first => 1111 sw 22nd st
Then target the 'direction' (sw) back to capitalized, and
Finally use an Array within an Array to identify and replace specific text. Ie.
Way = [way, WAY, wy, WY]
Street = [street, STREET, st, ST]
Road = [road, ROAD, rd, RD]
Is this the best approach?
If so, how do I approach (#2) targeting and capitalizing the 'direction' (SW, NE, etc), and (#3) what is the array that can identify and replace the abbreviated street-type?
This took care of it all. Worked great!
<?php
$data= $row['ADDRESS'];
$find= array(
'Way'=> array('WY','WAY','Wy'),
'Court'=> array('CT') ,
'Street'=> array(' ST'),
'th'=> array('TH'),
'rd'=> array('RD'),
'nd'=> array('ND'),
'1st'=> array('1ST')
);
foreach($find as $key => $value){
foreach($value as $r){
$data= str_replace($r,$key,$data);
}
}
echo $data;
?>
Noy Hadar

Proper Case'ing without breaking things like IBM, NASA etc

Does anyone has a PHP solution to this?
The goal is to have a function that take these
HELLO WORLD
hello world
Hello IBM
and return these
Hello World
Hello World
Hello IBM
respectively.
Mr MacDonald from Scotland prefers his name capitalized that way, while Mr Macdonald from Ireland prefers it thus. It is kinda hard to know which is 'correct' without knowing in advance which gentleman you are referring to, which takes more context than just the words in the file.
Also, the BBC (or is that the Bbc?) has taken to spelling some names like Nasa and Nato. It jars on me; I dislike it intensely. But that's what they do these days. When does an acrynom (or 'initialism' as some prefer to call it) become a word in its own right?
Tho this is a bit of a hack, you could store a list of acronyms that you want to keep uppercase and then compare the words within the string against the list of $exceptions.
While Jonathan is correct, if its names your working with and not acronyms then this solution is useless. but obviously if Mr MacDonald from Scotland is in the correct case then it wont change.
See it in action
<?php
$exceptions = array("to", "a", "the", "of", "by", "and","on","those","with",
"NASA","FBI","BBC","IBM","TV");
$string = "While McBeth and Mr MacDonald from Scotland
was using her IBM computer to watch a ripped tv show from the BBC,
she was being watched by the FBI, Those little rascals were
using a NASA satellite to spy on her.";
echo titleCase($string, $exceptions);
/*
While McBeth and Mr MacDonald from Scotland
was using her IBM computer to watch a ripped TV show from the BBC,
she was being watched by the FBI, Those little rascals were
using a NASA satellite to spy on her.
*/
/*Your case example
Hello World Hello World Hello IBM, BBC and NASA.
*/
echo titleCase('HELLO WORLD hello world Hello IBM, BBC and NASA.', $exceptions,true);
function titleCase($string, $exceptions = array(), $ucfirst=false) {
$words = explode(' ', $string);
$newwords = array();
$i=0;
foreach ($words as $word){
// trim white space or newlines from string
$word=trim($word);
// trim ending coomer if any
if (in_array(strtoupper(trim($word,',.')), $exceptions)){
// check exceptions list for any words that should be in upper case
$word = strtoupper($word);
} else{
// convert to uppercase if $ucfirst = true
if($ucfirst==true){
// check exceptions list for should not be upper case
if(!in_array(trim($word,','), $exceptions)){
$word = strtolower($word);
$word = ucfirst($word);
}
}
}
// upper case the first word in the string
if($i==0){$word = ucfirst($word);}
array_push($newwords, $word);
$i++;
}
$string = join(' ', $newwords);
return $string;
}
?>

How can I match a string between two other known strings and nothing else with REGEX?

I want to extract a string between two other strings. The strings happen to be within HTML tags but I would like to avoid a conversation about whether I should be parsing HTML with regex (I know I shouldn't and have solved the problem with stristr() but would like to know how to do it with regular expressions.
A string might look like this:
...uld select “Apply” below.<br/><br/><b>Primary Location</b>: United States-Washington-Seattle<br/><b>Travel</b>: Yes, 75 % of the Time <br/><b>Job Type</b>: Standard<br/><b>Region</b>: US Service Lines: ASL - Business Intelligence<br/><b>Job</b>: Business Intelligence<br/><b>Capability Group</b>: Con/Sol - BI&C<br/><br/>LOC:USA
I am interested in <b>Primary Location</b>: United States-Washington-Seattle<br/> and want to extract 'United States-Washington-Seattle'
I tried '(?<=<b>Primary Location</b>:)(.*?)(?=<br/>)' which worked in RegExr but not PHP:
preg_match("/(?<=<b>Primary Location</b>:)(.*?)(?=<br/>)/", $description,$matches);
You used / as regex delimiter, so you need to escape it if you want to match it literally or use a different delimiter
preg_match("/(?<=<b>Primary Location</b>:)(.*?)(?=<br/>)/", $description,$matches);
to
preg_match("/(?<=<b>Primary Location<\/b>:)(.*?)(?=<br\/>)/", $description,$matches);
or this
preg_match("~(?<=<b>Primary Location</b>:)(.*?)(?=<br/>)~", $description,$matches);
Update
I just tested it on www.writecodeonline.com/php and
$description = "uld select “Apply” below.<br/><br/><b>Primary Location</b>: United States-Washington-Seattle<br/><b>Travel</b>: Yes, 75 % of the Time <br/><b>Job Type</b>: Standard<br/><b>Region</b>: US Service Lines: ASL - Business Intelligence<br/><b>Job</b>: Business Intelligence<br/><b>Capability Group</b>: Con/Sol - BI&C<br/><br/>LOC:USA";
preg_match("~(?<=<b>Primary Location</b>:)(.*?)(?=<br/>)~", $description, $matches);
print_r($matches);
is working. Output:
Array ( [0] => United States-Washington-Seattle [1] => United States-Washington-Seattle )
You can also get rid of the capturing group and do
$description = "uld select “Apply” below.<br/><br/><b>Primary Location</b>: United States-Washington-Seattle<br/><b>Travel</b>: Yes, 75 % of the Time <br/><b>Job Type</b>: Standard<br/><b>Region</b>: US Service Lines: ASL - Business Intelligence<br/><b>Job</b>: Business Intelligence<br/><b>Capability Group</b>: Con/Sol - BI&C<br/><br/>LOC:USA";
preg_match("~(?<=<b>Primary Location</b>:).*?(?=<br/>)~", $description, $matches);
print($matches[0]);
Output
United States-Washington-Seattle

Categories