I've created a regular expression in C# but now I'm struggling when trying to run it in PHP. I presumed they'd work the same but obviously not. Does anyone know what needs to be changed below to get it working?
The idea is to make sure that the string is in the format "Firstname Lastname (Company Name)" and then to extract the various parts of the string.
C# code:
string patternName = #"(\w+\s*)(\w+\s+)+";
string patternCompany = #"\((.+\s*)+\)";
string data = "Firstname Lastname (Company Name)";
Match name = Regex.Match(data, patternName);
Match company = Regex.Match(data, patternCompany);
Console.WriteLine(name.ToString());
Console.WriteLine(company.ToString());
Console.ReadLine();
PHP code (not working as expected):
$patternName = "/(\w+\s*)(\w+\s+)+/";
$patternCompany = "/\((.+\s*)+\)/";
$str = "Firstname Lastname (Company Name)";
preg_match($patternName, $str, $nameMatches);
preg_match($patternCompany, $str, $companyMatches);
print_r($nameMatches);
print_r($companyMatches);
Seems to work here. What you should realize is that when you're capturing matches in a regex, the array PHP produces will contain both the full string that got matched the pattern as a whole, plus each individual capture group.
For your name/company name, you'd need to use
$nameMatches[1] -> Firstname
$nameMatches[2] -> Lastname
and
$companyMatches[1] -> Company Name
which is what got matched by the capture group. the [0] element of both is the entire string.
It could be because you're using double-quotes. PHP might be intercepting your escape sequences and removing them since they are not recognized.
Your patterns do appear to extract the information you want. Try replacing the two print_r() lines with:
print "Firstname: " . $nameMatches[1] . "\n";
print "Lastname: " . $nameMatches[2] . "\n";
print "Company Name: " . $companyMatches[1] . "\n";
Is there anything wrong with this output?
Firstname: Firstname
Lastname: Lastname
Company Name: Company Name
Related
I have code where I am extracting a name from a database, trying to reorder the word, and then changing it from all uppercase to word case. Everything I find online suggests my code should work, but it does not... Here is my code and the output:
$subjectnameraw = "SMITH, JOHN LEE";
$subjectlname = substr($subjectnameraw, 0, strpos($subjectnameraw, ",")); // Get the last name
$subjectfname = substr($subjectnameraw, strpos($subjectnameraw, ",") + 1) . " "; // Get first name and middle name
$subjectname = ucwords(strtolower($subjectfname . $subjectlname)); // Reorder the name and make it lower case / upper word case
However, the output looks like this:
John Lee smith
The last name is ALWAYS lowercase no matter what I do. How can I get the last name to be uppercase as well?
The above code gives wrong results when there are multibyte characters in the names like RENÉ. The following solution uses the multibyte function mb_convert_case.
$subjectnameraw = "SMITH, JOHN LEE RENÉ";
list($lastName,$firstnName) = explode(', ',mb_convert_case($subjectnameraw,MB_CASE_TITLE,'UTF-8'));
echo $firstnName." ".$lastName;
Demo : https://3v4l.org/ekTQA
How can I separate firstname and surname from a string like this:
Pietro DE GIOVANNI
(Pietro being the firstname and DE GIOVANNI the surname)
I used to do it with an explode() on the spaces, but obviously it doesn't work on a person like that.
Thanks in advance.
You can explode on the names by spaces as before, then loop the result as individual pieces of the name. Check with ctype_upper() if the string is purely uppercase or not, and append it to the proper variable.
Putting it into a function, it may look like this
function split_name($fullname) {
$firstname = "";
$surname = "";
$pieces = explode(" ", $fullname);
foreach ($pieces as $name) {
if (ctype_upper($name))
$surname .= $name." ";
else
$firstname .= $name. " ";
}
return array("firstname" => $firstname, "surname" => $surname);
}
You can then use it as such
$name = "Pietro DE GIOVANNI";
$split = split_name($name);
echo "Firstname: ".$split['firstname']."\nSurname: ".$split['surname'];
Note
This doesn't work for names such as James O'RILEY, John-Paul JOHNSON or John. F. KENNEDY. The first two we can circumvent by stripping away any characters that's not a-zA-Z before comparing with ctype_upper(), but the latter we won't be able to distinguish if it's a firstname or surname - there's not enough data to say either way. You can assume that it's always a part of the firstname (for instance), and/or check if it's after we've started looking at the surnames (if a name in capital letters has been found yet). You can take care of the first two cases by checking for
if (ctype_upper(filter_var(str_replace("'", "", $name), FILTER_SANITIZE_STRING)))
instead of using the if statement in the original codeblock. This removes quotes and any non-a-zA-Z values.
Here's a live demo where I've stripped away names that contain any characters beside a-zA-Z, which would account for the two first issues.
|affffc100|Hitem:bb:101:1:1:1:1:48:-30:47:18:5:2:6:6:0:0:0:0:0:0:0:0|h[Subject Name]|h|r
my usual printed out variable is ^
|cffffc700|Hitem:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x|h[SUBJECT_NAME]|h|r
my pattern is ^
ALL X's can be a-Z, 0-9
in one column I have many variables like that (up to 8).
and all variables are mixed with strings like that:
|affffc100|Hitem:bb:101:1:1:1:1:48:-30:47:18:5:2:6:6:0:0:0:0:0:0:0:0|h[Gold]|h|r NEW SOLD |affffc451|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Copper]|h|r maximum price 15k|affffx312|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Silver]|h|r
In one variable I want to clean all these unnecessary patterns and leave only subject name in brackets. []
So;
|cffffc700|Hitem:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x|h[SUBJECT NAME]|h|r
needs to leave only SUBJECT_NAME in my variable.
just to remind, I have always more than one from these pattern in my every variable... (up to 8)
I've searched it everywhere but couldn't find any reasonable answers NOR good patterns. Tried to make it myself but I guess I need to take all these patterns and make it array and clean it and only leave these subject names but I don't know exactly how to do it.
how do I convert this to :
|affffc100|Hitem:bb:101:1:1:1:1:48:-30:47:18:5:2:6:6:0:0:0:0:0:0:0:0|h[Gold]|h|r NEW SOLD |affffc451|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Copper]|h|r maximum price 15k|affffx312|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Silver]|h|r
this:
Gold NEW SOLD Copper maxiumum price 15k Silver
what should I use, preg_replace?
one more thing left, when I have a string without my special pattern, I get empty result from the function eg:
$str = "15KKK sold, 20KK updated";
expected result:
"15KKK sold, 20KK updated" // same without any pattern
but ^ that one returns EMPTY result..
another string:
$str = "|affffc100|Hitem:bb:101:1:1:1:1:48:-30:47:18:5:2:6:6:0:0:0:0:0:0:0:0|h[Uranium]|h|r 155kk |affffc451|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Metal]|h|r is sold";
expected result:
"Uranium 155kk Metal is sold"
if I use that function with non-pattern string it returns empty result that's my problem now
thank you very much
I'd do:
$str = '|affffc100|Hitem:bb:101:1:1:1:1:48:-30:47:18:5:2:6:6:0:0:0:0:0:0:0:0|h[Gold]|h|r NEW SOLD |affffc451|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Copper]|h|r maximum price 15k|affffx312|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Silver]|h|r';
preg_match_all('/h(\[.+?\])\|h\|r([^|]*)/', $str, $m);
for($i=0; $i<count($m[0]); $i++) {
$res .= $m[1][$i] . ' ' . $m[2][$i] . ' ';
}
echo $res,"\n";
Output:
[Gold] NEW SOLD [Copper] maximum price 15k [Silver]
If you want to keep the strings that don't match, test the result of preg_match:
if (preg_match_all('/h(\[.+?\])\|h\|r([^|]*)/', $str, $m)) {
for($i=0; $i<count($m[0]); $i++) {
$res .= $m[1][$i] . ' ' . $m[2][$i] . ' ';
}
} else {
$res = $str;
}
echo $res,"\n";
try this regex:
\|\w{9}\|Hitem(?::-?\w+)+\|h\[(?<SUBJECTNAME>\w+)\]\|h\|r
it will capture each variable sequence, as well as the relevant element name in the named group.
see the demo here
Here's the deal, I am handling a OCR text document and grabbing UPC information from it with RegEx. That part I've figured out. Then I query a database and if I don't have record of that UPC I need to go back to the text document and get the description of the product.
The format on the receipt is:
NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456
So, when I go back the second time to find the name of the item I am at a complete loss. I know how to get to the line where the UPC is, but how can I use something like regex to get the name that precedes the UPC? Or some other method. I was thinking of somehow storing the entire line and then parsing it with PHP, but not sure how to get the line either.
Using PHP.
Get all of the names of the items indexed by their UPCs with a regex and preg_match_all():
$str = 'NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456';
preg_match_all( '/^(.*?)\s+(\d+)/m', $str, $matches);
$items = array();
foreach( $matches[2] as $k => $upc) {
if( !isset( $items[$upc])) {
$items[$upc] = array( 'name' => $matches[1][$k], 'count' => 0);
}
$items[$upc]['count']++;
}
This forms $items so it looks like:
Array (
[123456789012] => NAME OF ITEM
[987654321098] => OTHER NAME
[567890123456] => NAME
)
Now, you can lookup any item name you want in O(1) time, as seen in this demo:
echo $items['987654321098']; // OTHER NAME
You can find the string preceding a value you know with the following regex:
$receipt = "NAME OF ITEM 123456789012\n" .
"OTHER NAME 987654321098\n" .
"NAME 567890123456";
$upc = '987654321098';
if (preg_match("/^(.*?) *{$upc}/m", $receipt, $matches)) {
$name = $matches[1];
var_dump($name);
}
The /m flag on the regex makes the ^ work properly with multi-line input.
The ? in (.*?) makes that part non-greedy, so it doesn't grab all the spaces
It would be simpler if you grabbed both the name and the number at the same time during the initial pass. Then, when you check the database to see if the number is present, you already have the name if you need to use it. Consider:
preg_match_all('^([A-Za-z ]+) (\d+)$', $document, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$name = $match[1];
$number = $match[2];
if (!order_number_in_database($number)) {
save_new_order($number, $name);
}
}
You can use lookahead assertions to match string preceding the UPC.
http://php.net/manual/en/regexp.reference.assertions.php
By something like this: ^\S*(?=\s*123456789012) substituting the UPC with the UPC of the item you want to find.
I'm lazy, so I would just use one regex that gets both parts in one shot using matching groups. Then, I would call it every time and put each capture group into name and upc variables. For cases in which you need the name, just reference it.
Use this type of regex:
/([a-zA-Z ]+)\s*(\d*)/
Then you will have the name in the $1 matching group and the UPC the $2 matching group. Sorry, it's been a while since I've used php, so I can't give you an exact code snippet.
Note: the suggested regex assumes you'll only have letters or spaces in your "names" if that's not the case, you'll have to expand the character class.
I am having a problem with regular expressions at the moment.
What I'm trying to do is that for each line through the iteration, it checks for this type of pattern: Lastname, Firstname
If it finds the name, then it will take the first letter of the first name, and the first six letters of the lastname and form it as an email.
I have the following:
$checklast = "[A-z],";
$checkfirst = "[A-z]";
if (ereg($checklast, $parts[1])||ereg($checkfirst, $parts[2])){
$first = preg_replace($checkfirst, $checkfirst{1,1}, $parts[2]);
print "<a href='mailto:$first.$last#email.com;'> $parts[$i] </a>";
}
This one obviously broke the code. But I was initially attempting to find only the first letter of the firstname and then after that the first six letters of the lastname followed by the #email.com This didn't work out too well. I'm not sure what to do at this point.
Any help is much appreciated.
How about something like this:
$name = 'Smith, John';
$email = preg_replace('/([a-z]{1,6})[a-z]*?,[\\s]([a-z])[a-z]*/i',
'\\2.\\1#email.com', $name);
echo $email; // J.Smith#email.com
Cheers