Making one line Regular Expression - php

I'm making cover letters for mailing of books and magazines. I have all data of recipients in the data base and I have PHP script fetching that data and making cover letters. A user who writes that cover letter using special characters to mark where the name should be put etc.
For example, in order to compose a cover letter an user writes:
Dear [[last_name]],
please find attached book...
Then it gets parsed by PHP script and [[last_name]] tag gets replaced with a real name. When 1000 addresses selected for mailing then the script produces 1000 cover letters each one with defferent name.
Now, in my Russian language word "Dear" has different ending for male and female. It is like we say on English "Dear mr." or "Dear mrs."
In order to mark that in the cover letter user writes the possible endings for the word:
Dear[[oy/aya]] [[last_name]]
or it could be something like:
Dear[[ie/oe]]... etc.
I'm trying to figure out the regular expression and replacement command for my PHP script to parse those kind of lines.
For the last_name tags I use:
$text = ...//this is text of the cover letter with all the tags.
$res = $mysqli->query("SELECT * FROM `addresses` WHERE `flag` = 1")
while ($row=$res->fetch_assoc()) {
$text = str_replace('[[last_name]]', $row['lname'], $text);
echo $text;
}
For the word endings as I understand it should be something like:
$text = preg_replace('/\w{2-3}\//\w{2-3}/', ($row['gender']==1)
? 'regexp(first_half)'
: 'regext(second_half)', $text);
I could make this whole idea by cycling through the tag, parsing it and replace but it would be 5-10 lines of code. I'm sure this can be done just by the line above but I can't figure out how.

see http://www.phpliveregex.com/p/2BC and then you replace with $1 for male and $2 for female
...
preg_replace('~\[\[(.+?)/(.+?)\]\]~', $row['gender']==1?'$1':'$2', $text);

$gender = ($row['gender'] == 1) ? 1 : 2;
preg_replace_callback('#\[\[(?:([^\]/]+)(?:/([^\]/]+))?\]\]#',
function($match) use ($row, $gender) {
// $match will contain the current match info
// $match[1] will contain a field name, or the first part of a he/she pair
// $match[2] will be non-empty only in cases of he/she etc
return (empty($match[2])) ? $row[$match[1]] : $match[$gender];
}
);

Related

Compare two sentences sequence wise

The scenario is there is a given sentence. The user have to type in the sentence in some input field and that will be saved to database.
Now, I want to compare both the sentences and find out, how many words the user has typed in correctly and incorrectly (missed any word, or added some new words by typing mistake).
Now I have done something like this-
$Str1 = "i **am** suraj roy i **am** having my lunch";
$Str2 = "i suraj roy i **am** having my dinner";
$st1 = (explode(" ", $Str1));
$st2 = (explode(" ", $Str2));
$result1 = array_diff($st1, $st2);
$result2 = array_diff($st2, $st1);
$word_count1 = count($result1); // words not present in typed sentence from original sentence - lunch
$word_count2 = count($result2); // newly added words in typed sentence - dinner
$total_error_count = $word_count1 + $word_count2;
echo 'Total words not matching between two sentences - '.$total_error_count;
Now this is working as i expected. Good.
But the problem is, have a look at the word am in both sentences, as the comparison is being done between array its taking single occurrence of am in typed sentence for all the occurrence of am in original sentence. So, there is no missing word count for am in typed sentence.
So, any help would be appreciated. Thanks.

Find and highlight word starting with Uppercase letter except the first word in a string using PHP?

Forexample, I have a string :
Liaqat Fayyaz delete data on module Users
I want to highlight Users like this.
Liaqat Fayyaz delete data on module Users
Is there a way to do this using php?
This should work in this case -
$s = 'Liaqat Fayyaz delete data on module Users';
preg_match_all('([A-Z][^\s]*)', $s, $matches); // match all title case words
array_shift($matches[0]); // Remove first word
foreach($matches[0] as $w) {
$s = str_replace($w, "<strong>$w</strong>", $s); // highlight each of them
}
echo $s;
Output
Liaqat <strong>Fayyaz</strong> delete data on module <strong>Users</strong>
Code
preg_match_all()
str_replace()
Oh its very simple find and replace with strong tag your word in php like this
$str = 'Liaqat Fayyaz delete data on module Users';
echo str_replace('Users', '<strong>Users</strong>', $str);
let me know if have any issue i will correct it.

RegEx or Similar - Grab string preceding matched value

Here's the deal, I am handling a OCR text document and grabbing UPC information from it with RegEx. That part I've figured out. Then I query a database and if I don't have record of that UPC I need to go back to the text document and get the description of the product.
The format on the receipt is:
NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456
So, when I go back the second time to find the name of the item I am at a complete loss. I know how to get to the line where the UPC is, but how can I use something like regex to get the name that precedes the UPC? Or some other method. I was thinking of somehow storing the entire line and then parsing it with PHP, but not sure how to get the line either.
Using PHP.
Get all of the names of the items indexed by their UPCs with a regex and preg_match_all():
$str = 'NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456';
preg_match_all( '/^(.*?)\s+(\d+)/m', $str, $matches);
$items = array();
foreach( $matches[2] as $k => $upc) {
if( !isset( $items[$upc])) {
$items[$upc] = array( 'name' => $matches[1][$k], 'count' => 0);
}
$items[$upc]['count']++;
}
This forms $items so it looks like:
Array (
[123456789012] => NAME OF ITEM
[987654321098] => OTHER NAME
[567890123456] => NAME
)
Now, you can lookup any item name you want in O(1) time, as seen in this demo:
echo $items['987654321098']; // OTHER NAME
You can find the string preceding a value you know with the following regex:
$receipt = "NAME OF ITEM 123456789012\n" .
"OTHER NAME 987654321098\n" .
"NAME 567890123456";
$upc = '987654321098';
if (preg_match("/^(.*?) *{$upc}/m", $receipt, $matches)) {
$name = $matches[1];
var_dump($name);
}
The /m flag on the regex makes the ^ work properly with multi-line input.
The ? in (.*?) makes that part non-greedy, so it doesn't grab all the spaces
It would be simpler if you grabbed both the name and the number at the same time during the initial pass. Then, when you check the database to see if the number is present, you already have the name if you need to use it. Consider:
preg_match_all('^([A-Za-z ]+) (\d+)$', $document, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$name = $match[1];
$number = $match[2];
if (!order_number_in_database($number)) {
save_new_order($number, $name);
}
}
You can use lookahead assertions to match string preceding the UPC.
http://php.net/manual/en/regexp.reference.assertions.php
By something like this: ^\S*(?=\s*123456789012) substituting the UPC with the UPC of the item you want to find.
I'm lazy, so I would just use one regex that gets both parts in one shot using matching groups. Then, I would call it every time and put each capture group into name and upc variables. For cases in which you need the name, just reference it.
Use this type of regex:
/([a-zA-Z ]+)\s*(\d*)/
Then you will have the name in the $1 matching group and the UPC the $2 matching group. Sorry, it's been a while since I've used php, so I can't give you an exact code snippet.
Note: the suggested regex assumes you'll only have letters or spaces in your "names" if that's not the case, you'll have to expand the character class.

PHP how to split text that have same delimiter in the text?

I am having trouble to separate the text.
This is the scenario:
$check = "Apple|Orange|Animal|Dog|Grape";
Suppose by using explode i could separate the word with "|", but because the value i retrieved "Animal|Dog" should be a word so in this case, what would be the solution?? I could not use
limit as well because the position or number of text could be different.
The only way to distinctly separate the text is the Animal keyword. Is there any function in php that similar to mysql LIKE syntax?
If Case 2
$check = "Apple|Orange|Animal:Dog|Cat|Grape";
OR
$check = "Apple|Orange|Animal:Fish|Bird|Grape";
where the name of animal could be vary.
Output
"Apple|Orange|Animal:Dog,Cat|Grape" or "Apple|Orange|Animal:Fish,Bird|Grape"
Thanks.
If all you want to do is replace "Animal|" with "Animal:" then you can do a simple str_replace:
$check = "Apple|Orange|Animal|Dog|Grape";
$newCheck = str_replace("Animal|","Animal:"); // will be set to 'Apple|Orange|Animal:Dog|Grape'
Is that what you meant?
EDIT, FOR CASE 2:
I assume you have a string like "Apple|Orange|Animal:Dog|Cat|Grape", which has the category followed by 2 members of the category. From what you've said, you want to transform this string into "Apple|Orange|Animal:Dog,Cat|Grape" with a comma separating the two group members instead of a pipe. This is more complicated than the first case - the category name could vary, and you can't do a simple str_replace starting with the colon because the first member of the group could vary as well. For this case, you'll need to use a regular expression to match and replace the pattern of the string. Here's the code:
$check = "Apple|Orange|Animal:Dog|Cat|Grape";
$newCheck = preg_replace("#(Animal:\w+)\|#", "$1,", $check); // will be set to "Apple|Orange|Animal:Dog,Cat|Grape"
DEMO
Let me explain what this does, in case you're not familiar with regular expressions. The first argument of the preg_replace function, "#(Animal:\w+)\|#", tells PHP to look for all substrings of $check that begin with the text "Animal" followed by a colon, then a string of words with one or more character, and end with a pipe. This will look for the category name as well as the first member of that category in your string. The second argument, ":$1,", tells PHP to change the first pipe after this pattern into a comma. If you have a different category name, simply change the pattern you pass as the first argument to the preg_replace function:
$check = "Apple|Orange|Animal1:Fish|Bird|Grape";
$newCheck = preg_replace("#(Animal1:\w+)\|#", "$1,", $check); // will be set to "Apple|Orange|Animal1:Fish,Bird|Grape"
Let me know if this is hard to follow!
The way I would handle this would be to use a / instead of a | between categories and items. Or use a different delimiter if you really want to keep the | in between categories & items.
$check = "Apple|Orange|Animal/Dog|Grape";
$ex=explode("|", $check);
but if you don't want to do that... then if you know what your category names are like "animal", you could explode the array on | and assume that if your current value is "animal", then the next element in the array is going to be "dog", "cat", whatever. This is not a good solution though, and would not work for multiple category levels.
You have spaces between Apple and orange, like this Apple | Orange. But there is no space between Animal and Dog Like this Animal|Dog. If this is the situation, you can explode it like this
$check = "Apple | Orange | Animal|Dog | Grape";
$ex=explode(" | ", $check);
Which will return array in format
array("Apple","Orange","Animal|Dog","Grape");
You can manipulate above array again to get Animal
I hope this is what you meant
Edit : A rough solution could be :
<?
$check = "Apple|Orange|Animal|Dog|Grape";
$ex=explode("|",$check);
if(in_array("Animal",$ex))
{
echo "Animal:";
}
if(in_array("Dog",$ex)){
echo "Dog";
}
?>
So in this case the position of Animal and Dog doesnot matter
str_replace("Animal|","Animal:",$check);
then do explode

Using regex to fix phone numbers in a CSV with PHP

My new phone does not recognize a phone number unless its area code matches the incoming call. Since I live in Idaho where an area code is not needed for in-state calls, many of my contacts were saved without an area code. Since I have thousands of contacts stored in my phone, it would not be practical to manually update them. I decided to write the following PHP script to handle the problem. It seems to work well, except that I'm finding duplicate area codes at the beginning of random contacts.
<?php
//the script can take a while to complete
set_time_limit(200);
function validate_area_code($number) {
//digits are taken one by one out of $number, and insert in to $numString
$numString = "";
for ($i = 0; $i < strlen($number); $i++) {
$curr = substr($number,$i,1);
//only copy from $number to $numString when the character is numeric
if (is_numeric($curr)) {
$numString = $numString . $curr;
}
}
//add area code "208" to the beginning of any phone number of length 7
if (strlen($numString) == 7) {
return "208" . $numString;
//remove country code (none of the contacts are outside the U.S.)
} else if (strlen($numString) == 11) {
return preg_replace("/^1/","",$numString);
} else {
return $numString;
}
}
//matches any phone number in the csv
$pattern = "/((1? ?\(?[2-9]\d\d\)? *)? ?\d\d\d-?\d\d\d\d)/";
$csv = file_get_contents("contacts2.CSV");
preg_match_all($pattern,$csv,$matches);
foreach ($matches[0] as $key1 => $value) {
/*create a pattern that matches the specific phone number by adding slashes before possible special characters*/
$pattern = preg_replace("/\(|\)|\-/","\\\\$0",$value);
//create the replacement phone number
$replacement = validate_area_code($value);
//add delimeters
$pattern = "/" . $pattern . "/";
$csv = preg_replace($pattern,$replacement,$csv);
}
echo $csv;
?>
Is there a better approach to modifying the CSV? Also, is there a way to minimize the number of passes over the CSV? In the script above, preg_replace is called thousands of times on a very large String.
If I understand you correctly, you just need to prepend the area code to any 7-digit phone number anywhere in this file, right? I have no idea what kind of system you're on, but if you have some decent tools, here are a couple options. And of course, the approaches they take can presumably be implemented in PHP; that's just not one of my languages.
So, how about a sed one-liner? Just look for 7-digit phone numbers, bounded by either beginning of line or comma on the left, and comma or end of line on the right.
sed -r 's/(^|,)([0-9]{3}-[0-9]{4})(,|$)/\1208-\2\3/g' contacts.csv
Or if you want to only apply it to certain fields, perl (or awk) would be easier. Suppose it's the second field:
perl -F, -ane '$"=","; $F[1]=~s/^[0-9]{3}-[0-9]{4}$/208-$&/; print "#F";' contacts.csv
The -F, indicates the field separator, the $" is the output field separator (yes, it gets assigned once per loop, oh well), the arrays are zero-indexed so second field is $F[1], there's a run-of-the-mill substitution, and you print the results.
Ah programs... sometimes a 10-min hack is better.
If it were me... I'd import the CSV into Excel, sort it by something - maybe the length of the phone number or something. Make a new col for the fixed phone number. When you have a group of similarly-fouled numbers, make a formula to fix. Same for the next group. Should be pretty quick, no? Then export to .csv again, omitting the bad col.
A little more digging on my own revealed the issues with the regex in my question. The problem is with duplicate contacts in the csv.
Example:
(208) 555-5555, 555-5555
After the first pass becomes:
2085555555, 208555555
and After the second pass becomes
2082085555555, 2082085555555
I worked around this by changing the replacement regex to:
//add escapes for special characters
$pattern = preg_replace("/\(|\)|\-|\./","\\\\$0",$value);
//add delimiters, and optional area code
$pattern = "/(\(?[0-9]{3}\)?)? ?" . $pattern . "/";

Categories