PHP how to split text that have same delimiter in the text? - php

I am having trouble to separate the text.
This is the scenario:
$check = "Apple|Orange|Animal|Dog|Grape";
Suppose by using explode i could separate the word with "|", but because the value i retrieved "Animal|Dog" should be a word so in this case, what would be the solution?? I could not use
limit as well because the position or number of text could be different.
The only way to distinctly separate the text is the Animal keyword. Is there any function in php that similar to mysql LIKE syntax?
If Case 2
$check = "Apple|Orange|Animal:Dog|Cat|Grape";
OR
$check = "Apple|Orange|Animal:Fish|Bird|Grape";
where the name of animal could be vary.
Output
"Apple|Orange|Animal:Dog,Cat|Grape" or "Apple|Orange|Animal:Fish,Bird|Grape"
Thanks.

If all you want to do is replace "Animal|" with "Animal:" then you can do a simple str_replace:
$check = "Apple|Orange|Animal|Dog|Grape";
$newCheck = str_replace("Animal|","Animal:"); // will be set to 'Apple|Orange|Animal:Dog|Grape'
Is that what you meant?
EDIT, FOR CASE 2:
I assume you have a string like "Apple|Orange|Animal:Dog|Cat|Grape", which has the category followed by 2 members of the category. From what you've said, you want to transform this string into "Apple|Orange|Animal:Dog,Cat|Grape" with a comma separating the two group members instead of a pipe. This is more complicated than the first case - the category name could vary, and you can't do a simple str_replace starting with the colon because the first member of the group could vary as well. For this case, you'll need to use a regular expression to match and replace the pattern of the string. Here's the code:
$check = "Apple|Orange|Animal:Dog|Cat|Grape";
$newCheck = preg_replace("#(Animal:\w+)\|#", "$1,", $check); // will be set to "Apple|Orange|Animal:Dog,Cat|Grape"
DEMO
Let me explain what this does, in case you're not familiar with regular expressions. The first argument of the preg_replace function, "#(Animal:\w+)\|#", tells PHP to look for all substrings of $check that begin with the text "Animal" followed by a colon, then a string of words with one or more character, and end with a pipe. This will look for the category name as well as the first member of that category in your string. The second argument, ":$1,", tells PHP to change the first pipe after this pattern into a comma. If you have a different category name, simply change the pattern you pass as the first argument to the preg_replace function:
$check = "Apple|Orange|Animal1:Fish|Bird|Grape";
$newCheck = preg_replace("#(Animal1:\w+)\|#", "$1,", $check); // will be set to "Apple|Orange|Animal1:Fish,Bird|Grape"
Let me know if this is hard to follow!

The way I would handle this would be to use a / instead of a | between categories and items. Or use a different delimiter if you really want to keep the | in between categories & items.
$check = "Apple|Orange|Animal/Dog|Grape";
$ex=explode("|", $check);
but if you don't want to do that... then if you know what your category names are like "animal", you could explode the array on | and assume that if your current value is "animal", then the next element in the array is going to be "dog", "cat", whatever. This is not a good solution though, and would not work for multiple category levels.

You have spaces between Apple and orange, like this Apple | Orange. But there is no space between Animal and Dog Like this Animal|Dog. If this is the situation, you can explode it like this
$check = "Apple | Orange | Animal|Dog | Grape";
$ex=explode(" | ", $check);
Which will return array in format
array("Apple","Orange","Animal|Dog","Grape");
You can manipulate above array again to get Animal
I hope this is what you meant
Edit : A rough solution could be :
<?
$check = "Apple|Orange|Animal|Dog|Grape";
$ex=explode("|",$check);
if(in_array("Animal",$ex))
{
echo "Animal:";
}
if(in_array("Dog",$ex)){
echo "Dog";
}
?>
So in this case the position of Animal and Dog doesnot matter

str_replace("Animal|","Animal:",$check);
then do explode

Related

ucwords(strtolower()) not working together php

I have code where I am extracting a name from a database, trying to reorder the word, and then changing it from all uppercase to word case. Everything I find online suggests my code should work, but it does not... Here is my code and the output:
$subjectnameraw = "SMITH, JOHN LEE";
$subjectlname = substr($subjectnameraw, 0, strpos($subjectnameraw, ",")); // Get the last name
$subjectfname = substr($subjectnameraw, strpos($subjectnameraw, ",") + 1) . " "; // Get first name and middle name
$subjectname = ucwords(strtolower($subjectfname . $subjectlname)); // Reorder the name and make it lower case / upper word case
However, the output looks like this:
John Lee smith
The last name is ALWAYS lowercase no matter what I do. How can I get the last name to be uppercase as well?
The above code gives wrong results when there are multibyte characters in the names like RENÉ. The following solution uses the multibyte function mb_convert_case.
$subjectnameraw = "SMITH, JOHN LEE RENÉ";
list($lastName,$firstnName) = explode(', ',mb_convert_case($subjectnameraw,MB_CASE_TITLE,'UTF-8'));
echo $firstnName." ".$lastName;
Demo : https://3v4l.org/ekTQA

search for words with sql greater than 3

I using explode in php to split up words, I need then to search for that word in my category list, if a match, then use that category id and create the listing. Only problem is I come across words only one letter ie: "E Cigarettes & Vape Mods"
I don't want to use the "E" that would come up with way too many categories, what is the best solution, if size of word 3 or 4 search that? just thinking aloud. Thanks inadvance
You can use len function: https://www.w3schools.com/sql/sql_func_len.asp
I.e:
select len('word'), outputs 4.
I used strlen then if greater than 3 search that word
$categoryName = str_replace(",","",$categoryName);
$wordsplit = explode(' ',$categoryName);
foreach($wordsplit as $wordCat)
{
if($found) break;
if(strlen($wordCat) > 3)
//DO SOMETHING
}

PHP tag system with preg_match and foreach

I'm trying to build this tag system for my website, where it checks the written article (could be 400-1000 words), for specific words and make a string with all the keywords found, from the array.
The one I made is working alright, but there is some problems I would like to fix.
$a = "This is my article and it's about apples and pears. I like strawberries as well though.";
$targets = array('apple', 'apples','pear','pears','strawberry','strawberries','grape','grapes');
foreach($targets as $t)
{
if (preg_match("/\b" . $t . "\b/i", $a)) {
$b[] = $t;
}
}
echo $b[0].",".$b[1].",".$b[2].",".$b[3];
$tags = $b[0].",".$b[1].",".$b[2].",".$b[3];
First of all, I would like to know, if there is any way, I can make this more effecient. I have a database with around 5.000 keywords and expanding day by day.
A you can see, I don't know how to get ALL the matches. I'm writing $b[0], $b[1] etc.
I would like it to just make a string with ALL the matches - but only 1 time per match. If apples is mentioned 5 times, then only 1 should go in the string.
A said - this works. But I don't feel, that this is the best solution.
EDIT:
I'm now trying this, but I cant get it to work at all.
$a = "This is my article and it's about apples and pears. I like strawberries as well though.";
$targets = array('apple', 'apples','pear','pears','strawberry','strawberries','grape','grapes');
$targets = implode('|', $targets);
$b = [];
preg_match("/\b(" . $targets . ")\b/i", $a, $b);
echo $b;
First, I'd like to provide a non-regex method, then I'll get into some long-winded regex condsiderations.
Because your search "needles" are whole words, you can leverage the magic of str_word_count() like so:
Code: (Demo)
$targets=['apple','apples','pear','pears','strawberry','strawberries','grape','grapes']; // all lowercase
$input="Apples, pears, and strawberries are delicious. I probably favor the flavor of strawberries most. My brother's favorites are crabapples and grapes.";
$lowercase_input=strtolower($input); // eliminate case-sensitive issue
$words=str_word_count($lowercase_input,1); // split into array of words, permitting: ' and -
$unique_words=array_flip(array_flip($words)); // faster than array_unique()
$targeted_words=array_intersect($targets,$unique_words); // retain matches
$tags=implode(',',$targeted_words); // glue together with commas
echo $tags;
echo "\n\n";
// or as a one-liner
echo implode(',',array_intersect($targets,array_flip(array_flip(str_word_count(strtolower($input),1)))));
Output:
apples,pears,strawberries,grapes
apples,pears,strawberries,grapes
Now about the regex...
While matiaslauriti's answer may get you a correct result, it makes very little attempt to provide any big gains in efficiency.
I'll make two points:
Do NOT use preg_match() in a loop when preg_match_all() was specifically designed to capture multiple occurrences in a single call. (code to be supplied later in answer)
Condense your pattern logic as much as possible...
Let's say you have an input like this:
$input="Today I ate an apple, then a pear, then a strawberry. This is my article and it's about apples and pears. I like strawberries as well though.";
If you use this array of tags:
$targets=['apple','apples','pear','pears','strawberry','strawberries','grape','grapes'];
to generate a simple piped regex pattern like:
/\b(?:apple|apples|pear|pears|strawberry|strawberries|grape|grapes)\b/i
It will take the regex engine 677 steps to match all of the fruit in $input. (Demo)
In contrast, if you condense the tag elements using the ? quantifier like this:
\b(?:apples?|pears?|strawberry|strawberries|grapes?)\b
Your pattern gains brevity AND efficiency, giving the same expected result in just 501 steps. (Demo)
Generating this condensed pattern can be done programmatically for simple associations, (including pluralization and verb conjugations).
Here is a method for handling singular/plural relationships:
foreach($targets as $v){
if(substr($v,-1)=='s'){ // if tag ends in 's'
if(in_array(substr($v,0,-1),$targets)){ // if same words without trailing 's' exists in tag list
$condensed_targets[]=$v.'?'; // add '?' quantifier to end of tag
}else{
$condensed_targets[]=$v; // add tag that is not plural (e.g. 'dress')
}
}elseif(!in_array($v.'s',$targets)){ // if tag doesn't end in 's' and no regular plural form
$condensed_targets[]=$v; // add tag with irregular pluralization (e.g. 'strawberry')
}
}
echo '/\b(?:',implode('|',$condensed_targets),")\b/i\n";
// /\b(?:apples?|pears?|strawberry|strawberries|grapes?)\b/i
This technique will only handle the simplest cases. You can really ramp up performance by scrutinizing the tag list and identifying related tags and condensing them.
Performing my above method to condense the piped pattern on every page load is going to cost your users load time. My very strong recommendation is to keep a database table of your ever-growing tags which are stored as regex-ified tags. When new tags are encountered/generated, add them individually to the table automatically. You should periodically review the ~5000 keywords and seek out tags that can be merged without losing accuracy.
It may even help you to maintain database table logic, if you have one column for regex patterns, and another column which shows a csv of what the row's regex pattern includes:
---------------------------------------------------------------
| Pattern | Tags |
---------------------------------------------------------------
| apples? | apple,apples |
---------------------------------------------------------------
| walk(?:s|er|ed|ing)? | walk,walks,walker,walked,walking |
---------------------------------------------------------------
| strawberry | strawberry |
---------------------------------------------------------------
| strawberries | strawberries |
---------------------------------------------------------------
To improve efficiency, you can update your table data by merging the strawberry and strawberries rows like this:
---------------------------------------------------------------
| strawberr(?:y|ies) | strawberry,strawberries |
---------------------------------------------------------------
With such a simple improvement, if you only check $input for these two tags, the steps required drops from 59 to 40.
Because you are dealing with >5000 tags the performance improvement will be very noticeable. This kind of refinement is best handled on a human level, but you might use some programmatical techniques to identify tags that share an internal substring.
When you want to use your Pattern column values, just pull them from your database, pipe them together, and place them inside preg_match_all().
*Keep in mind you should use non-capturing groups when condensing tags into a single pattern because my code to follow will reduce memory usage by avoiding capture groups.
Code (Demo Link):
$input="Today I ate an apple, then a pear, then a strawberry. This is my article and it's about apples and pears. I like strawberries as well though.";
$targets=['apple','apples','pear','pears','strawberry','strawberries','grape','grapes'];
//echo '/\b(?:',implode('|',$targets),")\b/i\n";
// condense singulars & plurals forms using ? quantifier
foreach($targets as $v){
if(substr($v,-1)=='s'){ // if tag ends in 's'
if(in_array(substr($v,0,-1),$targets)){ // if same words without trailing 's' exists in tag list
$condensed_targets[]=$v.'?'; // add '?' quantifier to end of tag
}else{
$condensed_targets[]=$v; // add tag that is not plural (e.g. 'dress')
}
}elseif(!in_array($v.'s',$targets)){ // if tag doesn't end in 's' and no regular plural form
$condensed_targets[]=$v; // add tag with irregular pluralization (e.g. 'strawberry')
}
}
echo '/\b(?:',implode('|',$condensed_targets),")\b/i\n\n";
// use preg_match_all and call it just once without looping!
$tags=preg_match_all("/\b(?:".implode('|',$condensed_targets).")\b/i",$input,$out)?$out[0]:null;
echo "Found tags: ";
var_export($tags);
Output:
/\b(?:apples?|pears?|strawberry|strawberries|grapes?)\b/i
Found tags: array ( 0 => 'apple', 1 => 'pear', 2 =>
'strawberry', 3 => 'apples', 4 => 'pears', 5 => 'strawberries',
)
...if you've managed to read this far down my post, you've likely got a problem like the OP's and you want to move forward without regrets/mistakes. Please go to my related Code Review post for more information about fringe case considerations and method logic.
preg_match already saves the matches. So:
int preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] )
The 3 param is already saving the matches, change this:
if (preg_match("/\b" . $t . "\b/i", $a)) {
$b[] = $t;
}
To this:
$matches = [];
preg_match("/\b" . $t . "\b/i", $a, $matches);
$b = array_merge($b, $matches);
But, if you are comparing directly the word, the documentation recomends using strpos().
Tip
Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() instead as it will be faster.
EDIT
You could improve (in performance) your code if you still want to use preg_match by doing this, replace this:
$targets = array('apple', 'apples','pear','pears','strawberry','strawberries','grape','grapes');
foreach($targets as $t)
{
if (preg_match("/\b" . $t . "\b/i", $a)) {
$b[] = $t;
}
}
With this:
$targets = array('apple', 'apples','pear','pears','strawberry','strawberries','grape','grapes');
$targets = implode('|', $targets);
preg_match("/\b(" . $t . ")\b/i", $a, $matches);
Here you are joining all your $targets with | (pipe), so your regex is like this: (target1|target2|target3|targetN) so you do only one search and not that foreach.

Making one line Regular Expression

I'm making cover letters for mailing of books and magazines. I have all data of recipients in the data base and I have PHP script fetching that data and making cover letters. A user who writes that cover letter using special characters to mark where the name should be put etc.
For example, in order to compose a cover letter an user writes:
Dear [[last_name]],
please find attached book...
Then it gets parsed by PHP script and [[last_name]] tag gets replaced with a real name. When 1000 addresses selected for mailing then the script produces 1000 cover letters each one with defferent name.
Now, in my Russian language word "Dear" has different ending for male and female. It is like we say on English "Dear mr." or "Dear mrs."
In order to mark that in the cover letter user writes the possible endings for the word:
Dear[[oy/aya]] [[last_name]]
or it could be something like:
Dear[[ie/oe]]... etc.
I'm trying to figure out the regular expression and replacement command for my PHP script to parse those kind of lines.
For the last_name tags I use:
$text = ...//this is text of the cover letter with all the tags.
$res = $mysqli->query("SELECT * FROM `addresses` WHERE `flag` = 1")
while ($row=$res->fetch_assoc()) {
$text = str_replace('[[last_name]]', $row['lname'], $text);
echo $text;
}
For the word endings as I understand it should be something like:
$text = preg_replace('/\w{2-3}\//\w{2-3}/', ($row['gender']==1)
? 'regexp(first_half)'
: 'regext(second_half)', $text);
I could make this whole idea by cycling through the tag, parsing it and replace but it would be 5-10 lines of code. I'm sure this can be done just by the line above but I can't figure out how.
see http://www.phpliveregex.com/p/2BC and then you replace with $1 for male and $2 for female
...
preg_replace('~\[\[(.+?)/(.+?)\]\]~', $row['gender']==1?'$1':'$2', $text);
$gender = ($row['gender'] == 1) ? 1 : 2;
preg_replace_callback('#\[\[(?:([^\]/]+)(?:/([^\]/]+))?\]\]#',
function($match) use ($row, $gender) {
// $match will contain the current match info
// $match[1] will contain a field name, or the first part of a he/she pair
// $match[2] will be non-empty only in cases of he/she etc
return (empty($match[2])) ? $row[$match[1]] : $match[$gender];
}
);

RegEx or Similar - Grab string preceding matched value

Here's the deal, I am handling a OCR text document and grabbing UPC information from it with RegEx. That part I've figured out. Then I query a database and if I don't have record of that UPC I need to go back to the text document and get the description of the product.
The format on the receipt is:
NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456
So, when I go back the second time to find the name of the item I am at a complete loss. I know how to get to the line where the UPC is, but how can I use something like regex to get the name that precedes the UPC? Or some other method. I was thinking of somehow storing the entire line and then parsing it with PHP, but not sure how to get the line either.
Using PHP.
Get all of the names of the items indexed by their UPCs with a regex and preg_match_all():
$str = 'NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456';
preg_match_all( '/^(.*?)\s+(\d+)/m', $str, $matches);
$items = array();
foreach( $matches[2] as $k => $upc) {
if( !isset( $items[$upc])) {
$items[$upc] = array( 'name' => $matches[1][$k], 'count' => 0);
}
$items[$upc]['count']++;
}
This forms $items so it looks like:
Array (
[123456789012] => NAME OF ITEM
[987654321098] => OTHER NAME
[567890123456] => NAME
)
Now, you can lookup any item name you want in O(1) time, as seen in this demo:
echo $items['987654321098']; // OTHER NAME
You can find the string preceding a value you know with the following regex:
$receipt = "NAME OF ITEM 123456789012\n" .
"OTHER NAME 987654321098\n" .
"NAME 567890123456";
$upc = '987654321098';
if (preg_match("/^(.*?) *{$upc}/m", $receipt, $matches)) {
$name = $matches[1];
var_dump($name);
}
The /m flag on the regex makes the ^ work properly with multi-line input.
The ? in (.*?) makes that part non-greedy, so it doesn't grab all the spaces
It would be simpler if you grabbed both the name and the number at the same time during the initial pass. Then, when you check the database to see if the number is present, you already have the name if you need to use it. Consider:
preg_match_all('^([A-Za-z ]+) (\d+)$', $document, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$name = $match[1];
$number = $match[2];
if (!order_number_in_database($number)) {
save_new_order($number, $name);
}
}
You can use lookahead assertions to match string preceding the UPC.
http://php.net/manual/en/regexp.reference.assertions.php
By something like this: ^\S*(?=\s*123456789012) substituting the UPC with the UPC of the item you want to find.
I'm lazy, so I would just use one regex that gets both parts in one shot using matching groups. Then, I would call it every time and put each capture group into name and upc variables. For cases in which you need the name, just reference it.
Use this type of regex:
/([a-zA-Z ]+)\s*(\d*)/
Then you will have the name in the $1 matching group and the UPC the $2 matching group. Sorry, it's been a while since I've used php, so I can't give you an exact code snippet.
Note: the suggested regex assumes you'll only have letters or spaces in your "names" if that's not the case, you'll have to expand the character class.

Categories