PHP str_ireplace for array - php

I am trying to remove certain strings from an array of strings
$replace = array(
're: ', 're:', 're',
'fw: ', 'fw:', 'fw',
'[Ticket ID: #'.$ticket["ticketnumber"].'] ',
);
$available_subjects = array($ticket["subject"], $update["subject"]);
I tried using these loops
This replaced words like "You're" because of the "re"
foreach($replace as $r) {
$available_subjects = str_ireplace($r, '', $available_subjects);
}
And the same with this one
foreach($replace as $r) {
$available_subjects = preg_replace('/\b'.$r.'\b/i', '', $available_subjects);
}
So I want to match the whole word, and not part of words

First, I would replace Ticket IDs statically like you did:
$ticket_prefix = '[Ticket ID: #' . $ticket["ticketnumber"] . '] ';
$available_subjects = str_ireplace($ticket_prefix, '', $available_subjects);
Then, I would use a regular expression to replace re and fw:
$available_subjects = preg_replace('#\b(fw|re):?\s*\b#i', '', $available_subjects);

Related

PHP cleaning up a string of duplicated commas etc

I have a problem - hope you can help.
My users will enter a string like this:
'dt', 'time_hour', 'loc', 'protocol_category', 'service_identifier', 'mcc', 'imsi', 'service_dl_bytes', 'service_ul_ bytes'
Depending on their other inputs, I then remove two fields from this list (for example, service_identifier and service_dl_bytes)
This leaves lots of stray commas in the string.
I need some logic that says:
- there can only be one consecutive comma
- there should not be a comma as the last character
- comma, space comma is also not permitted
Basically, the format has to be 'input', 'input2', 'input3'
Can anyone help me .. i tried with the below, but it doesn't work in all use cases
elseif ($df3aggrequired == "YES" and $df3agg2required == "NO" ) {
#remove spaces from select statement
$df3group0v2 = str_replace($df3aggfield, '', $df3select);
#replace aggfield1 with null
$df3group1v2 = str_replace(' ', '', $df3group0v2);
#replace instances of ,, with ,
$df3group3v2 = preg_replace('/,,+/', ',', $df3group1v2);
$finalstring0df3v2 = rtrim($df3group3v2, ',');
$finalstring1df3v2 = str_replace('\'\'', '', $finalstring0df3v2);
$finalstringdf3v2 = str_replace('.,', '', $finalstring1df3v2);
$finalstringdf31v2 = str_replace(',,', ',', $finalstringdf3v2);
$finalcleanup = preg_replace('/,,+/', ',', $finalstringdf31v2);
echo"\\<br> .groupBy(";
echo "$finalcleanup";
echo ")";
I also think that string replacements can cause issues, so went for a similar process to geoidesic, this code tries to stick with the fields being a CSV list, and removes the quotes round them as part of the decoding, but also puts them back again in the result.
$fields = "'dt', 'time_hour', 'loc', 'protocol_category', 'service_identifier', 'mcc', 'imsi', 'service_dl_bytes', 'service_ul_ bytes'";
$removeFields = true ;
$fieldList = str_getcsv($fields, ",", "'");
if ( $removeFields == true ) {
if ( ($key = array_search('loc', $fieldList)) !== false ) {
unset ( $fieldList[$key] );
}
$fields = "'".implode("', '", $fieldList)."'";
}
echo $fields;
The example removes the 'loc' field, but this can be modified to remove any fields required.
The final output is (assuming 'loc' is removed)...
'dt', 'time_hour', 'protocol_category', 'service_identifier', 'mcc', 'imsi', 'service_dl_bytes', 'service_ul_ bytes'
Whatever your string after removing process is, you may want to run a single preg_replace over it using a positive lookahead:
^\s*(,\s*)*|,\s*(?=,|$)
Live demo
Breakdown:
^\s*(,\s*)* Match unwanted commas at beginning
| Or
,\s*(?=,|$) Match consecutive commas or trailing ones
PHP code:
$string = preg_replace('~^\s*(,\s*)*|,\s*(?=,|$)~', '', $string);
You'd better consider removing unwanted commas while removing words.
If it is a comma-separated list, then don't use str_replace. Turn the string into an array first by splitting it on the commas:
$myArray = explode(',' $myString);
Then remove the parts of the array you don't want and make it a string again:
$fieldsToDelete = ["service_identifier", "service_dl_bytes"];
$newArray = [];
foreach($myArray as $key => $value) {
if(!in_array($value, $fieldsToDelete) {
array_push($newArray, $value)
}
}
$newString = implode(',', $newArray)
The following code works on the index of the fields you want to remove.
Works the same as the answer written by #geoidesic but uses index of field in string instead of a string match.
$user_input = "'dt', 'time_hour', 'loc', 'protocol_category', 'service_identifier', 'mcc', 'imsi', 'service_dl_bytes', 'service_ul_ bytes'";
$input_array = explode(',', $user_input);
// 4 = 'service_indentifier'
// 8 = 'service_ul_ bytes'
$remove_input = array(4, 8);
foreach($remove_input as $array_index) {
unset($input_array[$array_index]);
}
$parsed_input = implode(',', $input_array);
var_dump($parsed_input);
// Output: string(80) "'dt', 'time_hour', 'loc', 'protocol_category', 'mcc', 'imsi', 'service_dl_bytes'"
You have try with
$var = preg_replace(["/'[\s,]*,[\s,]*'/", "/^[\s,]*|[\s,]*$/"], ["', '", ""], $var);
It can help!

Php how to get the first word without space that occurred before specific keyword?

I have a complete mysql query in string, inside Where clause of that query string i have defined some variables names that should i replace them with the value passed from interface, but if user has not filled any filter, i should replace the whole relevant sub statement inside query with 1.
Suppose i have a html select element in form named as provinces_province_code, and the same string inside the query.
I have developed bellow code blocks to remove the extra spaces from query, and get all strings after the where. Here $nput_key is provinces_province_code, and $input_val is value of select element'(0,01,02,....,etc..)'. I have checked if $input_val is 0 then i should replace users.province_code = provinces_province_code with 1.
if (preg_match('/= ' . $input_key . ' /', trim(preg_replace('/\s+/', ' ', $main_query))) or preg_match('/=' . $input_key . '/', trim(preg_replace('/\s+/', ' ', $main_query))) or preg_match('/= \'' . $input_key . '\'/', trim(preg_replace('/\s+/', ' ', $main_query))) or preg_match('/=\'' . $input_key . '\'/', trim(preg_replace('/\s+/', ' ', $main_query))) or preg_match('/= "' . $input_key . '"/', trim(preg_replace('/\s+/', ' ', $main_query))) or preg_match('/="' . $input_key . '"/', trim(preg_replace('/\s+/', ' ', $main_query)))) {
if($input_val == 0){
$main_query = preg_replace('/ where /', ' WHERE ', $main_query);
$main_query = preg_replace('/ Where /', ' WHERE ', $main_query);
$separatedByWhere = explode("WHERE", $main_query);
$last_where_clause = end($separatedByWhere);
Here I have removed all tild,single/double quatation before and after of province_province_code.
$string = preg_replace('/\`'.$prm_val.'\`/', $prm_val, $last_where_clause);
$string = preg_replace('/\''.$prm_val.'\'/', $prm_val, $string);
$string = preg_replace('/\"'.$prm_val.'\"/', $prm_val, $string);
$string = preg_replace('/\"'.$prm_key.'\"/', $prm_val, $string);
$string = preg_replace('/\"'.$prm_key.'\"/', $prm_val, $string);
$string = preg_replace('/\"'.$prm_key.'\"/', $prm_val, $string);
Now i want to define start, and end of string and call another function to get all sub strings occurred between those strings, in my criteria end of string is always the name of html element that i user as filter (provinces_province_code), but start of string is unknown, what i need to do is, I should get the first occurred word before = $end_string, to use it as $start_string.
$string is:
users.district_code IS NOT NULL AND users.district_code <> '' AND users.province_code = srs.province_code AND users.province_code = provinces_province_code
GROUP BY users.district_code
$start_string = '';
$end_string = $prm_val;
$statment2beReplaced = self::get_string_between($string, $start_string, $end_string);
$statment2beReplaced = preg_replace('/ and /', ' AND ',$statment2beReplaced);
if(preg_match('/ AND /',$statment2beReplaced)){
$searchString = $statment2beReplaced.''.$prm_val;
$statment2beReplaced = preg_replace('/'.$prm_key.' =/',1,$statment2beReplaced);
$string = preg_replace('/'.$searchString.'/',$statment2beReplaced,$string);
}
else {
$string = preg_replace('/' . $prm_key . $statment2beReplaced . $end_string . '/', 1, $string);
}
$separatedByWhere[key($separatedByWhere)] = $string;
$main_query = implode('WHERE', $separatedByWhere);
}
Now I have checked that provinces_province_code existed inside above string or not
1- if existed so, i want to get first word occurred before = provinces_province_code, which in above string itsusers.province_code`.
The question is now, how can i get that ?
thanks for any help.
Sorry I am not much good with understanding preg_match yet so I would look at exploding the string into an array something like :-
<?php
$string = "users.district_code IS NOT NULL AND users.district_code <> '' AND users.province_code = srs.province_code AND users.province_code = provinces_province_code GROUP BY users.district_code";
$subs = explode(" ", $string);
foreach ($subs as $item) {
echo "<li>$item</li>";
}
?>
then depending on the delimiter you use and how your string is formatted you can search the array and find the preceding word. For instance in the above example since the preceding word is separated by space = space, it will always be 2 rows before (= will be considered a word as well). Or you could strip the = and it will be the row before. This is assuming each word is preceded by a space and ended by a space

Extract a sentence from a text based on another sentence that lacks some words

So, for my university graduation thesis I chose to build a web app that extracts the main idea from an article(summarization app). It's build in PHP.
But I have reached a situation to which I see no possible solutions, maybe you guys can give me an idea or a solution to the problem.
So basically the app relies on extractive algorithms, what I do:
Firstly, I "sanitize" the text, which means I remove all stop words, I stem the words, remove any abbreviation or initials that may contain a '.' that can alter my text from not being broken into sentences correctly.
After that I break the text into sentences by exploding the text by . token and I get all sentences in an array.
Now comes the process in which I "give" the sentences a rating, basically this is how I spot the most relevant sentence in the article, the one that has the highest rating is usually the one that contains the article's main idea.
But my problem starts now, the sentences that I have rated are the ones on which I applied all the 'sanitization' and are not in their original form. I want to take the highest rated sentence and based on that I want to extract the original sentence from the text to which this rated sentence matches. I have tried matching it with regex but it doesn't always work. I need a 100% working method that extracts the original sentence from the article based on the highest rated sentence.
I have no idea how to achieve this, since the rated sentence misses words from it.
I hope you understand my point. Thank you.
EDIT:
This is the function that I now use to match the original sentence in the article but I doesn't always work:
private function get_original_sentence($s, $t)
{
$s = preg_replace("/[^A-Za-z0-9 ]/", '', $s);
$s = trim($s);
$arr = explode(" ",$s);
$f_word = $arr[0];
$l_word = $arr[count($arr)-1];
preg_match('~(?<=\.)([a-zA-Z ]*)'.$f_word.'(.*?)'.$l_word.'([a-zA-Z ]*)(?=\.)~i', $t, $matches);
if(empty($string))
{
preg_match('~(?<=\.)([^\.]*)'.$f_word.'(.*?)'.$l_word.'([^\.]*)(?=\.)~i', $t, $matches);
}
return $matches[0] ? $matches[0] : false;
}
The $s parameter is the rated sentence after the summarization and $t is the full original article.
EDIT 2: The abbreviation removal function, which practicaly sanitizes the whole text not just abbreviations.
static private function _remove_abbrev($subject)
{
$domains = '\.ro|\.com|\.edu|\.org|\.gov';
foreach(self::$abrv as $abrv)
{
$not.= strtolower(str_replace('.', '\.', $abrv)).'|';
$not.= strtolower(trim(str_replace('.', '\.', $arbv))).'|';
}
$not = substr($not, 0, -1);
//$subject = preg_replace('~(\".*?\")~u', '', strtolower($subject));//replaces " " from text.
$subject = preg_replace('~(?<=\.|^)(?![^\.]{60,})[^\.&]*\&[^\.]*\.?~u', '', strtolower($subject));
$subject = preg_replace('~\b\s?[\dA-za-z\-\.]+('.$domains.')~u', '', strtolower($subject));
$subject = preg_replace('~\s*\(.*?\)\s*~u', '', strtolower($subject));
$subject = preg_replace('~\b('.$not.')~u', '', strtolower($subject));
$subject = preg_replace('~(?<=[^a-z])[A-Za-z]{1,5}+\.[\s\,]*(?=[a-z]|[0-9])~u', '', strtolower($subject));
$subject = preg_replace('~(?<=[\s\,\.\:])([A-Za-z]*(\.)){2,}+(.)(?=.*)~u', '', strtolower($subject));
$subject = preg_replace('~(\d)+\.(\d)*(\s)~u', '', strtolower($subject));
return $subject;
}
This is the abbreviation array collection:
static public $abrv = array(
' alin.', ' art.', ' A.N.P', ' A.V.A.B', ' A.V.A.S.', ' B.N.R', ' c.', ' C.A.S', ' C.civ.', ' C.com.', ' C.fam.', ' C.pen.', ' C.pr.civ.', ' C.pr.pen', ' C.N.C.D', ' C.N.V.M', ' C.N.S.A.S', ' C.S.M', ' C.S.J', ' D.G.F.P', ' D.G.P.M.B', ' D.N.A', ' D.S.V', 'Ed.', ' etc.', ' H.G.', ' I.G.P.F', ' I.G.P.R', ' I.N.M.L.', ' I.P.J', ' I.C.C.J', ' lit.', ' M.Ap.N.', ' art.', ' M.J.', ' M.Of.', ' nr.', ' O.G.', ' O.U.G.', ' p.', ' P.N.A.', ' par.', ' pct.', ' R.A.A.P.P.S.', ' subl. ns.', ' S.C.', ' S.A.', ' S.P.P.', ' S.R.I.', ' S.R.L.', 'U.N.B.R.', ' urm.', ' str.', ' sec.', ' pag.', ' a.c.', ' dv.', ' dvs.', ' prof.', ' conf.', ' dr.', ' drd.', ' mrd.', ' s.a.m.d'
);
How about this approach:
You extract all the matches with preg_match_all first into an array with numerical indexes $substitutions
Then you replace them with a unique marker utilizing the 4 variable of preg_replace: $count whose value points to the $substitutions array
A rough code sketch:
$count = 0;
$substitutions = array();
foreach ($patterns as $pattern) {
$matches = array();
preg_match_all($pattern, $subject, $matches);
preg_replace($pattern, $subject, '__'.$count.'__', -1, $count);
foreach ($matches[???] as $match) {
$substiutions[] = $match;
}
}
I'm not sure if i messed up the syntax for referring to $count as call by reference ( e.g. &$ in the documentation).
I think the crux of this approach is to extract the right value from the $matches array. There are some options, how the matches are extracted. Maybe another approach could be not to use $count from preg_replace but from the according sub-array of $matches
The _remove_abbr function doesn't seem to work very well. It removes words like "art" at the end of sentences but doesn't remove abbreviations like "C.A.S." (because it has already removed "c."). It also has at least one typo ($arbv) and doesn't define $not before concatenating to it.
Nevertheless, how about instead of removing the abbreviations, URLs, and so on, you replace them with space characters? That way, when you split the text into sentences, they would still have the same length as in the original text so you could store the position the sentences start and end at. If necessary, you could convert multiple spaces to a single space at this point but you'd still know where they came from in the original text.
You just need a callback function to achieve this:
$f = function($m){ return str_repeat(" ", strlen($m[0])); };
$subject = preg_replace_callback('~(?<=\.|^)(?![^\.]{60,})[^\.&]*\&[^\.]*\.?~u', $f, strtolower($subject));
$subject = preg_replace_callback('~\b\s?[\dA-za-z\-\.]+('.$domains.')~u', $f, $subject);
$subject = preg_replace_callback('~\s*\(.*?\)\s*~u', $f, $subject);
$subject = preg_replace_callback('~\b('.$not.')~u', $f, $subject);
$subject = preg_replace_callback('~(?<=[^a-z])[A-Za-z]{1,5}+\.[\s\,]*(?=[a-z]|[0-9])~u', $f, $subject);
$subject = preg_replace_callback('~(?<=[\s\,\.\:])([A-Za-z]*(\.)){2,}+(.)(?=.*)~u', $f, $subject);
$subject = preg_replace_callback('~(\d)+\.(\d)*(\s)~u', $f, $subject);

Trim comma and whitespacing from number

I am using this function to convert square meters to square foot.
$sinput = rtrim(get_field('fl_area'), ", \t\n");
if(trim($sinput) == "0"){echo ' ' ;} else {$soutput = metersToSquareFeet($sinput); echo $soutput . ' sq. m (' . number_format($sinput ) . ' sq. f)' ;}
function metersToSquareFeet($meters, $echo = true)
{
$m = $meters;
$valInFeet = $m*10.7639;
$valFeet = (int)$valInFeet;
if($echo == true)
{
echo $valFeet;
} else {
return $valFeet;
}
}
Problem I have is with line:
rtrim(get_field('fl_area'), ", \t\n");
The user enters the number in the format 3,246 and i want to convert this to 3246 for my function to work.
Of course I could also modify the function somehow and not use rtrim in the first place
rtrim only removes characters from the end of the string, not the middle. Use preg_replace:
$sinput = preg_replace('/[,\s]+/g', '', get_field('fl_area'));
You have a few options but probably your easiest one is to remove all non-numeric characters.
Not sure what your get_field does but assuming it just gets the field from the Input you could use regex like so.
$sinput = preg_replace( '/[^0-9]/', '', $get_field('fl_area') );
Also see: PHP regular expression - filter number only
You can use str_replace() to remove all occurrences of commas and spaces from the string:
$m = str_replace(array(',',' '), '', $m);
Or even strtr():
$m = strtr($m, array(',' => '', ' ' => ''));
This is likely to be faster than regular expessions. However, if the number of function calls are minimal, the difference wouldn’t be noticeable.
Try this:
str_replace(array(',',' '), '', get_field('fl_area'));

PHP regex with preg_replace and the "." character

I have scientific names in the following format:
S. daemon
A. cacatuoides
B. splendens
Etc, etc.
I'm having difficulty with the "." character.
This code works for full species names (i.e. Satanoperca daemon):
foreach ($species as $term) {
$term_norm = preg_replace('/\s+/', ' ', strtoupper(trim($term)));
$pattern[] = preg_replace('/ /', '\\s+', preg_quote($term_norm));
$urls[$term_norm] = '/dev/species/' . str_replace(" ", "-", rawurlencode($term));
$rels[$term_norm] = $urls[$term_norm] . '?preview=true';
$title[$term_norm] = $term;
But I can't get it to work for the aforementioned examples:
$genus_species = explode(" ", $term);
$genus = $genus_species[0];
$species = $genus_species[1];
$initial = substr($genus, 0, 1);
$shortened = $initial . '. ' . $species;
$term_norm = preg_replace('/\s+/', ' ', strtoupper(trim($shortened)));
$pattern[] = preg_replace('/ /', '\\s+', preg_quote($term_norm));
$urls[$term_norm] = '/dev/species/' . rawurlencode($term);
$rels[$term_norm] = $urls[$term_norm] . '?preview=true';
$title[$term_norm] = $term;
If I use this code, nearly all of my source, i.e. every word/character, is linked with . If I comment the code out, the full name linking works perfectly and not such problem occurs.
A little more info...
$pattern is echoing out as: /\b(SATANOPERCA\s+DAEMON|S(\.)\s+DAEMON)\b/i
The input is a list of species names, such as the ones I previously mentioned. The source is a species profile, which often refers to other species.
What I'd like the code to do is replace any mention of these species names with a link to that species profile.
Thanks in advance,
While looking into your issue I ran over the way you initially build the regular expression. I thought, why not simplify it? Here is what I've come up with:
foreach ($terms as $term) {
list($genus, $species) = explode(' ', $term);
$pattern = sprintf('~\b((?:%s[.]|%s) %s)~i', $genus[0], $genus, $species);
Which gives the following
~\b((?:S[.]|Satanoperca) daemon)~i
I'm making use of list here in combination with explode which often is less code, so better readable.
To build the regular expression I use sprintf which often is easier to formulate complex strings you need substitution in. It allows the usage of a mask.
Finally $genus[0] is the first character of $genus. You might need to replace it in case you're using a multibyte character set. Just saying.
The pattern itself is streamlined as well:
~\b((?:S[.]|Satanoperca) daemon)~i
The first subgroup is non-catching (?:) and offers both variants: Short with . or the long genus. Then followed by the space and finally the species. I also use [.] to express the dot in there, but sure \. would work as well:
~\b((?:S\.|Satanoperca) daemon)~i
What's left is the replacement procedure. I opted for using a callback function here. As the link needs only be build once for the term, I add that on top of the foreach. Again I'm using sprintf to format it:
foreach ($terms as $term) {
$termSlug = strtolower(strtr($term, array(' ' => '-')));
$termHref = sprintf('/dev/species/%s', rawurlencode($termSlug));
list($genus, $species) = explode(' ', $term);
$pattern = sprintf('~\b((?:%s\.|%s) %s)~i', $genus[0], $genus, $species);
What's left is the callback function that replaces every match with the link:
$string = preg_replace_callback($pattern, function($match) use($term, $termHref)
{
return sprintf('%s', $termHref
, htmlspecialchars($term), htmlspecialchars($match[1]));
}, $string);
And that's it. The full example:
$string = <<<STR
S. daemon
Satanoperca daemon
A. cacatuoides
B. splendens
STR;
$terms = array(
'Satanoperca daemon',
);
foreach ($terms as $term) {
$termSlug = strtolower(strtr($term, array(' ' => '-')));
$termHref = sprintf('/dev/species/%s', rawurlencode($termSlug));
list($genus, $species) = explode(' ', $term);
$pattern = sprintf('~\b((?:%s\.|%s) %s)~i', $genus[0], $genus, $species);
echo $pattern, "\n";
$string = preg_replace_callback($pattern, function($match) use($term, $termHref)
{
return sprintf('%s', $termHref
, htmlspecialchars($term), htmlspecialchars($match[1]));
}, $string);
}
echo $string;
And it's output:
S. daemon
Satanoperca daemon
A. cacatuoides
B. splendens
I hope this is helpful even it's completely new code everywhere.
Validate Terms:
// validate terms
$valid = '/^\w+ \w+$/';
foreach ($terms as $index => $term) {
if ($result = preg_match($valid, $term))
continue;
printf("Invalid Term: (%d) %s\n", $index, $term);
}
Do you want to include the . also like this
$term_norm = preg_replace('/[\s\.]+/', ' ', strtoupper(trim($shortened)));

Categories