Regular expressions in php deletes dots, commas, etc

Regular expressions in php deletes dots, commas, etc - php

I have this code:
$words = ['mleko', 'masło'];
$words = explode(' ', $value); // create an array of words
foreach($words as $word) { //iterate through words
$word = preg_replace('/[^\w]/uis', '', $word);
if (in_array(mb_strtolower($word), $allergens)) {
$return .= "<b>" . $word . "</b> ";
} else {
$return .= $word . " ";
}
}
The above code works fine, but it deletes characters like:,. e.t.c.
How can I fix it? :)

Problem jest w podejściu jakie zastosowałeś. Nie tylko w linii
$word = preg_replace('/[^\w]/uis', '', $word);
which should be extended with the characters of the Polish alphabet (the range of the \w class is [a-zA-Z0-9_], remember to mention the range of lowercase and uppercase characters separately) like in this line
$word = preg_replace('/[^\wąćęłńóśźżĄĆĘŁŃÓŚŹŻ]/uis', '', $word);
Moreover, I believe that the above line is used incorrectly. In my opinion, you should save the result of this operation in another variable as below
$rawWord = preg_replace('/[^\wąćęłńóśźżĄĆĘŁŃÓŚŹŻ]/uis', '', $word);
Thanks to this, you have access to both the purified and the original value, which you can use in this way
if (in_array(mb_strtolower($rawWord), $allergens)) {
$return .= str_replace($rawWord, "<b>{$rawWord}</b> ", $word);
} else {
$return .= $word;
}
With this approach, however, you will still miss some characters. Even spaces that you filtered out with explode earlier. In my opinion, instead of concatenating a string, you should build an array and finally concatenate it with spaces. Complete code below.
$allergens = ['jogurt', 'jaja', 'żytni', "jogurt", "banan"];
$value = 'Chleb żytni, masło z mleka, jogurt naturalny z mleka, jaja, pieczeń rzymska z kaszą gryczaną.';
$returns = [];
$words = explode(' ', $value); // create an array of words
foreach($words as $word) { //iterate through words
$rawWord = preg_replace('/[^\wąćęłńóśźżĄĆĘŁŃÓŚŹŻ]/uis', '', $word);
if (in_array(mb_strtolower($rawWord), $allergens)) {
$returns[] = str_replace($rawWord, "<b>{$rawWord}</b>", $word);
} else {
$returns[] = $word;
}
}
$return = implode(' ', $returns);
Look at this line
$returns[] = str_replace($rawWord, "<b>{$rawWord}</b>", $word);
Replaces the original word (containing the characters you want to ignore) with cleaned and bold version of the word. This keeps all characters (like commas) stuck to the word.
In the $return variable at the end you will get something like this
Chleb żytni, masło z mleka, jogurt naturalny z mleka, jaja, pieczeń rzymska z kaszą gryczaną.

Related

Check if a word occur in string and not to be in first and last

I am trying to check if word is occur in a string but not to be the first and last word, if its true then remove the space after and before of the word and replace with a underscore.
Input:
$str = 'This is a cool area";
Output:
$str = 'This is a_cool_area";
I want to check that the word 'cool' is inside the string but not a first and last word. if yes the remove the space & replace with '_'

You can use preg_replace to do this job, using this regex:
/(?<=\w)\s+(' . $word . ')\s+(?=\w)/i
which looks for the word, surrounded by at least one word character on either side (to prevent matching at the beginning or ending of the sentence). Usage in PHP:
$str = 'This is a cool area';
$word = 'cool';
$str = preg_replace('/(?<=\w)\s+(' . $word . ')\s+(?=\w)/i', '_$1_', $str);
echo $str . "\n";
$str = ' Cool areas are cool ';
$str = preg_replace('/(?<=\w)\s+(' . $word . ')\s+(?=\w)/i', '_$1_', $str);
echo $str . "\n";
Output:
This is a_cool_area
Cool areas are cool
Demo on 3v4l.org

function checkWord($str, $word)
{
$arr = explode(" ", $str);
$newArr = array_slice($arr, 1, -1);
$key = array_search($word, $newArr);
if($key !== false)
{
return implode('_',array_slice($arr, $key, 3));
}
else
{
return $str;
}
}
echo checkWord('This is a cool area', 'cool');

Matching string in 2 different patterns in PHP. The code must return TRUE

Objective: strings with ' should match the string without it.
Example:
$first_string = "alex ern o'brian";
$second_string = "alex-ern o brian";
$pattern = array("/(-|\.| )/", "/(')/");
$replace = array(' ', '(\s|)');
$first_string = preg_replace($pattern, $replace, $first_string);
$second_string = preg_replace($pattern, $replace, $second_string);
$first_string_split = preg_split("/(-|\.| )/", $first_string);
$first_string_split[] = $first_string;
$second_string_split = preg_split("/(-|\.| )/", $second_string);
$second_string_split[] = $second_string;
$first_string = array_slice($first_string_split, -1)[0];
$second_string = array_slice($second_string_split, -1)[0];
if(in_array($first_string, $second_string_split) || in_array($second_string, $first_string_split))
{
echo 'true';
} else {
echo 'false';
}

I think you are expecting this.
Solution 1: Try this code snippet here
Regex: (\s|) this will match either space or null.
<?php
ini_set('display_errors', 1);
$string = "o'brian";
$string=str_replace("'", "(\s|)",$string);
$list = array("o'neal", "o brian", "obrian");
$result=array();
foreach($list as $value)
{
if(preg_match("/$string/", $value))
{
$result[]=$value;
}
}
print_r($result);
Solution 2:
Regex: [a-z]+ will match character from a to z.
$string1="o brian";
$string2="obrian";
if(preg_match("/".implode(" ", $matches[0])."/", $string1))
{
echo "matched";
}
if( preg_match("/".implode("", $matches[0])."/", $string2))
{
echo "matched";
}

I'm not sure if I got your question right, but this should do it:
(?<=\w)'(?=\w)
It matches every ' character, which is followed and preceded by a word character. The word character \w is equal to [a-zA-Z0-9_].
Here is a live example to test the regex
Here is a live PHP example

ucwords function with exceptions

I need some help i have this code that Uppercase the first character of each word in a string with exceptions i need the function to ignore the exception if it's at the beginning of the string:
function ucwordss($str, $exceptions) {
$out = "";
foreach (explode(" ", $str) as $word) {
$out .= (!in_array($word, $exceptions)) ? strtoupper($word{0}) . substr($word, 1) . " " : $word . " ";
}
return rtrim($out);
}
$string = "my cat is going to the vet";
$ignore = array("is", "to", "the");
echo ucwordss($string, $ignore);
// Prints: My Cat is Going to the Vet
this is what im doing:
$string = "my cat is going to the vet";
$ignore = array("my", "is", "to", "the");
echo ucwordss($string, $ignore);
// Prints: my Cat is Going to the Vet
// NEED TO PRINT: My Cat is Going to the Vet

- return rtrim($out);
+ return ucfirst(rtrim($out));

Something like this:
function ucwordss($str, $exceptions) {
$out = "";
foreach (explode(" ", $str) as $key => $word) {
$out .= (!in_array($word, $exceptions) || $key == 0) ? strtoupper($word{0}) . substr($word, 1) . " " : $word . " ";
}
return rtrim($out);
}
Or even easier, before return in your function make strtoupper first letter

Do this really cheaply by just always uppercasing your first word:
function ucword($word){
return strtoupper($word{0}) . substr($word, 1) . " ";
}
function ucwordss($str, $exceptions) {
$out = "";
$words = explode(" ", $str);
$words[0] = ucword($words[0]);
foreach ($words as $word) {
$out .= (!in_array($word, $exceptions)) ? ucword($word) : $word . " ";
}
return rtrim($out);
}

what about you make the first letter in the string upper case so no matter your mix you will still come through
$string = "my cat is going to the vet";
$string = ucfirst($string);
$ignore = array("is", "to", "the");
echo ucwordss($string, $ignore);
this way you first letter of the string will always be upper case

preg_replace_callback() will allow you to express your conditional replacement logic in a loopless and dynamic fashion. Consider this approach that will suitably modify your sample data:
Code: (PHP Demo) (Pattern Demo)
$string = "my cat is going to the vet";
$ignore = array("my", "is", "to", "the");
$pattern = "~^[a-z]+|\b(?|" . implode("|", $ignore) . ")\b(*SKIP)(*FAIL)|[a-z]+~";
echo "$pattern\n---\n";
echo preg_replace_callback($pattern, function($m) {return ucfirst($m[0]);}, $string);
Output:
~^[a-z]+|\b(?|my|is|to|the)\b(*SKIP)(*FAIL)|[a-z]+~
---
My Cat is Going to the Vet
You see, the three piped portions of the pattern (in order) make these demands:
If the start of the string is a word, capitalize the first letter.
If a "whole word" (leveraging the \b word boundary metacharacter) is found in the "blacklist", disqualify the match and keep traversing the input string.
Else capitalize the first letter of every word.
Now, if you want to get particular about contractions and hyphenated words, then you only need to add ' and - to the [a-z] character classes like this: [a-z'-] (Pattern Demo)
If anyone has a fringe cases that will break my snippet (like "words" with special characters that need to be escaped by preg_quote()), you can offer them and I can offer a patch, but my original solution will adequately serve the posted question.

Before deploying php files, I want to optimize/encrypt them, how?

When my scripts are done, I want to optimize/convert them to smaller size + its harder to get know what do the files even they are stolen.
$c = file_get_contents('source.php');
$newStr = '';
$commentTokens = array(T_COMMENT);
if (defined('T_DOC_COMMENT'))
$commentTokens[] = T_DOC_COMMENT; // PHP 5
if (defined('T_ML_COMMENT'))
$commentTokens[] = T_ML_COMMENT; // PHP 4
$tokens = token_get_all($c);
foreach ($tokens as $token) {
if (is_array($token)) {
if (in_array($token[0], $commentTokens))
continue;
$token = $token[1];
}
$newStr .= $token;
}
$newStr = str_replace (chr(13), '', $newStr);
$newStr = str_replace (chr(10), '', $newStr);
$newStr = preg_replace('/\s+/', ' ', $newStr);
now $newStr contain the "compressed" stuff. Almost OK, but it kills to much white spaces. If there are white spaces in code like this:
if (true)
{
codeeee();
}
it converts to:
if (true)
{
codeeee();
}
and thats ok. But in case of this:
$a = ' var ';
it does:
$a = ' var ';
which is unwanted. How to do this optimize correctly? Are there any ideas? I almost thinking of renaming class names etc.

With help from this answer I was able to create this regex which trims all whitespace (including line breaks) down to single spaces, but preserves the whitespace between quotes (either ' or ")
preg_replace('/\G(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)*\K\s+/', ' ', $string);

Uppercase the first character of each word in a string except 'and', 'to', etc

How can I make upper-case the first character of each word in a string accept a couple of words which I don't want to transform them, like - and, to, etc?
For instance, I want this - ucwords('art and design') to output the string below,
'Art and Design'
is it possible to be like - strip_tags($text, '<p><a>') which we allow and in the string?
or I should use something else? please advise!
thanks.

None of these are really UTF8 friendly, so here's one that works flawlessly (so far)
function titleCase($string, $delimiters = array(" ", "-", ".", "'", "O'", "Mc"), $exceptions = array("and", "to", "of", "das", "dos", "I", "II", "III", "IV", "V", "VI"))
{
/*
* Exceptions in lower case are words you don't want converted
* Exceptions all in upper case are any words you don't want converted to title case
* but should be converted to upper case, e.g.:
* king henry viii or king henry Viii should be King Henry VIII
*/
$string = mb_convert_case($string, MB_CASE_TITLE, "UTF-8");
foreach ($delimiters as $dlnr => $delimiter) {
$words = explode($delimiter, $string);
$newwords = array();
foreach ($words as $wordnr => $word) {
if (in_array(mb_strtoupper($word, "UTF-8"), $exceptions)) {
// check exceptions list for any words that should be in upper case
$word = mb_strtoupper($word, "UTF-8");
} elseif (in_array(mb_strtolower($word, "UTF-8"), $exceptions)) {
// check exceptions list for any words that should be in upper case
$word = mb_strtolower($word, "UTF-8");
} elseif (!in_array($word, $exceptions)) {
// convert to uppercase (non-utf8 only)
$word = ucfirst($word);
}
array_push($newwords, $word);
}
$string = join($delimiter, $newwords);
}//foreach
return $string;
}
Usage:
$s = 'SÃO JOÃO DOS SANTOS';
$v = titleCase($s); // 'São João dos Santos'

since we all love regexps, an alternative, that also works with interpunction (unlike the explode(" ",...) solution)
$newString = preg_replace_callback("/[a-zA-Z]+/",'ucfirst_some',$string);
function ucfirst_some($match)
{
$exclude = array('and','not');
if ( in_array(strtolower($match[0]),$exclude) ) return $match[0];
return ucfirst($match[0]);
}
edit added strtolower(), or "Not" would remain "Not".

How about this ?
$string = str_replace(' And ', ' and ', ucwords($string));

You will have to use ucfirst and loop through every word, checking e.g. an array of exceptions for each one.
Something like the following:
$exclude = array('and', 'not');
$words = explode(' ', $string);
foreach($words as $key => $word) {
if(in_array($word, $exclude)) {
continue;
}
$words[$key] = ucfirst($word);
}
$newString = implode(' ', $words);

I know it is a few years after the question, but I was looking for an answer to the insuring proper English in the titles of a CMS I am programming and wrote a light weight function from the ideas on this page so I thought I would share it:
function makeTitle($title){
$str = ucwords($title);
$exclude = 'a,an,the,for,and,nor,but,or,yet,so,such,as,at,around,by,after,along,for,from,of,on,to,with,without';
$excluded = explode(",",$exclude);
foreach($excluded as $noCap){$str = str_replace(ucwords($noCap),strtolower($noCap),$str);}
return ucfirst($str);
}
The excluded list was found at:
http://www.superheronation.com/2011/08/16/words-that-should-not-be-capitalized-in-titles/
USAGE: makeTitle($title);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular expressions in php deletes dots, commas, etc - php

Related

Check if a word occur in string and not to be in first and last

Matching string in 2 different patterns in PHP. The code must return TRUE

ucwords function with exceptions

Before deploying php files, I want to optimize/encrypt them, how?

Uppercase the first character of each word in a string except 'and', 'to', etc

Categories

Resources