Regex, PHP - finding words that need correction - php

I have a long string with words. Some of the words have special letters.
For example a string "have now a rea$l problem with$ dolar inp$t"
and i have a special letter "$".
I need to find and return all the words with special letters in a quickest way possible.
What I did is a function that parse this string by space and then using “for” going over all the words and searching for special character in each word. When it finds it—it saves it in an array. But I have been told that using regexes I can have it with much better performance and I don’t know how to implement it using them.
What is the best approach for it?
I am a new to regex but I understand it can help me with this task?
My code: (forbiden is a const)
The code works for now, only for one forbidden char.
function findSpecialChar($x){
$special = "";
$exploded = explode(" ", $x);
foreach ($exploded as $word){
if (strpos($word,$forbidden) !== false)
$special .= $word;
}
return $special;
}

You could use preg_match like this:
// Set your special word here.
$special_word = "café";
// Set your sentence here.
$string = "I like to eat food at a café and then read a magazine.";
// Run it through 'preg_match''.
preg_match("/(?:\W|^)(\Q$special_word\E)(?:\W|$)/i", $string, $matches);
// Dump the results to see it working.
echo '<pre>';
print_r($matches);
echo '</pre>';
The output would be:
Array
(
[0] => café
[1] => café
)
Then if you wanted to replace that, you could do this using preg_replace:
// Set your special word here.
$special_word = "café";
// Set your special word here.
$special_word_replacement = " restaurant ";
// Set your sentence here.
$string = "I like to eat food at a café and then read a magazine.";
// Run it through 'preg_replace''.
$new_string = preg_replace("/(?:\W|^)(\Q$special_word\E)(?:\W|$)/i", $special_word_replacement, $string);
// Echo the results.
echo $new_string;
And the output for that would be:
I like to eat food at a restaurant and then read a magazine.
I am sure the regex could be refined to avoid having to add spaces before and after " restaurant " like I do in this example, but this is the basic concept I believe you are looking for.

Related

Most efficient way to find matching words from paragraph

I have a Paragraph that I have to parse for different keywords. For example, Paragraph:
"I want to make a change in the world. Want to make it a better place to live. Peace, Love and Harmony. It is all life is all about. We can make our world a good place to live"
And my keywords are
"world", "earth", "place"
I should report whenever I have a match and how many times.
Output should be:
"world" 2 times and "place" 1 time
Currently, I am just converting Paragraph strings to array of characters and then matching each keyword with all of the array contents.
Which is wasting my resources.
Please guide me for an efficient way.( I am using PHP)
As #CasimiretHippolyte commented, regex is the better means as word boundaries can be used. Further caseless matching is possible using the i flag. Use with preg_match_all return value:
Returns the number of full pattern matches (which might be zero), or FALSE if an error occurred.
The pattern for matching one word is: /\bword\b/i. Generate an array where the keys are the word values from search $words and values are the mapped word-count, that preg_match_all returns:
$words = array("earth", "world", "place", "foo");
$str = "at Earth Hour the world-lights go out and make every place on the world dark";
$res = array_combine($words, array_map( function($w) USE (&$str) { return
preg_match_all('/\b'.preg_quote($w,'/').'\b/i', $str); }, $words));
print_r($res); test at eval.in outputs to:
Array
(
[earth] => 1
[world] => 2
[place] => 1
[foo] => 0
)
Used preg_quote for escaping the words which is not necessary, if you know, they don't contain any specials. For the use of inline anonymous functions with array_combine PHP 5.3 is required.
<?php
Function woohoo($terms, $para) {
$result ="";
foreach ($terms as $keyword) {
$cnt = substr_count($para, $keyword);
if ($cnt) {
$result .= $keyword. " found ".$cnt." times<br>";
}
}
return $result;
}
$terms = array('world', 'earth', 'place');
$para = "I want to make a change in the world. Want to make it a better place to live.";
$r = woohoo($terms, $para);
echo($r);
?>
I will use preg_match_all(). Here is how it would look in your code. The actual function returns the count of items found, but the $matches array will hold the results:
<?php
$string = "world";
$paragraph = "I want to make a change in the world. Want to make it a better place to live. Peace, Love and Harmony. It is all life is all about. We can make our world a good place to live";
if (preg_match_all($string, $paragraph, &$matches)) {
echo 'world'.count($matches[0]) . "times";
}else {
echo "match NOT found";
}
?>

Matching a substring (an apostrophe) in a given word using regex

I have a server application which looks up where the stress is in Russian words. The end user writes a word жажда. The server downloads a page from another server which contains the stresses indicated with apostrophes for each case/declension like this жа'жда. I need to find that word in the downloaded page.
In Russian the stress is always written after a vowel. I've been using so far a regex that is a grouping of all possible combinations (жа'жда|жажда'). Is there a more elegant solution using just a regex pattern instead of making a PHP script which creates all these combinations?
EDIT:
I have a word жажда
The downloaded page contains the string жа'жда. (notice the
apostrophe, I do not before-hand know where the apostrophe in the
word is)
I want to match the word with apostrophe (жа'жда).
P.S.: So far I have a PHP script creating the string (жа'жда|жажда') used in regex (apostrophe is only after vowels) which matches it. My goal is to get rid of this script and use just regex in case it's possible.
If I understand your question,
have these options (d'isorder|di'sorder|dis'order|diso'rder|disor'der|disord'er|disorde'r|disorder‌​') and one of these is in the downloaded page and I need to find out which one it is
this may suit your needs:
<pre>
<?php
$s = "d'isorder|di'sorder|dis'order|diso'rder|disor'der|disord'er|disorde'r|disorder'|disorde'";
$s = explode("|",$s);
print_r($s);
$matches = preg_grep("#[aeiou]'#", $s);
print_r($matches);
running example: https://eval.in/207282
Uhm... Is this ok with you?
<?php
function find_stresses($word, $haystack) {
$pattern = preg_replace('/[aeiou]/', '\0\'?', $word);
$pattern = "/\b$pattern\b/";
// word = 'disorder', pattern = "diso'?rde'?r"
preg_match_all($pattern, $haystack, $matches);
return $matches[0];
}
$hay = "something diso'rder somethingelse";
find_stresses('disorder', $hay);
// => array(diso'rder)
You didn't specify if there can be more than one match, but if not, you could use preg_match instead of preg_match_all (faster). For example, in Italian language we have àncora and ancòra :P
Obviously if you use preg_match, the result would be a string instead of an array.
Based, on your code, and the requirements that no function is called and disorder is excluded. I think this is what you want. I have added a test vector.
<pre>
<?php
// test code
$downloadedPage = "
there is some disorde'r
there is some disord'er in the example
there is some di'sorder in the example
there also' is some order in the example
there is some disorder in the example
there is some dso'rder in the example
";
$word = 'disorder';
preg_match_all("#".preg_replace("#[aeiou]#", "$0'?", $word)."#iu"
, $downloadedPage
, $result
);
print_r($result);
$result = preg_grep("#'#"
, $result[0]
);
print_r($result);
// the code you need
$word = 'also';
preg_match("#".preg_replace("#[aeiou]#", "$0'?", $word)."#iu"
, $downloadedPage
, $result
);
print_r($result);
$result = preg_grep("#'#"
, $result
);
print_r($result);
Working demo: https://eval.in/207312

Function which searches for a word in a text and highlights all the words which contain it

This function searches for words (from the $words array) inside a text and highlights them.
function highlightWords(Array $words, $text){ // Loop through array of words
foreach($words as $word){ // Highlight word inside original text
$text = str_replace($word, '<span class="highlighted">' . $word . '</span>', $text);
}
return $text; // Return modified text
}
Here is the problem:
Lets say the $words = array("car", "drive");
Is there a way for the function to highlight not only the word car, but also words which contain the letters "car" like: cars, carmania, etc.
Thank you!
What you want is a regular expression, preg_replace or peg_replace_callback more in particular (callback in your case would be recommended)
<?php
$searchString = "The car is driving in the carpark, he's not holding to the right lane.\n";
// define your word list
$toHighlight = array("car","lane");
Because you need a regular expression to search your words and you might want or need variation or changes over time, it's bad practice to hard code it into your search words. Hence it's best to walk over the array with array_map and transform the searchword into the proper regular expression (here just enclosing it with / and adding the "accept everything until punctuation" expression)
$searchFor = array_map('addRegEx',$toHighlight);
// add the regEx to each word, this way you can adapt it without having to correct it everywhere
function addRegEx($word){
return "/" . $word . '[^ ,\,,.,?,\.]*/';
}
Next you wish to replace the word you found with your highlighted version, which means you need a dynamic change: use preg_replace_callback instead of regular preg_replace so that it calls a function for every match it find and uses it to generate the proper result. Here we enclose the found word in its span tags
function highlight($word){
return "<span class='highlight'>$word[0]</span>";
}
$result = preg_replace_callback($searchFor,'highlight',$searchString);
print $result;
yields
The <span class='highlight'>car</span> is driving in the <span class='highlight'>carpark</span>, he's not holding to the right <span class='highlight'>lane</span>.
So just paste these code fragments after the other to get the working code, obviously. ;)
edit: the complete code below was altered a bit = placed in routines for easy use by original requester. + case insensitivity
complete code:
<?php
$searchString = "The car is driving in the carpark, he's not holding to the right lane.\n";
$toHighlight = array("car","lane");
$result = customHighlights($searchString,$toHighlight);
print $result;
// add the regEx to each word, this way you can adapt it without having to correct it everywhere
function addRegEx($word){
return "/" . $word . '[^ ,\,,.,?,\.]*/i';
}
function highlight($word){
return "<span class='highlight'>$word[0]</span>";
}
function customHighlights($searchString,$toHighlight){
// define your word list
$searchFor = array_map('addRegEx',$toHighlight);
$result = preg_replace_callback($searchFor,'highlight',$searchString);
return $result;
}
I haven't tested it, but I think this should do it:-
$text = preg_replace('/\W((^\W)?$word(^\W)?)\W/', '<span class="highlighted">' . $1 . '</span>', $text);
This looks for the string inside a complete bounded word and then puts the span around the whole lot using preg_replace and regular expressions.
function replace($format, $string, array $words)
{
foreach ($words as $word) {
$string = \preg_replace(
sprintf('#\b(?<string>[^\s]*%s[^\s]*)\b#i', \preg_quote($word, '#')),
\sprintf($format, '$1'), $string);
}
return $string;
}
// courtesy of http://slipsum.com/#.T8PmfdVuBcE
$string = "Now that we know who you are, I know who I am. I'm not a mistake! It
all makes sense! In a comic, you know how you can tell who the arch-villain's
going to be? He's the exact opposite of the hero. And most times they're friends,
like you and me! I should've known way back when... You know why, David? Because
of the kids. They called me Mr Glass.";
echo \replace('<span class="red">%s</span>', $string, [
'mistake',
'villain',
'when',
'Mr Glass',
]);
Sine it's using an sprintf format for the surrounding string, you can change your replacement accordingly.
Excuse the 5.4 syntax

PHP: Bolding of overlapping keywords in string

This is a problem that I have figured out how to solve, but I want to solve it in a simpler way... I'm trying to improve as a programmer.
Have done my research and have failed to find an elegant solution to the following problem:
I have a hypothetical array of keywords to search for:
$keyword_array = array('he','heather');
and a hypothetical string:
$text = "What did he say to heather?";
And, finally, a hypothetical function:
function bold_keywords($text, $keyword_array)
{
$pattern = array();
$replace = array();
foreach($keyword_array as $keyword)
{
$pattern[] = "/($keyword)/is";
$replace[] = "<b>$1</b>";
}
$text = preg_replace($pattern, $replace, $text);
return $text;
}
The function (not too surprisingly) is returning something like this:
"What did <b>he</b> say to <b>he</b>ather?"
Because it is not recognizing "heather" when there is a bold tag in the middle of it.
What I want the final solution to do is, as simply as possible, return one of the two following strings:
"What did <b>he</b> say to <b>heather</b>?"
"What did <b>he</b> say to <b><b>he</b>ather</b>?"
Some final conditions:
--I would like the final solution to deal with a very large number of possible keywords
--I would like it to deal with the following two situations (lines represent overlapping strings):
One string engulfs the other, like the following two examples:
-- he, heather
-- sanding, and
Or one string does not engulf the other:
-- entrain, training
Possible way to solve:
-A regex that ignores tags in keywords
-Long way (that I am trying to avoid):
*Search string for all occurrences of each keyword, store an array of positions (start and end) of keywords to be bolded
*Process this array recursively to combine overlapping keywords, so there is no redundancy
*Add the bold tags (starting from the end of the string, to avoid the positions of information shifting from the additional characters)
Many thanks in advance!
Example
$keyword_array = array('he','heather');
$text = "What did he say to heather?";
$pattern = array();
$replace = array();
sort($keyword_array, SORT_NUMERIC);
foreach($keyword_array as $keyword)
{
$pattern[] = "/ ($keyword)/is";
$replace[] = " <b>$1</b>";
}
$text = preg_replace($pattern, $replace, $text);
echo $text; // What did <b>he</b> say to <b>heather</b>?
need to change your regex pattern to recognize that each "term" you are searching for is followed by whitespace or punctuation, so that it does not apply the pattern match to items followed by an alpha-numeric.
Simplistic and lazy-ish Approach off The Top of My head:
Sort your initial Array by Item length, descending! No more "Not recognized because there's already a Tag in The Middle" issues!
Edit: The nested tags issue is then easily fixed by extending your regex in a Way that >foo and foo< isn't being matched anymore.

Does anyone have a PHP snippet of code for grabbing the first "sentence" in a string?

If I have a description like:
"We prefer questions that can be answered, not just discussed. Provide details. Write clearly and simply."
And all I want is:
"We prefer questions that can be answered, not just discussed."
I figure I would search for a regular expression, like "[.!\?]", determine the strpos and then do a substr from the main string, but I imagine it's a common thing to do, so hoping someone has a snippet lying around.
A slightly more costly expression, however will be more adaptable if you wish to select multiple types of punctuation as sentence terminators.
$sentence = preg_replace('/([^?!.]*.).*/', '\\1', $string);
Find termination characters followed by a space
$sentence = preg_replace('/(.*?[?!.](?=\s|$)).*/', '\\1', $string);
<?php
$text = "We prefer questions that can be answered, not just discussed. Provide details. Write clearly and simply.";
$array = explode('.',$text);
$text = $array[0];
?>
My previous regex seemed to work in the tester but not in actual PHP. I have edited this answer to provide full, working PHP code, and an improved regex.
$string = 'A simple test!';
var_dump(get_first_sentence($string));
$string = 'A simple test without a character to end the sentence';
var_dump(get_first_sentence($string));
$string = '... But what about me?';
var_dump(get_first_sentence($string));
$string = 'We at StackOverflow.com prefer prices below US$ 7.50. Really, we do.';
var_dump(get_first_sentence($string));
$string = 'This will probably break after this pause .... or won\'t it?';
var_dump(get_first_sentence($string));
function get_first_sentence($string) {
$array = preg_split('/(^.*\w+.*[\.\?!][\s])/', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
// You might want to count() but I chose not to, just add
return trim($array[0] . $array[1]);
}
Try this:
$content = "My name is Younas. I live on the pakistan. My email is **fromyounas#gmail.com** and skype name is "**fromyounas**". I loved to work in **IOS development** and website development . ";
$dot = ".";
//find first dot position
$position = stripos ($content, $dot);
//if there's a dot in our soruce text do
if($position) {
//prepare offset
$offset = $position + 1;
//find second dot using offset
$position2 = stripos ($content, $dot, $offset);
$result = substr($content, 0, $position2);
//add a dot
echo $result . '.';
}
Output is:
My name is Younas. I live on the pakistan.
current(explode(".",$input));
I'd probably use any of the multitudes of substring/string-split functions in PHP (some mentioned here already).
But also look for ". " OR ".\n" (and possibly ".\n\r") instead of just ".". Just in case for whatever reason, the sentence contains a period that isn't followed by a space. I think it will harden the likelihood of you getting genuine results.
Example, searching for just "." on:
"I like stackoverflow.com."
Will get you:
"I like stackoverflow."
When really, I'm sure you'd prefer:
"I like stackoverflow.com."
And once you have that basic search, you'll probably come across one or two occasions where it may miss something. Tune as you run with it!
Try this:
reset(explode('.', $s, 2));

Categories