preg_replace_callback highlight pattern not match in result - php

I have this code:
$string = 'The quick brown fox jumped over the lazy dog and lived to tell about it to his crazy moped.';
$text = explode("#", str_replace(" ", " #", $string)); //ugly trick to preserve space when exploding, but it works (faster than preg_split)
foreach ($text as $value) {
echo preg_replace_callback("/(.*p.*e.*d.*|.*a.*y.*)/", function ($matches) {
return " <strong>".$matches[0]."</strong> ";
}, $value);
}
The point of it is to be able to enter a sequence of characters (in the code above it's a fixed pattern), and it finds and highlights those characters in the matched word. The code I have now highlights the entire word. I'm looking for the most efficient way of highlighting the characters.
The result of the current code:
The quick brown fox jumped over the lazy dog and lived to tell about it to his crazy moped.
What I would like to have:
The quick brown fox jumped over the lazy dog and lived to tell about it to his crazy moped.
Did I take the wrong approach? It would be awesome if someone could point me in the right way, I've been searching for hours and didn't find what I was looking for.
EDIT 2:
Divaka's been a great help. Almost there.. I apologize if I haven't been clear enough on what my goal is. I will try to explain further.
- Part A -
One of the things I will be using this code for is a phone book. A simple example:
When following characters are entered:
Jan
I need it to match following examples:
Jan Verhoeven
Arjan Peters
Raj Naren
Jered Von Tran
The problem is that I will be iterating over the entire phone book, person-record per person-record. Each person also has email-addresses, a postal address, maybe a website, a extra note, ect.. This means that the text I'm actually search can contain anything from letters, numbers, special characters(&#()%_- etc..), newlines, and most importantly spaces. So an entire record (csv) might contain the following info:
Name;Address;Email address;Website;Note
Jan Verhoeven;Veldstraat 2a, 3209 Herkstad;jan#werk.be;www.janophetwerk.be,jan#telemet.be;Jan die ik ontmoet heb op de bouwbeurs.\n Zelfstandige vertegenwoordiger van bouwmaterialen.
Raj Naren;Kerklaan 334, 5873 Biep;raj#werk.be;;Rechtstreekse contactpersoon bij Werk.be (#654 intern)
The \n is meant to be an actual newline. So if I search for #werk.be, I'd like to see both these records as a result.
- Part B -
Something else I want to use this for is searching song-texts. When I'm looking for a song and I can only remember it had to do something with ducks or docks and a circle, I would enter dckcircle and get the following result:
... and the ducks were all dancing in a great big circle, around the great big bonfire ...
To be able to fine-tune the searching I'd like to be able to limit the number of spaces (or any other character), because I would imagine it finding a simple pattern like eve in every song while I'm only looking for a song that has the exact word eve in it.
- Conclusion -
If I summarize this in pseudo-regex, for a search pattern abc with a max of 3 spaces in-between it would be something like this: (I might be totally off here)
(a)(any character, max 3 spaces)(b)(any character, max 3 spaces)(c)
Or more generic:
(a)({any character}{these characters with a limit of 3})(b)({any character}{these characters with a limit of 3})(c)
This can even be extended to this fairly easily I'm guessing:
(a)({any character}{these characters with a limit of 3}{not these characters})(b)({any character}{these characters with a limit of 3}{not these characters})(c)
(I know the ´{}´ brackets are not to be used that way in a regular expression, but I don't know how else to put it without using a character that has a meaning in regular expressions.)
If anyone wonders, I know the sql like statement would be able to do 80% (I'm guessing, might even be more) of what I'm trying to do, but I'm trying to avoid using a database to make this as portable as possible.
When the correct answer has been found, I'll clean this question (and the code) up and post the resulting php-class here (maybe I'll even put it up on github if that would be useful), so anyone looking for the same will have a fully working class to work with :).

I've came up with this. Tell me if it's what you want!
//$string = "The quick brown fox jumped over the lazy dog and lived to tell about it to his crazy moped.";
$string = "abcdefo";
//$pattern_array1 = array(a,y);
//$pattern_array2 = array(p,e,d);
$pattern_array1 = array(e,f);
$pattern_array2 = array(o);
$pattern_array2 = array(a,f);
$number_of_patterns = 2;
$regexp1 = generate_regexp($pattern_array1, 1);
$regexp2 = generate_regexp($pattern_array2, 2);
$string = preg_replace($regexp1["pattern"], $regexp1["replacement"], $string);
$string = preg_replace($regexp2["pattern"], $regexp2["replacement"], $string);
$string = transform_multimatched_chars($string);
// transforming other chars after transforming the multimatched ones
for($i = 1; $i <= $number_of_patterns; $i++) {
$string = str_replace("#{$i}", "<strong>", $string);
$string = str_replace("#/{$i}", "</strong>", $string);
}
echo $string;
function generate_regexp($pattern_array, $pattern_num) {
$regexp["pattern"] = "/";
$regexp["replacement"] = "";
$i = 0;
foreach($pattern_array as $key => $char) {
$regexp["pattern"] .= "({$char})";
$regexp["replacement"] .= "#{$pattern_num}\$". ($key + $i+1) . "#/{$pattern_num}";
if($key < count($pattern_array) - 1) {
$regexp["pattern"] .= "(?s)((?:(?!{$pattern_array[$key + 1]})(?!\s).)*)";
$regexp["replacement"] .= "\$".($key + $i+2) . "";
}
$i = $key + 1;
}
$regexp["pattern"] .= "/";
return $regexp;
}
function transform_multimatched_chars($string)
{
preg_match_all("/((#[0-9]){2,})(.*)((#\/[0-9]){2,})/", $string, $matches);
// change this for your purposes
$start_replacement = '<span style="color:red;">';
$end_replacement = '</span>';
foreach($matches[1] as $key => $match)
{
$string = str_replace($match, $start_replacement, $string);
$string = str_replace($matches[4][$key], $end_replacement, $string);
}
return $string;
}

Related

Add <span> after 3 sentences only once and then add <p></p> after every 3 sentences

I have some long simple text and i need to insert <span id='colorme'></span> tag after first 5 sentences only once and after that i need to insert <p...></p> tags after every 5 sentences till the end of the text. But if full text less than 5 sentences, the script must do nothing.
For example:
Without false modesty, we state that we have the best staff possible. And it's not some kind of farce, fiction or someone's evil joke. No, no - this is the most sincere truth. All our employees are incredibly welcoming, smiling, polite, tidy and competent in their work. Thanks to this, our sauna has been working successfully for many years, bringing pleasure to all its customers, both permanent and new. Come, we will be glad to see you. With respect to you, Alina.
And i need:
Without false modesty, we state that we have the best staff possible. And it's not some kind of farce, fiction or someone's evil joke. No, no - this is the most sincere truth. <span id='colorme'></span> All our employees are incredibly welcoming, smiling, polite, tidy and competent in their work. Thanks to this, our sauna has been working successfully for many years, bringing pleasure to all its customers, both permanent and new. Come, we will be glad to see you. <p style='color:red'>www.example.com</p> With respect to you, Alina.
It's just an example. So have something like this, it's not work property, just added <span> after every 3 sentences but i need only once, i don't even know what must I do
<?php
$long_text = 'long long text';
$str = $long_text;
$arr = explode(".", $str);
$new_str = "";
$j = 1;
foreach($arr as $arr_el) {
$new_str .= $arr_el.".";
if($j % 3 == 0) {
$new_str .= "<span id=colorme></span>";
};
$j++;
}
echo $new_str;?>
change like below:-
<?php
$long_text = 'long long text';
$str = $long_text;
$arr = explode(".", $str);
$new_str = "";
$j = 1;
foreach($arr as $arr_el) {
$new_str .= $arr_el.".";
if($j == 3) { // add span after first 3 sentences
$new_str .= "<span id=colorme></span>";
}else{
if($j %3 == 0) { // now after each 3rd sentence add paragraph
$new_str .= "<p class=colorme></p>";
}
}
$j++;
}
echo $new_str;
?>
Note:- since <p></p> is going to repeat multiple time so i changed id to class, because multiple same id is not correct.

php str_replace function issue

I am trying to have this ORIGINAL string converted to the RESULT below using php.
ORIGINAL: "The quick <font color="brown">brown</font> fox jumps over the lazy dog"
RESULT:"god yzal eht revo spmuj xof <font color="brown">nworb</font> kciuq ehT"
What I have done so far is explained like below.
First, strip the HTML tag from the ORIGINAL.
$originalStr = "The quick <font color='brown'>brown</font> fox jumps over the lazy dog";
$stripTags = strip_tags($originalStr);
This results to The quick brown fox jumps over the lazy dog ,
Second, I reverse the result and the word "brown" by using strrev function
$reverseStr = strrev($stripTags);
$brown = strrev("brown");
This results to god yzal eht revo spmuj xof nworb kciuq ehT
Third, I am trying to use str_replace function to find $brown from the reverseStr, and replace it with $openFont $brown $closeFont like below.
$openFont = "<font color='brown'>";
$closeFont = "</font>";
$result = str_replace($brown, $openFont.$brown.$closeFont, $reverseStr);
echo "result -->" . $result . "<br/><br/><br/>";
This results to god yzal eht revo spmuj xof kciuq ehT, NOT the same as the RESULT.
It seems like special characters in font () tag is the problem that may be blocking str_replace to replace the String.
$result = str_replace($brown, "TEST", $reverseStr);
echo "result -->" . $result . "<br/><br/><br/>";
This results to god yzal eht revo spmuj xof TEST kciuq ehT
Does anyone know str_replace is not accepting special characters? and know how I should solve this problem?
If there is another way to solve the problem, I will also be appreciated to hear your suggestion.
(* This is one of the practical questions that I am trying to solve in an algorithm test websites)
UPDATED: I feel so dumb to think that where font tag was. Since tag is meant to change the font color, it was working perfectly in the beginning. Thank you very much everyone for your time!
If it was me, I'd do this (fully tested):
// Original string
$str = 'The quick <font color="brown">brown</font> fox jumps over the lazy dog';
// Strip the font tag
$str = strip_tags( $str );
// Convert string to array
$arr = str_split( $str );
// Reverse the array
$rra = array_reverse( $arr );
// Convert array back to string
$str = implode( $rra );
// Add font tag back in
$str = str_replace('nworb', '<font color="brown">nworb</font>', $str);
// Result
echo $str;
Parse the HTML with something that will give you a DOM API to it.
Write a function that loops over the child nodes of an element.
If a node is a text node, get the data as a string, split it on words, reverse each one, then assign it back.
If a node is an element, recurse into your function.
Use preg_match_all() function.
$originalStr = "The quick <font color='brown'>brown</font> fox jumps over the lazy dog";
preg_match_all('|<[^>]+>(.*)</[^>]+>|U', $originalStr, $matches, PREG_SET_ORDER, 0);
$_tag = $matches[0][0];
$_txt = $matches[0][1];
$newString = str_replace($_tag,$_txt,$originalStr);

Find a pattern within two or more sets of text

I have lots of data that I need to search through for certain patterns.
Problem is when looking for said patterns I have no reference to what I'm looking for.
Or in other words, I have two paragraphs. Each on similar topics. I need to be able to compare both paragraphs and find patterns. Phrases said in both paragraphs and how many times both were said.
Can't seem to find the solution because preg_match and other functions your required to supply the things your looking for.
Example paragraphs
Paragraph 1:
Bee Pollen is made by honeybees, and is the food of the young bee. It
is considered one of nature's most completely nourishing foods as it
contains nearly all nutrients required by humans. Bee-gathered pollens
are rich in proteins (approximately 40% protein), free amino acids,
vitamins, including B-complex, and folic acid.
Paragraph 2:
Bee Pollen is made by honeybees. It is required for the fertilization
of the plant. The tiny particles consist of 50/1,000-millimeter
corpuscles, formed at the free end of the stamen in the heart of the
blossom, nature's most completely nourishing foods. Every variety of
flower in the universe puts forth a dusting of pollen. Many orchard
fruits and agricultural food crops do, too.
So from those examples these patterns:
Bee Pollen is made by honeybees
and:
nature's most completely nourishing foods
Both phrases are found in both paragraphs.
This is potentially a complex question depending on whether you're looking for similar phrases or phrases that match word for word.
Finding exact word-for-word matches is quite simple all you need to do is split on common breaks like punctuation marks (e.g. .,;:) and perhaps on conjunctions as well (e.g. and or). However, the problem comes when you come to, for example, adjectives two phrases might be exactly the same but have one word different, like so:
The world is spinnnig around its axis at a tremendous speed.
The world is spinning around its axis at a magnificent speed.
This won't match because tremendous and magnificent are used in place of one another. Potentially you could work around this, however, that would be a more complex question.
Answer
If we stick to the simple side of things we can achieve phrase matching with just a few lines of code (4 in this example; not including the formatting for comments/readability).
$wordSplits = 'and or on of as'; //List of words to split on
preg_match_all('/(?<m1>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para1, $matches1);
preg_match_all('/(?<m2>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para2, $matches2);
$commonPhrases = array_filter( //Removes blank $key=>$value pairs
array_intersect( //Finds matching paterns
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para1 values - removes leading and following spaces
}, $matches1['m1']),
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para2 values - removes leading and following spaces
}, $matches2['m2'])
)
);
var_dump($commonPhrases);
/**
OUTPUT:
array(2) {
[0]=>
string(31) "bee pollen is made by honeybees"
[5]=>
string(41) "nature's most completely nourishing foods"
}
/*
The above code will find matches splitting both on punctuation (defined in [...] of the preg_match_all pattern) it will also concatenate the word list (matching only words in the word list with a preceding and following space).
Wordlist
You can change the word list to include any breaks you like, editing the list until you get the phrases you are after, examples:
$wordSplits = 'and or';
$wordSplits = 'and but if or';
$wordSplits = 'a an as and by but because if in is it of off on or';
Punctuation
You can add any punctuation marks you like into the list (between [ and ]), however remember that some characters do have special meanings and may need to be escaped (or placed appropriately): - and ^ should become \- and \^ or be placed where their special meaning doesn't come into play.
You may consider changing:
([.,;:\-]|
To:
([.,;:\-] | //Adding a space before the pipe
So that you only split punctuation marks which are followed by a space. For example: this would mean that items like 50,000 won't be split.
Spaces and breaks
You may also consider changing the spaces to \s so that tabs and newlines etc are included and not just spaces. Like so:
'/(?<m1>.*?)([.,;:\-]|\s'.str_replace(' ', '\s|\s', trim($wordSplits)).'\s)/i'
This would also apply to:
([.,;:\-]\s|
If you decide to go down that route.
I've been working on this code, don't know if it suits your needs... Feel free to expand it!
$p1 = "Bee Pollen is made by honeybees, and is the food of the young bee. It is considered one of nature's most completely nourishing foods as it contains nearly all nutrients required by humans. Bee-gathered pollens are rich in proteins (approximately 40% protein), free amino acids, vitamins, including B-complex, and folic acid.";
$p2 = "Bee Pollen is made by honeybees. It is required for the fertilization of the plant. The tiny particles consist of 50/1,000-millimeter corpuscles, formed at the free end of the stamen in the heart of the blossom, nature's most completely nourishing foods. Every variety of flower in the universe puts forth a dusting of pollen. Many orchard fruits and agricultural food crops do, too.";
// Strip strings of periods etc.
$p1 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p1));
$p2 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p2));
// Extract words from first paragraph
$w1 = explode(" ", $p1);
// Build search string
$search = '';
$found = array();
foreach ($w1 as $word) {
//echo 'Word: ' . $word . "<br />";
$search .= ' ' . $word;
$search = trim($search);
//echo '. . Search string: '. $search . "<br /><br />";
if (substr_count($p2, $search)) {
$old_search = $search;
$num_occured = substr_count($p2, $search);
//echo " . . . found!" . "<br /><br /><br />";
$add = TRUE;
} else {
//echo " . . . not found! Generating new search string: " . $word . '<br />';
if ($add) {
$found[] = array('pattern' => $old_search, 'occurences' => $num_occured);
$add = FALSE;
}
$old_search = '';
$search = $word;
}
}
print_r($found);
The above code finds occurences of patterns from the first string in the second one.
I'm sure it can be written better, but since it's past midnight (local time), I'm not as "fresh" as I'd like to be...
Codepad-link

PHP trying to split paragraph into sentences. Keep punctuation

Basically I'm taking in a paragraph filled with all kinds of punctuation
such as ! ? . ; " and splitting them into sentences.
The issues I'm facing is coming up with a way to split them into sentences with punctuation intact while at the same time accounting for quotations in dialogue
For instance the paragraph:
One morning, when Gregor Samsa woke from troubled dreams, he found
himself transformed in his bed into a horrible vermin. "What has
happened!?" he asked himself. "I... don't know." said Samsa, "Maybe
this is a bad dream." He lay on his armour-like back, and if he lifted
his head a little he could see his brown belly, slightly domed and
divided by arches into stiff sections.
Would need to be split up like this
[0] One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin.
[1] "What has happened!?" he asked himself.
[2] "I... don't know." said Samsa, "Maybe this is a bad dream."
And so on.
Currently I am just using explode
$sentences = explode(".", $sourceWork);
and only splitting it up by the periods and appending one at the end. Which I know is far from what I want but I'm not quite sure where to even start handling this. If someone could at least point me the right direction of where to look for ideas that would be amazing.
Thanks in advance!
Here's what I have:
<?php
/**
* #param string $str String to split
* #param string $end_of_sentence_characters Characters which represent the end of the sentence. Should be a string with no spaces (".,!?")
*
* #return array
*/
function split_sentences($str, $end_of_sentence_characters) {
$inside_quotes = false;
$buffer = "";
$result = array();
for ($i = 0; $i < strlen($str); $i++) {
$buffer .= $str[$i];
if ($str[$i] === '"') {
$inside_quotes = !$inside_quotes;
}
if (!$inside_quotes) {
if (preg_match("/[$end_of_sentence_characters]/", $str[$i])) {
$result[] = $buffer;
$buffer = "";
}
}
}
return $result;
}
$str = <<<STR
One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. "What has happened!?" he asked himself. "I... don't know." said Samsa, "Maybe this is a bad dream." He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections.
STR;
var_dump(split_sentences($str, "."));
preg_split('/[.?!]/',$sourceWork);
it's very simple regular expression, but i think you task is impossible.
you need to manually go through your String and do explodes. Keep track of quotation count, if it is odd number do not break, here is a simple idea:
<?
//$str = 'AAA. BBB. "CCC." DDD. EEE. "FFF. GGG. HHH".';
$str = 'One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. "What has happened!?" he asked himself. "I... don\'t know." said Samsa, "Maybe this is a bad dream." He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections.';
$last_dot=0;
$quotation=0;
$explode_list = Array();
for($i=0;$i < strlen($str);$i++)
{
$char = substr($str,$i,1);//get the currect character
if($char == '"') $quotation++;//track quotation
if($quotation%2==1) continue;//nothing to do so go back
if($char == '.')
{
echo "char is $char $last_dot<br/>";
$explode_list[]=(substr($str,$last_dot,$i+1-$last_dot));
$last_dot = $i+1;
}
}
echo "testing:<pre>";
print_r($explode_list);;

PHP Wordwrap on an angle, increasing indentation each break

is it possible with php's wordwrap to add increased indentation each line break to essentially create wordwrapping on an angle?
If I understand your question correcly, you would like to produce an output like:
xxxxxxx
xxxxxxxxx
xxxxxxxxxxx
xxxxxxxxxxxxx
xxxxxxxxxxxxxxx
Obviously, replacing x with your text.
wordwrap built-in function does not support this feature but you still can write your own, with a simple loop. Change the max length on each iteration, and break your initial string (depending on your needs, where you find a space or wherever you want).
<?php
$text = "The quick brown fox jumped over the lazy dog.";
echo $return = costomwrap($text , 10);
function costomwrap($text , $len)
{
$str = '';
for($i=0;$i<strlen($text);$i = $i+$len)
{
$str .= substr($text , $i , $len ).'<br />';
$len--;
}
return $str;
}
?>
live demo http://codepad.org/v3ysqV5A.
here, <br /> not come on your programme.

Categories