i want to highlight text in a given string with given keywords and add a random number of surrounding words.
Example sentence:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed.
Example keyword:
dolore magna
Desired result:
(mark 0-4 words before and after the keyword
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et **dolore magna** aliquyam erat, sed.
What did i try?
( [\w,\.-\?]+){0,5} ".$myKeyword." (.+ ){2,5}
and
([a-zA-Z,. ]+){1,3} ".$n." ([a-zA-Z,. ]+){1,3}
Any ideas how to improve this and make it more robust?
For highlighting use preg_replace function. Here's an idea: $s = "dolore magna";
$str = preg_replace(
'/\b(?>[\'\w-]+\W+){0,4}'.preg_quote($s, "/").'(?:\W+[\'\w-]+){0,4}/i',
'<b>$0</b>', $str);
Test the pattern at regex101 or php test at eval.in. echo $str;
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed.
Using i flag for caseless matching - drop if not wanted. First group ?> atomic for performance.
As word character I used ['\w-] (\w shorthand for word character, ' and -)
\W matches a character, that is not a word character (negated \w)
\b matches a word boundary. Used it for better performance.
I think this would accomplish what you are after. Please see the demo for an explanation of everything the regex is doing, or post a comment if you have a question.
Regex:
((?:[\w,.\-?]+\h){0,5})\b' . . '\b((?:.+\h){2,5})
Demo: https://regex101.com/r/vG8qT2/1
PHP:
<?php
$string = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed.';
$term = 'dolore magna';
$min = 0;
$max = 5;
preg_match('~((?:[\w,.\-?]+\h){'.$min.','.$max. '})\b' . preg_quote($term) . '\b((?:.+\h){'.$min.','.$max.'})~', $string, $matches);
print_r($matches);
Demo: https://eval.in/410063
Note the captured values will be in $matches[1] and $matches[2].
Related
I want to replace "\n" by " in first line and last line of email
$chekclist = $_POST['emaillist'];
$rwina = explode("\n", "$chekclist");
$i = 0;
$count = 1;
foreach ($rwina as $key => $email[i])
Actually you cannot do that because \n is where the line ends.
I'm assuming that you want your email to format like:
"Lorem ipsum dolor sit amet
consectetur adipiscing elit
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua."
But the text you'll get from $_POST['emaillist'] will be format like this:
Lorem ipsum dolor sit amet \n
consectetur adipiscing elit\n
sed do eiusmod tempor incididunt \n
ut labore et dolore magna aliqua. \n
So if you want to replace \n with " it will be like this:
Lorem ipsum dolor sit amet"
consectetur adipiscing elit
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua."
But there is a way to achieve what you are looking for if I'm assuming it right :p
So here's the code:
$chekclist = $_POST['emaillist']; // Get email text
$rwina = explode("\n", "$chekclist"); // Make array
$count = count($rwina); // Count array values
for ($i = 0; $i < $count; $i++) {
if ($i == 0) {
echo '"' . $rwina[$i] . '<br>';
} else if ($i == ($count - 1)) {
echo $rwina[$i] . '"<br>';
} else {
echo $rwina[$i]. '<br>';
}
}
Let me know if this is what you are looking for :)
I have problem with regex tag html. Any one please help me!
Thanks this is some case of me... I have search and think but not do it.
Case 1
// My input to regex
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit <br/><img src="img.jpg/> sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua<p>
// Out Put after regex
Lorem ipsum dolor sit amet, consectetur adipisicing elit <br/><img src="img.jpg/> sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua
Case 2
// My input to regex
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit</p>
// Out put after regex
Lorem ipsum dolor sit amet, consectetur adipisicing elit
Case 3
// My input to regex
<p><ul>...</ul><p>
//Out put after regex
NULL
I'm guessing something like this is what you're after (example in javascript).
function checkParagraph(str)
{
var result = str.match(/^<p>([^<].*[^>])<\/p>$/i);
if (result) return result[1];
else return null;
}
alert(checkParagraph("<p>Lorem ipsum <br/><img src=\"img.jpg\"/> magna aliqua</p>"));
alert(checkParagraph("<p>Lorem ipsum magna aliqua</p>"));
alert(checkParagraph("<p><img src=\"img.jpg\"/></p>"));
With the additional information about only allowing BR, IMG, A and IMG-inside-A tags, the regex is quite different:
function checkParagraph(str)
{
var result = str.match(/^<p>(([^<>]+|<br\/>|<img[^>]+>|<a[^>]+>[^<>]*<\/a>|<a[^>]+><img[^>]+><\/a>)*)<\/p>$/i);
if (result) return result[1];
else return null;
}
alert(checkParagraph("Lorem ipsum magna aliqua"));
alert(checkParagraph("<p>Lorem ipsum magna aliqua</p>"));
alert(checkParagraph("<p>Lorem ipsum <br/> magna aliqua</p>"));
alert(checkParagraph("<p>Lorem ipsum magna aliqua</p>"));
alert(checkParagraph("<p>Lorem ipsum <img src=\"img.jpg\"/> magna aliqua</p>"));
alert(checkParagraph("<p>Lorem ipsum <br/><img src=\"img.jpg\"/> magna aliqua</p>"));
alert(checkParagraph("<p><br/><img src=\"img.jpg\"/></p>"));
alert(checkParagraph("<p><span>magna</span> aliqua</p>"));
alert(checkParagraph("<p><span>magna</span> aliqua</p>"));
alert(checkParagraph("<p><br/><img src=\"img.jpg\"/><span>magna</span> aliqua</p>"));
Break-down of the regex:
/.../i -> case insensitive for upper and lower case tags
^<p>...<\/p>$ -> input is enclosed in P tag
(...) -> the capture group between the brackets will become result[1]
(...|...)* -> any number of the following options:
[^<>]+ -> option 1: any text without tags
<br\/> -> option 2: a BR tag
<img[^>]+> -> option 3: an IMG tag
<a[^>]+>[^<>]*<\/a> -> option 4: an A tag with text inside
<a[^>]+><img[^>]+><\/a> -> option 5: an A tag with an IMG tag inside
I have code on a side which looks like the one below and will be generated from a CMS.
The user can generate a table, but I have to put a <div> around it.
<p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et
dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo
dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem</p>
<table>
<thead>
<tr><td></td></tr>
...
</tbody>
</table>
<p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et
dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo
dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem</p>
<table>
<thead>
<tr><td></td></tr>
...
</tbody>
</table>
...
My goal is it now to give every <table> a <div class="table">
I´ve tried it with regex and got this result:
function smarty_modifier_table($string) {
preg_match_all('/<table.*?>(.*?)<\/table>/si', $string, $matches);
echo "<pre>";
var_dump($matches);
}
/* result
array(2) {
[0]=> string(949) "<table>...</table>"
[1]=> string(934) "<thead>...</tbody>"
}
array(2) {
[0]=> string(949) "<table>...</table>"
[1]=> string(934) "<thead>...</tbody>"
}
*/
First of all, I do not understand why the second array [1]=> string(934) "<thead>...</tbody>" appears
and second how to fit the modified array back into the string on the right place.
If your html is really simple like this, the following would probably work:
print preg_replace('~<table.+?</table>~si', "<div class='table'>$0</div>", $html);
If, however, you can have nested tables:
<table>
<tr><td> <table>INNER!</table> </td></tr>
</table>
this expression will fail miserably - that's why using regexes to parse html is not recommended. To handle complex html it's better to use a parser library, for example, XML DOM:
$doc = new DOMDocument();
$doc->loadHTML($html);
$body = $doc->getElementsByTagName('body')->item(0);
foreach($body->childNodes as $s) {
if($s->nodeType == XML_ELEMENT_NODE && $s->tagName == 'table') {
$div = $doc->createElement("div");
$div->setAttribute("class", "table");
$body->replaceChild($div, $s);
$div->appendChild($s);
}
}
This one handles nested tables correctly.
$buffer = preg_replace('%<table>(.*?)</table>%sim', '<table><div class="table">$1</div></table>', $buffer);
Thank you all for your incredible fast and perfect help!
So it works for me.
$result = preg_replace('~~si', "$0", $string);
return $result;
regards
Torsten
How would I use PHP's preg_replace() to return only the value inside the <h1> in the following string (it's HTML text loaded in a variable called $html):
<h1>I'm Header</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque tincidunt porttitor magna, quis molestie augue sagittis quis.</p>
<p>Pellentesque tincidunt porttitor magna, quis molestie augue sagittis quis. Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
I've tried this: preg_replace('#<h1>([.*])</h1>.*#', '$1', $html), but to no avail. Am I regex-ing this correctly? And is there a better PHP function that I should be using instead of preg_replace?
Here is how you do it using preg_replace:
$header = preg_replace('/<h1>(.*)<\/h1>.*/iU', '$1', $html);
You can also use preg_match:
$matches = array();
preg_match('/<h1>(.*)</h1>.*/iU', $html, $matches);
print_r($matches);
([.*]) means dot OR astersk
What you need is (.*?), which means any amount of any characters ungreedy
or
([^<]*) - which means any amount of any characters but not <
How could I convert everyting between a tag to html enities:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua.
<code class="highlight sql">
CREATE TABLE `comments`
</code>
<h1>Next step</h1>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et
dolore magna aliquyam erat, sed diam voluptua.
At vero eos et accusam et justo duo dolores et ea rebum.
<b>Stet clita kasd gubergren, no sea takimata sanctus</b> est Lorem
dolor sit amet. Lorem ipsum dolor sit amet, consetetur
sadipscing elitr, sed diam nonumy eirmod tempor invidunt
ut labore et dolore magna aliquyam erat, sed diam voluptua:
<code class="highlight php">
<?php
$host = "localhost";
?>
</code>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr.
Note: That example above is a string which I could convert in PHP.
This comes down to a regex for me. And before you start shouting it is possible to reliably match & replace subsets of html, as long as there are no nesting tags.
This is the easy way tbh. A regex to match a tag start till end and apply a function to the matches / encoding what we need and replacing it.
Heres the code:
<?php
$string = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua.
<code class="highlight sql">
CREATE TABLE `comments`&
</code>
<h1>Next step</h1>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et
dolore magna aliquyam erat, sed diam voluptua.
At vero eos et accusam et justo duo dolores et ea rebum.
<b>Stet clita kasd gubergren&, no sea takimata sanctus</b> est Lorem
dolor sit amet. Lorem ipsum dolor sit amet, consetetur
sadipscing elitr, sed diam nonumy " eirmod " tempor invidunt
ut labore et dolore magna aliq&uyam erat, sed diam voluptua:
<code class="highlight php">
<?php
* $host = "localhost";
?>&
</code>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr.';
echo preg_replace("/(<code[^>]*?>)(.*?)(<\/code>)/se", "
stripslashes('$1').
htmlentities(stripslashes('$2')).
stripslashes('$3')
", $string);
And heres a working testcase on codepad
http://codepad.org/MhKwfOQl
This will work as long as there are no nasty nested tags / corrupted html.
I would still advise you to try and make sure you save the data as you want to make it visible, encoded where needed.
If you want to replace between a different set of tags change the regex.
Update: It seemed that $host was being parsed by php... and ofrourse we don't want this. This happened because php evaluates the replacement string as php which then executes the given functions and inputs the found strings into those functions, and if that string is encapsulated by double qoutes it will parse those strings too... heh what a hassle.
And another problem then arises, php escapes single and double qoutes in matches so they won't generate parse errors, this ment that any qoutes in the matches had to be stripped from their slashes too... resulting in the pretty long replace string.
Although a regular expression or parser may give you a solution to this puzzle, I think you may be going about your goal the wrong way.
Taken from the comments below the question:
#Poru How is that string generated?
#Phil: Fetched from database. It's
the content of a tutorial. It's an own development "CMS".
If you are storing this string in a database, and it's function is to return HTML content, you should be storing the content ready to serve as HTML, which means you must escape the appropriate characters with their equivalent HTML entities.
This was the advice already offered to you in this question: https://stackoverflow.com/questions/7059776/include-source-code-in-html-valid/7059834
The characters that must be escaped are explained here (among other various references):
http://php.net/manual/en/function.htmlspecialchars.php
The translations performed are:
'&' (ampersand) becomes '&'
'"' (double quote) becomes '"' when ENT_NOQUOTES is not set.
"'" (single quote) becomes ''' only when ENT_QUOTES is set.
'<' (less than) becomes '<'
'>' (greater than) becomes '>'
If in fact this is the case, and this string is supposed to be HTML output and has no other function, it doesn't make any sense to save it as invalid HTML, or at least not what you intend it to be.
If you must store your code examples unescaped, consider a separate database table for these snippets, and simply run htmlspecialchars() on them before outputting it to the HTML document. You could even assign a language to each record, and use the appropriate syntax highlighting tool for each case automatically.
What you are attempting, in my opinion, is not the appropriate solution to this particular problem, in this context. Escaping the characters and having your HTML content ready to be output to screen in it's current form is the way to go.
$dom = new DOMDocument;
$dom->loadHTML(...);
$tags = $dom->getElementsByTagName('tag');
foreach($tags as $tag) {
$tag->nodeValue = htmlentities($tag->nodeValue);
}
$dom->saveHTML();