Detect quotes with in quotes using RegEx - php

I am looking for a way to detect and drop quotes with in quotes, for example: something "something "something something" something" something.
In the above example the italic something something is wrapped in double-quotes as you can see. I want to strip the string inside from these outer quotes.
So, the expression should simply look for quotes with a text between them plus a another set of text-wrapping text, and then drop the quotes wrapping the last.
This is my current code (php):
preg_match_all('/".*(".*").*"/', $text, $matches);
if(is_array($matches[0])){
foreach($matches[0] as $match){
$text = str_replace($match, '"' . str_replace('"', '', $match) . '"', $text);
}
}

If the string starts with a " and the double quotes inside the string are always balanced you might use:
^"(*SKIP)(*F)|"([^"]*)"
That would match a double quote at the start of the string and then skips that match using SKIP FAIL. Then it would match ", capture in a group what is between the " and match a " again.
In the replacement you could use capturing group 1 $1
$pattern = '/^"(*SKIP)(*F)|"([^"]+)"/';
$str = "\"something \"something something\" and then \"something\" something\"";
echo preg_replace($pattern, "$1", $str);
"something something something and then something something"
Demo

You could leverage strpos() with the third parameter (offset) to look up all quotes and replace every quote from 1 to n-1:
<?php
$data = <<<DATA
something "something "something something" something" something
DATA;
# set up the needed variables
$needle = '"';
$lastPos = 0;
$positions = array();
# find all quotes
while (($lastPos = strpos($data, $needle, $lastPos)) !== false) {
$positions[] = $lastPos;
$lastPos = $lastPos + strlen($needle);
}
# replace them if there are more than 2
if (count($positions) > 2) {
for ($i=1;$i<count($positions)-1;$i++) {
$data[$positions[$i]] = "";
}
}
# check the result
echo $data;
?>
This yields
something "something something something something" something
You could even hide it in a class:
class unquote {
# set up the needed variables
var $data = "";
var $needle = "";
var $positions = array();
function cleanData($string="", $needle = '"') {
$this->data = $string;
$this->needle = $needle;
$this->searchPositions();
$this->replace();
return $this->data;
}
private function searchPositions() {
$lastPos = 0;
# find all quotes
while (($lastPos = strpos($this->data, $this->needle, $lastPos)) !== false) {
$this->positions[] = $lastPos;
$lastPos = $lastPos + strlen($this->needle);
}
}
private function replace() {
# replace them if there are more than 2
if (count($this->positions) > 2) {
for ($i=1;$i<count($this->positions)-1;$i++) {
$this->data[$this->positions[$i]] = "";
}
}
}
}
And call it with
$q = new unquote();
$data = $q->cleanData($data);

Related

Php search replacer

I have a search String: $str (Something like "test"), a wrap string: $wrap (Something like "|") and a text string: $text (Something like "This is a test Text").
$str is 1 Time in $text. What i want now is a function that will wrap $str with the wrap defined in $wrap and output the modified text (even if $str is more than one time in $text).
But it shall not output the whole text but just 1-2 of the words before $str and then 1-2 of the words after $str and "..." (Only if it isn`t the first or last word). Also it should be case insensitive.
Example:
$str = "Text"
$wrap = "<span>|</span>"
$text = "This is a really long Text where the word Text appears about 3 times Text"
Output would be:
"...long <span>Text</span> where...word <span>Text</span> appears...times <span>Text</span>"
My Code (Obviusly doesnt works):
$tempar = preg_split("/$str/i", $text);
if (count($tempar) <= 2) {
$result = "... ".substr($tempar[0], -7).$wrap.substr($tempar[1], 7)." ...";
} else {
$amount = substr_count($text, $str);
for ($i = 0; $i < $amount; $i++) {
$result = $result.".. ".substr($tempar[$i], -7).$wrap.substr($tempar[$i+1], 0, 7)." ..";
}
}
If you have a tipp or a solution dont hesitate to let me know.
I have taken your approach and made it more flexible. If $str or $wrap changes you could have escaping issues within the regex pattern so I have used preg_quote.
Note that I added $placeholder to make it clearer, but you can use $placeholder = "|" if you don't like [placeholder].
function wrapInString($str, $text, $element = 'span') {
$placeholder = "[placeholder]"; // The string that will be replaced by $str
$wrap = "<{$element}>{$placeholder}</{$element}>"; // Dynamic string that can handle more than just span
$strExp = preg_quote($str, '/');
$matches = [];
$matchCount = preg_match_all("/(\w+\s+)?(\w+\s+)?({$strExp})(\s+\w+)?(\s+\w+)?/i", $text, $matches);
$response = '';
for ($i = 0; $i < $matchCount; $i++) {
if (strlen($matches[1][$i])) {
$response .= '...';
}
if (strlen($matches[2][$i])) {
$response .= $matches[2][$i];
}
$response .= str_replace($placeholder, $matches[3][$i], $wrap);
if (strlen($matches[4][$i])) {
$response .= $matches[4][$i];
}
if (strlen($matches[5][$i]) && $i == $matchCount - 1) {
$response .= '...';
}
}
return $response;
}
$text = "text This is a really long Text where the word Text appears about 3 times Text";
string(107) "<span>text</span> This...long <span>text</span> where...<span>text</span> appears...times <span>text</span>"
To make the replacement case insensitive you can use the i regex option.
If I understand your question correct, just a little bit of implode and explode magic needed
$text = "This is a really long Text where the word Text appears about 3 times Text";
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));
If you specifically need to render the span tags using HTML, just write it that way
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));
Use patern below to get your word and 1-2 words before and after
/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i
demo
In PHP code it can be:
$str = "Text";
$wrap = "<span>|</span>";
$text = "This is a really long Text where the word Text appears about 3 times Text";
$temp = str_replace('|', $str, $wrap); // <span>Text</span>
// find patern and 1-2 words before and after
// (to make it casesensitive, delete 'i' from patern)
if(preg_match_all('/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i', $text, $match)) {
$res = array_map(function($x) use($str, $temp) { return '... '.str_replace($str, $temp, $x) . ' ...';}, $match[0]);
echo implode(' ', $res);
}

PHP - Check if string contains illegal chars in string

In JS you can do:
var chs = "[](){}";
var str = "hello[asd]}";
if (str.indexOf(chs) != -1) {
alert("The string can't contain the following characters: " + chs.split("").join(", "));
}
How can you do this in PHP (replacing alert with echo)?
I do not want to use a regex for the simplicity of what I think.
EDIT:
What I've tried:
<?php
$chs = /[\[\]\(\)\{\}]/;
$str = "hella[asd]}";
if (preg_match(chs, str)) {
echo ("The string can't contain the following characters: " . $chs);
}
?>
Which obviously doesn't work and idk how to do it without regex.
In php you should do this:
$string = "Sometring[inside]";
if(preg_match("/(?:\[|\]|\(|\)|\{|\})+/", $string) === FALSE)
{
echo "it does not contain.";
}
else
{
echo "it contains";
}
The regex says check to see any of the characters are inside the string. you can read more about it here:
http://en.wikipedia.org/wiki/Regular_expression
And about PHP preg_match() :
http://php.net/manual/en/function.preg-match.php
Update:
I have written an updated regex for this, which captures the letters inside:
$rule = "/(?:(?:\[([\s\da-zA-Z]+)\])|\{([\d\sa-zA-Z]+)\})|\(([\d\sa-zA-Z]+)\)+/"
$matches = array();
if(preg_match($rule, $string, $matches) === true)
{
echo "It contains: " . $matches[0];
}
It returnes something like this:
It contains: [inside]
I have changed the regex only which becomes:
$rule = "/(?:(?:(\[)(?:[\s\da-zA-Z]+)(\]))|(\{)(?:[\d\sa-zA-Z]+)(\}))|(\()(?:[\d\sa-zA-Z]+)(\))+/";
// it returns an array of occurred illegal characters
It now returns [] for this "I am [good]"
Why not you try str_replace.
<?php
$search = array('[',']','{','}','(',')');
$replace = array('');
$content = 'hella[asd]}';
echo str_replace($search, $replace, $content);
//Output => hellaasd
?>
Instead of regex we can use string replace for this case.
here is a simple solution without using regex:
$chs = array("[", "]", "(", ")", "{", "}");
$string = "hello[asd]}";
$err = array();
foreach($chs AS $key => $val)
{
if(strpos($string, $val) !== false) $err[]= $val;
}
if(count($err) > 0)
{
echo "The string can't contain the following characters: " . implode(", ", $err);
}

PHP: How to find text NOT between particular tags?

Example input string: "[A][B][C]test1[/B][/C][/A] [A][B]test2[/B][/A] test3"
I need to find out what parts of text are NOT between the A, B and C tags. So, for example, in the above string it's 'test2' and 'test3'. 'test2' doesn't have the C tag and 'test3' doesn't have any tag at all.
If can also be nested like this:
Example input string2: "[A][B][C]test1[/B][/C][/A] [A][B]test2[C]test4[/C][/B][/A] test3"
In this example "test4" was added but "test4" has the A,B and C tag so the output wouldn't change.
Anyone got an idea how I could parse this?
This solution is not clean but it does the trick
$string = "[A][B][C]test1[/B][/C][/A] [A][B]test2[/B][/A] test3" ;
$string = preg_replace('/<A[^>]*>([\s\S]*?)<\/A[^>]*>/', '', strtr($string, array("["=>"<","]"=>">")));
$string = trim($string);
var_dump($string);
Output
string 'test3' (length=5)
Considering the fact that everyone of you tags is in [A][/A] What you can do is: Explode the [/A] and verify if each array contains the [A] tag like so:
$string = "[A][B][C]test1[/B][/C][/A] [A][B]test2[/B][/A] test3";
$found = ''; // this will be equal to test3
$boom = explode('[/A]', $string);
foreach ($boom as $val) {
if (strpos($val, '[A] ') !== false) { $found = $val; break; }
}
echo $found; // test3
try the below code
$str = 'test0[A]test1[B][C]test2[/B][/C][/A] [A][B]test3[/B][/A] test4';
$matches = array();
// Find and remove the unneeded strings
$pattern = '/(\[A\]|\[B\]|\[C\])[^\[]*(\[A\]|\[B\]|\[C\])[^\[]*(\[A\]|\[B\]|\[C\])([^\[]*)(\[\/A\]|\[\/B\]|\[\/C\])[^\[]*(\[\/A\]|\[\/B\]|\[\/C\])[^\[]*(\[\/A\]|\[\/B\]|\[\/C\])/';
preg_match_all( $pattern, $str, $matches );
$stripped_str = $str;
foreach ($matches[0] as $key=>$matched_pattern) {
$matched_pattern_str = str_replace($matches[4][$key], '', $matched_pattern); // matched pattern with text between A,B,C tags removed
$stripped_str = str_replace($matched_pattern, $matched_pattern_str, $stripped_str); // replace pattern string in text with stripped pattern string
}
// Get required strings
$pattern = '/(\[A\]|\[B\]|\[C\]|\[\/A\]|\[\/B\]|\[\/C\])([^\[]+)(\[A\]|\[B\]|\[C\]|\[\/A\]|\[\/B\]|\[\/C\])/';
preg_match_all( $pattern, $stripped_str, $matches );
$required_strings = array();
foreach ($matches[2] as $match) {
if (trim($match) != '') {
$required_strings[] = $match;
}
}
// Special case, possible string on start and end
$pattern = '/^([^\[]*)(\[A\]|\[B\]|\[C\]).*(\[\/A\]|\[\/B\]|\[\/C\])([^\[]*)$/';
preg_match( $pattern, $stripped_str, $matches );
if (trim($matches[1]) != '') {
$required_strings[] = $matches[1];
}
if (trim($matches[4]) != '') {
$required_strings[] = $matches[4];
}
print_r($required_strings);

How do I return a part of text with a certain word in the middle?

If this is the input string:
$input = 'In biology (botany), a "fruit" is a part of a flowering
plant that derives from specific tissues of the flower, mainly one or
more ovaries. Taken strictly, this definition excludes many structures
that are "fruits" in the common sense of the term, such as those
produced by non-flowering plants';
And now I want to perform a search on the word tissues and consequently return only a part of the string, defined by where the result is, like this:
$output = '... of a flowering plant that derives from specific tissues of the flower, mainly one or more ovaries ...';
The search term may be in the middle.
How do I perform the aforementioned?
An alternative to my other answer using preg_match:
$word = 'tissues'
$matches = array();
$found = preg_match("/\b(.{0,30}$word.{0,30})\b/i", $string, $matches);
if ($found == 0) {
// string not found
} else {
$output = $matches[1];
}
This may be better as it uses word boundaries.
EDIT: To surround the search term with a tag, you'll need to slightly alter the regex. This should do it:
$word = 'tissues'
$matches = array();
$found = preg_match("/\b(.{0,30})$word(.{0,30})\b/i", $string, $matches);
if ($found == 0) {
// string not found
} else {
$output = $matches[1] . "<strong>$word</strong>" . $matches[2];
}
User strpos to find the location of the word and substr to extract the quote. For example:
$word = 'tissues'
$pos = strpos($string, $word);
if ($pos === FALSE) {
// string not found
} else {
$start = $pos - 30;
if ($start < 0)
$start = 0;
$output = substr($string, $start, 70);
}
Use stripos for case insensitive search.

highlight the word in the string, if it contains the keyword

how write the script, which menchion the whole word, if it contain the keyword? example: keyword "fun", string - the bird is funny, result - the bird is * funny*. i do the following
$str = "my bird is funny";
$keyword = "fun";
$str = preg_replace("/($keyword)/i","<b>$1</b>",$str);
but it menshions only keyword. my bird is funny
Try this:
preg_replace("/\w*?$keyword\w*/i", "<b>$0</b>", $str)
\w*? matches any word characters before the keyword (as least as possible) and \w* any word characters after the keyword.
And I recommend you to use preg_quote to escape the keyword:
preg_replace("/\w*?".preg_quote($keyword)."\w*/i", "<b>$0</b>", $str)
For Unicode support, use the u flag and \p{L} instead of \w:
preg_replace("/\p{L}*?".preg_quote($keyword)."\p{L}*/ui", "<b>$0</b>", $str)
You can do the following:
$str = preg_replace("/\b([a-z]*${keyword}[a-z]*)\b/i","<b>$1</b>",$str);
Example:
$str = "Its fun to be funny and unfunny";
$keyword = 'fun';
$str = preg_replace("/\b([a-z]*${keyword}[a-z]*)\b/i","<b>$1</b>",$str);
echo "$str"; // prints 'Its <b>fun</b> to be <b>funny</b> and <b>unfunny</b>'
<?php
$str = "my bird is funny";
$keyword = "fun";
$look = explode(' ',$str);
foreach($look as $find){
if(strpos($find, $keyword) !== false) {
if(!isset($highlight)){
$highlight[] = $find;
} else {
if(!in_array($find,$highlight)){
$highlight[] = $find;
}
}
}
}
if(isset($highlight)){
foreach($highlight as $replace){
$str = str_replace($replace,'<b>'.$replace.'</b>',$str);
}
}
echo $str;
?>
Here by am added multi search in a string for your reference
$keyword = ".in#.com#dot.com#1#2#3#4#5#6#7#8#9#one#two#three#four#five#Six#seven#eight#nine#ten#dot.in#dot in#";
$keyword = implode('|',explode('#',preg_quote($keyword)));
$str = "PHP is dot .com the amazon.in 123455454546 dot in scripting language of choice.";
$str = preg_replace("/($keyword)/i","<b>$0</b>",$str);
echo $str;
Basically, since this is HTML, what you have to do is iterate over text nodes and split those containing the search string into up to three nodes (before match, after match and the highlighted match). If "after match" node exist, it must be processed too. Here is a PHP7 example using PHP DOM extension. The following function accepts preg_quoted UTF-8 search string (or regex-conpatible expression like apple|orange). It will enclose every match in a given tag with a given class.
function highlightTextInHTML($regex_compatible_text, $html, $replacement_tag = 'span', $replacement_class = 'highlight') {
$d = new DOMDocument('1.0','utf-8');
$d->loadHTML('<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/></head>' . $html);
$xpath = new DOMXPath($d);
$process_node = function(&$node) use($regex_compatible_text, $replacement_tag, $replacement_class, &$d, &$process_node) {
$i = preg_match("~(?<before>.*?)(?<search>($regex_compatible_text)+)(?<after>.*)~ui", $node->textContent, $m);
if($i) {
$x = $d->createElement($replacement_tag);
$x->setAttribute('class', $replacement_class);
$x->textContent = $m['search'];
$parent_node = $node->parentNode;
$before = null;
$after = null;
if(!empty($m['after'])) {
$after = $d->createTextNode($m['after']);
$parent_node->replaceChild($after, $node);
$parent_node->insertBefore($x, $after);
} else {
$parent_node->replaceChild($x, $node);
}
if(!empty($m['before'])) {
$before = $d->createTextNode($m['before']);
$parent_node->insertBefore($before, $x);
}
if($after) {
$process_node($after);
}
}
};
$node_list = $xpath->query('//text()');
foreach ($node_list as $node) {
$process_node($node);
}
return preg_replace('~(^.*<body>)|(</body>.*$)~mis', '', $d->saveHTML());
}
Search and highlight the word in your string, text, body and paragraph:
<?php $body_text='This is simple code for highligh the word in a given body or text'; //this is the body of your page
$searh_letter = 'this'; //this is the string you want to search for
$result_body = do_Highlight($body_text,$searh_letter); // this is the result with highlight of your search word
echo $result_body; //for displaying the result
function do_Highlight($body_text,$searh_letter){ //function for highlight the word in body of your page or paragraph or string
$length= strlen($body_text); //this is length of your body
$pos = strpos($body_text, $searh_letter); // this will find the first occurance of your search text and give the position so that you can split text and highlight it
$lword = strlen($searh_letter); // this is the length of your search string so that you can add it to $pos and start with rest of your string
$split_search = $pos+$lword;
$string0 = substr($body_text, 0, $pos);
$string1 = substr($body_text,$pos,$lword);
$string2 = substr($body_text,$split_search,$length);
$body = $string0."<font style='color:#FF0000; background-color:white;'>".$string1." </font> ".$string2;
return $body;
} ?>

Categories