Extract var value from preg_replace function - php

I'm trying to simulate a bbcode tag, like code below:
[code]this is code to render[/code]
[code attributeA=arg]this is code to render[/code]
[code attribute C=arg anotherAtributte=anotherArg]this is code to render[/code]
As you can see, the code tag can take as many attributes as needed, also could exists too many code tags in the same "publishment". I only have dealed with easiest tags like img, b, a, i. For example:
$result = preg_replace('#\[link\=(.+)\](.+)\[\/link\]#iUs', '$2', $publishment);
That works fine since it returns the final markup. But, in the code tag I need to have the "attributes" and "values" in array in order to build the markup myselft according to these attributes in order to simulate someting like this:
$code_tag = someFunction("[code ??=?? ...] content [/code]", $array );
//build the markup myself
$attribute1 = array_contains("attribute1", $array)? $array["attribute1"] : "";
echo '<pre {$attribute1}>' . $array['content'] . </pre>
So, I don't expect that you do it entirely for me, I need you just help to take me to the right direction because I never have used regex.
Thank you in advance

I like to use preg_replace_callback for such things:
function codecb($matches)
{
$original=$matches[0];
$parameters=$matches[1];
$content=$matches[2];
return "<pre>". $content ."</pre>";
}
preg_replace_callback("#\[code(.*)\](.+)\[\/code\]#iUs", "codecb", $str);
so when you have [code argA=test argB=test]This is content[/code] then in the function "codecb" you will have:
$original = "[code argA=test argB=test]This is content[/code]"
$parameters = " argA=test argB=test"
$content = "This is content"
and can preg_match the arguments and return the replacement for the whole.

Related

PHP Simple Html Dom get the plain text of div,but avoiding all other tags

I use PHP Simple Html Dom to get some html,now i have a html dom like follow code,i need fetch the plain text inner div,but avoiding the p tags and their content(only return 111111), who can help me?Thanks in advance!
<div>
<p>00000000</p>
111111
<p>22222222</p>
</div>
It depends on what you mean by "avoiding the p tags".
If you just want to remove the tags, then just running strip_tags() on it should work for what you want.
If you actually want to just return "11111" (ie. strip the tags and their contents) then this isn't a viable solution. For that, something like this may work:
$myDiv = $html->find('div'); // wherever your the div you're ending up with is
$children = $myDiv->children; // get an array of children
foreach ($children AS $child) {
$child->outertext = ''; // This removes the element, but MAY NOT remove it from the original $myDiv
}
echo $myDiv->innertext;
If you text is always at the same position , try this:
$html->find('text', 2)->plaintext; // should return 111111
Here is my solution
I want to get the Primary Text part only.
$title_obj = $article->find(".ofr-descptxt",0); //Store the Original Tree ie) h3 tag
$title_obj->children(0)->outertext = ""; //Unset <br/>
$title_obj->children(1)->outertext = ""; //Unset the last Span
echo $title_obj; //It has only first element
Edited:
If you have PHP errors
Try to enclose with If else or try my lazy code
($title_obj->children(0))?$title_obj->children(0)->outertext="":"";
($title_obj->children(1))?$title_obj->children(1)->outertext = "":"";
Official Documentation
$wordlist = array("<p>", "</p>")
foreach($wordlist as $word)
$string = str_replace($word, "", $string);

Get second item with preg_match in php

I am getting text between two tags with PHP (from a HTML).
a sample code i use is this :
function GDes($url) {
$fp = file_get_contents($url);
if (!$fp) return false;
$res = preg_match("/<description>(.*)<\/description>/siU", $fp, $title_matches);
if (!$res) return false;
$description = preg_replace('/\s+/', ' ', $title_matches[1]);
$description = trim($description);
return $description;
}
It gives between the description tags, But my problem is that if the page have to description tags, it will give the first one that i don't need it.
I need to get the second one.
For example, If my HTML is this :
<description>No need to this</description>
<description>I NEED THIS ONE</description>
I need to give the second description tag with that function above.
What changes the function needed ?
Use preg_match_all instead. It will create an array with all matches.
You can keep your code as is, just replace preg_match with preg_match_all.
Then you have to use $title_matches[1][1] instead of $title_matches[1] in your preg_replace call, since the $title_matches is now a multidimensional array.

How to use htmlspecialchars only on <code></code> tags.

I'm using WordPress and would like to create a function that applies the PHP function htmlspecialchars only to code contained between <code></code> tags. I appreciate this may be fairly simple but I'm new to PHP and can't find any references on how to do this.
So far I have the following:
function FilterCodeOnSave( $content, $post_id ) {
return htmlspecialchars($content, ENT_NOQUOTES);
}
Obviously the above is very simple and performs htmlspecialchars on the entire content of my page. I need to limit the function to only apply to the HTML between code tags (there may be multiple code tags on each page).
Any help would be appreciated.
Thanks,
James
EDIT: updated to avoid multiple CODE tags
Try this:
<?php
// test data
$textToScan = "Hi <code>test12</code><br>
Line 2 <code><br>
Test <b>Bold</b><br></code><br>
Test
";
// debug:
echo $textToScan . "<hr>";
// the regex pattern (case insensitive & multiline
$search = "~<code>(.*?)</code>~is";
// first look for all CODE tags and their content
preg_match_all($search, $textToScan, $matches);
//print_r($matches);
// now replace all the CODE tags and their content with a htmlspecialchars() content
foreach($matches[1] as $match){
$replace = htmlspecialchars($match);
// now replace the previously found CODE block
$textToScan = str_replace($match, $replace, $textToScan);
}
// output result
echo $textToScan;
?>
Use DOMDocument to get all <code> tags;
// in this example $dom is an instance of DOMDocument
$code_tags = $dom->getElementsByTagName('code');
if ($code_tags) {
foreach ($code_tags as $node) {
// [...]
}
// [...]
}
i know this is a little bit late, but you can call the htmlspecialchars function first and then when outputting call the htmlspecialchars_decode function

Regex in PHP to extract data from website

I am new to php. As a part of my course homework assignment , I am required to extract data from a website and using that data render a table.
P.S. : Using regex is not a good option but we are not allowed to use any library like DOM, jQuery etc.
Char set is UTF-8.
$searchURL = "http://www.allmusic.com/search/artists/the+beatles";
$html = file_get_contents($searchURL);
$patternform = '/<form(.*)<\/form>/sm';
preg_match_all($patternform ,$html,$matches);
Here regex works fine but when I apply the same regex for table tag, it return me empty array. Is there something to do with whitespaces in $html ?
What is wrong here?
The following code produces a good result:
$searchURL = "http://www.allmusic.com/search/artists/the+beatles";
$html = file_get_contents($searchURL);
$patternform = '/(<table.*<\/table>)/sm';
preg_match_all($patternform ,$html,$matches);
echo $matches[0][0];
Result:

Using preg_replace_callback to identify and manipulate latex code

I have latex + html code somewhere in the following form:
...some text1.... \[latex-code1\]....some text2....\[latex-code2\]....etc
Firstly I want to obtain the latex codes in an array codes[] to be able to send them to a server for rendering, so that
code[0]=latex-code1, code[1]=latex-code2, etc
Secondly, I want to modify this text so that it looks like:
...some text1.... <img src="root/1.png">....some text2....<img src="root/2.png">....etc
i.e, the i-th latex code fragment is replaced by the link to the i-th rendered image.
I have been trying to do this with preg_replace_callback and preg_match_all but being new to PHP haven't been able to make it work. Please advise.
If you're looking for codez:
$html = '...some text1.... \[latex-code1\]....some text2....\[latex-code2\]....etc';
$codes = array();
$count = 0;
$replace = function($matches) use (&$codes, &$count) {
list(, $codes[]) = $matches;
return sprintf('<img src="root/%d.png">', ++$count);
};
$changed = preg_replace_callback('~\\\\\\[(.+?)\\\\\\]~', $replace, $html);
echo "Original: $html\n";
echo "Changed : $changed\n\nLatex Codes: ", print_r($codes, 1), "Count: ", $count;
I don't know at which part you've got the problems, if it's the regex pattern, you use characters inside your markers that needs heavy escaping: For PHP and PCRE, that's why there are so many slashes.
Another tricky part is the callback function because it needs to collect the codes as well as having a counter. It's done in the example with an anonymous function that has variable aliases / references in it's use clause. This makes the variables $codes and $count available inside the callback.

Categories