Tricky detection of href attribute with Regex and PHP

Tricky detection of href attribute with Regex and PHP - php

Given the following html code I want to detect the real href of the link. You see that I have a "fake" href in the div too.
$html = '
<a class="test">simple text</a>
<div data-href="yahoo.com">yahoo in div</div>
<a class="blabla" href="google.com">google</a>';
preg_match("'<a.*?href=[\'\"](.*?)[\'\"]'si", $html, $output);
What I got now is yahoo.com but this is not what I need.. I want to receive google.com.
Do you have any ideeas ?

You can try this:
(?<=href=")(\w+)\.\w+(?=">\1[^ ])
Check: https://regex101.com/r/nB1wP4/5

I would try simplify. Try it https://regex101.com/r/oU6kR8/1
\shref="([a-z.\/:]+)"

Related

Extract content with PHP Simple HTML DOM

i'm trying to extract "XXXXXXX" with PHP Simple HTML DOM description
<h2 class="title">XXXXXXX</h2>
I tried
$ret = $html->find('h2[class="title"]') ;
but i don't know the next instruction because there is no attribute. How i can do this ?
I need to extract also "XX" from this code, i think it's the same problem no ?
<a id="likeScore" appName='videos' object="video" objectid="96" direction="up" class="button like icon-heart youLike not-active">XX</a>
Thank you !

For the first one I think this could work:
$text = $html->find('h2[class="title"] a',0)->innertext;
For tags with ID you can use something more direct:
$text1 = $html->getElementById("likeScore")->innertext;
or using the #selector syntax
$text1 = $html->find('#likeScore',0)->innertext;
Documentation:
https://simplehtmldom.sourceforge.io/manual.htm#section_access

Edit iframe content using PHP, and preg_replace()

I need to load some 3rd party widget onto my website. The only way they distribute it is by means of clumsy old <iframe>.
I don't have much choice so what I do is get an iframe html code, using a proxy page on my website like so:
$iframe = file_get_contents('http://example.com/page_with_iframe_html.php');
Then I have to remove some specific parts in iframe like this:
$iframe = preg_replace('~<div class="someclass">[\s\S]*<\/div>~ix', '', $iframe);
In this way I intend to remove the unwanted section. And in the end i simply output the iframe like so:
echo ($iframe);
The iframe gets output alright, however the unwanted section is still there. The regex itself was tested using regex101, but it doesn't work.

You should try this way, Hope this will help you out. Here i am using sample HTML remove the div with given class name, First i load the document, query and remove that node from the child.
Try this code snippet here
<?php
ini_set('display_errors', 1);
//sample HTML content
$string1='<html>'
. '<body>'
. '<div>This is div 1</div>'
. '<div class="someclass"> <span class="hot-line-text"> hotline: </span> <a id="hot-line-tel" class="hot-line-link" href="tel:0000" target="_parent"> <button class="hot-line-button"></button> <span class="hot-line-number">0000</span> </a> </div>'
. '</body>'
. '</html>';
$object= new DOMDocument();
$object->loadHTML($string1);
$xpathObj= new DOMXPath($object);
$result=$xpathObj->query('//div[#class="someclass"]');
foreach($result as $node)
{
$node->parentNode->removeChild($node);
}
echo $object->saveHTML();

How to extract HTML element from a source file

I need to replace a HTML section identified by a tag id in a source code, which is combination of HTML and PHP using PHP. In case it's pure HTML, DOM parser could be used; in case there is no DIV in DIV, I can imagine how to use preg_match. This is what I am trying to do - I have a code (loaded into a string) like:
<div>
<img >
</div>
<? include(); ?>
<div id="mydiv">
<div>
<div>
<img >
</div>
</div>
</div>
and my task is to replace content of "mydiv" DIV with a new one e.g.
<div id="newdiv>
some text
</div>
so the string will look like this after the change:
<div>
<img >
</div>
<? include(); ?>
<div id="mydiv">
<div id="newdiv>
some text
</div>
</div>
I have already tried:
1) parsing the code using DOMdocument's loadHTML => it produces a lot of errors in case PHP code is included.
2) I played around a bit with regexes like preg_match_all('/<div id="myid"([^<]*)<\/div>/', $src, $matches), which fails in case more child divs are included.
The best approach I have found so far is:
1) find id="mydiv" string
2) search for '<' and '>' chars and count them like '<'=1 and '>'=-1 (not exactly, but it gives the idea)
3) once I get sum == 0 I should be on position of the closing tag, so I know, which portion string I should exchange
This is quite "heavy" solution, which can stop working in some cases, where the code is different (e.g. onpage PHP code contains the chars as well instead of just simple "include"). So I am looking so some better solution.

You could try something like this:
$file = 'filename.php';
$content = file_get_contents($file);
$array_one = explode( '<div id="mydiv">' , $content );
$my_div_content = explode("</div>" , $array_one[1] )[0];
Or use preg_match like you said:
preg_match('/<div id="mydiv"(.*?)<\/div>/s', $content, $matches)

Yes there is. First you need to use a function that will get the content of the file. Lets call the file homepage.php:
$homepageString = file_get_contents('homepage.php');
Now you have a string with all the content. The next thing you would do is use the preg_replace() function to take out the part of code that you want to take out:
$newHomepageString = preg_replace('/id="mydiv"/',"", $homepageString);
Now you overwrite the existing homepage.php file with the new source code:
file_put_contents("homepage.php", $newHomepageString);
Let me know if it worked for you! :)

Replace pagebreak tag with regular expression

Here's what I am tryin to accomplish. The CMS editor of our Magento webshop, has a button to insert a <!-- pagebreak --> tag. I would like to use this, to create a read more functionality. I thought I would search/replace for this tag to do this.
I want to search inside <p> tags, and I want people to be able to use this tag as often as they want.
Suppose this is my original HTML:
<p>This is my example text, but<!-- pagebreak --> this should be readable after 'click more'<!-- pagebreak --> with even more click more possible</p>
I would like to convert it to something like this.. I think the first one is the easiest to accomplish, maybe by doing an preg_replace in a while loop? The second one is probably cleaner/better html (less nesting)
<p>This is my example text, but <a href="#" onClick='#'>read more</a><div class='hiddenreadmore' id='hiddenreadmore-1'> this should be readable after 'click more'<a href="#" onClick='#'>read more</a><div class='hiddenreadmore' id='hiddenreadmore-2'> with even more click more possible</div></div></p>
or
<p>This is my example text, but <a href="#" onClick='#'>read more</a><div class='hiddenreadmore' id='hiddenreadmore-1'> this should be readable after 'click more'<a href="#" onClick='#'>read more</a></div><div class='hiddenreadmore' id='hiddenreadmore-2'> with even more click more possible</div></p>
So I came up with this, but I think there should be a way to do it with one replace.
$pattern = '#\<p\>(.+?)\<\!-- pagebreak --\>(.+?)\<\/p\>#s';
$count = true;
while ($count) {
$text = preg_replace($pattern, '<p>$1 read more<div class="hidden">$2</div></p>', $text, -1, $count);
}

Well if it you dont need to check if it's in a <p> tag you can use something like this:
str_replace ( "<!-- pagebreak -->" , '<p>$1 read more<div class="hidden">$2</div></p>' , $text, $count );
It's a lot lighter to the system.

I guess this would do the job:
$pattern = '#\<p>(.*?)\<!-- pagebreak -->(.*?)\</p>#s';
$text = "<p>some test <!-- pagebreak --> hidden content</p> second test <p>lolo <!-- pagebreak --> more hidden content</p>";
echo preg_replace($pattern, '<p>$1 read more<div class="hidden">$2</div></p>', $text, -1, $count);

PHP or Javascript: Simply Remove and Replace HTML Code

I have this code on my page, but the link has different names and ids:
<div class="myclass">
<a href="http://www.example.com/?vstid=00575000&veranstaltung=http://www.example.com/page.html">
Example Text</a>
</div>
how can I remove and Replace it to this:
<div class="myclass">Sorry no link</div>
With PHP or Javascript? I tried it with str.replace
Thank you!

I assume you mean dynamically? You won't be able to do this with php because it is server side, and doesn't have anything to do with the HTML once its been output to the screen.
See: http://www.tizag.com/javascriptT/javascript-innerHTML.php for the javascript.
Or you could use jquery which is just better and nicer than trying to do a cross browser compatible javascript script.
$('.myclass').html('Sorry...');

If the page is still on the server before you need to make the replacement, do this:
<?php if (allowed_to_see_link()) { ?>
<div class="myclass">
<a href="http://www.example.com/? vstid=00575000&veranstaltung=http://www.example.com/page.html">
Example Text</a>
</div>
<?php } else { ?>
non-link-text
<php } ?>
and also write the named functions...

You might want to clearify what you are up to. If that is your file, then you can simply open up in an editor and remove the portions. If you want to modify HTML with PHP, you can use native DOM
$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$xPath = new DOMXPath($dom);
foreach( $xPath->query('//div[#class="myclass"]/a') as $link) {
$link->parentNode->replaceChild(new DOMText('Sorry no link'), $link);
}
echo $dom->saveHTML();
The above code would replace any direct <a> element children of any <div> elements that have a class attribute of myclass with the Textnode "Sorry no link".

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Tricky detection of href attribute with Regex and PHP - php

You can try this: (?<=href=")(\w+)\.\w+(?=">\1[^ ]) Check: https://regex101.com/r/nB1wP4/5

I would try simplify. Try it https://regex101.com/r/oU6kR8/1 \shref="([a-z.\/:]+)"

Related

Extract content with PHP Simple HTML DOM

Edit iframe content using PHP, and preg_replace()

How to extract HTML element from a source file

Replace pagebreak tag with regular expression

PHP or Javascript: Simply Remove and Replace HTML Code

Categories

Resources