Extract content with PHP Simple HTML DOM - php

i'm trying to extract "XXXXXXX" with PHP Simple HTML DOM description
<h2 class="title">XXXXXXX</h2>
I tried
$ret = $html->find('h2[class="title"]') ;
but i don't know the next instruction because there is no attribute. How i can do this ?
I need to extract also "XX" from this code, i think it's the same problem no ?
<a id="likeScore" appName='videos' object="video" objectid="96" direction="up" class="button like icon-heart youLike not-active">XX</a>
Thank you !

For the first one I think this could work:
$text = $html->find('h2[class="title"] a',0)->innertext;
For tags with ID you can use something more direct:
$text1 = $html->getElementById("likeScore")->innertext;
or using the #selector syntax
$text1 = $html->find('#likeScore',0)->innertext;
Documentation:
https://simplehtmldom.sourceforge.io/manual.htm#section_access

Related

php regex get text between html tags

I want to scrap information about products from other site and the tags in which the price is held looks like that:
<span class="text10black">Price: <strong style="color:#000000;">15.90 $</strong></span>
In this case I need to extract only 15.90.
I have tried this:
$site_content = file_get_contents('url');
preg_match_all('#<span class="text10black">Price: <strong style="color:#000000;">(.*?) $</strong></span>#', $site_content, $product_prices);
Where 'url' is the url from which I scrap the products, but when I check the $product_prices var with var_dump() it says NULL
Using Simple Dom Parser http://simplehtmldom.sourceforge.net/ seems the best idea for doing what you need.
$html = file_get_html($url);
foreach($html->find('.text10black strong') as $element)
var_dump($element->plaintext);

How to extract HTML element from a source file

I need to replace a HTML section identified by a tag id in a source code, which is combination of HTML and PHP using PHP. In case it's pure HTML, DOM parser could be used; in case there is no DIV in DIV, I can imagine how to use preg_match. This is what I am trying to do - I have a code (loaded into a string) like:
<div>
<img >
</div>
<? include(); ?>
<div id="mydiv">
<div>
<div>
<img >
</div>
</div>
</div>
and my task is to replace content of "mydiv" DIV with a new one e.g.
<div id="newdiv>
some text
</div>
so the string will look like this after the change:
<div>
<img >
</div>
<? include(); ?>
<div id="mydiv">
<div id="newdiv>
some text
</div>
</div>
I have already tried:
1) parsing the code using DOMdocument's loadHTML => it produces a lot of errors in case PHP code is included.
2) I played around a bit with regexes like preg_match_all('/<div id="myid"([^<]*)<\/div>/', $src, $matches), which fails in case more child divs are included.
The best approach I have found so far is:
1) find id="mydiv" string
2) search for '<' and '>' chars and count them like '<'=1 and '>'=-1 (not exactly, but it gives the idea)
3) once I get sum == 0 I should be on position of the closing tag, so I know, which portion string I should exchange
This is quite "heavy" solution, which can stop working in some cases, where the code is different (e.g. onpage PHP code contains the chars as well instead of just simple "include"). So I am looking so some better solution.
You could try something like this:
$file = 'filename.php';
$content = file_get_contents($file);
$array_one = explode( '<div id="mydiv">' , $content );
$my_div_content = explode("</div>" , $array_one[1] )[0];
Or use preg_match like you said:
preg_match('/<div id="mydiv"(.*?)<\/div>/s', $content, $matches)
Yes there is. First you need to use a function that will get the content of the file. Lets call the file homepage.php:
$homepageString = file_get_contents('homepage.php');
Now you have a string with all the content. The next thing you would do is use the preg_replace() function to take out the part of code that you want to take out:
$newHomepageString = preg_replace('/id="mydiv"/',"", $homepageString);
Now you overwrite the existing homepage.php file with the new source code:
file_put_contents("homepage.php", $newHomepageString);
Let me know if it worked for you! :)

How can I insert text in an html element stored in a variable?

I have a program which is copying the text from another website and showing it.
It is storing the text in a variable $string.
The variable is containing html tags in it and I want to add text before a html tag stored in the variable.
For example: $string="<div id='1'><div id='game'></div>"; I want to add text before the div whose id is game.
To add the text before the div whose id is 'game'. simply use:
$string = "<div id='1'><div id='game'></div>";
$new = "texttoinsert";
$pos = "<div id='game'></div>";
echo str_replace($pos, $new.$pos ,$string);
In php the easiest way to do this would be using str_replace (http://www.php.net/manual/en/function.str-replace.php).
$textToInsert = "test";
$string = str_replace("<div id='game'>", $textToInsert."<div id='game'>" ,$string);
For that particular case the following works:
$($string).prepend("text");
DEMO
Using jQuery the solution is simple:
var text = $("<div id='1'><div id='game'></div></div>");
$('#1', text).prepend('text-to-insert');
and the result HTML can be obtained like this: text.html()
I hope this help.

How to chain in phpquery (almost everything can be a chain)

Good day everyone,
I'm very new with phpquery and this is my first post here at stackoverflow for a reason that i cant find the correct for syntax for the phpquery chaining. I know someone knows what i been looking for.
I only want to remove the a certain div inside a div.
<div id = "content">
<p>The text that i want to display</p>
<div class="node-links">Stuff i want to remove</div>
</content>
This few lines of codes works perfect
pq('div.node-links')->remove();
$text = pq('div#content');
print $text; //output: The text that i want to display
But when I tried
$text = pq('div#content')->removeClass('div.node-links'); //or
$text = pq('div#content')->remove('div.node-links');
//output: The text that i want to display (+) Stuff i want to remove
Can someone tell me why the second block of code is not working?
Thanks!
The first line of code will only work if your trying to remove the class from div.node-links, it won't remove the node.
If you are trying to remove the class you need to change it from:
$text = pq('div#content')->removeClass('div.node-links');
// to
$text = pq('div#content')->find('.node-links')->removeClass('node-links')->end();
which will output:
<div id="content">
<p>The text that i want to display</p>
<div>Stuff i want to remove</div>
</div>
As for the second line of code.. I'm not exactly sure why it is not working, it seems like your not selecting .node-links but I was able to get the desired results using these.
// $markup = file_get_contents('test.html');
// $doc = phpQuery::newDocumentHTML($markup);
$text = $doc->find('div#content')->children()->remove('.node-links')->end();
// or
$text = pq('div#content')->find('.node-links')->remove()->end();
// or
$text = pq('div#content > *')->remove('.node-links')->parent();
Hope that helps
Since remove() does not take any parameter, you can do:
$text = pq('div#content div.node-links')->remove();

how to remove links from a html content using php

I have the following html content:
<p>My name is way2project</p>
Now I want this text as <p>My name is way2project</p>
Is there any way to do this? Please help me thanks
I used preg_replace but in vain.
Thanks again
You can use the strip tags function
$string = '<p>My name is way2project</p>';
echo strip_tags($string,'<p>');
note the second parameter is the list of allowed tags you wont to ignore.
This seems strange, but not knowing the complete scope of your issue and seeing that you want to do this in PHP, you can try:
$origstring = '<p>My name is way2project</p>';
$newstring = str_replace('way2project', 'way2project', $origstring);
echo $newstring;
Checkout Simple Html Dom Parser
$html = str_get_html('<html><body>Hello!SO</body></html>');
echo $html->find('a',0)->innertext; //prints "SO"
strip_tags you can use this, to remove html tags.

Categories