I like the solution from Get DIV content from external Website, but I need to find the value from a DIV class which contains a specific string, like in my case btx764839 (for example). The name of the div class is not that string alone but <div class="dark btx764839">76</div> for example.
I need the script to search for the DIV that contains btx764839 and then refer to that specific one. I've been trying many different things with strpos();, without success. Below the script I found on the other post (without my adjustments). I would appreciate some help. Thank you very much!
$url = 'url';
$content = file_get_contents($url);
$first_step = explode( '<div class="dark btx764839">' , $content );
$second_step = explode("</div>" , $first_step[1] );
echo $second_step[0];
This is my latest attempt:
$url = 'url';
$content = file_get_contents($url);
foreach($content as $div) {
// Loop through the DIVs looking for one withan id of "content"
// Then echo out its contents (pardon the pun)
// get class attribute of a div, it's a string
$class = $div->getAttribute('class');
// find substring 'btx764839' in $class
if (strpos($class, 'btx764839') !== false) {
echo $div->nodeValue;
}
}
Code taken from provided example:
foreach($divs as $div) {
// Loop through the DIVs looking for one withan id of "content"
// Then echo out its contents (pardon the pun)
// get class attribute of a div, it's a string
$class = $div->getAttribute('class');
// find substring 'btx764839' in $class
if (strpos($class, 'btx764839') !== false) {
echo $div->nodeValue;
}
}
Related
I have previously built some html pages where several paragraphs are inside div and now i want to move it in WordPress to understand theming and the structure of the CMS but i have some issues to understand. For example, in the loop i can't just add the opening div at the starting point and the closing div at the end (obviously) because in the middle there are different elements wrapped in other div. For example, I wanted to take the last paragraph, create a wrapping div around it with a custom class and my solution was this. I am sure it is a totally messed up solution. What am I doing wrong?
// First function
function addDivLastP1( $content ) {
$pattern = '/[\s\S]*\K(<p>)/i';
// Here i adding the opening tag div. I close it later in another function
$replacement = '<div class="my_class">$1';
$content = preg_replace( $pattern, $replacement, $content );
return $content;
}
add_filter( 'the_content', 'addDivLastP1' );
// Second function
function addDivLastP2( $content ) {
$pattern = '/[\s\S]*\K(<\/p>)/i';
// Closing div previously open
$replacement = '</div>';
$content = preg_replace( $pattern, $replacement, $content );
return $content;
}
add_filter( 'the_content', 'addDivLastP2' );
It is not a good idea to parse HTML (or any XML) by regexp.
In your case much better use DOMDocument:
function addDivLastP( $content ) {
$doc = new DOMDocument();
if(!$doc->loadHTML($content)) {
// cannot parse HTML content
return $content;
}
for($i = $doc->childNodes->count()-1; $i >= 0; $i--) {
$child = $doc->childNodes[$i];
if ($child->nodeName === 'p') { // got last paragraph inside root node
$div = $doc->createElement('div');
// replace paragraph by new empty div
$doc->replaceChild($div, $child);
// insert paragraph inside div
$div->appendChild($child);
return $doc->saveHTML();
}
}
return $content;
}
HTML
<article class="movie-summary" data-slug="slug-goes-here" data-title="This is a Title">
...
...
</article>
PHP
$html = file_get_html( 'example.com' );
foreach( $html->find('article') as $data) {
$property = 'data-title';
echo $data->$property;
}
Hey all, so I want to be able to get all data-title from all articles off a particular site. When I use data-slug I get data back yet when I use data-title I get nothing, with the help of this post
If you look at the actual HTML code you are trying to parse (the link provided at comments), you see that it is not valid:
<article class="movie-summary hero" data-slug="aiyaary-hindi"data-title="Aiyaary">
...
</article>
Meaning, there is no space between data-slug and data-title attributes. So to fix this I suggest to add necessary spaces. Like so:
function placeNeccessarySpaces($contents) {
return preg_replace('/"data-title/', '" data-title', $contents);
}
This is similar to this answer. Then:
$contents = placeNeccessarySpaces(file_get_contents('http://example.com'));
$html = str_get_html($contents);
foreach( $html->find('article') as $data) {
$property = 'data-title';
echo $data->$property;
}
This is simply working fine, Verified result
<?php
include 'simple_html_dom.php';
$html = str_get_html('<article class="movie-summary" data-slug="slug-goes-here" data-title="This is a Title"></article>');
foreach( $html->find('article') as $data) {
$property = 'data-title';
echo $data->$property;
}
?>
got the file 'simple_html_dom.php' from https://sourceforge.net/projects/simplehtmldom/files/
output:
I want to grab what's new text from play store whatsapp. I am trying below code, and it's working well.
<?php
$url = 'https://play.google.com/store/apps/details?id=com.whatsapp&hl=en';
$content = file_get_contents($url);
$first_step = explode( '<div class="recent-change">' , $content );
$second_step = explode("</div>" , $first_step[1] );
echo $second_step[0];
?>
The issue is that, this code only show text from first recent-change div class. It has multiple divs with recent-change class name. How to get all content from it?
As already suggested in comments you have to use dom content. But if you want to display all text containing recent-change class. You can use loop. I am providing solution on same way which you are using
$url = 'https://play.google.com/store/apps/details?id=com.whatsapp&hl=en';
$content = file_get_contents($url);
$first_step = explode( '<div class="recent-change">' , $content );
foreach ($first_step as $key => $value) {
if($key > 0)
{
$second_step = explode("</div>" , $value );
echo $second_step[0];
echo "<br>";
}
}
I am somewhat new with PHP, but can't really wrap my head around what I am doing wrong here given my situation.
Problem: I am trying to get the href of a certain HTML element within a string of characters inside an XML object/element via Reddit (if you visit this page, it would be the actual link of the video - not the reddit link but the external youtube link or whatever - nothing else).
Here is my code so far (code updated):
Update: Loop-mania! Got all of the hrefs, but am now trying to store them inside a global array to access a random one outside of this function.
function getXMLFeed() {
echo "<h2>Reddit Items</h2><hr><br><br>";
//$feedURL = file_get_contents('https://www.reddit.com/r/videos/.xml?limit=200');
$feedURL = 'https://www.reddit.com/r/videos/.xml?limit=200';
$xml = simplexml_load_file($feedURL);
//define each xml entry from reddit as an item
foreach ($xml -> entry as $item ) {
foreach ($item -> content as $content) {
$newContent = (string)$content;
$html = str_get_html($newContent);
foreach($html->find('table') as $table) {
$links = $table->find('span', '0');
//echo $links;
foreach($links->find('a') as $link) {
echo $link->href;
}
}
}
}
}
XML Code:
http://pasted.co/0bcf49e8
I've also included JSON if it can be done this way; I just preferred XML:
http://pasted.co/f02180db
That is pretty much all of the code. Though, here is another piece I tried to use with DOMDocument (scrapped it).
foreach ($item -> content as $content) {
$dom = new DOMDocument();
$dom -> loadHTML($content);
$xpath = new DOMXPath($dom);
$classname = "/html/body/table[1]/tbody/tr/td[2]/span[1]/a";
foreach ($dom->getElementsByTagName('table') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
//$originalURL = $node->getAttribute('href');
}
//$html = $dom->saveHTML();
}
I can parse the table fine, but when it comes to getting certain element's values (nothing has an ID or class), I can only seem to get ALL anchor tags or ALL table rows, etc.
Can anyone point me in the right direction? Let me know if there is anything else I can add here. Thanks!
Added HTML:
I am specifically trying to extract <span>[link]</span> from each table/item.
http://pastebin.com/QXa2i6qz
The following code can extract you all the youtube links from each content.
function extract_youtube_link($xml) {
$entries = $xml['entry'];
$videos = [];
foreach($entries as $entry) {
$content = html_entity_decode($entry['content']);
preg_match_all('/<span><a href="(.*)">\[link\]/', $content, $matches);
if(!empty($matches[1][0])) {
$videos[] = array(
'entry_title' => $entry['title'],
'author' => preg_replace('/\/(.*)\//', '', $entry['author']['name']),
'author_reddit_url' => $entry['author']['uri'],
'video_url' => $matches[1][0]
);
}
}
return $videos;
}
$xml = simplexml_load_file('reddit.xml');
$xml = json_decode(json_encode($xml), true);
$videos = extract_youtube_link($xml);
foreach($videos as $video) {
echo "<p>Entry Title: {$video['entry_title']}</p>";
echo "<p>Author: {$video['author']}</p>";
echo "<p>Author URL: {$video['author_reddit_url']}</p>";
echo "<p>Video URL: {$video['video_url']}</p>";
echo "<br><br>";
}
The code outputs in the multidimensional format of array with the elements inside are entry_title, author, author_reddit_url and video_url. Hope it helps you!
If you're looking for a specific element you don't need to parse the whole thing. One way of doing it could be to use the DOMXPath class and query directly the xml. The documentation should guide you through.
http://php.net/manual/es/class.domxpath.php .
I need to perform a series of tests on a url. The first test is a word count, I have that working perfectly and the code is below:
if (isset($_GET[article_url])){
$title = 'This is an example title';
$str = #file_get_contents($_GET[article_url]);
$test1 = str_word_count(strip_tags(strtolower($str)));
if($test1 === FALSE) { $test = '0'; }
if ($test1 > '550') {
echo '<div><i class="fa fa-check-square-o" style="color:green"></i> This article has '.$test1.' words.';
} else {
echo '<div><i class="fa fa-times-circle-o" style="color:red"></i> This article has '.$test1.' words. You are required to have a minimum of 500 words.';
}
}
Next I need to get all h1 and h2 tags from $str and test them to see if any contain the text $title and echo yes if so and no if not. I am not really sure how to go about doing this.
I am looking for a pure php means of doing this without installing php libraries or third party functions.
please try below code.
if (isset($_GET[article_url])){
$title = 'This is an example title';
$str = #file_get_contents($_GET[article_url]);
$document = new DOMDocument();
$document->loadHTML($str);
$tags = array ('h1', 'h2');
$texts = array ();
foreach($tags as $tag)
{
//Fetch all the tags with text from the dom matched with passed tags
$elementList = $document->getElementsByTagName($tag);
foreach($elementList as $element)
{
//Store text in array from dom for tags
$texts[] = strtolower($element->textContent);
}
}
//Check passed title is inside texts array or not using php
if(in_array(strtolower($title),$texts)){
echo "yes";
}else{
echo "no";
}
}