Explode twice doesn't index properly - php

I'm using the Simple HTML DOM Parser to retrieve a specific div from a website. I remove the part of the div that I don't want by using explode(). I then want to explode the kept part into a new array, but for some reason it doesn't get indexed as intended.
Why doesn't my last row with "echo $content[0];" print "Overall" while "echo $content[5];" does, when Overall is the first string? How do I fix this?
<?php
include_once('simple_html_dom.php');
$html = file_get_html('http://services.runescape.com/m=hiscore_oldschool/hiscorepersonal.ws?user1=Pur');
$content = $html->find('div[id=contentHiscores]', 0)->plaintext;
echo $content;
echo "<br><br><br><br>";
$content = explode("SkillRankLevelXP", $content);
$content = $content[1];
echo $content;
echo "<br><br><br><br>";
$content = explode(" ", $content);
echo $content[0];
?>

Between SkillRankLevelXP and Overall, there are 6 spaces, though the browser only shows it as 1. Use the "View Source" menu and you'll see what I mean.
You can use some RegEx to replace 2 or more spaces with just 1 space, and I think that will get you closer to what you want.
<?php
include_once('simple_html_dom.php');
$html = file_get_html('http://services.runescape.com/m=hiscore_oldschool/hiscorepersonal.ws?user1=Pur');
$content = $html->find('div[id=contentHiscores]', 0)->plaintext;
$content=preg_replace('/ {2,}/', ' ', trim($content));
$content = explode('SkillRankLevelXP ', $content);
$content = $content[1];
$content = explode(' ', $content);
print_r($content);
?>

Related

PHP Parse content from url

i need some help regarding this study script im building which im trying to fetch articles from a website.
Currently im able to get the article from 1 element but failing to get all elements, this is an example of the url im trying to fetch
<div class="entry-content">
</div>
<div class="entry-content">
</div>
<div class="entry-content">
</div>
This is my PHP code to get the content of the first div :
function getArticle($url){
$content = file_get_contents($url);
$first_step = explode( '<div class="entry-content">' , $content );
$separate_news = explode("</div>" , $first_step[1] );
$article = $separate_news[0];
echo $article;
}
You should really use PHPs DOMDocument class for parsing HTML. In terms of your example code, the problem is that you're not processing all the results from your $first_step array. You could try something like this:
$first_steps = explode( '<div class="entry-content">' , $content );
foreach ($first_steps as $first_step) {
if (strpos($first_step, '</div>') === false) continue;
$separate_news = explode("</div>" , $first_step );
$article = $separate_news[0];
echo $article;
}
Here's a small demo on 3v4l.org
I have used this library before http://simplehtmldom.sourceforge.net/ . Full documentation is found here http://simplehtmldom.sourceforge.net/manual.htm .
It's very easy to use and does a lot more.
You could select your articles like:
$html = file_get_html($url);
$articles = $html->find(".entry-content");
foreach($articles as $article) echo $article->plaintext;
You should use DOMDocument. Although it is a bit tricky to select nodes by CSS class, you can do it with DomXPath like this:
$dom = new DomDocument();
$dom->load($url);
$xpath = new DomXPath($dom);
$classname="entry-content";
$nodes = $xpath->query('//*[contains(concat(" ", normalize-space(#class), " "), " entry-content ")]');
foreach($nodes as $node) {
echo $node->textContent . "\n";
}
The advantage is now also that HTML entities and other HTML that might occur inside the article content is converted as expected. Like & becomes &, and <b>bold</b> just becomes bold.

Seperating post text and images in different locations on the same page

Using the following code to remove images (and paste just the text)
<?php
$content = get_the_content();
$content = preg_replace("/<img[^>]+\>/i", " ", $content);
$content = apply_filters('the_content', $content);
$content = str_replace(']]>', ']]>', $content);
echo $content;
?>
Trying to bring in the images (in a new area of the same page) I a trying to use
preg_match('#(<img.*?>)#', $content, $results);
To no avail. And I am also wondering if there is a better way than to run a function twice, or If I can separate the text and image and call them into two different divs/locations
Get Only text Based content
function custom_strip_image($text) {
$text = preg_replace("/<img[^>]+\>/i", "", $text);
return $text;
}
echo custom_strip_image(get_the_content());
Make second function for get images list

PHP - Find and replace enclosed text in an external HTML file

I have a PHP file that renders an HTML file, inside this HTML file I have this piece of code
<div class="app">{: echo $this->content :}</div>
And I want to replace the opening {: and closing :} tags with the traditional <?php ?> tags to make it look something like this:
<div class="app"><?php echo $this->content ?></div>
$contents = file_get_contents ("file.php");
$contents = str_replace(array('{:', ':}'), array('<?php', '?>'), $contents);
file_put_contents("file.php", $contents);
You could do a simple string replace if you are doing this within PHP
str_replace(array('{:', ':}'), array('<?php', '?>'), $file_content)
Try this :
$string = '<div class="app">{: echo $this->content :}</div>';
$string = str_replace('{:','<?php',$string);
$string = str_replace(':}','?>',$string);
If you want to catch the content inside the you can use this :
$string = '<div class="app">{: echo $this->content :}</div>';
preg_match('/<div class="app">(.+)<\/div>/',$string,$preg_array);
$string = str_replace('{:','<?php',$preg_array[1]);
$string = str_replace(':}','?>',$string);
Output :
<?php echo $this->content ?>

how to print url from html code in php when url contain spaces

See i have an url in a html code
play
Now i want to print this url as it is written in a php page
http://b48.ve.vc/b/data/48/3746/05 Dabangg Reloaded_-_www.DjPunjab.Com.mp3
You can see that between the url 05 Dabangg Reloaded their is space. I made this program to print url from this html code..
$str = "play";
$pattern = '`.*?((http|ftp)://[\w#$&+,\/:;=?#.-]+)[^\w#$&+,\/:;=?#.-]*?`i';
if (preg_match_all($pattern,$str,$matches))
foreach($matches[1] as $data)
{
$str=$data;
echo $str;
}
Then i am getting this
http://b48.ve.vc/b/data/48/3746/05
please do not mention on foreach($matches[1] as $data) line bcoz i am using it with so many urls.. I just want to know how to print the whole url in this format.
http://b48.ve.vc/b/data/48/3746/05 Dabangg Reloaded_-_www.DjPunjab.Com.mp3
Spaces are become a huge matter.. Do not know how to fix it..
What i need to add inside
$pattern = '`.*?((http|ftp)://[\w#$&+,\/:;=?#.-]+)[^\w#$&+,\/:;=?#.-]*?`i';
For making it completely workable.
Please suggest me any idea.
$str = 'play';
$arr = explode("\"", $str);
$pattern = '`.*?((http|ftp)://[\w#$&+,\/:;=?#.-]+)[^\w#$&+,\/:;=?#.-]*?`i';
$url = preg_grep($pattern,$arr);
$url = implode('',$url);
Output: $url = 'http://b48.ve.vc/b/data/48/3746/05 Dabangg Reloaded_-_www.DjPunjab.Com.mp3'
Update: 2nd Solution [Reference-DOMElement].
$str = 'play';
$DOM = new DOMDocument;
$DOM->loadHTML($str);
$search_item = $DOM->getElementsByTagName('a');
foreach($search_item as $search_item) {
$url = $search_item->getAttribute('href');
}
echo $url; //Output: http://b48.ve.vc/b/data/48/3746/05 Dabangg Reloaded_-_www.DjPunjab.Com.mp3
You can str_replace each one -space- with %20 for encoding your URL
<?php
$url_org = 'http://b48.ve.vc/b/data/48/3746/05 Dabangg Reloaded_-_www.DjPunjab.Com.mp3';
$url_edited = str_replace(" ", '%20', $url_org);
?>
HERE
This will work.

Replacing words in php with preg_replace in a given div area

Want to replace some words on the fly on my website.
$content = preg_replace('/\bWord\b/i', 'Replacement', $content);
That works so far. But now i want only change the the words which are inside
div id="content"
How do i do that?
$dom = new DOMDocument();
$dom->loadHTML($html);
$x = new DOMXPath($dom);
$pattern = '/foo/';
foreach($x->query("//div[#id='content']//text()") as $text){
preg_match_all($pattern,$text->wholeText,$occurances,PREG_OFFSET_CAPTURE);
$occurances = array_reverse($occurances[0]);
foreach($occurances as $occ){
$text->replaceData($occ[1],strlen($occ[0]),'oof');
}
//alternative if you want to do it in one go:
//$text->parentNode->replaceChild(new DOMText(preg_replace($pattern,'oof',$text->wholeText)),$text);
}
echo $dom->saveHTML();
//replaces all occurances of 'foo' with 'oof'
//if you don't really need a regex to match a word, you can limit the text-nodes
//searched by altering the xpath to "//div[#id='content']//text()[contains(.,'searchword')]"
use the_content filter, you can place it in your themes function.php file
add_filter('the_content', 'your_custom_filter');
function your_custom_filter($content) {
$pattern = '/\bWord\b/i'
$content = preg_replace($pattern,'Replacement', $content);
return $content;
}
UPDATE: This applies only if you are using WordPress of course.
If the content is dynamically driven then just echo the return value of $content into the div with id of content. If the content is static then you'll have to either use this PHP snippet on the text then echo out the return into the div, or use JavaScript (dirty method!).
$content = "Your string of text goes here";
$content = preg_replace('/\bWord\b/i', 'Replacement', $content);
<div id="content">
<?php echo $content; ?>
</div>

Categories