Simple HTML DOM cannot get file - php

I have no clue what the solution might be.
I simply cannot get the html file of this Charizard, I don't get any response even though the link is correct. Bulbasaur is working fine, but I want this lovely Charizard...
include("simple_html_dom.php");
$html = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Charizard_(Pok%C3%A9mon)');
$html2 = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Bulbasaur_(Pok%C3%A9mon)');
echo $html;
echo $html2;
Does this page have any protection or is Charizard only harder to catch?
I'd appreciate if you are able to help me with this.
Jonas :)

There are two problems here:
Length of the content fetched from this URL exceeds MAX_FILE_SIZE (defined in simple_html_dom.php)
The bug that was pointed out in the comments (https://github.com/sunra/php-simple-html-dom-parser/issues/37). This bug seems to be resolved in the forked repository that is maintained on github but it still exists in original version (which does not seem to be maintained anymore).
To solve the first problem, edit simple_html_dom.php and change define('MAX_FILE_SIZE', 600000); to use a bigger number.
As a workaround for the second problem, pass correct parameters to file_get_html, and by that I mean to pass 0 for $offset:
$html = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Charizard_(Pok%C3%A9mon)',
false,
null,
0); // this last one is the offset
var_dump($html);
Alternatively you can use the forked version of the library.

I'm going to suggest an alternative library because II don't think you will get this with simple_html_dom:
include 'advanced_html_dom.php';
$html = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Charizard_(Pok%C3%A9mon)');
echo $html->find('h1', 0)->text() . PHP_EOL;
echo $html->find('big a[title*="Pokédex number"]', 0)->text() . PHP_EOL;
This gives:
Charizard (Pokémon)
#006

Since i haven't found the file_get_html() in the php docs, maybe you prefer using file_get_contents(url) instead.

Related

How to change specific xml tag value, where tag has an id. Using php

I am trying to change a value in an xml file using php. I am loading the xml file using php into an object like this..
if(file_exists('../XML/example.xml')) {
$example = simplexml_load_file('../XML/example.xml');
}
else {
exit ("can't load the file");
}
Then once it is loaded I am changing values within tags, by assigning them the contents of another variable, like this...
$example->first_section->second_section->third_section->title = $var['data'];
Then once I've made the necessary changes the file is saved. So far this process is working well, but have now hit a stumbling block.
I want to change a value within a particular tag in my xml file, which has an id. In the XML file it looks like this.
<first_section>
<second_section>
<third_section id="2">
<title>Mrs</title>
</third_section>
</second_section>
</first_section>
How can I change this value using similar syntax to what I've been using?
doing..
$example->first_section->second_section->third_section id="2" ->title = $var['data']
doesn't work as the syntax is wrong.
I've been scanning through stack overflow, and all over the net for an example of doing it this way but come up empty.
Is it possible to target and change a value in an xml like this, or do I need to change the way I am amending this file?
Thanks.
Some dummy code as your provided XML is surely not the original one.
$xml = simplexml_load_file('../XML/example.xml');
$section = $xml->xpath("//third_section[#id='2']")[0];
// runs a query on the xml tree
// gives always back an array, so pick the first one directly
$section["id"] = "3";
// check if it has indeed changed
echo $xml->asXML();
As #Muhammed M. already said, check the SimpleXML documentation for more information. Check the corresponding demo on ideone.com.
Figured it our after much messing around. Thanks to your contributions I indeed needed to use Xpath. However the reason it wasn't working for me was because I wasn't specifying the entire path for the node I wanted to edit.
For example, after loading the xml file into an object ($xml):
foreach($xml->xpath("/first_section/second_section/third_section[#id='2']") as $entry ) {
$entry->title = "mr";
}
This will work, because the whole path to the node is included in the parenthesis.
But in our above examples eg:
foreach($xml->xpath("//third_section[#id='2']" as $entry ) {
$entry->title = "mr";
}
This wouldn't work, even though it was my understanding that the double // will make it drill down, and I assumed that xpath will search the whole xml structure and return where id=2. It appears after spending hours testing this isn't the case. You must include the entire path to the node. As soon as I did that it worked.
Also on a side note. $section = $xml->xpath("//third_section[#id='2']")[0];
IS incorrect syntax. You don't need to specify the index "[0]" at the end. Including it flags up Dreamweavers syntax checker. And ignoring Dreamweaver and uploading anyway breaks the code. All you need is..
$section = $xml->xpath(" entire path to node in here [#id='2']");
Thanks for helping and suggesting xpath. It works very well... once you know how to use it.

Get ajax generated content from another website

I have an automated archive of several (media) websites' frontpage, written in php. Specifically, I am copying the html in the <body> tag twice a day, I have a copy of all their css and js files, so I can recreate the frontpage from any point in the past. Now, I came to a problem with one of those websites, as they load the main slider content (most important news) with an ajax call. I would like this ajax call to be executed before I parse the data, not just a blank div. By looking around, I found out they use a wordpress plugin named lof-jslidernews2, but I can't find the specific ajax call to see the url and make curl request. Any ideas how to achieve this?
The website: http://fokus.mk/
My code (had to parse manually like this, because of some problems with DomDocument and not-valid html):
// ...
if($html = file_get_contents ($row['page_url'])) {
$content = strstr($html, '<body');
$content = str_before($content, '</body>') . '</body>';
$filename = date('YmdHis') . $row['page_name'];
if($success = file_put_contents ('app/webroot/files/' . $filename, $content)) {
// ....
** There is nothing illegal about my project, I am not stealing content, just freezing frontpages for later comparison. I have consulted a lawyer about this. :)
I don't know why, but the guy that actually solved my problem deleted his answer. So, here it is:
He suggested using an emulator, specifically Mink. It was easy to install (using composer) and did the job on the first try. Awesome library.
Mink is an open source browser controller/emulator for web applications, written in PHP 5.3.

file_get_html() doesnt work [duplicate]

I used the following code to parse the HTML of another site but it display the fatal error:
$html=file_get_html('http://www.google.co.in');
Fatal error: Call to undefined function file_get_html()
are you sure you have downloaded and included php simple html dom parser ?
You are calling class does not belong to php.
Download simple_html_dom class here and use the methods included as you like it. It is really great especially when you are working with Emails-newsletter:
include_once('simple_html_dom.php');
$html = file_get_html('http://www.google.co.in');
As everyone have told you, you are seeing this error because you obviously didn't downloaded and included simple_html_dom class after you just copy pasted that third party code,
Now you have two options, option one is what all other developers have provided in their answers along with mine,
However my friend,
Option two is to not use that third party php class at all! and use the php developer's default class to perform same task, and that class is always loaded with php, so there is also efficiency in using this method along with originality plus security!
Instead of file_get_html which not a function defined by php developers use-
$doc = new DOMDocument();
$doc->loadHTMLFile("filename.html");
echo $doc->saveHTML(); that's indeed defined by them. Check it on php.net/manual(Original php manual by its devs)
This puts the HTML into a DOM object which can be parsed by individual tags, attributes, etc.. Here is an example of getting all the 'href' attributes and corresponding node values out of the 'a' tag. Very cool....
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
echo $tag->getAttribute('href').' | '.$tag->nodeValue."\n";
}
P.S. : PLEASE UPVOTE IF YOU LIKED MY ANSWER WILL HELP MY REPUTATION ON STACKOVERFLOW, THIS PEOPLES THINK I'M NOOB!
It looks like you're looking for simplexml_load_file which will load a file and put it into a SimpleXML object.
Of course, if it is not well-formatted that might cause problems. Your other option is DomObject::loadHTMLFile. That is a good deal more forgiving of badly formed documents.
If you don't care about the XML and just want the data, you can use file_get_contents.
$html = file_get_contents('http://www.google.co.in');
to get the html content of the page
in simple words
download the simple_html_dom.php from here Click here
now write these line to your Php file
include_once('simple_html_dom.php');
and start your coading after that
$html = file_get_html('http://www.google.co.in');
no error will be displayed
Try file_get_contents.
http://www.php.net/manual/en/function.file-get-contents.php

simple_html_dom not finding my element

I'm trying prices off of Amazon for an exercise.
<?php
require('simple_html_dom.php');
$get = $_GET['id'];
$get_url = 'http://www.amazon.co.uk/s/field-keywords='.$get;
echo $get_url;
// Retrieve the DOM from a given URL
$html = file_get_html($get_url);
foreach($html->find('li[class=newp]') as $e)
echo $e->plaintext . '<br>';
I tried a few differents:
li[class=newp]
.price
ul[class=rsltL]
but it doesn't return anything, what am I doing wrong?
I tried returning the titles as well:
.lrg.bold
Tried Xpath, nothing.
Thanks
Your code is fine. It is very likely that your PHP settings are the culprits.
put
error_reporting(E_ALL);
ini_set('display_errors', '1');
at the begining of your php script and see if it prints out any useful error.
Also, note that simple_html_dom uses the file_get_contents() function internally to grab page content. So, you may want to run file_get_contents($get_url) to see what happens.
If that function does not work then it is definitely your PHP setting. In such case, I recommend starting another thread with that issue in the title.
This might help though:
PHP file_get_contents does not work on localhost

How to manually call MediaWiki to convert wiki text to HTML?

I have a MediaWiki installation and I'm writing a custom script that reads some database entries and produces a custom output for client.
However, the text are in wiki format, and I need to convert them to HTML. Is there some PHP API I could call -- well there must be, but what and how exactly?
What files to include and what to call?
You use the global object $wgParser to do this:
<?php
require(dirname(__FILE__) . '/includes/WebStart.php');
$output = $wgParser->parse(
"some ''wikitext''",
Title::newFromText('Some page title'),
new ParserOptions());
echo $output->getText();
?>
Although I have no idea whether doing it this way is a good practice, or whether there is some better way.
All I found is dumpHTML.php that will dump all your mediawiki ; or may be better API:Parser wiki text which tells :
If you are interested in simply getting the rendered content of a
page, you can bypass the api and simply add action=render to your url,
like so: /w/index.php?title=API:Parsing_wikitext&action=render
Once you add action=render it seems you can get the html page ; dont you think ?
hope this could help.
regards.

Categories