PHP DomDocument failing to handle quotes in a url - php

When I try to open a url like that :
http://api.anghami.com/rest/v1/GETsearch.view?sid=11754134061397734622103190992&query=Can't Remember to Forget You Shakira&searchtype=SONG&ook&songCount=1
containing a quote with the browser everything works fine and the output is good as an xml
But when I try to call it from a php file:
$url = "http:/api.anghami.com/rest/v1/GETsearch.view?sid=11754134061397734622103190992&query=Can't Remember to Forget You Shakira&searchtype=SONG&ook&songCount=1"
//using DOMDocument for parsing.
$data = new DOMDocument();
// loading the xml from Anghami API.
if($data->load("$url")){// Getting the Tag song.
foreach ($data->getElementsByTagName('song') as $searchNode)
{
$count++;
$n++;
//Getting the information of Anghami Song from the XML file.
$valueID = $searchNode->getAttribute('id');
$titleAnghami = $searchNode->getAttribute('title');
$album = $searchNode->getAttribute('album');
$albumID = $searchNode->getAttribute('albumID');
$artistAnghami = $searchNode->getAttribute('artist');
$track = $searchNode->getAttribute('track');
$year = $searchNode->getAttribute('year');
$coverArt = $searchNode->getAttribute('coverArt');
$ArtistArt = $searchNode->getAttribute('ArtistArt');
$size = $searchNode->getAttribute('size');
}
}
I get this error:
'Warning: DOMDocument::load(): I/O warning : failed to load external entity /var/www/html/http:/api.anghami.com/rest/v1/GETsearch.view?sid=11754134061397734622103190992&query=Can't Remember to Forget You Shakira&searchtype=SONG&ook&songCount=1" in /var/www/html/search.php on line 93'
Can anyone help please?

#Fracsi is correct: the URL needs to start with http:// not http:/
The other problem is that the XML has a default namespace (defined with the xmlns attribute on the root element), so you need to use
$data->getElementsByTagNameNS('http://api.anghami.com/rest/v1', 'song')
to select all the "song" elements.

Related

PHP How to avoid this warning: DOMDocument::loadHTML(): Invalid char in CDATA

I'm trying to collect some info from a web service, but I'm having issues with the CDATA Section of a page, because everything goes right when I use something like this:
$url = 'http://www.example.com';
$content = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($content);
foreach($doc->getElementsByTagName('h3') as $subtitle) {
echo $subtitle->textContent; //The output is the Subtitle/s.
}
But when the page contains CDATA sections there is a problem with this error on the line $doc->loadHTML($content).
Warning: DOMDocument::loadHTML(): Invalid char in CDATA
I've seen over here a solution that I tried to implement without any success.
function sanitize_html($content) {
if (!$content) return '';
$invalid_characters = '/[^\x9\xa\x20-\xD7FF\xE000-\xFFFD]/';
return preg_replace($invalid_characters,'', $content);
}
$url = 'http://www.example.com';
$content = file_get_contents($url);
$cleanContent = sanitize_html($content);
$doc = new DOMDocument();
$doc->loadHTML($cleanContent); //Warning: DOMDocument::loadHTML(): htmlParseEntityRef: no name in Entity
But I got this other error:
Warning: DOMDocument::loadHTML(): htmlParseEntityRef: no name in Entity
What could be a good way to deal with the CDATA sections of a page? Greetings.
The solution is to - replace the & symbol with &
or if you must have that & as it is then, may be you could enclose it in: <![CDATA[ - ]]>
Try adding PCLZIP before load IOFactory as shown:
require_once '/Classes/PHPExcel.php';
\PHPExcel_Settings::setZipClass(\PHPExcel_Settings::PCLZIP);
add libxml_use_internal_errors(true) and libxml_clear_errors() this work for me please click below to review code
https://i.stack.imgur.com/6MN4H.png

Google Calendar layout

I have been working with this php code, which should modify Google Calendars layout. But when I put the code to page, it makes everything below it disappear. What's wrong with it?
<?php
$your_google_calendar=" PAGE ";
$url= parse_url($your_google_calendar);
$google_domain = $url['scheme'].'://'.$url['host'].dirname($url['path']).'/';
// Load and parse Google's raw calendar
$dom = new DOMDocument;
$dom->loadHTMLfile($your_google_calendar);
// Change Google's CSS file to use absolute URLs (assumes there's only one element)
$css = $dom->getElementByTagName('link')->item(0);
$css_href = $css->getAttributes('href');
$css->setAttributes('href', $google_domain . $css_href);
// Change Google's JS file to use absolute URLs
$scripts = $dom->getElementByTagName('script')->item(0);
foreach ($scripts as $script) {
$js_src = $script->getAttributes('src');
if ($js_src) { $script->setAttributes('src', $google_domain . $js_src); }
}
// Create a link to a new CSS file called custom_calendar.css
$element = $dom->createElement('link');
$element->setAttribute('type', 'text/css');
$element->setAttribute('rel', 'stylesheet');
$element->setAttribute('href', 'custom_calendar.css');
// Append this link at the end of the element
$head = $dom->getElementByTagName('head')->item(0);
$head->appendChild($element);
// Export the HTML
echo $dom->saveHTML();
?>
When I'm testing your code, I'm getting some errors because of wrong method call:
->getElementByTagName should be ->getElementsByTagName with s on Element
and
->setAttributes and ->getAttributes should be ->setAttribute and ->getAttribute without s at end.
I'm guessing that you don't have any error_reporting on, and because of that don't know anything went wrong?

Trying to scrape images from reddit, having trouble cleaning up strings

So I'm not asking for you to fix my script, if you know the answer I would appreciate it if you just pointed me in the right direction. This is a script I found and I'm trying to edit it for a project.
I believe that whats going on is the formatting of $reddit is causing problems when I input that string into $url. I am not sure how to filter the string.
Right after I posted this I had the idea of using concatenation on $reddit to get the desired result instead of filtering the string. Not sure.
Thanks!
picgrabber.php
include("RIS.php");
$reddit = "pics/top/?sort=top&t=all";
$pages = 5;
$t = new RIS($reddit, $pages);
$t->getImagesOnPage();
$t->saveImage();
RIS.php
class RIS {
var $after = "";
var $reddit = "";
public function __construct($reddit, $pages) {
$this->reddit = preg_replace('/[^A-Za-z0-9\-]/', '' , $reddit);
if(!file_exists($this->reddit)) {
mkdir($this->reddit, 0755);
}
$pCounter = 1;
while($pCounter <= $pages) {
$url = "http://reddit.com/r/$reddit/.json?limit=100&after=$this->after";
$this->getImagesOnPage($url);
$pCounter++;
}
}
private function getImagesOnPage($url) {
$json = file_get_contents($url);
$js = json_decode($json);
foreach($js->data->children as $n) {
if(preg_match('(jpg$|gif$|png$)', $n->data->url, $match)) {
echo $n->data->url."\n";
$this->saveImage($n->data->url);
}
$this->after = $js->data->after;
}
}
private function saveImage($url) {
$imgName = explode("/", $url);
$img = file_get_contents($url);
//if the file doesnt already exist...
if(!file_exists($this->reddit."/".$imgName[(count($imgName)-1)])) {
file_put_contents($this->reddit."/".$imgName[(count($imgName)-1)], $img);
}
}
}
Notice: Trying to get property of non-object in C:\Program Files (x86)\EasyPHP-DevServer-13.1VC9\data\localweb\RIS.php on line 33
Warning: Invalid argument supplied for foreach() in C:\Program Files (x86)\EasyPHP-DevServer-13.1VC9\data\localweb\RIS.php on line 33
Fatal error: Call to private method RIS::getImagesOnPage() from context '' in C:\Program Files (x86)\EasyPHP-DevServer-13.1VC9\data\localweb\vollyeballgrabber.php on line 23
line 33:
foreach($js->data->children as $n) {
var_dump($url);
returns:
string(78) "http://reddit.com/r/pics/top/?sort=top&t=all/.json?limit=100&after=" NULL
$reddit in picgrabber.php has GET parameters
In the class RIS, you're embedding that value into a string that has another GET set in it with the ".json" token between.
The resulting url is:
http://reddit.com/r/pics/top/?sort=top&t=all/.json?limit=100&after=
The ".json" token needs to come after the end of the location portion of the url and before the GET sets. I would also change any addition "?" tokens to "&" (ampersands) so any additional sets of GET parameters you decide to concatenate to the URL string become additional parameters.
Like this:
http://reddit.com/r/pics/top/.json?sort=top&t=all&limit=100&after=
The difference is, your url is returning html code because the reddit server doesn't understand how to parse what you're sending. You're trying to parse html with a json decoder. My URL returns actual json data. That should get your json decoder returning an actual json object array.

Xml Php Warning Extra Content at end of document

I'm trying to write a droid app that sends and receives XML between the app and a web service. When I try to run the following code
$dom = new domDocument;
$dom = simplexml_load_file('php://input');
$xml = simplexml_import_dom($dom);
$messages = Messages::find_by_sql("SELECT * FROM messages WHERE reciever = '$xml->userName'");
$xmlString = "";
if($messages)
{
foreach($messages as $message)
{
$ts = strtotime($message->ts);
$xmlString=$xmlString."<Message><sender>".$message->sender."</sender><reciever>".$message->reciever."</reciever><timestamp>"."123"."</timestamp><text>".$message->text."</text></Message>";
}
}
else
{
//do something
}
$xmlReturn = new DOMDocument('1.0', 'UTF-8');
$xmlReturn->loadXML($xmlString);
echo($xmlReturn->saveXML());
?>
I get a Warning Extra content at the end of the document.
The error comes from this line: $xmlReturn->loadXML($xmlString);
I'm not 100% sure that you can create an xml document by loading a string, but I've seen similar things done and if you look here you can see what it ouputs, which looks like valid XML to me.
An XML document can have only one root element. You are stringing together multiple <message>…</message> combinations here, so a root element encapsulating those is missing.

beginner attempting to read xml into php

I have an xml feed located here that I am trying to read into a php script, then cycle through the <packages>, and sum the <downloads>. I've attempted to do this using DOMDocument, but have thus far failed.
the basic method i've been trying to use is as follows
<?php
$dom = new DomDocument;
$dom->loadXML('http://www.phogue.net/feed');
$packages = $dom->getElementsByTagName('package');
foreach($packages as $item)
{
echo $item->getAttribute('uid').'<br>';
}
?>
The above code is meant to just print out the name of each item, but its not working. I am currently getting the following error
Warning: DOMDocument::loadXML() [domdocument.loadxml]: Start tag expected, '<' not found in Entity, line: 1 in /home/a8744502/public_html/userbar.php on line 3
WORKING CODE:
<?php
$dom = new DomDocument;
$dom->load('http://www.phogue.net/feed/');
$package = $dom->getElementsByTagName('package');
$value=0;
foreach ($package as $plugin) {
$downloads = $plugin->getElementsByTagName("downloads");
$download = $downloads->item(0)->nodeValue;
$authors = $plugin->getElementsByTagName("author");
$author = $authors->item(0)->nodeValue;
if($author == "Zaeed")
{
$value += $download;
}
}
echo $value;
?>
DOMDocument::loadXML() expects a string of XML. Try DOMDocument::load() instead - http://www.php.net/manual/en/domdocument.load.php
Keep in mind that to open an XML file via HTTP, you will need the appropriate wrapper enabled.
You have a open parenthesis at the beginning of your echo.

Categories