PHP render sitemap with SimpleXMLElement - php

I am trying to build a function to render sitemap links and get inside links of inner sitemap its working good but its not working for all the links some of the links ( with the same syntax) is not working and responding errors
function download_page($path){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$path);
curl_setopt($ch, CURLOPT_FAILONERROR,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_HTTPHEADER, [
'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36',
'Content-type: application/xml'
]);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
$retValue = curl_exec($ch);
curl_close($ch);
return $retValue;
}
function getAllLinks($sitemapUrl) {
$links = array();
$i=0;
// $context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36')));
// $xml = file_get_contents($sitemapUrl, false, $context);
$sitemap = $this->download_page($sitemapUrl);
// dd($sitemap);
// Load the sitemap XML file
$sitemapXml = new \SimpleXMLElement($sitemap);
// $sitemapXml = simplexml_load_file($sitemap);
// $sitemapXml = simplexml_load_string($sitemap);
// Loop through the <url> and <sitemap> elements
foreach($sitemapXml->children() as $child) {
if ($child->getName() === 'url') {
$i++;
$links[$i]['url'] = (string)$child->loc;
$links[$i]['lastmod'] = (string)$child->lastmod;
}
elseif ($child->getName() === 'sitemap') {
$links = array_merge($links, $this->getAllLinks((string)$child->loc));
}
}
return $links;
}
In the comments I tried to u se multiple methods
Example for working link : https://rulepingpong.com/sitemap_index.xml
Example for not working link: https://majesticgaragedoorfl.com/sitemap_index.xml
getting the error "String could not be parsed as XML"
I am really lost

Related

I have made a proxy scraper in PHP but I don't know how to check if the proxy is live

The below code scrapes the proxy from the website but what I want is the program to check if the proxy is alive or not one by one and then save that proxy in the file. Can someone help me out to do so
<?php
header('Content-Type:application/json');
$url = "https://www.my-proxy.com/free-proxy-list.html";
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/'.rand(111,999).'.36 (KHTML, like Gecko) Chrome/88.0.'.rand(1111,9999).'.104 Safari/'.rand(111,999).'.36');
curl_setopt($ch, CURLOPT_URL, $url);
$proxies = array();
$firstcount = 1;
$endcound = 10;
for ($i = $firstcount; $i <= $endcound; $i++){
curl_setopt($ch, CURLOPT_URL, "https://www.my-proxy.com/free-proxy-list-$i.html");
$result =curl_exec($ch);
///Get Proxy
// >102.64.122.214:8085#U
preg_match_all("!\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}:.\d{2,4}!", $result, $matches);
$proxies = array_merge($proxies, $matches[0]);
}
curl_close($ch);
print_r($proxies);
?>
There are multiple ways to test, easiest one being an option in 'file_get_contents' request
$options = array(
'http'=>array(
'proxy' => 'tcp://' . $prox, //IP:PORT info. ie: 8.8.8.8:2222
'timeout' => 2,
'request_fulluri' => true,
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.76 Safari/537.36\r\n"
)
);
$context = stream_context_create($options);
$base_url='http://lotsofrandomstuff.com/1.php'; //url that simply returns '1' each time
$web=#file_get_contents($base_url,false,$context);
if($web=='1')
{
echo "proxy is good";
}
else
{
echo "proxy is dead";
}

How to get class value from website using DOMDocument PHP

I'm trying to get specific class from website url. I've tried use code below, but I cannot get loadHTML because I have 503 response.
// <span class="_1n0q8zmp">AUD - $</span>
$url = 'https://www.airbnb.com/rooms/19844318';
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36');
$html = curl_exec($ch);
curl_close($ch);
return $html;
}
$html = file_get_contents_curl($url);
$dom = new DOMDocument();
libxml_use_internal_errors(1);
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$script = $dom->getElementsByTagName('span');
$script = $xpath->query("//*[contains(#class, '_1n0q8zmp')]");
echo $script;
// result should be: AUD - $

Can't Return When Looping

Why I can't return while looping in function? Why I just got 1 result like without looping? Here is my code:
function search($get){
$i=0;
//print_r($get);
foreach($get->itemlist as $song){
$i++;
$ch = curl_init('');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIE, 'wmid=14997771; user_type=2; country=id; session_key=96870dd03ab9280c905566cad439c904;');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/537.36');
$json = curl_exec($ch);
$json = str_replace('MusicInfoCallback(', '', $json);
$json = str_replace(')', '', $json);
$json = json_decode($json);
$songurl = $json->mp3Url;
//print_r($json);
return array($i => array("song" => $json->msong,
"singer" => $json->msinger,
"url" => $song->songid));
}
}
print_r(search("key"));
any alternative?
Untested Code:
function search($get){
foreach($get->itemlist as $song){
$ch = curl_init('');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIE, 'wmid=14997771; user_type=2; country=id; session_key=96870dd03ab9280c905566cad439c904;');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/537.36');
$json = curl_exec($ch);
$json = json_decode(substr($json,18,-1),true);
$results[]=['songurl'=>$json['mp3Url'],
'song'=>$json['msong'],
'singer'=>$json['msinger'],
'url'=>$song->songid
];
}
return $results;
}
I don't have any sample data to verify my code with. I am making an assumption that 'MusicInfoCallback( and ) are the start and end of the curl string. I recommend packing all data into an (automatically) indexed array.
$songurl was also "trapped" within the scope of the function.

Make request with custom headers (POST)

My code isn't working, tried a few things but I'm new to php so yeah... here's what I got, always returns me a blank page.
<?php
ini_set('display_errors',1);
error_reporting(E_ALL);
$rnd = $_GET['rnd'];
$ch = curl_init("http://chat.website.com/script/login.php?rnd=".$rnd);
$request_headers = array();
$request_header[] = (
'User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36',
'Content-Type: application/x-www-form-urlencoded',
'onLoad: [type Function]',
'p: password',
'u: username',
'owner: [object Object]
');
curl_setopt($ch, CURLOPT_HTTPHEADER, $request_headers);
$userdata = curl_exec($ch);
echo $userdata;
?>
you are passing $request_headers but the data you have in $request_header and better see your array is fine.
or may be try something like this:
$request_header[] = array('User-Agent'=>'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36',
'Content-Type'=> 'application/x-www-form-urlencoded',
'onLoad'=>'[type Function]',
'p'=>'username',
'u'=>'password',
'owner'=>'[object Object]
');
I found my error, I wasn't making the request in POST.
Here's the code that is working if anyone needs it:
<?php
ini_set('display_errors',1);
error_reporting(E_ALL);
$rnd = 1;
$rnd = $_GET['rnd'];
$ch = curl_init("http://chat.website.com/scripts/login.php?rnd=".$rnd);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "onLoad=%5Btype%20Function%5D&p=password&u=username&owner=%5Bobject%20Object%5D");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$userdata = curl_exec($ch);
echo $userdata;
?>

Simple html dom div downloading issue

Here's some php code that i wrote. It's mainly based on docs.
It's obviously using simple html dom
The problem is it doesnt really work and i dunno why.
<?php
include("simple_html_dom.php");
$context = stream_context_create();
stream_context_set_params($context, array('user_agent' => "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36"));
$html = file_get_html('http://www.ask.fm', 0, $context);
$elem = $html->find('div[id=heads]', 0);
var_dump($elem);
?>
What i want is to set useragent which i tried to do above that sentence. And then i want to download div with id "heads". That's not much but i couldnt figure it out in any way.
<?php
include "simplehtmldom_1_5/simple_html_dom.php";
function curl($url)
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// you may set this options if you need to follow redirects. Though I didn't get any in your case
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
$content = curl_exec($curl);
curl_close($curl);
return $content;
}
$html = str_get_html(curl("http://www.ask.fm"));
echo $elem = $html->find('div[id=heads]', 0);
?>
I think it is useful for you

Categories