How to get property attribute from Facebook profile?

How to get property attribute from Facebook profile? - php

I'm trying to get specific property al:android:url from the link https://www.facebook.com/tobiasz.mencfel.
Current code: String $link_id shows nothing.
I've done so far:
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36');
$html = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl("https://www.facebook.com/tobiasz.mencfel");
$doc = new DOMDocument();
#$doc->loadHTML($html);
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++) {
$meta = $metas->item($i);
if ($meta->getAttribute('property') == 'al:android:url') {
$link_id = $meta->getAttribute('content');
}
}
// output should be: fb://profile/100025596917906
echo $link_id;
How meta looks like:
<meta property="al:android:url" content="fb://profile/100025596917906" />

//modify
return $data;
//to
return $html;
//result : fb://profile/100025596917906

Related

PHP render sitemap with SimpleXMLElement

I am trying to build a function to render sitemap links and get inside links of inner sitemap its working good but its not working for all the links some of the links ( with the same syntax) is not working and responding errors
function download_page($path){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$path);
curl_setopt($ch, CURLOPT_FAILONERROR,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_HTTPHEADER, [
'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36',
'Content-type: application/xml'
]);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
$retValue = curl_exec($ch);
curl_close($ch);
return $retValue;
}
function getAllLinks($sitemapUrl) {
$links = array();
$i=0;
// $context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36')));
// $xml = file_get_contents($sitemapUrl, false, $context);
$sitemap = $this->download_page($sitemapUrl);
// dd($sitemap);
// Load the sitemap XML file
$sitemapXml = new \SimpleXMLElement($sitemap);
// $sitemapXml = simplexml_load_file($sitemap);
// $sitemapXml = simplexml_load_string($sitemap);
// Loop through the <url> and <sitemap> elements
foreach($sitemapXml->children() as $child) {
if ($child->getName() === 'url') {
$i++;
$links[$i]['url'] = (string)$child->loc;
$links[$i]['lastmod'] = (string)$child->lastmod;
}
elseif ($child->getName() === 'sitemap') {
$links = array_merge($links, $this->getAllLinks((string)$child->loc));
}
}
return $links;
}
In the comments I tried to u se multiple methods
Example for working link : https://rulepingpong.com/sitemap_index.xml
Example for not working link: https://majesticgaragedoorfl.com/sitemap_index.xml
getting the error "String could not be parsed as XML"
I am really lost

How to get class value from website using DOMDocument PHP

I'm trying to get specific class from website url. I've tried use code below, but I cannot get loadHTML because I have 503 response.
// <span class="_1n0q8zmp">AUD - $</span>
$url = 'https://www.airbnb.com/rooms/19844318';
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36');
$html = curl_exec($ch);
curl_close($ch);
return $html;
}
$html = file_get_contents_curl($url);
$dom = new DOMDocument();
libxml_use_internal_errors(1);
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$script = $dom->getElementsByTagName('span');
$script = $xpath->query("//*[contains(#class, '_1n0q8zmp')]");
echo $script;
// result should be: AUD - $

Get Paginated Links With php and simple html dom

I have this code to try and get the pagination links using php but the result is not quiet right. could any one help me.
what I get back is just a recurring instance of the first link.
<?php
include_once('simple_html_dom.php');
function dlPage($href) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $href);
curl_setopt($curl, CURLOPT_REFERER, $href);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4");
$str = curl_exec($curl);
curl_close($curl);
// Create a DOM object
$dom = new simple_html_dom();
// Load HTML from a string
$dom->load($str);
$Next_Link = array();
foreach($dom->find('a[title=Next]') as $element){
$Next_Link[] = $element->href;
}
print_r($Next_Link);
$next_page_url = $Next_Link[0];
if($next_page_url !='') {
echo '<br>' . $next_page_url;
$dom->clear();
unset($dom);
//load the next page from the pagination to collect the next link
dlPage($next_page_url);
}
}
$url = 'https://www.jumia.com.gh/phones/';
$data = dlPage($url);
//print_r($data)
?>
what i want to get is
mySiteUrl/?facet_is_mpg_child=0&viewType=gridView&page=2
mySiteUrl//?facet_is_mpg_child=0&viewType=gridView&page=3
.
.
.
to the last link in the pagination. Please help

Here it is. Look that I htmlspecialchars_decode the link. Cause the href in curl there shouldn't be an & like in xml. Should the return value of dlPage the last link in Pagination. I understood so.
<?php
include_once('simple_html_dom.php');
function dlPage($href, $already_loaded = array()) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $href);
curl_setopt($curl, CURLOPT_REFERER, $href);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4");
$htmlPage = curl_exec($curl);
curl_close($curl);
echo "Loading From URL:" . $href . "<br/>\n";
$already_loaded[$href] = true;
// Create a DOM object
$dom = file_get_html($href);
// Load HTML from a string
$dom->load($htmlPage);
$next_page_url = null;
$items = $dom->find('ul[class="osh-pagination"] li[class="item"] a[title="Next"]');
foreach ($items as $item) {
$link = htmlspecialchars_decode($item->href);
if (!isset($already_loaded[$link])) {
$next_page_url = $link;
break;
}
}
if ($next_page_url !== null) {
$dom->clear();
unset($dom);
//load the next page from the pagination to collect the next link
return dlPage($next_page_url, $already_loaded);
}
return $href;
}
$url = 'https://www.jumia.com.gh/phones/';
$data = dlPage($url);
echo "DATA:" . $data . "\n";
And the output is:
Loading From URL:https://www.jumia.com.gh/phones/<br/>
Loading From URL:https://www.jumia.com.gh/phones/?facet_is_mpg_child=0&viewType=gridView&page=2<br/>
Loading From URL:https://www.jumia.com.gh/phones/?facet_is_mpg_child=0&viewType=gridView&page=3<br/>
Loading From URL:https://www.jumia.com.gh/phones/?facet_is_mpg_child=0&viewType=gridView&page=4<br/>
Loading From URL:https://www.jumia.com.gh/phones/?facet_is_mpg_child=0&viewType=gridView&page=5<br/>
DATA:https://www.jumia.com.gh/phones/?facet_is_mpg_child=0&viewType=gridView&page=5

Curl with Simple HTML DOM using Link Pagination

I want to combine Curl and Simple HTML DOM.
Both are working fine separately.
I want to curl a site and then I want to look into the inner data using DOM
with pagination page numbers.
I am using this code.
<?php
include 'simple_html_dom.php';
function dlPage($href) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $href);
curl_setopt($curl, CURLOPT_REFERER, $href);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4");
$str = curl_exec($curl);
curl_close($curl);
// Create a DOM object
$dom = new simple_html_dom();
// Load HTML from a string
$dom->load($str);
return $dom;
}
$url = 'http://example.com/';
$data = dlPage($url);
// echo $data;
#######################################################
$startpage = 1;
$endpage = 3;
for ($p=$startpage;$p<=$endpage;$p++) {
$html = file_get_html('http://example.com/page/$p.html');
// connect to main page links
foreach ($html->find('div#link a') as $link) {
$linkHref = $link->href;
//loop through each link
$linkHtml = file_get_html($linkHref);
// parsing inner data
foreach($linkHtml->find('h1') as $title) {
echo $title;
}
foreach ($linkHtml->find('div#data') as $description) {
echo $description;
}
}
}
?>
How can I combine this to make it work as one single script?

Parsing xml to store in mysql using php

<?php
header("Content-type: text/xml");
$xml = new SimpleXMLElement("<noresult>1</noresult>");
$fn = urlencode($_REQUEST['fn']);
$ln = urlencode($_REQUEST['ln']);
$co = $_REQUEST['co'];
if (empty($fn) || empty($ln)):
echo $xml->asXML();
exit();
endif;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.linkedin.com/pub/dir/?first={$fn}&last={$ln}&search=Search");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($ch, CURLOPT_TIMEOUT, 8);
$res = curl_exec($ch);
preg_match("/<div id=\"content\".*?<\/div>\s*<\/div>/ms", $res, $match);
if (!empty($match)):
$dom = new DOMDocument();
$dom->loadHTML($match[0]);
$ol = $dom->getElementsByTagName('ol');
$vcard = $dom->getElementsByTagName('li');
$co_match_node = false;
for ($i = 0; $i < $vcard->length; $i++):
if (!empty($co) && stripos($vcard->item($i)->nodeValue, $co) !== false) $co_match_node = $vcard->item($i);
endfor;
if (!empty($co_match_node)):
echo $dom->saveXML($co_match_node);
// my idea is to put code here to save in the database.
else:
echo (string)$dom->saveXML($ol->item(0));
endif;
else:
echo $xml->asXML();
endif;
curl_close($ch);
exit();
I'm trying to save XML into a MySQL database.
However, I don't know how to parse the $dom or how to segregate the "li".
There are 5 fields needed in the database:
span.given-name
span.family-name
span.location
span.industry
dd.current-content span
These fields are available in the XML.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to get property attribute from Facebook profile? - php

//modify return $data; //to return $html; //result : fb://profile/100025596917906

Related

PHP render sitemap with SimpleXMLElement

How to get class value from website using DOMDocument PHP

Get Paginated Links With php and simple html dom

Curl with Simple HTML DOM using Link Pagination

Parsing xml to store in mysql using php

Categories

Resources