retrieve the contents of a div from a external site - php

Try to retrieve the contents of a div from the external site withg PHP, and XPath
This is an excerpt from the page, showing the relevant code:
note: i try to add all - also to add # on the class and a at the end on my query, After that, i use saveHTML() to get it. see my test:
btw:
this is my XPath: //*[#id="post-15991"]/div[4]/div[1]
this is the URL: https://wordpress.org/plugins/wp-job-manager/
see the subsequent code:
<?PHP
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$dom = new DOMDocument();
#$dom->loadHTMLFile($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//*[#id="post-15991"]/div[4]/div[1]');
$link = $dom->saveHTML($elements->item(0));
echo $link;
?>
output: But the output is zero....
background:
my way to get the xpath; use google chrome: I have a webpage I want to get some data off:
https://wordpress.org/plugins/wp-job-manager/
https://wordpress.org/plugins/participants-database/
https://wordpress.org/plugins/amazon-link/
https://wordpress.org/plugins/simple-membership/
https://wordpress.org/plugins/scrapeazon/
goal: i need the following data:
Version:
Last updated:
Active installations:
Tested up
see for example the following - view-source:https://wordpress.org/plugins/wp-job-manager/
Version: 1.29.3
Last updated: 5 days ago
Active installations: 100,000+
<li>
Requires WordPress Version:<strong>4.3.1</strong> </li>
<li>Tested up to: <strong>4.9.2</strong></li>
background: i need the data from all my favorite plugins - want to have it in a db or a calc sheet. So there were approx 70 pages to scrape:_
see here the list for the example - the full xpath:
//*[#id="post-15991"]/div[4]/div[1]
and
job-board-manager:
//*[#id="post-519"]/div[4]/div[1]/ul/li[1]
//*[#id="post-519"]/div[4]/div[1]/ul/li[2]
//*[#id="post-519"]/div[4]/div[1]/ul/li[3]
//*[#id="post-519"]/div[4]/div[1]/ul/li[7]
i used this method: Is there a way to get the xpath in google chrome?
Right click "inspect" on the item you are trying to find the xpath
Right click on the highlighted area on the console.
Go to Copy xpath

You are calling .loadHTMLFile which is expecting a file path. If you have your warning options on, you will see the following warnings:
E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Attribute class redefined in https://wordpress.org/plugins/wp-job-manager/, line: 190 -- at line 5
E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag header invalid in https://wordpress.org/plugins/wp-job-manager/, line: 201 -- at line 5
E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag nav invalid in https://wordpress.org/plugins/wp-job-manager/, line: 205 -- at line 5
E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag main invalid in https://wordpress.org/plugins/wp-job-manager/, line: 224 -- at line 5
Instead, use .loadHTML.
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$dom = new DOMDocument();
#$dom->loadHTML($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//*[#id="post-15991"]/div[4]/div[1]');
$link = $dom->saveHTML($elements->item(0));
echo $link;
And the result would be:
https://wordpress.org/plugins/wp-job-manager/

Related

How do I use the output from a php file in a TemplaVoila FCE?

I am trying to use the output from a php file in a TemplaVoila FCE.
According to the articles, etc, I have found on the subject, I seem to be doing it right. But it does not work.
I have reduced my implementation to a very simple test, and I hope that someone here can tell me what I am doing wrong.
The php code is in fileadmin/php/test.php
The file contains this code:
<?php
function getBeechgroveTest($content, $conf)
{
return 'B';
}
//echo getBeechgroveTest(0,0);
?>
In the main template (template module - not TemplaVoila) I have added this line:
includeLibs.beechgroveTest = fileadmin/php/test.php
I have tried to put it at the root level and inside a PAGE object. Both gave the same result.
If I uncomment the 'echo' line I get a 'B' at the top of my HTML page, so the php must be read at some point.
My FCE has one field of type 'None (TypoScript only)' and contains this code:
10 = TEXT
10 {
value = A
}
20 = USER
20 {
userFunc = getBeechgroveTest
}
30 = TEXT
30 {
value = C
}
I was expecting the FCE to output 'ABC', but I only get 'AC'.
What am I doing wrong?
I use TYPO3 version 4.5.30 and TemplVoila 1.8.0
It must by problem in cache, try use USER_INT instead USER. If you create this object as USER_INT, it will be rendered non-cached, outside the main page-rendering.
20 = USER_INT
20 {
userFunc = getBeechgroveTest
}

Parsing HTML Table - PHP

I am trying to parse many HTML tables, with the URLs stored in the database. The current problem with my code is that it will fail on a different table every time. Here is the part of the code that gets the error:
while ($sqlrow = mysqli_fetch_row($res)) {
echo "Started Processing Table " . $tables . PHP_EOL;
$tables++;
$data = file_get_contents($sqlrow[1]);
$dom->loadHTML($data);
$dom->preserveWhiteSpace = false;
$teamtable = $dom->getElementById("reTeamTable");
$teamrows = $teamtable->getElementsByTagName('tr');
The lines that usually fails is either the "getElementById" command or the "getElementsByTagName" command. The error I am getting is: "PHP Fatal error: Call to a member function getElementsByTagName() on a non-object in /scouting/teamlist.php on line 20". I don't understand why this is getting an error on a different URL every time.
Its means that $dom doesnt find element with id="reTeamTable" ( $teamtable is null ). Before call getElementsByTagName , check $teamtable on empty.

Php parsing table does not return the full list of elements

I tried to parse a web page with php, but does not return the integer result of a table, only a part.
the last player returns RAMSEY, Aron ... The other name are loaded after the web page.
Who can help me? I want all the list of players...
$data1=file_get_contents('http://it.soccerwiki.org/squad.php?clubid=1');
$doc=new DOMDocument();
#$doc->loadHTML($data1);
$doc->preserveWhiteSpace=false;
$table=$doc->getElementsByTagName('table');
for($i=0;$i<$table->item(2)->childNodes->length;$i++)
echo $table->item(2)->childNodes->item($i)->textContent;
The wrong result:
Info Naz Giocatore Pos Età Val - MARTÍNEZ, Damián1PO2175- VERMAELEN, Thomas2D(SC)2791- ROSICKÝ, Tomáš256C(C),CO(DSC)3288- BENDTNER, Nicklas16384A(C)2588- WALCOTT, Theo4096CO(D),A(DC)2491- GIBBS, Kieran2D,MD,C(S)2388- WILSHERE, Jack256C,CO(C)2190- OXLADE-CHAMBERLAIN, Alex1024CO(DSC),A(DS)2088- AFOBE, Benik16384A(C)2082- JENKINSON, Carl8D(D)2185- YENNARIS, Nicholas8D(D),MD(C)2075- JEFFREY, Anthony1024CO,A(S)1875- ARTETA, Mikel32MD,C(C)3191- CAZORLA, Santi256C(C),CO(DSC)2892- MONREAL, Nacho2D(S)2789- FLAMINI, Mathieu32MD,C(C)2988- SAGNA, Bacary8D(D)3091- KOSCIELNY, Laurent4D(C)2790- DIABY, Abou32MD,C,CO(C)2789- GIROUD, Olivier16384A(C)2690- SANOGO, Yaya16384A(C)2082- PODOLSKI, Lukas1024CO,A(SC)2891- MERTESACKER, Per4D(C)2891- ÖZIL, Mesut1024CO(DSC)2494- GNABRY, Serge1024CO(DSC)1877- EISFELD, Thomas256C,CO(C)2075- FRIMPONG, Emmanuel32MD,C(C)2183Prs VIVIANO, Emiliano1PO2789- MIYAICHI, Ryo1024CO,A(DS)2084- PARK, Chu-Young1024CO,A(DSC)2887- FABIAŃSKI, Lukasz1PO2886- SZCZĘSNY, Wojciech1PO2389- RAMSEY, Aaron256C,CO(DC)2288
Actually your data is right. But there is jquery sorting on page loaded. Just different player row places.
I copied HTML codes and removed js files, last player was RAMSEY, Aaron

PHP, FACEBOOK, PAGES, RSS: Help turning page feed into RSS (getting errors)

Ok, so following some instructions I found in another post here on StackOverflow, I have constructed a script to get a fan pages feed and turn it into an RSS2 feed. However, the script required a few changes and Im not the best programmer in the world, so I need a little help.
Im getting this error:
Warning: Invalid argument supplied for foreach() in feed.php on line 48
Im not sure what the invalid argument is all about.
<?
// error reporting
echo '<pre>';
ini_set('display_errors', 'on');
error_reporting(E_ALL);
// require your facebook php sdk
require('./facebook/facebook.php');
// include the feed generator feedwriter file
include("./feed/FeedWriter.php");
// config secret key and appid
$config = array(
'appId' => '',
'secret'=> ''
);
// Initialize
$facebook = new Facebook($config);
// Set Apps Permissions Request
$permission_scope = "";
// get users access token
$access_token = $facebook->getAccessToken();
// get page post
$feed_url = 'https://www.facebook.com/Ritualdubstep/feed?access_token='.$access_token;
$feed_json = file_get_contents($feed_url);
$feed_data = json_decode($feed_json);
// create the feedwriter object
$feed = new FeedWriter(RSS2);
$feed->setTitle('Ritual Dubstep'); // set your feed title
$feed->setLink('https://www.facebook.com/Ritualdubstep'); // set the url to the feed page you're generating
$feed->setChannelElement('updated', date(DATE_RSS , time()));
$feed->setChannelElement('author', array('name'=>'Ritual Dubstep SF')); // set the author name
// iterate through the facebook response to add items to the feed
foreach($feed_data['data'] as $entry){
if(isset($entry["message"])){
$item = $feed->createNewItem();
$item->setTitle($entry["from"]["name"]);
$item->setDate($entry["updated_time"]);
$item->setDescription($entry["message"]);
if(isset($entry["link"]))
$item->setLink(htmlentities($entry["link"]));
$feed->addItem($item);
}
}
// generate feed
$feed->genarateFeed();
?>
Generally it means that the first argument in the foreach call (in this case $feed_data['data']) is not a valid array.
Make sure that $feed_data['data'] exists (isset($feed_data['data'])) and that it is an array (is_array($feed_data['data'])) before running entering the foreach loop.
And - as shapeshifter mentioned in the comments - you might want to start troubleshooting by var_dump($feed_data['data']) right before you start the foreach loop to see what's being generated.

unable to parse xml data

i am unable to parse xml document .
well my task was like that
i got xml page from curl which contain ip info
<ip_address>209.59.194.20</ip_address><ip_type>Mapped</ip_type><Network><organization>thoughtconvergence.com</organization><carrier>whidbey internet services</carrier><asn>6295</asn><connection_type/><line_speed/><ip_routing_type>fixed</ip_routing_type><Domain><tld>com</tld><sld>trafficz</sld></Domain></Network><Location><continent>north america</continent><latitude>34.03708</latitude><longitude>-118.42789</longitude><CountryData><country>united states</country><country_code>us</country_code><country_cf>99</country_cf></CountryData><region>southwest</region><StateData><state>california</state><state_code>ca</state_code><state_cf>80</state_cf></StateData><dma>803</dma><msa>31100</msa><CityData><city>los angeles</city><postal_code>90064</postal_code><time_zone>-8</time_zone><area_code>323</area_code><city_cf>61</city_cf></CityData></Location></ipinfo>
i try to parse it
$book = simplexml_load_string($datax);
$ipadd = $book->ip_address;
$ipatype = $book->ip_type;
$ip_routing_type = $book->Network->ip_routing_type;
$state = $book->Location->StateData->state;
$country = $book->Location->CountryData->country;
$continent = $book->Location->continent;
$region = $book->Location->region;
now am getting few errors
1) Entity: line 2: parser error : Extra content at the end of the document
2 Trying to get property of non-object in line #
Your XML is invalid. It has a closing tag for a root element at the end of the data
</ipinfo>
but there is no header tag. If you tack:
<ipinfo>
to the front of it I'll bet it will work.

Categories