How to get text within td element from remote webpage via PHP

How to get text within td element from remote webpage via PHP - php

I have a temperature monitor set up, and I would like to use some of the data for other things (cron jobs, etc). The data from the sensor can be accessed from our local network (192.168.123.123). The element in question is:
<td id="1E5410ECC9D90FC3-entity-0-measurement-0" class="">69.08</td>
<!-- I NEED THE 69.08 -->
I can't do it via ajax since I get the Allow-Access-Origin error (CORS).
I tried this:
$url = 'http://192.168.123.123';
$content = file_get_contents($url);
$first = explode( '<div id="1E5410ECC9D90FC3-entity-0-measurement-0">' , $content );
$second = explode("</div>" , $first[0] );
echo $second[0];
but I got this:
��UMS�0��+��$���94С�2����؋-�%#Ʉ�뻲���Bۓ%����ݷr��m4�yyF*_+ry���ӈP������S��|��&�ȵ�2���}��V�7ǜO��dz�[�� (�!�_2��$�/�p/ g�=B� D����<��1�#�=h���J�˨�'��I^ ��g7��=�=��^�0��ϔ����p�Q��L��I�%TF�Q�) ������;c��o$��a����g��mWr�ܹ��;�(��bE��O�i� ��y�҉)f=�6=�,2� �#I��s����>����kNƕt/W2^��# Xp�3^݅$ѵ��T U�ʲ�#f��db�ԁ%��b�`G|��D�{񠐏sι1�� ]#2ZH�(1;&�h8��^0er��3���D�Q�5B�u� ^!5X:�{a U\:߰0�~Ɍ�3+S�^1��qB:�g����C>�.�P~n��$\֢D����%J+�b�ELc�Gq���K �]��xV��j�[���Ԧ��nAɍ��<�ZT#���zc�Q(f܁�(~�^�ZKwk:8�·n>��(=�"aB)�Fl5�b]/�_�$���_��ɴ��9�H}��B [#�V�ԅp��r�g�A�j���2����Ju*������{�bY�,O4�����M��B�#�e���,� ��_֔���o����
How can I properly get the 'td' text within the specific div id?

You are trying to retrieve data from <td id="1E5410ECC9D90FC3-entity-0-measurement-0" class=""> not <div id="1E5410ECC9D90FC3-entity-0-measurement-0">, so not from a <div>, so just change it into:
$url = 'http://192.168.123.123';
$content = file_get_contents($url);
$first = explode( '<td id="1E5410ECC9D90FC3-entity-0-measurement-0">' , $content );
$second = explode("</td>" , $first[0] );
echo $second[0];
Or am I crazy?

Step 1:
I suggest using php's curl library to manage and configure your web request/response.
Using this mechanism allows you to better manage/control encoding, compression and encryption.
http://php.net/manual/en/book.curl.php
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "http://192.168.123.123");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
Step 2:
Let's extract the details out of the returned response string from the web server. I suggest PHP's PCRE function preg_match to extract the needed data.
http://php.net/manual/en/ref.pcre.php
// Looking for <td id="1E5410ECC9D90FC3-entity-0-measurement-0" class="">69.08</td>
$pattern = '/id="1E5410ECC9D90FC3-entity-0-measurement-0".*>([\d]{1,2}?\.[\d]{1,2})<\//';
// run the regex match and collect the hit
preg_match($pattern, $output, $matches);
// print_r of the array
/*
Array
(
[0] => id="1E5410ECC9D90FC3-entity-0-measurement-0" class="">69.08</
[1] => 69.08
)
*/
// Print out the result to check
echo $matches[1];

Related

Getting whole HTML element with PHP

I want to get the whole element <article> which represents 1 listing but it doesn't work. Can someone help me please?
containing the image + title + it's link + description
<?php
$url = 'http://www.polkmugshot.com/';
$content = file_get_contents($url);
$first_step = explode( '<article>' , $content );
$second_step = explode("</article>" , $first_step[3] );
echo $second_step[0];
?>

You should definitely be using curl for this type of requests.
function curl_download($url){
// is cURL installed?
if (!function_exists('curl_init')){
die('cURL is not installed!');
}
$ch = curl_init();
// URL to download
curl_setopt($ch, CURLOPT_URL, $url);
// User agent
curl_setopt($ch, CURLOPT_USERAGENT, "Set your user agent here...");
// Include header in result? (0 = yes, 1 = no)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Should cURL return or print out the data? (true = retu rn, false = print)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Timeout in seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
// Download the given URL, and return output
$output = curl_exec($ch);
// Close the cURL resource, and free system resources
curl_close($ch);
return $output;
}
for best results for your question. Combine it with HTML Dom Parser
use it like:
// Find all images
foreach($output->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($output->find('a') as $element)
echo $element->href . '<br>';
Good Luck!

I'm not sure I get you right, But I guess you need a PHP DOM Parser. I suggest this one (This is a great PHP library to parser HTML codes)
Also you can get whole HTML code like this:
$url = 'http://www.polkmugshot.com/';
$html = file_get_html($url);
echo $html;

Probably a better way would be to parse the document and run some xpath queries over it afterwards, like so:
$url = 'http://www.polkmugshot.com/';
$xml = simplexml_load_file($url);
$articles = $xml->xpath("//articles");
foreach ($articles as $article) {
// do sth. useful here
}
Read about SimpleXML here.

extract the articles with DOMDocument. working example:
<?php
$url = 'http://www.polkmugshot.com/';
$content = file_get_contents($url);
$domd=#DOMDocument::loadHTML($content);
foreach($domd->getElementsByTagName("article") as $article){
var_dump($domd->saveHTML($article));
}
and as pointed out by #Guns , you'd better use curl, for several reasons:
1: file_get_contents will fail if allow_url_fopen is not set to true in php.ini
2: until php 5.5.0 (somewhere around there), file_get_contents kept reading from the connection until the connection was actually closed, which for many servers can be many seconds after all content is sent, while curl will only read until it reaches content-length HTTP header, which makes for much faster transfers (luckily this was fixed)
3: curl supports gzip and deflate compressed transfers, which again, makes for much faster transfer (when content is compressible, such as html), while file_get_contents will always transfer plain

Getting resulting data from website query

I need to get the resulting data from a website query. for example
http://www.uniprot.org/uniprot/?query=organism:9606+AND+gene:AEBP1+AND+reviewed:yes&sort=score&format=tab&columns=entry%20name
resulting page shows
Entry name
AEBP1_HUMAN
I need the result, in this case "AEBP1_HUMAN" to be display on my website. Confused how to get it. Thanks

The goal is, that you can read the content of any url like a file, because php supports wrappers for variety of protocols.
First example uses function file that read entire content and split it by lines into an array.
<?php
$content = file($url);
echo $content[1];
?>
In the second example you get the whole content as a string, so you have to split it with explode function by line endings.
<?php
$content = file_get_contents($url);
$lines = explode("\n", $content);
echo $lines[1];
?>
Third example uses standard file open in combination with function fgets that read the content line by line.
<?php
$fp = fopen($url);
$fp = fopen($url, 'r');
$line = fgets($fp);
$line = fgets($fp);
echo $line;
?>
The last example shows usage of curl. Don't forget to use right options.
<?php
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$content = curl_exec($ch);
$lines = explode("\n", $content);
echo $lines[1];
?>
Sometimes you may experience problems on public hosting servers, where reading remote content is blocked.

Need help converting my beautiful soup (Python) code to PHP

I'm currently in the process of converting all of my beautiful soup code into PHP just to get used to PHP. However i've run into a bit of a problem, my php code will only work when the wiki page has 'External links' after the original run in the html (such as True Detective Wiki). I just found out that this won't always happen because there may not always be an 'External links' section. I was wondering if there was anyway to convert my beautiful soup code into php code using the same technique my beautiful soup code uses?
import requests, re
from bs4 import BeautifulSoup
def get_date(url):
r = requests.get(url)
soup = BeautifulSoup(r.content)
date = soup.find_all("table", {"class": "infobox"})
for item in date:
dates = item.find_all("th")
for item2 in dates:
if item2.text == "Original run":
test2 = item2.find_next("td").text.encode("utf-8")
mysub = re.sub(r'\([^)]*\)', '', test2)
return my sub
and here is my php code currently
<?php
// Defining the basic cURL function
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
?>
<?php
// Defining the basic scraping function
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
?>
<?php
$scraped_page = curl("http://en.wikipedia.org/wiki/The_Walking_Dead_(TV_series)"); // Downloading IMDB home page to variable $scraped_page
$scraped_data = scrape_between($scraped_page, "<table class=\"infobox vevent\" style=\"width:22em\">", "</table>"); // Scraping downloaded dara in $scraped_page for content between <title> and </title> tags
$original_run = mb_substr($scraped_data, strpos($scraped_data, "Original run")-2, strpos($scraped_data, "External links") - strpos($scraped_data, "Original run")-2);
echo $original_run;
?>

Have you considered simply using Wikipedia API? Autogenerated wiki markup is generally incredibly terrible to deal with and may change at any time.
Additionally, instead of trying to regex-parse HTML or something in PHP, just use the phpQuery library with composer, you can just search for the selector table.infobox.vevent.

how to find the total no.of inbound and outbound links of a website using php?

how to find the total no.of inbound and outbound links of a website using php?

To count outbound links
parse html for webpage
parse all links using regex
filter links which starts with your domain or "/"
To inbound link
Grab google results page
http://www.google.ca/search?sourceid=chrome&ie=UTF-8&q=site:
parse similarly

For outbound links, you will have to parse the HTML code of the website as some here have suggested.
For inbound links, I suggest using the Google Custom Search API, sending a direct request to google can get your ip banned. You can view the search api here. Here is a function I use in my code for this api:
function doGoogleSearch($searchTerm)
{
$referer = 'http://your-site.com';
$args['q'] = $searchTerm;
$endpoint = 'web';
$url = "http://ajax.googleapis.com/ajax/services/search/".$endpoint;
$args['v'] = '1.0';
$key= 'your-api-key';
$url .= '?'.http_build_query($args, '', '&');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $referer);
$body = curl_exec($ch);
curl_close($ch);
//decode and return the response
return json_decode($body);
}
After calling this function as: $result = doGoogleSearch('link:site.com'), the variable $result->cursor->estimatedResultCount will have the number of results returned.

PHP can't determine the inbound links of a page through some trivial action. You either have to monitor all incoming visitors and check what their referrer is, or parse the entire internet for links that point to that site. The first method will miss links not getting used, and the second method is best left to Google.
On the other hand, the outbound links from a site is doable. You can read in a page and analyze the text for links with a regular expression, counting up the total.

function getGoogleLinks($host)
{
$request = "http://www.google.com/search?q=" . urlencode("link:" . $host) ."&hl=en";
$data = getPageData($request);
preg_match('/<div id=resultStats>(About )?([\d,]+) result/si', $data, $l);
$value = ($l[2]) ? $l[2] : "n/a";
$string = "" . $value . "";
return $string;
}
//$host means the domain name

php cURL. preg_match , extract text from xhtml

I'm trying to extract the price from the bellow html page/link using php cURL and preg_match . Basically I'm expecting for this code to output 4,550 but for some reasons I get
Notice: Undefined offset: 1 in C:\wamp\www\test.php on line 22
I think that the pattern is correct because if I put the html itself in a variable and escape the "" it works ! .
Also if I output (echo $result;) it displays the html properly grabbed from foxtons website so I just can't figure it out why the whole thing doesn't work . I need to make this work and also I would appreciate if you would tell me why is that notice generated and why my current script doesn't work.
$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_exec($ch);
curl_close($ch);
$result2 = str_replace('"', '\"', $result);
$tagname1= ");</script>
";
$tagname2= "</noscript>
per month</a>";
$pattern = "/$tagname1(.*?)$tagname2/";
preg_match($pattern, $result, $matches);
$prices = $matches[1];
print_r($prices);
?>

I rewrote the script a bit to account for more than 1 <noscript> on the page. You needed to use preg_match_all which will look for all the matches not just stop at the first one.
$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_exec($ch);
curl_close($ch);
preg_match_all("/<noscript>(.*)<\/noscript>/", $result, $matches);
print_r($matches);
Outputs
Array
(
[0] => Array
(
[0] => £1,050
[1] => 4,550
)
[1] => Array
(
[0] => £1,050
[1] => 4,550
)
)
I tried this on my box and it worked - let me know if it worked for you

Don't use REGEX to parse html, use an html dom parser instead, like PHP Simple HTML DOM Parser
include("simple_html_dom.php") ;
$html = file_get_html("http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717");
foreach($html->find('noscript') as $noscript)
{
echo $noscript->innertext."<br>";
}
echo's:
£1,600
6,934
£1,500
6,500
£1,350
5,850
£950
4,117
£925
4,009
£850
3,684
£795
3,445
£795
3,445
£775
3,359
£750
3,250

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to get text within td element from remote webpage via PHP - php

Related

Getting whole HTML element with PHP

Getting resulting data from website query

Need help converting my beautiful soup (Python) code to PHP

how to find the total no.of inbound and outbound links of a website using php?

php cURL. preg_match , extract text from xhtml

Categories

Resources