Get text from script output - php

everyone, I've been using this code for quite a long time
<?php
$url = 'http://www.smn.gov.ar/mensajes/index.php?observacion=metar&operacion=consultar&87582=on&87641=on&87750=on&87765=on&87222=on&87761=on&87860=on&87395=on&87344=on&87166=on&87904=on&87571=on&87347=on&87803=on&87576=on&87162=on&87532=on&87497=on&87097=on&87046=on&87548=on&87217=on&87506=on&87692=on&87418=on&87574=on&87715=on&87374=on&87289=on&87852=on&87178=on&87896=on&87823=on&87270=on&87155=on&87453=on&87925=on&87934=on&87480=on&87047=on&87553=on&87311=on&87909=on&87436=on&87509=on&87912=on&87623=on&87444=on&87129=on&87371=on&87645=on&87022=on&87127=on&87828=on&87121=on&87938=on&87791=on&87448=on';
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile($url);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
// search for td's containing METAR
$metars = $xpath->query('//td[contains(text(), "METAR SA")]');
if($metars->length <= 0) {
echo 'no metars found';
exit;
}
$data = array();
foreach($metars as $metar) {
$data[] = $metar->nodeValue;
}
echo '<pre>';
print_r($data);
Well, this was working fine, until the program in charge to read the output was updated and now it needs a clear output.
At the momment I'm getting this:
http://ar.ivao.aero/weather/metar.php
But the program needs it like this:
SABE 161600Z 02006KT 9999 FEW030 24/18 Q1009 =
SAZA 161600Z 18011KT CAVOK 24/08 Q1010 =
SAZB 161700Z 27012KT CAVOK 21/09 Q1011 =
I don't thought maybe using another script like a file_get_content() could be useful but again its going to show the infromation I don't want to.
I also tried replacing print_r() by var_dump() but its the same
Any ideas?
There is anyway to get this informatin in a simple txt file?
Regards,

You need to filter out some data. Try to find out what's common in the info you need to output. For instance, all the required info from your raw print_r data seems to beging with METAR. So
echo '<pre>';
foreach($metars as $metar) {
if(substr($metar->nodeValue, 0, 5) === "METAR") {
echo str_replace("METAR ", "", $metar->nodeValue) . PHP_EOL;
}
}
That removes any lines like Aeropuerto FORMOSA from the output.

Related

PHP scrape data from website

I am new to programming. So I choose to build a webpage by using Wordpress. But I am trying to gather weather data from other sites, I could not find a fitting plugin for scraping the data, and decided to give it a try and put something together myself.
But with my limited understanding of programming is giving me issues. With a little inspirations from the web I have put this together:
$html = file_get_contents('http://www.frederikshavnhavn.dk/scripts/weatherwindow.php?langid=2'); //get the html returned from the following url
$poke_doc = new DOMDocument();
libxml_use_internal_errors(false); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$poke_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$poke_xpath = new DOMXPath($poke_doc);
//get all the spans's with an id
$poke_type = $poke_xpath->query("//span[#class='weathstattype']");
$poke_text = $poke_xpath->query("//span[#class='weathstattext']");
foreach($poke_text as $text){
foreach($poke_type as $type){
echo $type->nodeValue;
echo $text->nodeValue . "</br>";
continue 2;
}
break;
}
}
Being that this is all new to me, and I am really trying to get this to work for me, hoping for a better understanding of the code behind the works.
What I am trying to achieve is a formatted list with the data.
1. value $type $text
2. value $type $text
Right now it is giving me a lot of trouble.
when I use the continue 2 it does not change the value $type, but when I just use continue statement it changes $type but not $text. How can I make it change both values each time?
Thanks for your help.
try adding this method:
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
then replace the foreach with this:
foreach($poke_text as $text){
//echo $type ->nodeValue . "</n>";
echo get_inner_html($text ).'<br>';
}
foreach($poke_type as $type){
//echo $text ->nodeValue;
echo get_inner_html($type ).'<br>';
}
produces this:
197° (Syd)
5.7 °C Stigende
4.8 m/s Stigende
5.4 m/s Stigende
-6 cm Faldende 1004 hPa Vindretning Lufttemperatur Middel vindhastighed Max vindhastighed Vandstand Lufttryk
Buddy in your code your foreach loops (in last) you use $type as $text and $text as $type.. I run the code and just change the variables as they should be its working fine..
$html = file_get_contents('http://www.frederikshavnhavn.dk/scripts/weatherwindow.php?langid=2'); //get the html returned from the following url
$poke_doc = new DOMDocument();
libxml_use_internal_errors(false); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$poke_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$poke_xpath = new DOMXPath($poke_doc);
//get all the spans's with an id
$poke_type = $poke_xpath->query("//span[#class='weathstattype']");
$poke_text = $poke_xpath->query("//span[#class='weathstattext']");
foreach($poke_text as $text){
echo $text->nodeValue;
}
foreach($poke_type as $type){
echo $type->nodeValue;
}
}
And this the out that I got from your code (by changing the variables in loop)
196° (Syd) 5.6 °C 4.1 m/s 5 m/s -6 cm 1004 hPa Vindretning Lufttemperatur Middel vindhastighed Max vindhastighed Vandstand Lufttryk
Now You have your data I think you can manage how to sort them out...

HTML Table parsing using DOM

I Have a HTML Table
My Parsing Code is
$src = new DOMDocument('1.0', 'utf-8');
$src->formatOutput = true;
$src->preserveWhiteSpace = false;
#$src->loadHTML($result);
$xpath = new DOMXPath($src);
$data=$xpath->query('//td[ contains (#class, "bodytext1") ]');
foreach($data as $datas)
{
echo $datas->nodeValue."<br />";
}
$values=$xpath->query('//tr[ contains (#bgcolor, "f3fafe") ]');
foreach($values as $value)
{
echo $value->nodeValue."<br />";
}
$values1=$xpath->query('//tr[ contains (#bgcolor, "def0fa") ]');
foreach($values1 as $value1)
{
echo $value1->nodeValue."<br />";
}
to be printed, and I want them to be repeated along with other lines as shown above in output i need.
and I want this whole thing in a array so that i can insert it in the database
Can anyone please guide me or give me any hint so that I can do this
This should get you started.
$src = new DOMDocument('1.0', 'utf-8');
$src->formatOutput = true;
$src->preserveWhiteSpace = false;
$src->loadHTML($result);
$xpath = new DOMXPath($src);
// get header data
$data=$xpath->query('//table[1]//td');
$htno = trim(explode(":",$data->item(0)->nodeValue)[1]);
$name = trim(explode(":",$data->item(1)->nodeValue)[1]);
$fatherName=trim(explode(":",$data->item(2)->nodeValue)[1]);
// rows from 2nd table
$values1=$xpath->query('//table[2]//tr');
$header = true; // flag to track whether we've read the header row.
foreach($values1 as $value1)
{
if (!$header) {
$rowdata = str_replace("\r\n"," ",$value1->nodeValue);
echo $htno," ",$name," ",$fatherName," ",$rowdata,"\n";
}
$header = false;
}
Note:
The $header flag is a quick fix. A better Xpath query might eliminate the need for it.
the str_replace near the bottom is ugly but expedient. You might want to play with the xpath query to see if you can improve it.
Output is not formatted for HTML - lines are delimited by \n
I got a warning on one line where it contained &, so I changed it to AND. You might have to preprocess your tables to eliminate those somehow.
you could use third party's dll,such as "Html Agility Pack". a tool which is professional to convert html into xml.

change variable with GET method

I have a page test.php in which I have a list of names:
name1: 992345
name2: 332345
name3: 558645
name4: 434544
In another page test1.php?id=name2 and the result should be:
332345
I've tried this PHP code:
<?php
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("/test.php");
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*#".$_GET["id"]."");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
?>
I need to be able to change the name with GET PHP method in test1.pdp?id=name4
The result should be different now.
434544
is there another way, becose mine won't work?
Here is another way to do it.
<?php
libxml_use_internal_errors(true);
/* file function reads your text file into an array. */
$doc = file("test.php");
$id = $_GET["id"];
/* Show your array. You can remove this part after you
* are sure your text file is read correct.*/
echo "Seeking id: $id<br>";
echo "Elements:<pre>";
print_r($doc);
echo "</pre>";
/* this part is searching for the get variable. */
if (!is_null($doc)) {
foreach ($doc as $line) {
if(strpos($line,$id) !== false){
$search = $id.": ";
$replace = '';
echo str_replace($search, $replace, $line);
}
}
} else {
echo "No elements.";
}
?>
There is a completely different way to do this, using PHP combined with JavaScript (not sure if that's what you're after and if it can work with your app, but I'm going to write it). You can change your test.php to read the GET parameter (it can be POST as well, you'll see), and according to that, output only the desired value, probably from the associative array you have hard-coded in there. The JavaScript approach will be different and it would involve making a single AJAX call instead of DOM traversing using PHP.
So, in short: AJAX call to test.php, which then output the desired value based on the GET or POST parameter.
jQuery AJAX here; native JS tutorial here.
Just let me know if this won't work for your app, and I'll delete my answer.

How to validate plain text (link text) in a hyperlink using php?

I am using simple html dom to fetch datas from other websites. while fetching data it fetches both hyperlinks with plain text and without plain text. I want to remove hyperlinks without plain text(link text) while fetching the data ..
i have tried below codes
if($title==""){ echo "No text";}
and
if(ctype_space($title)) { echo "No text";}
where $title is the plaintext fetched from the website
but both method didnt worked..can any one help
Advance thanks for your help
Until you give us more information on what value is my best guess would be to try something like this
if(empty($title))
{
echo "No Text";
}
Does it really need to be "plain text validation"?
Reading your question it seems you just want to remove links with empty values.
If the latter is true, you can do something like this:
$html = <<<EOL
Text
More Text
EOL;
$dom = new DOMDocument;
$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link) {
if (strlen(trim($link->nodeValue)) == 0) {
$link->parentNode->removeChild($link);
}
}
var_dump($dom->saveHTML());
$dom = new DOMDocument;
$dom->loadHTML($html);
$xPath = new DOMXPath($html);
$links_array = $xPath->query("//a"); // select all a tags
$totalLinks = $links_array->length; // how many links there are.
for($i = 0; $i < $totalLinks; $i++) // process each link one by one
{
$title = $links_array->item($i)->nodeValue; // get LInkText
if($title == '') // if no link text
{
$url = $links_array->item($i)->getAttribute('href');
// do here what you want
}
}
You need to use preg_match, with a regular expression, to extract the link text. For example
if (preg_match("/<a.*?>(.*?)</",$title,$matches))
{
echo $matches[1];
}

PHP returning page error on simplexml print_r

The problem is only happening with one file when I try to do a DocumentDOM/SimpleXML method, so it seems like the issue is with that file. No clue what it could be.
If I do the following:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
print_r($xml);
in Chrome, I get a "Page Unavailable" error. In Firefox, I get nothing.
If I do the same thing but to a "test2.html", I get a print out as expected.
If I try the same thing but doing it this way:
$file = "test1.html";
$data = file_get_contents($file)
$dom = DOMDocument::loadHTML($data);
$xml = simplexml_import_dom($dom);
print_r($xml);
I get the same issue.
If I comment out the print_r line, Chrome goes from the "Page Unavailable" to blank.
I changed the permissions to 777, in case that was an issue, no fix.
I tried simply echoing out the contents of the html, no problem at all.
Any clues as to why a) Chrome would do that, and b) why I'm not getting any usable results?
Update:
If I put in:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
if(!$dom) {
echo "No Load!";
}
else {
$xml = simplexml_import_dom($dom);
print_r($xml);
}
I get the same issue. If I put in:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
if(!$dom) {
echo "No Load!";
}
else {
echo "Load!";
}
I get the "Load!" output, meaning that the dom method shouldn't be the problem (?)
I'll try the same exact test with the simplexml.
Update2:
If I do this:
I get the same issue. If I put in:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
if(!$xml) {
echo "No Load!";
}
else {
echo "Load!";
}
I get "Load!" but if I do:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
if(!$xml) {
echo "No Load!";
}
else {
echo "Load!";
print_r($xml);
}
I get the error. I did finally notice that I had an option to view the error in Chrome:
Error 324 (net::ERR_EMPTY_RESPONSE): Unknown error.
The troublesome html file is 288Kb. Could that be the issue? If so, how would I adjust for that?
Last Update:
Very Odd. I can use methods and functions on the object (as simplexml or domdocument), so I can do things like xpath to delete or parse the html, etc. In some cases (small results) it can echo out results, but for big stuff (show all spans), it fails in the same way.
So, since the end result, I think will fit in these parameters, I SHOULD be okay (I guess).
But any real solution is very welcome.
Turn on error reporting: error_reporting(E_ALL); in the first line of your PHP code.
Check the memory limit of your PHP configuration: memory_limit in the respective php.ini
What's the difference between test1.html and test2.html? Perhaps test1.html is not well-formed.
DocumentDOM and/or SimpleXML may bail out if the document is malformed. Try something like:
$dom = DOMDocument::loadHTMLFile($file);
if (!$dom) {
echo 'Loading file failed';
exit;
}
$xml = simplexml_import_dom($dom);
if (!$xml) {
...
}
If creating the $dom worked, conversion to $xml should work as well, but make sure anyway.
Edit: As Gehrig said, make sure error reporting is on, that should make it obvious where the process fails.

Categories