I have the following code which scrapes the text from multiple pages and displays them.
My question for you is how can I take each of those variables and place them into an excel spreadsheet located on the server. For each link, on separate rows.
Like this :
<?php
include_once 'simple_html_dom.php';
$urls = array(
'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/01/1150001435/1&judet=50',
'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/05/1140001657/1&judet=50',
'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/05/1140001657/2&judet=50',
'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/01/1150001435/1&judet=50',
'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/05/1140001657/1&judet=50',
'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/05/1140001657/2&judet=50',
'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/01/1150001435/1&judet=50',
'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/05/1140001657/1&judet=50',
'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/05/1140001657/2&judet=50',
);
function scraping($url) {
// DOM
$html = file_get_html($url);
// articol
if ($html && is_object($html) && isset($html->nodes)) {
foreach ($html->find('/html/body/table') as $article) {
//titlu
$item['titlu'] = trim($article->find('/tbody/tr[1]/td/div', 0)->plaintext);
// tabel
$item['tr2'] = trim($article->find('/tbody/tr[2]/td[2]', 0)->plaintext);
$item['tr3'] = trim($article->find('/tbody/tr[3]/td[2]', 0)->plaintext);
$item['tr4'] = trim($article->find('/tbody/tr[4]/td[2]', 0)->plaintext);
$item['tr5'] = trim($article->find('/tbody/tr[5]/td[2]', 0)->plaintext);
$item['tr6'] = trim($article->find('/tbody/tr[6]/td[2]', 0)->plaintext);
$item['tr7'] = trim($article->find('/tbody/tr[7]/td[2]', 0)->plaintext);
$item['tr8'] = trim($article->find('/tbody/tr[8]/td[2]', 0)->plaintext);
$item['tr9'] = trim($article->find('/tbody/tr[9]/td[2]', 0)->plaintext);
$item['tr10'] = trim($article->find('/tbody/tr[10]/td[2]', 0)->plaintext);
$item['tr11'] = trim($article->find('/tbody/tr[11]/td[2]', 0)->plaintext);
$item['tr12'] = trim($article->find('/tbody/tr[12]/td/div/]', 0)->plaintext);
$ret[] = $item;
}
// memorie
$html->clear();
unset($html);
return $ret;}
}
echo '<pre>';
foreach ($urls as $url) {
$ret = scraping($url);
foreach ($ret as $v) {
echo $v['titlu'] . '<br>';
echo $v['tr2'] . '<br>';
echo $v['tr3'] . '<br>';
echo $v['tr4'] . '<br>';
echo $v['tr5'] . '<br>';
echo $v['tr6'] . '<br>';
echo $v['tr7'] . '<br>';
echo $v['tr8'] . '<br>';
echo $v['tr9'] . '<br>';
echo $v['tr10'] . '<br>';
echo $v['tr11'] . '<br>';
echo $v['tr12'] . '<br>';
echo '<br>';
echo '<br>';
}
}
?>
Related
when i set some cron jobs for loading a php file cron doesn't work and it seams some php function have problem with cron job.
i know my cron command is true because i tested my cron job working true with simple php code that put date to a text file so my cron command is true i tested all ways of command like : wget, crul , cd, php , /user/local/bin/php and another but i don't know why my php code doesn't work and too i test that codes working very well when i loading php files with my browser.
my php file code:
<?php
header('Content-Type: text/html; charset=utf-8');
include ('simple_html_dom.php');
$mycache_url = 'http://example3.com';
$mybziran_url = 'http://example2.com';
function addhttp($url)
{
if (!preg_match("~^(?:f|ht)tps?://~i", $url)) {
global $mybziran_url;
$url = ltrim($url, '/');
$url = $mybziran_url . '/' . $url;
}
return $url;
}
function urlencodeproblem($badurl)
{
$badurl = urlencode($badurl);
$badchar = array('%3A', '%2F');
$truechar = array(':', '/');
$badurl = str_replace($badchar, $truechar, $badurl);
return $badurl;
}
$url_html = #file_get_html($mycache_url);
$bziran_url = '';
$bziran_title = '';
foreach (#$url_html->find('a') as $elements) {
$bziran_url[] = urlencodeproblem($elements->href);
$bziran_title[] = $elements->innertext;
}
$myi = count($bziran_url);
for ($i = 0; $i < $myi; ++$i) {
$post_title = $bziran_title[$i];
$post_url = $bziran_url[$i];
$html = #file_get_html($post_url);
foreach ($html->find('div.price') as $myhtml_price_adelete) {
echo '######' . $myhtml_price_adelete->innertext . '######';
}
$bad_title_my = '';
foreach ($html->find('h1 a') as $myhtml_price_adelete) {
$bad_title_my .= $myhtml_price_adelete->innertext;
}
if (empty($bad_title_my)) {
echo $post_url;
echo 'prob';
} else {
$kalame = urlencode($post_title);
$A2_html = 'ok';
foreach ($html->find('a') as $myhtml_a_code) {
$e_ahref = addhttp($myhtml_a_code->href);
$myhtml_a_code->href = $e_ahref;
$myhtml_a_code->target = '_blank';
}
$html->save();
foreach ($html->find('img') as $myhtml_img_code) {
if (strpos($myhtml_img_code->src, 'base64') === false) {
$e_imgsrc = addhttp($myhtml_img_code->src);
$myhtml_img_code->src = $e_imgsrc;
}
}
$html->save();
$mymeta_keyword = '';
foreach ($html->find('meta[name=keywords]') as $myhtml_keyword) {
$mymeta_keyword[] = $myhtml_keyword->content;
}
foreach ($html->find('p') as $mytagdelete) {
if (strpos($mytagdelete->innertext, 'tag :') !== false) {
$mytagdelete->outertext = '';
}
}
$html->save();
foreach ($html->find('h1 a') as $myadelete) {
$myadelete->outertext = $myadelete->innertext;
}
$html->save();
$a3_href = '';
$a2_href = $html->find("img[alt=buy]");
foreach ($a2_href as $a2_href) {
$a2_href->outertext =
'<br><p align="center"><img alt="pay-download" src="http://exam.com/tmp_files/01-pay-download.png"></p>';
}
$html->save();
echo '<br>buy : ' . $a3_href . '<br>';
$myhtmlcode3 = '';
foreach ($html->find('div.prod') as $myhtmlcode) {
$myhtmlcode3 .= $myhtmlcode->outertext;
}
$html->save();
echo '<br>*** title ***' . $post_title . '<br>';
echo $post_url . '<br>';
$i_t = mt_rand(1, 34);
$mysaier_mahsolat = '<br>
<a target="_blank" href="http://ayta.ir/index.php?page=' . $i_t . '"> click </a>
<br>';
echo $mysaier_mahsolat;
echo $A2_html . $myhtmlcode3 . '<br>';
echo 'keyword :' . $mymeta_keyword[0];
}
}
?>
<?
$crontest = date("Y-m-d - h:i:s") . "\n" . file_get_contents(dirname(__file__) .
DIRECTORY_SEPARATOR . "cron.txt");
echo $crontest;
file_put_contents("cron.txt", $crontest);
?>
Currently I am scraping this website with the code displayed below but it displays sometimes pages with Mixtape in the title and I am wondering how I can make it skip over these and only crawl the pages that display normally. (demo)
$html = file_get_html('http://beatshype.com/mp3download/');
foreach($html->find('.entry-title a') as $element)
{
print '<br><br>';
echo $url = ''.$element->href;
$html2 = file_get_html($url);
print '<br>';
$image = $html2->find('meta[property=og:image]',0);
print $image = $image->content;
print '<br>';
$title = $html2->find('.single-title',0);
print $title = $title->plaintext;
print '<br>';
$str = explode ("/", $url);
$date = $html2->find('.single-content a',2);
print $date = $date->href;
}
Screenshot:
Top result is good, bottom result is bad.
Very simple, check if the title contains 'mixtape' and go to the next item in the loop:
if(stripos($title->plaintext, 'mixtape') !== false) {
continue;
}
Put that code just before you assign $title to $title->plaintext, or just use $title as the haystack argument.
Some people need it spelled out..
$html = file_get_html('http://beatshype.com/mp3download/');
foreach($html->find('.entry-title a') as $element)
{
$html2 = file_get_html($url);
$title = $html2->find('.single-title',0);
if(stripos($title, 'mixtape') !== false) continue;
$title = $title->plaintext;
print '<br><br>';
echo $url = ''.$element->href;
print '<br>';
$image = $html2->find('meta[property=og:image]',0);
print $image = $image->content;
print $title.'<br>';
$str = explode ("/", $url);
$date = $html2->find('.single-content a',2);
print $date = $date->href;
}
First
print $image = $image->content;
looks superflous.
It both sets $image = $image->content and prints it.
But instead of grabbing and printing each line one after another, grab the title, then decide if you want to fetch the other lines and print the record.
$html = file_get_html('http://beatshype.com/mp3download/');
foreach($html->find('.entry-title a') as $element)
{
$url = ''.$element->href;
$html2 = file_get_html($url);
$title = $html2->find('.single-title',0);
if (strpos($title->plaintext,"MIXTAPE")===FALSE) {
$image = $html2->find('meta[property=og:image]',0);
$date = $html2->find('.single-content a',2);
print '<br><br>';
echo $url;
print '<br>';
print $image->content;
print '<br>';
print $title->plaintext;
print '<br>';
print $date->href;
}
}
I want to print both
<?php
//in file A
$_SESSION['cart']['prices'] = array('1000');
$_SESSION['cart']['services'] = array('game');
//In File B
$_SESSION['cart']['prices'] = array('2000');
$_SESSION['cart']['services'] = array('game2');
//in file C
foreach ($_SESSION['cart']['services'] as $key => $service) {
echo $service . ' = ' . $_SESSION['cart']['prices'][$key] . '<br />';
}
?>
Better use this :
$_SESSION['cart']['prices'][] = array('1000');
$_SESSION['cart']['services'][] = array('game');
//In File B
$_SESSION['cart']['prices'][] = array('2000');
$_SESSION['cart']['services'][] = array('game2');
According to the current data foreach loop will execute two times. It will print Array as $service is array array('1000') and array('2000') same as for $_SESSION['cart']['prices'][$key]
foreach ($_SESSION['cart']['services'] as $key => $service) {
echo $service . ' = ' . $_SESSION['cart']['prices'][$key] . '<br />';
}
Try this :
$array1 = array('1000','2000');
$array2 = array('game1','game2');
foreach($array1 as $index=>$key)
{
$_SESSION['cart']['prices'][] = $key;
$_SESSION['cart']['services'][] = $array2[$index];
}
foreach ($_SESSION['cart']['services'] as $key => $service) {
echo $service . ' = ' . $_SESSION['cart']['prices'][$key] . '<br />';
}
Would someone of you know why I'm not able to use a (long)piece of code within a foreach loop?
The code in the foreach loop is only executed once.
This code at topictweets.php works fine on its own but I want to repeat it for each forum.
The foreach loop works fine without the include. I also tried to have the code from topic tweets.php plainly in the foreach loop, this didn't work either of course.
The code it includes is used to get topics of a forum from the database and find related tweets, and save those in the database.
Is there some other way to do this?
foreach ($forumlist as $x => $fID) {
echo 'id:'.$fID.'<br>';
include 'topictweets.php';
/////////
////////
}
online version: http://oudhollandsedrop.nl/webendata/FeedForum/fetchtweets.php
bunch of code in topic tweets.php
<?php
//?/ VVVV ---- SELECT TOPICS FOR CURRENT FORUM ----- VVVV ////
echo $fID;
$sql = "SELECT Topics_TopicID
FROM Topics_crosstable
WHERE Forums_ForumID = '$fID'";
$result = mysql_query($sql);
if (!$result) {
//echo 'The topiclist could not be displayed, please try again later.';
} else {
if (mysql_num_rows($result) == 0) {
// echo 'This topic doesn′t exist.';
} else {
while ($row = mysql_fetch_assoc($result)) {
//display post data
// echo $row['Topics_TopicID'];
// echo': ';
$topic = "SELECT Name
FROM Topics
WHERE TopicID = " . mysql_real_escape_string($row['Topics_TopicID']);
$topicname = mysql_query($topic);
if (!$topicname) {
// echo 'The topic could not be displayed, please try again later.';
} else {
if (mysql_num_rows($topicname) == 0) {
// echo 'This topic doesn′t exist.';
} else {
while ($row = mysql_fetch_assoc($topicname)) {
//display post data
// echo $row['Name'];
// echo'<br>';
$topiclist[] = $row['Name'];
}
}
}
}
}
}
foreach ($topiclist as $key => $value) {
$terms .= "" . $value . ",";
}
//echo'<p>';
//echo rtrim($terms, ",");
//echo'<p>';
//echo'<p>';
//echo $terms;
//$terms="vintage";
//Twitter account information
$username = "Username";
$password = "Password";
while (true) {
//$terms="vintage";
//echo "search terms: " . substr_replace($terms, "", -1) . "\n";
$url = "https://stream.twitter.com/1/statuses/filter.json";
$cred = sprintf('Authorization: Basic %s', base64_encode("$username:$password"));
$param = "track=" . urlencode(substr_replace($terms, "", -1));
$opts = array(
'http' => array(
'method' => 'POST',
'header' => $cred,
'content' => $param,
'Content-type' => 'application/x-www-form-urlencoded'),
'ssl' => array('verify_peer' => false)
);
$ctx = stream_context_create($opts);
$handle = fopen($url, 'r', false, $ctx);
//var_dump($handle);
$content = "";
$flag = true;
while ($flag) {
$buffer = fread($handle, 100);
//$buffer = stream_get_line($handle, 1024, "\n");
$a = explode("\n", $buffer, 2);
$content = $content . $a[0];
#var_dump($a);
if (count($a) > 1) {
#echo $content;
#echo "\n";
$r = json_decode($content, true);
#var_dump($r);
// echo '<p>';
// echo "text: " . $r["text"];
// echo '<br>';
// echo "\nrceated_at: " . $r["created_at"];
// echo '<br>';
// echo "\nuser screen name: " . $r["user"]["screen_name"];
// echo '<br>';
// echo "\nuser id: " . $r["user"]["id"];
// echo '<br>';
// echo "\nid : " . $r["id"];
// echo '<br>';
// echo "\nin_reply_to_status_id: " . $r["in_reply_to_status_id"];
// echo '<p>';
// echo "\n\n";
$created_at = $r["created_at"];
$created_at = strtotime($created_at);
$mysqldate = date('Y-m-d H:i:s', $created_at);
//
// echo'<p>';
foreach ($topiclist as $key => $value) {
// echo'getshere!';
//$whichterm = $r["text"];
$whichterm = '"' . $r["text"] . '"';
//echo $whichterm;
if (stripos($whichterm, $value) !== false) {
// echo 'true:' . $value . '';
//find topicid
$whattopic = "SELECT TopicID
FROM Topics
WHERE Name = '$value'";
//var_dump($whattopic);
$tID = mysql_query($whattopic);
//var_dump($tID);
if (!$tID) {
// echo 'topic id not found.';
} else {
if (mysql_num_rows($tID) == 0) {
// echo 'This topic doesn′t exist.';
} else {
while ($rec = mysql_fetch_assoc($tID)) {
$inserttweets = "INSERT INTO
Tweets(Topics_TopicID, AddDate, Tweetcontent)
VALUES('" . mysql_real_escape_string($rec['TopicID']) . "',
'" . mysql_real_escape_string($mysqldate) . "',
'" . mysql_real_escape_string($r["text"]) . "')";
//WHERE TopicID = " . mysql_real_escape_string($row['Topics_TopicID'])
}
}
$addtweet = mysql_query($inserttweets);
if (!$addtweet) {
//something went wrong, display the error
//echo 'Something went wrong while adding tweet.';
//echo mysql_error(); //debugging purposes, uncomment when needed
} else {
echo 'Succesfully added tweet';
}
}
}
}
die();
$content = $a[1];
}
}
fclose($handle);
}
?>
"Pasting" a bunch of code inside a loop isn't a great practice. In fact, what you're looking for is a function or the use of a defined class. So, if you can, define a function in your topictweets.php that will contain your code and use it in your loop:
include 'topictweets.php';
foreach ($forumlist as $x => $fID) {
echo 'id:'.$fID.'<br>';
processYourForums($fID);
/////////
////////
}
try include_once()
however, why not have a loop within topictweets.php?
you can do the query, etc.. in this page, but then loop through it in the include
This should work fine:
include 'topictweets.php';
foreach ($forumlist as $x => $fID) {
echo 'id:'.$fID.'<br>';
}
You only need to include once.
This one only covers the first record in the array -- $form[items][0][description]. How could I iterate this to be able to echo succeeding ones i.e
$form[items][1][description];
$form[items][2][description];
$form[items][3][description];
and so on and so forth?
$array = $form[items][0][description];
function get_line($array, $line) {
preg_match('/' . preg_quote($line) . ': ([^\n]+)/', $array['#value'], $match);
return $match[1];
}
$anchortext = get_line($array, 'Anchor Text');
$url = get_line($array, 'URL');
echo '' . $anchortext . '';
?>
This should do the trick
foreach ($form['items'] as $item) {
echo $item['description'] . "<br>";
}
I could help you more if I saw the body of your get_line function, but here's the gist of it
foreach ($form['items'] as $item) {
$anchor_text = get_line($item['description'], 'Anchor Text');
$url = get_line($item['description'], 'URL');
echo "{$anchor_text}";
}
You can use a for loop to iterate over this array.
for($i=0; $i< count($form['items']); $i++)
{
$anchortext = get_line($form['items'][$i]['description'], 'Anchor Text');
$url = get_line($form['items'][$i]['description'], 'URL');
echo '' . $anchortext . '';
}