I want to load csv file data to extract the urls from CSV and check for the title tag for all the urls and update the urls with corresponding title tags in a new csv. But while I try to add data to the csv all the urls are getting listed but only the title of the last url is displayed in the CSV. I have tried different ways to overcome this problem but unable to do so.
Here is my code:
<?php
ini_set('max_execution_time', '300'); //300 seconds = 5 minutes
ini_set('max_execution_time', '0');
include('simple_html_dom.php');
// if (isset($_POST['resurl'])) {
// $url = $_POST['resurl'];
if (($csv_file = fopen("old.csv", "r", 'a')) !== FALSE) {
$arraydata = array();
while (($read_data = fgetcsv($csv_file, 1000, ",")) !== FALSE) {
$column_count = count($read_data);
for ($c = 0; $c < $column_count; $c++) {
array_push($arraydata, $read_data[$c]);
}
}
fclose($csv_file);
}
$title = [];
foreach ($arraydata as $ad) {
$ard = [];
$ard = $ad;
$html = file_get_html($ard);
if ($html) {
$title = $html->find('title', 0)->plaintext;
// echo '<pre>';
// print_r($title);
}
}
$ncsv = fopen("updated.csv", "a");
$head = "Url,Title";
fwrite($ncsv, "\n" . $head);
foreach ($arraydata as $value) {
// $ar[]=$value;
$csvdata = "$value,$title";
fwrite($ncsv, "\n" . $csvdata);
}
fclose($ncsv);
I've changed the code so that you write the CSV file as you read the HTML pages. This saves having another loop and an extra array of titles.
I've also changed it to use fputcsv to write the data out as it sorts ot things like escaping values etc.
// Open file, using w to clear the old file down
$ncsv = fopen('updated.csv', 'w');
$head = 'Url,Title';
fwrite($ncsv, "Url,Title" . PHP_EOL . $head);
foreach ($arraydata as $ad) {
$html = file_get_html($ad);
// Fetch title, or set to blank if html is not loaded
if ($html) {
$title = $html->find('title', 0)->plaintext;
} else {
$title = '';
}
// Write record out
fputcsv($ncsv, [$value, $title]);
}
fclose($ncsv);
I was able to solve it finally.
Here is the updated code:
<?php
ini_set('max_execution_time', '300'); //300 seconds = 5 minutes
ini_set('max_execution_time', '0');
include('simple_html_dom.php');
// if (isset($_POST['resurl'])) {
// $url = $_POST['resurl'];
if (($csv_file = fopen("ntsurl.csv", "r", 'a')) !== FALSE) {
$arraydata = array();
while (($read_data = fgetcsv($csv_file, 1000, ",")) !== FALSE) {
$column_count = count($read_data);
for ($c = 0; $c < $column_count; $c++) {
array_push($arraydata, $read_data[$c]);
}
}
fclose($csv_file);
}
// print_r($arraydata);
$title=[];
$ncsv=fopen("ntsnew.csv","a");
$head="Website Url,title";
fwrite($ncsv,"\n".$head);
foreach($arraydata as $ad)
{
$ard = [];
$ard = $ad;
$html = file_get_html($ard);
if ($html) {
$title = $html->find('title', 0)->plaintext;
echo '<pre>';
print_r($title);
$csvdata="$ard,$title ";
fwrite($ncsv,"\n".$csvdata);
}
}
// fclose($ncsv);
Related
I've got a csv file which contains product datas and prices from two distributors.
There are 67 keys in this file.
Now I want to search all EANs in this file which are twice available and then get the cheapest price.
After that delete the other higher price product line.
The CSV has a key for my merchant.
I made a test csv for easier view:
artno;name;ean;price;merchant
1;ipad;1654213154;499.00;merchant1
809;ipad;1654213154;439.00;merchant2
23;iphone;16777713154;899.00;merchant2
90;iphone;16777713154;799.00;merchant1
After the script runs through, the csv should look like (writing to new file):
artno;name;ean;price;merchant
809;ipad;1654213154;439.00;merchant2
90;iphone;16777713154;799.00;merchant1
I played around with fgetcsv, looping through the csv is not a problem, but how I can search for the ean in key 2?
$filename = './test.csv';
$file = fopen($filename, 'r');
$fileline = 1;
while (($data = fgetcsv($file, 0, ";")) !== FALSE) {
if($fileline == "1"){ $fileline++; continue; }
$search = $data[2];
$lines = file('./test.csv');
$line_number = false;
$count = 0;
while (list($key, $line) = each($lines) and !$line_number) {
$line_number = (strpos($line, $search) !== FALSE) ? $key : $line_number;
$count++;
}
if($count > 2){
echo "<pre>",print_r(str_getcsv($lines[$line_number], ";")),"</pre>";
}
}
I think this is what you are looking for:
<?php
$filename = './test.csv';
$file = fopen($filename, 'r');
$lines = file('./test.csv');
$headerArr = str_getcsv($lines[0], ";");
$finalrawData = [];
$cheapeastPriceByProduct = [];
$dataCounter = 0;
while (($data = fgetcsv($file, 0, ";")) !== FALSE) {
if($dataCounter > 0) {
$raw = str_getcsv($lines[$dataCounter], ";");
$tempArr = [];
foreach( $raw as $key => $val) {
$tempArr[$headerArr[$key]] = $val;
}
$finalrawData[] = $tempArr;
}
$dataCounter++;
}
foreach($finalrawData as $idx => $dataRow ) {
if(!isset($cheapeastPriceByProduct[$dataRow['name']])) {
$cheapeastPriceByProduct[$dataRow['name']] = $dataRow;
}
else {
if(((int)$dataRow['price'])< ((int)$cheapeastPriceByProduct[$dataRow['name']]['price'])) {
$cheapeastPriceByProduct[$dataRow['name']] = $dataRow;
}
}
}
echo "<pre>";
print_r($finalrawData);
print_r($cheapeastPriceByProduct);
I just added $finalData data array to store the parsed data and associated all rows with their header key counterpart then you can compare and filter data based on your criteria.
I have a space separated csv file. I am trying to read using following method:
$file = fopen('/var/www/html/my.csv', 'r');
while (($line = fgetcsv($file,0,"\t")) !== FALSE) {
$header_count = 0;
// $final_data = array();
if ( $count == 0 )
{
foreach ($line as $data) {
$header[$header_count] = $data;
$header_count++;
}
}
else
{
// print_r($line);
foreach ($line as $data) {
$final_data[$count-1][$header[$header_count]] = $data;
$header_count++;
}
}
$count++;
}
The file looks like:
id created ad
410699345585 11:12:29+05:30 ag:6061734588280
...
But, the reading output gives:
["��id"]=>
as id index. What am I doing wrong?
I'm writing a script for download from FTP..
In the form I need to show files and folders..
With ftp_nlist, they come all togethers but I want to know who's who ..
I can't find an easy way to do this:
$contents = ftp_nlist($connection, $rep);
$dossiers =array();
$fichiers = array();
foreach($contents as $content){
//if folder
if (is_folder($content)) $dossiers[] = $content;
//si file
if(is_filex($content)) $fichiers[] = $content;
}
Of course is_file and is_dir don't work with distant files...
I've find something with ftp_rawlist and the size of each result..
like this:
if($result['size']== 0){ //is dir }
But in case of an empty file???
So what id the way to know what is a folder and what is a file??
Thanks!
I've had the same problem and this was my solution:
$conn = ftp_connect('my_ftp_host');
ftp_login($conn, 'my_user', 'my_password');
$path = '/';
// Get lists
$nlist = ftp_nlist($conn, $path);
$rawlist = ftp_rawlist($conn, $path);
$ftp_dirs = array();
for ($i = 0; $i < count($nlist) - 1; $i++)
{
if($rawlist[$i][0] == 'd')
{
$ftp_dirs[] = $nlist[$i];
}
}
I know the above code could be optimised and do just one FTP request instead of two but for my purposes this did the work.
For anyone looking for a cleaner solution, I've found a script to parse ftp_rawlist in this LINK:
Function
function parse_ftp_rawlist($List, $Win = FALSE)
{
$Output = array();
$i = 0;
if ($Win) {
foreach ($List as $Current) {
ereg('([0-9]{2})-([0-9]{2})-([0-9]{2}) +([0-9]{2}):([0-9]{2})(AM|PM) +([0-9]+|) +(.+)', $Current, $Split);
if (is_array($Split)) {
if ($Split[3] < 70) {
$Split[3] += 2000;
}
else {
$Split[3] += 1900;
}
$Output[$i]['isdir'] = ($Split[7] == '');
$Output[$i]['size'] = $Split[7];
$Output[$i]['month'] = $Split[1];
$Output[$i]['day'] = $Split[2];
$Output[$i]['time/year'] = $Split[3];
$Output[$i]['name'] = $Split[8];
$i++;
}
}
return !empty($Output) ? $Output : false;
}
else {
foreach ($List as $Current) {
$Split = preg_split('[ ]', $Current, 9, PREG_SPLIT_NO_EMPTY);
if ($Split[0] != 'total') {
$Output[$i]['isdir'] = ($Split[0] {0} === 'd');
$Output[$i]['perms'] = $Split[0];
$Output[$i]['number'] = $Split[1];
$Output[$i]['owner'] = $Split[2];
$Output[$i]['group'] = $Split[3];
$Output[$i]['size'] = $Split[4];
$Output[$i]['month'] = $Split[5];
$Output[$i]['day'] = $Split[6];
$Output[$i]['time/year'] = $Split[7];
$Output[$i]['name'] = $Split[8];
$i++;
}
}
return !empty($Output) ? $Output : FALSE;
}
}
Usage
// connect to ftp server
$res_ftp_stream = ftp_connect('my_server_ip');
// login with username/password
$login_result = ftp_login($res_ftp_stream, 'my_user_name', 'my_password');
// get the file list for curent directory
$buff = ftp_rawlist($res_ftp_stream, '/');
// parse ftp_rawlist output
$result = parse_ftp_rawlist($buff, false);
// dump result
var_dump($result);
// close ftp connection
ftp_close($res_ftp_stream);
First of all I load PHPExcel.php
Secondly, I am using this code:
$location = '/path/file.csv';
$inputFileType = 'CSV';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objPHPExcel = $objReader->load($location);
$worksheet = $objPHPExcel->getActiveSheet();
$list = array();
foreach ($worksheet->getRowIterator() as $row)
{
$rowIndex = $row->getRowIndex();
$cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
array_push($list, $cellValue);
}
$count = count($list);
for ($rowIndex = $count; $rowIndex != 1; $rowIndex--)
{
$cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
for ($i = $rowIndex - 2; $i != 0; $i--)
{
if ($list[$i] == $cellValue)
{
$worksheet->removeRow($rowIndex);
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');
$objWriter->save($location);
break;
}
}
}
So, I am trying to remove the rows when there are duplicate values in the first column. The code does not work. When I first run it in putty, I have to wait for ages. I interrupt the process and then I run it again. Then it runs, but in my csv file I have wrong results (duplicates are 300 but I am getting -600 rows).
In order to read a CSV file you dont have to use PHPExcel. Instead you can use a native php code like this one:
<?php
// Array which will hold all analyzed lines
$uniqueEntries = array();
$dublicatedEntries = array();
$delimiter = ',';
$file = 'test.csv';
//Open the file
if (($handle = fopen($file, "r")) !== false) {
// read each line into an array
while (($data = fgetcsv($handle, 8192, $delimiter)) !== false) {
// build a "line" from the parsed data
$line = join($delimiter, $data);
//If the line content has ben discovered before - save to duplicated and skip the rest..
if (isset($uniqueEntries[$line])){
dublicatedEntries[] = $line;
continue;
}
// save the line
$uniqueEntries[$line] = true;
}
fclose($handle);
}
// build the new content-data
$contents = '';
foreach ($uniqueEntries as $line => $bool) $contents .= $line . "\r\n";
// save it to a new file
file_put_contents("test_unique.csv", $contents);
?>
This code is untested but should work.
This will give you a .csv file with all unique entries.
I made a script that reads data from a .xls file and converts it into a .csv, then I have a script that takes the .csv and puts it in an array, and then I have a script with a foreach loop and at the end should echo out the end variable, but it echos out nothing, just a blank page. The file writes okay, and that's for sure, but I don't know if the script read the csv, because if I put an echo after it reads, it just returns blank.
Here my code:
<?php
ini_set('memory_limit', '300M');
$username = 'test';
function convert($in) {
require_once 'Excel/reader.php';
$excel = new Spreadsheet_Excel_Reader();
$excel->setOutputEncoding('CP1251');
$excel->read($in);
$x=1;
$sep = ",";
ob_start();
while($x<=$excel->sheets[0]['numRows']) {
$y=1;
$row="";
while($y<=$excel->sheets[0]['numCols']) {
$cell = isset($excel->sheets[0]['cells'][$x][$y]) ? $excel->sheets[0]['cells'][$x][$y] : '';
$row.=($row=="")?"\"".$cell."\"":"".$sep."\"".$cell."\"";
$y++;
}
echo $row."\n";
$x++;
}
return ob_get_contents();
ob_end_clean();
}
$csv = convert('usage.xls');
$file = $username . '.csv';
$fh = fopen($file, 'w') or die("Can't open the file");
$stringData = $csv;
fwrite($fh, $stringData);
fclose($fh);
$maxlinelength = 1000;
$fh = fopen($file);
$firstline = fgetcsv($fh, $maxlinelength);
$cols = count($firstline);
$row = 0;
$inventory = array();
while (($nextline = fgetcsv($fh, $maxlinelength)) !== FALSE )
{
for ( $i = 0; $i < $cols; ++$i )
{
$inventory[$firstline[$i]][$row] = $nextline[$i];
}
++$row;
}
fclose($fh);
$arr = $inventory['Category'];
$texts = 0;
$num2 = 0;
foreach($inventory['Category'] as $key => $value) {
$val = $value;
if (is_object($value)) { echo 'true'; }
if ($value == 'Messages ') {
$texts++;
}
}
echo 'You have used ' . $texts . ' text messages';
?>
Once you return. you cannot do anything else in the function:
return ob_get_contents();
ob_end_clean();//THIS NEVER HAPPENS
Therefore the ob what never flushed and won't have any output.
I see a lot of repetitive useless operations there. Why not simply build an array with the data you're pulling out of the Excel file? You can then write out that array with fputcsv(), instead of building the CSV string yourself.
You then write the csv out to a file, then read the file back in and process it back into an array. Which begs the question... why? You've already got the raw individual bits of data at the moment you read from the excel file, so why all the fancy-ish giftwrapping only to tear it all apart again?