Fetch Specific Data From A website and Turn it into PDF

Fetch Specific Data From A website and Turn it into PDF - php

I Need to Fetch Specific data from ERPNEXT.COM/User-Guide to make it look like
https://drive.google.com/file/d/0B-uyX-vtnUFINnlhRWJ6cWNtMDg/view?usp=sharing
Simply i want to remove Header and footer so that only the main Heading and article with images will remain .. this i want to do on the whole website and fetch this data either using php or anything and then convert it to PDF. I was using Acrobat pro to convert these webpage to PDF and make a user guide PDF for users . Also i was using httrack to download complete website but that fetchs all data .
Please suggest A complete solution .
Regards,
Vishal Verma

read url with this function :
function fetchURL($URL) {
$Max=200000;
$handle = #fopen ($URL, "r");
if ($handle === false){ return false; }
$len = 0;
$buffer = "";
while (!feof($handle)){
$queue = fgets($handle, 4096);
$buffer .= $queue;
$len = $len + strlen($queue);
if($len > $Max){ fclose ($handle); unset($buffer); return 0; break;}else{continue;}
}
fclose ($handle);
return $buffer;
}
if you want get pdf only check character from that function substr(fetchURL([URL]),0,4) ==='%PDF'. that url is pdf. in my case i write the result to disk in file.pdf. i hope this help

Related

simple php page session visit counter not working

I am having this strange issue and can't figure it out.
On some websites I have this script works perfect... same code, same server settings...
With php, there is a simple page view hit counter that stores locally in a txt file.
Then I echo out the value on the footer copyright area of my websites to give the client a quick statistic... its pretty cool how fast it grows.
Anyway.. i have a client corner grill ny . com (seo purposes I added spaces )
On that website.. its been working great for years.
Now another website and a bunch more.. for example... savianos . com
This breaks.. and the text value is blank.
This is the counter.php code
<?php
session_start();
$counter_name = "counter/hits.txt";
//Check if a text file exists. If not create one and initialize it to zero.
if (!file_exists($counter_name)) {
$f = fopen($counter_name, "w");
fwrite($f,"0");
fclose($f);
}
// Read the current value of our counter file
$f = fopen($counter_name,"r");
$counterVal = fread($f, filesize($counter_name));
fclose($f);
// Has visitor been counted in this session?
// If not, increase counter value by one
if(!isset($_SESSION['hasVisited'])){
$_SESSION['hasVisited']="yes";
$counterVal++;
$f = fopen($counter_name, "w");
fwrite($f, $counterVal);
fclose($f);
}
?>
Now, if I add a value in the txt file.. like 1040... and go to the website it starts to work... then after a week or so I check it .. its blank again.
Any ideas?
I am thinking that this may be happening because the website might get a TON of views during dinner time friday night.. and the simple script can't handle it so.. while its trying to write a added a number it just breaks and go to blank.. and never starts back up again.
The structure is this.
/counter/ folder has
counter.php and a hits.txt file
Every page of the website the very first thing is
<?php include ('counter/counter.php'); ?>
and in the footer of the website we have
<?php echo $counterVal; ?>

Your code looks perfect, but let's understand the situation. You have a file which can be accessed concurrently for many users, because page visit can be done by multiple users on same time. This does't seem right you have to lock the file manipulation for another user while someone is modifying it, right?. Please have a look
Visits counter without database with PHP

It is most likely because you have two concurrent scripts that tried to open the file at one and one of them fail. You have to use flock() when there are multiple instances of the script that could operate at the same time. Counter are some of the heaviest things if you going to use file reading and writing. I wrote this wrapper to easily implement file locking.
If you want to check out one of my counters that in operation try http://ozlu.org. That dynamic counter image was self-built. The fileReadAll will read the entire file in one shot. The file writer only has two modes, write or append. You can pass the fileWriter an array or a string and it will write it to the file. The function will not add any \n to format your text so you would have to add that. The default mode for the fileWriteAll is w if you do not set the third argument.
function fileWriteAll($file, $content, $mode = "w"){
$mode = $mode === "w" || $mode === "a"? $mode : "w";
$FILE = fopen($file, $mode);
while (!flock($FILE, LOCK_EX)) { usleep(1); }
if( is_array($content) ){
for ($i = 0; $i < count($content); $i++){
fwrite($FILE, $content[$i]);
}
} else {
fwrite($FILE, $content);
}
flock($FILE, LOCK_UN);
fclose($FILE);
}
function fileReadAll($file){
$FILE = fopen($file, 'r');
while (!flock($FILE, LOCK_SH)) { usleep(1); }
$content = fread($FILE, filesize($file));
flock($FILE, LOCK_UN);
fclose($FILE);
return $content;
}

Your modified code:
session_start();
$counterName = './views.txt';
if (!file_exists($counterName)) {
$file = fopen($counterName, 'w');
fwrite($file, '0');
fclose($file);
}
$file = fopen($counterName, 'r');
$value = fread($file, filesize($counterName));
fclose($file);
if (! isset($_SESSION['visited'])) {
$_SESSION['visited'] = 'yes';
$value++;
$file = fopen($counterName, 'w');
fwrite($file, $value);
fclose($file);
}
session_unset();
echo $value;

Best way to check if URL is a video file in PHP?

I'm trying to find a way to be (almost) sure that an URL is real video file.
I've of course check get_headers to check if URL exist and header content type :
function get_http_response_code($theURL)
{
$headers = get_headers($theURL);
return substr($headers[0], 9, 3);
}
function isURLExists($url)
{
if(intval(get_http_response_code($url)) < 400)
{
return true;
}
return false;
}
function isFileVideo($url)
{
$headers = get_headers( $url );
$video_exist = implode(',',$headers);
if (strpos($video_exist, 'video') !== false)
{
return true;
}
else
{
return false;
}
}
Maybe i answer to myself, but maybe there are other more robust solution ( for video type mainly) .
Don't know if it's possible, but could i just download the file metadatas first and return the file related to this test ?
Thanks a lot !

Of course you can't be sure, but the best practice is to check the first bytes of the file and identify the MIME type based on this information.
An example of it as to find in this Q & A: https://stackoverflow.com/a/8225754/2797243

You can try this code,
<?php
function getUrlMimeType($url) {
$buffer = file_get_contents($url);
$finfo = new finfo(FILEINFO_MIME_TYPE);
return $finfo->buffer($buffer);
}
?>
You need to enable the extension on your PHP.ini
php_fileinfo.dll
If you want to download some portion of file use,
$filename = $url;
$portion=8192; // if you want upto 8192 byte to read
$handle = fopen($filename, "rb");
$contents = fread($handle, $portion);
fclose($handle);
If you want to take some portion of $url from inside file use,
$filename = $url;
$from=10000; // if you want to read file from 1000 byte
$to=9999; //if you want to read up to 999 9byte
$handle = fopen($filename, "rb");
$skip= fread($handle, $from);
$contents = fread($handle, $to);
fclose($handle);
Then you can cheque mime type of file.
thanks

reading and counting words from pdf document

i have been working on this text extraction project of various file extensions,
but i am having the most pain with pdf and powerpoint,here is the code for pdf
any one here know how to read text from existing pdf documents using any tool or library tcpdf , xpdf or fpdfi because i havent seen any exact solution for reading text from pdf or ppt,but please no zend solutions
function pdf2txt($filename){
$data = getFileData($filename);
// grab objects and then grab their contents (chunks)
$a_obj = getDataArray($data,"obj","endobj");
foreach($a_obj as $obj){
$a_filter = getDataArray($obj,"<<",">>");
if (is_array($a_filter)){
$j++;
$a_chunks[$j]["filter"] = $a_filter[0];
$a_data = getDataArray($obj,"stream\r\n","endstream");
if (is_array($a_data)){
$a_chunks[$j]["data"] = substr($a_data[0],strlen("stream\r\n"),strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
}
}
}
// decode the chunks
foreach($a_chunks as $chunk){
// look at each chunk and decide how to decode it - by looking at the contents of the filter
$a_filter = split("/",$chunk["filter"]);
if ($chunk["data"]!=""){
// look at the filter to find out which encoding has been used
if (substr($chunk["filter"],"FlateDecode")!==false){
$data =# gzuncompress($chunk["data"]);
if (trim($data)!=""){
$result_data .= ps2txt($data);
} else {
//$result_data .= "x";
}
}
}
}
return $result_data;
}
// Function : ps2txt()
// Arguments : $ps_data - postscript data you want to convert to plain text
// Description : Does a very basic parse of postscript data to
// : return the plain text
// Author : Jonathan Beckett, 2005-05-02
function ps2txt($ps_data){
$result = "";
$a_data = getDataArray($ps_data,"[","]");
if (is_array($a_data)){
foreach ($a_data as $ps_text){
$a_text = getDataArray($ps_text,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
} else {
// the data may just be in raw format (outside of [] tags)
$a_text = getDataArray($ps_data,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
return $result;
}
// Function : getFileData()
// Arguments : $filename - filename you want to load
// Description : Reads data from a file into a variable
// and passes that data back
// Author : Jonathan Beckett, 2005-05-02
function getFileData($filename){
$handle = fopen($filename,"rb");
$data = fread($handle, filesize($filename));
fclose($handle);
return $data;
}
// Function : getDataArray()
// Arguments : $data - data you want to chop up
// $start_word - delimiting characters at start of each chunk
// $end_word - delimiting characters at end of each chunk
// Description : Loop through an array of data and put all chunks
// between start_word and end_word in an array
// Author : Jonathan Beckett, 2005-05-02
function getDataArray($data,$start_word,$end_word){
$start = 0;
$end = 0;
unset($a_result);
while ($start!==false && $end!==false){
$start = strpos($data,$start_word,$end);
if ($start!==false){
$end = strpos($data,$end_word,$start);
if ($end!==false){
// data is between start and end
$a_result[] = substr($data,$start,$end-$start+strlen($end_word));
}
}
}
return $a_result;
}
this one is for powerpoint i found here some where but that isnt working also
function parsePPT($filename) {
// This approach uses detection of the string "chr(0f).Hex_value.chr(0x00).chr(0x00).chr(0x00)" to find text strings, which are then terminated by another NUL chr(0x00). [1] Get text between delimiters [2]
$fileHandle = fopen($filename, "r");
$line = #fread($fileHandle, filesize($filename));
$lines = explode(chr(0x0f),$line);
$outtext = '';
foreach($lines as $thisline) {
if (strpos($thisline, chr(0x00).chr(0x00).chr(0x00)) == 1) {
$text_line = substr($thisline, 4);
$end_pos = strpos($text_line, chr(0x00));
$text_line = substr($text_line, 0, $end_pos);
$text_line = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t#\/\_\(\)]/","",$text_line);
if(substr($text_line,0,20)!="Click to edit Master")
if (strlen($text_line) > 1) {
$outtext.= substr($text_line, 0, $end_pos)."\n<br>";
}
}
}
return $outtext;
}

Why are you trying to reinvent the wheel? You could either resort to using ie. xpdf or a similar tool to extract the text data inside the PDF, and afterwards process the plain text file resulting from that operation. The same approach could be used for virtually any file format that contains text (ie. first convert to a plain text version, then process that)...
Indexing PDF Documents with Zend_Search_Lucene could be an interesting read if you opt for that solution.

file_get_contents create file is not exists

Is there any alternative to file_get_contents that would create the file if it did not exist. I am basically looking for a one line command. I am using it to count download stats for a program. I use this PHP code in the pre-download page:
Download #: <?php $hits = file_get_contents("downloads.txt"); echo $hits; ?>
and then in the download page, I have this.
<?php
function countdownload($filename) {
if (file_exists($filename)) {
$count = file_get_contents($filename);
$handle = fopen($filename, "w") or die("can't open file");
$count = $count + 1;
} else {
$handle = fopen($filename, "w") or die("can't open file");
$count = 0;
}
fwrite($handle, $count);
fclose($handle);
}
$DownloadName = 'SRO.exe';
$Version = '1';
$NameVersion = $DownloadName . $Version;
$Cookie = isset($_COOKIE[str_replace('.', '_', $NameVersion)]);
if (!$Cookie) {
countdownload("unqiue_downloads.txt");
countdownload("unique_total_downloads.txt");
} else {
countdownload("downloads.txt");
countdownload("total_download.txt");
}
echo '<META HTTP-EQUIV=Refresh CONTENT="0; URL='.$DownloadName.'" />';
?>
Naturally though, the user accesses the pre-download page first, so its not created yet. I do not want to add any functions to the pre download page, i want it to be plain and simple and not alot of adding/changing.
Edit:
Something like this would work, but its not working for me?
$count = (file_exists($filename))? file_get_contents($filename) : 0; echo $count;

Download #: <?php
$hits = '';
$filename = "downloads.txt";
if (file_exists($filename)) {
$hits = file_get_contents($filename);
} else {
file_put_contents($filename, '');
}
echo $hits;
?>
you can also use fopen() with 'w+' mode:
Download #: <?php
$hits = 0;
$filename = "downloads.txt";
$h = fopen($filename,'w+');
if (file_exists($filename)) {
$hits = intval(fread($h, filesize($filename)));
}
fclose($h);
echo $hits;
?>

Type juggling like this can lead to crazy, unforeseen problems later. to turn a string to an integer, you can just add the integer 0 to any string.
For example:
$f = file_get_contents('file.php');
$f = $f + 0;
echo is_int($f); //will return 1 for true
however, i second the use of a database instead of a text file for this. there's a few ways to go about it. one way is to insert a unique string into a table called 'download_count' every time someone downloads the file. the query is as easy as "insert into download_count $randomValue" - make sure the index is unique. then, just count the number of rows in this table when you need the count. the number of rows is the download count. and you have a real integer instead of a string pretending to be an integer. or make a field in your 'download file' table that has a download count integer. each file should be in a database with an id anyway. when someone downloads the file, pull that number from the database in your download function, put it into a variable, increment, update table and show it on the client however you want. use PHP with jQuery Ajax to update it asynchronously to make it cool.
i would still use php and jquery.load(file.php) if you insist on using a text file. that way, you can use your text file for storing any kind of data and just load the specific part of the text file using context selectors. the file.php accepts the $_GET request, loads the right portion of the file and reads the number stored in the file. it then increments the number stored in the file, updates the file and sends data back to the client to be displayed any way you want. for example, you can have a div in your text file with an id set to 'downloadcount' and a div with an id for any other data you want to store in this file. when you load file.php, you just send div#download_count along with the filename and it will only load the value stored in that div. this is a killer way to use php and jquery for cool and easy Ajax/data driven apps. not to turn this into a jquery thread, but this is as simple as it gets.

You can use more concise equivalent yours function countdownload:
function countdownload($filename) {
if (file_exists($filename)) {
file_put_contents($filename, 0);
} else {
file_put_contents($filename, file_get_contents($filename) + 1);
}
}

How To watch a file write in PHP?

I want to make movement such as the tail command with PHP,
but how may watch append to the file?

I don't believe that there's some magical way to do it. You just have to continuously poll the file size and output any new data. This is actually quite easy, and the only real thing to watch out for is that file sizes and other stat data is cached in php. The solution to this is to call clearstatcache() before outputting any data.
Here's a quick sample, that doesn't include any error handling:
function follow($file)
{
$size = 0;
while (true) {
clearstatcache();
$currentSize = filesize($file);
if ($size == $currentSize) {
usleep(100);
continue;
}
$fh = fopen($file, "r");
fseek($fh, $size);
while ($d = fgets($fh)) {
echo $d;
}
fclose($fh);
$size = $currentSize;
}
}
follow("file.txt");

$handle = popen("tail -f /var/log/your_file.log 2>&1", 'r');
while(!feof($handle)) {
$buffer = fgets($handle);
echo "$buffer\n";
flush();
}
pclose($handle);

Checkout php-tail on Google code. It's a 2 file implementation with PHP and Javascript and it has very little overhead in my testing.
It even supports filtering with a grep keyword (useful for ffmpeg which spits out frame rate etc every second).

$handler = fopen('somefile.txt', 'r');
// move you at the end of file
fseek($handler, filesize( ));
// move you at the begining of file
fseek($handler, 0);
And probably you will want to consider a use of stream_get_line

Instead of polling filesize you regular checking the file modification time: filemtime

Below is what I adapted from above. Call it periodically with an ajax call and append to your 'holder' (textarea)... Hope this helps... thank you to all of you who contribute to stackoverflow and other such forums!
/* Used by the programming module to output debug.txt */
session_start();
$_SESSION['tailSize'] = filesize("./debugLog.txt");
if($_SESSION['tailPrevSize'] == '' || $_SESSION['tailPrevSize'] > $_SESSION['tailSize'])
{
$_SESSION['tailPrevSize'] = $_SESSION['tailSize'];
}
$tailDiff = $_SESSION['tailSize'] - $_SESSION['tailPrevSize'];
$_SESSION['tailPrevSize'] = $_SESSION['tailSize'];
/* Include your own security checks (valid user, etc) if required here */
if(!$valid_user) {
echo "Invalid system mode for this page.";
}
$handle = popen("tail -c ".$tailDiff." ./debugLog.txt 2>&1", 'r');
while(!feof($handle)) {
$buffer = fgets($handle);
echo "$buffer";
flush();
}
pclose($handle);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.