Convert PDF documents (.pdf) to EXCEL document (.XLXS) using PHP - php

I have a set of pdf documents and I need to convert the files to excel. When I tried to convert those I could not convert it was stating failed. So I have tried to convert to HTML tag all the data are converting but it was changing the whole document into P tags. I have attached the sample response of HTML tags. Please suggest me some ides for converting pdf to excel in php.
I have used this library
<p class="ft00" style="margin: 0; padding: 0; font-size: 11px; font-family: Times; color: #000000; position: absolute; top: 863px; left: 683px; white-space: nowrap;"><b>10</b></p>
<p class="ft01" style="margin: 0; padding: 0; font-size: 9px; font-family: Times; color: #000000; -moz-transform: matrix( 0, -1, 0.9999875, 0, 0, 0); -webkit-transform: matrix( 0, -1, 0.9999875, 0, 0, 0); -o-transform: matrix( 0, -1, 0.9999875, 0, 0, 0); -ms-transform: matrix( 0, -1, 0.9999875, 0, 0, 0); -moz-transform-origin: left 75%; -webkit-transform-origin: left 75%; -o-transform-origin: left 75%; -ms-transform-origin: left 75%; position: absolute; top: 207px; left: 63px; white-space: nowrap;"><b>THE</b></p>
Library: PdfToHtml
$pdf = new \TonchikTm\PdfToHtml\Pdf($pdf_file , [
'pdftohtml_path' => $bindir,
'pdfinfo_path' => $binexe,
'outputDir' => getcwd().'/excelfile/'.uniqid() // output dir
]);
$newpage = '';
foreach ($pdf->getHtml()->getAllPages() as $page) {
$newpage .= $page . '<br/>';
}

I've never had a use case to convert PDF to Excel, but what other libraries did you try?
PDFTables looks promising:
https://pdftables.com/pdf-to-excel-api#php
https://github.com/pdftables/php-pdftables-api
If it needs to be done with the PDF to HTML library you're using, I don't know of an easy/clean way to do this.
Based on the Question, each page returns a string containing all the paragraph elements?
If returns an array of elements, you should be able to loop over each paragraph and use strip_tags() to get the individual values.
If it's a string, look into using Simple HTML Dom to parse the HTML elements from string format so that you can work with the HTML more easily.
$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');

Related

Google Unicode Fonts not rendered properly in mpdf for PHP

I'm trying to use Baloo2 Google Font in mpdf to render Hindi Language.
I'm using the Latest Version.
The PDF is being Rendered but the complex scripts are not being rendered properly.
I'm trying to render the following,
चातुर्यकला त्रिवेदी But it is rendered asचातुर् यकला त् रविदी
Although when inspected in browser, the rendering is proper in inspect source code. but not displayed properly in the PDF.
The Code in my file is
$html = ' आवेदक का नाम : चातुर्यकला त्रिवेदी';
$filename = 'demofile' . time() . '.pdf';
$mpdf = new \Mpdf\Mpdf([
'mode' => 'utf-8',
'default_font' => 'baloo',
]);
$mpdf->debug = true;
$mpdf->WriteHTML("p, td {font-family: baloo; color: transparent; white-space: pre; cursor: text; transform-origin: 0% 0%;}", 1);
$mpdf->WriteHTML($html);
$mpdf->Output($filename, 'D');
I've also copied the fonts in the ttfonts directory and also made changes to the fontVariables.php files as follows
'fontdata' => [
"baloo" => [
'R' => "BalooRegular.ttf",
'useOTL' => 0xFF,
'useKashida' => 75,
]...
];
When done in this format, an error is thrown
GPOS Lookup Type 5, Format 3 not supported (ttfontsuni.php)
Pls. let me know what I'm doing wrong...

How to call CSS in a PHP "fetch" function (scriptcase / pdf form)

First about my environment:
I'm working in scriptcase v9;
I'm trying to create a pdf report with TCPDF library.
I'm trying to call some CSS in a PHP fetch function, but with no success:
Variant A: I tried to use a big CSS stored at the beginning of the file and I called id's from there;
$_style = "
<style>
html, body {
margin:0;
padding:0
}
table {}
th {
border-bottom: 0.3px solid #D2D2D2;
height: auto;
padding-top: 30px;
padding-bottom: 20px;
text-align: left;
color: #2B2B2B;
}
#summary_tr {padding:0px 0px;}
#summary_tc1 {
width: 5%;
height: auto;
}
#summary_tc2 {
width: 5%;
height: auto;
border-left: 0.5px solid #D2D2D2;
}
#summary_tc3 {
width: 90%;
height: auto;
}
</style>";
function fetch_mresumeright($resright_id)
{
$server = "blabla";
$user = "blabla";
$pass = "blabla";
$db = "blabla";
$output = '';
$con = mysqli_connect($server,$user,$pass,$db) or die("Some error occurred during connection " . mysqli_error($con));
// Write query
$strSQL = "SELECT
summary
FROM
_mresume
WHERE id = ".$resright_id;
// Execute the query.
$query = mysqli_query($con, $strSQL);
if ($query->num_rows > 0)
{
while($row = mysqli_fetch_array($query))
{
$output .= '
<tr id = "summary_tr">
<td id = "summary_td_1"></td>
<td id = "summary_td_2"></td>
<td id = "summary_td_3">'.$row["summary"].'</td>
</tr>';
}
return $output;
}
//Close the connection
mysqli_close($con);
}
$_mresumeright = fetch_mresumeright($var_id);
// create new PDF document
$pdf = new TCPDF(PDF_PAGE_ORIENTATION, PDF_UNIT, PDF_PAGE_FORMAT, true, 'UTF-8', false);
// set document information
$pdf->SetCreator(PDF_CREATOR);
// set default header data
$pdf->SetHeaderData(PDF_HEADER_LOGO, PDF_HEADER_LOGO_WIDTH, PDF_HEADER_TITLE.' 007', PDF_HEADER_STRING);
// set header and footer fonts
$pdf->setHeaderFont(Array(PDF_FONT_NAME_MAIN, '', PDF_FONT_SIZE_MAIN));
$pdf->setFooterFont(Array(PDF_FONT_NAME_DATA, '', PDF_FONT_SIZE_DATA));
// set default monospaced font
$pdf->SetDefaultMonospacedFont(PDF_FONT_MONOSPACED);
// set margins
$pdf->SetMargins(PDF_MARGIN_LEFT, PDF_MARGIN_TOP, PDF_MARGIN_RIGHT);
$pdf->SetHeaderMargin(PDF_MARGIN_HEADER);
$pdf->SetFooterMargin(PDF_MARGIN_FOOTER);
// set auto page breaks
$pdf->SetAutoPageBreak(TRUE, PDF_MARGIN_BOTTOM);
// set image scale factor
$pdf->setImageScale(PDF_IMAGE_SCALE_RATIO);
// set some language-dependent strings (optional)
if (#file_exists(dirname(__FILE__).'/lang/eng.php')) {
require_once(dirname(__FILE__).'/lang/eng.php');
$pdf->setLanguageArray($l);
}
$right_column = <<<EOF
//=======right=========
//===css
$_style
//===html
<table>
$_mresumeright
</table>
EOF;
// write the second column - for A3
$pdf->writeHTMLCell(206, '', '82', '', $right_column, 0, 1, 0, true, '', true);
// reset pointer to the last page
$pdf->lastPage();
Variant B: I declared each ID as a parameter and I called it in my fetch function.
$summary_tr = "
padding:0px 0px;";
$summary_td_1 = "
width: 5%;
height: auto;";
$summary_td_2 = "
width: 5%;
border-left: 0.5px solid #D2D2D2;
height: auto;";
$summary_td_3 = "
width: 90%;
height: auto;";
$_style = "
<style>
html, body {
margin:0;
padding:0
}
table {}
th {
border-bottom: 0.3px solid #D2D2D2;
height: auto;
padding-top: 30px;
padding-bottom: 20px;
text-align: left;
color: #2B2B2B;
}
</style>";
function fetch_mresumeright($resright_id)
{
$server = "blabla";
$user = "blabla";
$pass = "blabla";
$db = "blabla";
$output = '';
$con = mysqli_connect($server,$user,$pass,$db) or die("Some error occurred during connection " . mysqli_error($con));
// Write query
$strSQL = "SELECT
summary
FROM
_mresume
WHERE id = ".$resright_id;
// Execute the query.
$query = mysqli_query($con, $strSQL);
if ($query->num_rows > 0)
{
while($row = mysqli_fetch_array($query))
{
$output .= '
<tr style = "'.$summary_tr.'">
<td style = "'.$summary_td_1.'"></td>
<td style = "'.$summary_td_2.'"></td>
<td style = "'.$summary_td_3.'">'.$row["summary"].'</td>
</tr>';
}
return $output;
}
//Close the connection
mysqli_close($con);
}
$_mresumeright = fetch_mresumeright($var_id);
// create new PDF document
$pdf = new TCPDF(PDF_PAGE_ORIENTATION, PDF_UNIT, PDF_PAGE_FORMAT, true, 'UTF-8', false);
// set document information
$pdf->SetCreator(PDF_CREATOR);
// set default header data
$pdf->SetHeaderData(PDF_HEADER_LOGO, PDF_HEADER_LOGO_WIDTH, PDF_HEADER_TITLE.' 007', PDF_HEADER_STRING);
// set header and footer fonts
$pdf->setHeaderFont(Array(PDF_FONT_NAME_MAIN, '', PDF_FONT_SIZE_MAIN));
$pdf->setFooterFont(Array(PDF_FONT_NAME_DATA, '', PDF_FONT_SIZE_DATA));
// set default monospaced font
$pdf->SetDefaultMonospacedFont(PDF_FONT_MONOSPACED);
// set margins
$pdf->SetMargins(PDF_MARGIN_LEFT, PDF_MARGIN_TOP, PDF_MARGIN_RIGHT);
$pdf->SetHeaderMargin(PDF_MARGIN_HEADER);
$pdf->SetFooterMargin(PDF_MARGIN_FOOTER);
// set auto page breaks
$pdf->SetAutoPageBreak(TRUE, PDF_MARGIN_BOTTOM);
// set image scale factor
$pdf->setImageScale(PDF_IMAGE_SCALE_RATIO);
// set some language-dependent strings (optional)
if (#file_exists(dirname(__FILE__).'/lang/eng.php')) {
require_once(dirname(__FILE__).'/lang/eng.php');
$pdf->setLanguageArray($l);
}
$right_column = <<<EOF
//=======right=========
//===css
$_style
//===html
<table>
$_mresumeright
</table>
EOF;
// write the second column - for A3
$pdf->writeHTMLCell(206, '', '82', '', $right_column, 0, 1, 0, true, '', true);
// reset pointer to the last page
$pdf->lastPage();
None have worked for me.
Any idea where did I got wrong?
Unfortunly my experience in programing is quite limited.
Thank you for your time and help,
Dan
There is a simple example below that is based on your code. It should help you to understand how to use HTML/CSS with TCPDF and how to insert content into the HTML. One of the issues I noticed is that I don't see where the TCPDF library was required in your example, but it may have been done elsewhere. Additionally, you may have not called $pdf->AddPage();. The example should run inside the TCPDF examples directory.
<?php
$styles = <<<EOF
<style>
.section {
color: navy;
}
.row {
background-color: green;
}
.cell-one {
background-color: blue;
color: white;
}
</style>
EOF;
function fetchContent() {
$row = '<tr class="row">
<td class="cell-one">Cell One</td>
<td>Cell Two</td>
</tr>';
return str_repeat($row,20);
}
require_once('../tcpdf.php');
$pdf = new TCPDF(PDF_PAGE_ORIENTATION, PDF_UNIT, PDF_PAGE_FORMAT, true, 'UTF-8', false);
$pdf->AddPage();
$tableRows = fetchContent();
$html = <<<EOF
$styles
<div class="section">
<h1>Lorem Ipsum</h1>
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
</div>
<table>
$tableRows
</table>
EOF;
$pdf->writeHTML($html, true, false, true, false, '');
$pdf->Output('example', 'I');

phpwkhtmltopdf is generating multiple pages instead of one

I want to show the html in pdf only on one page. But it makes it on multiple pages.
I tried to add css modifications, but it didn't work eather. There are plently of options to choose, but cant find one that suits.
options avaliable
https://github.com/mikehaertl/phpwkhtmltopdf
http://wkhtmltopdf.org/usage/wkhtmltopdf.txt
php
$options = array(
'no-outline', // Make Chrome not complain
'margin-top' => 0,
'margin-right' => 0,
'margin-bottom' => 0,
'margin-left' => 0,
// Default page options
'disable-smart-shrinking',
'user-style-sheet' => PATH . '/assets/pdf.css',
);
pdf.css
table:first {
position: relative;
top: 1cm;
left: 1cm;
width: 2cm;
height: 1.6cm;
}
table {
font-size: 10px;
font-family: Arial;
}

HTML2PDF page size

I'm using PHP & HTML2PDF lib to generate pdf files.But what I'm trying to do is to generate a pdf file with the pageSize (width/height) as html content size. How can I achieve this?
My html content is:
<page format="432x240" orientation="L" backcolor="#FFFFFF" style="font: arial;">
<div class="image">
<span class="firstname">$fname</span>
<span class="lastname">$lname</span>
</div>
The css for image class is:
position: relative;width: 100%; /* for IE 6 */ background-image: url(../img/test.png);height: 240px; width: 432px;top: 50%;
And my PHP code is:
$html2pdf = new HTML2PDF('P', 'A4', 'en', true, 'UTF-8', 0);
$html2pdf->pdf->SetDisplayMode('fullpage');
$contentTpl = $this->renderPartial('template_01', array('fname' => $firstname, 'lname' => $lastname), true);
$html2pdf->writeHTML(utf8_encode($contentTpl));
Here is the solution for this problem:
$html2pdf = new HTML2PDF('P', array($width_in_mm,$height_in_mm), 'en', true, 'UTF-8', array(0, 0, 0, 0));
Width and Height should be in MM. If your using inches convert it to MM.
Formula:
$width_in_mm = $width_in_inches * 25.4;
$height_in_mm = $height_in_inches * 25.4;
Don't round it off. Used the exact conversion even if it has a decimal point.
Hope this answer will solve your problem.

PHP cURL to get dynamic content

I am trying to use cURL and PHP to scrape proxies off of a webpage. However, when I use cURL all I get is the CSS in the $content. The page uses wordpress so it dynamically loads content but I haven't found anything to help me download the dynamic content. I use wget in linux and the page downloads fine.
<?php
//$source1 = file_get_contents('http://www.new-fresh-proxies.blogspot.com/');
$source1 = get_data("http://www.new-fresh-proxies.blogspot.com/");
$array = array();
$source1 = preg_grep('/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}\b/', $array);
//download webpage
function get_data($url) {
$options = array(
CURLOPT_RETURNTRANSFER => 1, // return web page
CURLOPT_HEADER => true, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 50, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
My output:
(
string:203221) HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Expires: Wed, 06 Feb 2013 22:09:23 GMT
Date: Wed, 06 Feb 2013 22:09:23 GMT
Cache-Control: private, max-age=0
Last-Modified: Wed, 06 Feb 2013 20:39:30 GMT
ETag: "c6675d47-80ec-48ee-9c0f-613c9419f172"
Content-Encoding: gzip
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Content-Length: 47132
Server: GSE
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/gml/expr'>
<head>
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>
<script type="text/javascript">(function() { var a=window,b="jstiming",d="tick";var e=function(c){this.t={};this.tick=function(c,p,h){h=void 0!=h?h:(new Date).getTime();this.t[c]=[h,p]};this[d]("start",null,c)},f=new e;a.jstiming={Timer:e,load:f};if(a.performance&&a.performance.timing){var g=a.performance.timing,j=a[b].load,k=g.navigationStart,l=g.responseStart;0<k&&l>=k&&(j[d]("_wtsrt",void 0,k),j[d]("wtsrt_","_wtsrt",l),j[d]("tbsd_","wtsrt_"))}
try{var m=null;a.chrome&&a.chrome.csi&&(m=Math.floor(a.chrome.csi().pageT),j&&0<k&&(j[d]("_tbnd",void 0,a.chrome.csi().startE),j[d]("tbnd_","_tbnd",k)));null==m&&a.gtbExternal&&(m=a.gtbExternal.pageT());null==m&&a.external&&(m=a.external.pageT,j&&0<k&&(j[d]("_tbnd",void 0,a.external.startE),j[d]("tbnd_","_tbnd",k)));m&&(a[b].pt=m)}catch(n){};a.tickAboveFold=function(c){var i=0;if(c.offsetParent){do i+=c.offsetTop;while(c=c.offsetParent)}c=i;750>=c&&a[b].load[d]("aft")};var q=!1;function r(){q||(q=!0,a[b].load[d]("firstScrollTime"))}a.addEventListener?a.addEventListener("scroll",r,!1):a.attachEvent("onscroll",r);
})();</script>
<meta content='true' name='MSSmartTagsPreventParsing'/>
<meta content='blogger' name='generator'/>
<link href='http://www.new-fresh-proxies.blogspot.com/favicon.ico' rel='icon' type='image/x-icon'/>
<link href='http://new-fresh-proxies.blogspot.com/' rel='canonical'/>
<link rel="alternate" type="application/atom+xml" title="New Fresh Proxies - Atom" href="http://new-fresh-proxies.blogspot.com/feeds/posts/default" />
<link rel="alternate" type="application/rss+xml" title="New Fresh Proxies - RSS" href="http://new-fresh-proxies.blogspot.com/feeds/posts/default?alt=rss" />
<link rel="service.post" type="application/atom+xml" title="New Fresh Proxies - Atom" href="http://www.blogger.com/feeds/2001908494944967755/posts/default" />
<link rel="EditURI" type="application/rsd+xml" title="RSD" href="http://www.blogger.com/rsd.g?blogID=2001908494944967755" />
<link rel="openid.server" href="http://www.blogger.com/openid-server.g" />
<link rel="openid.delegate" href="http://new-fresh-proxies.blogspot.com/" />
<!--[if IE]> <script> (function() { var html5 = ("abbr,article,aside,audio,canvas,datalist,details," + "figure,footer,header,hgroup,mark,menu,meter,nav,output," + "progress,section,time,video").split(','); for (var i = 0; i < html5.length; i++) { document.createElement(html5[i]); } try { document.execCommand('BackgroundImageCache', false, true); } catch(e) {} })(); </script> <![endif]-->
<title>New Fresh Proxies</title>
<link type='text/css' rel='stylesheet' href='//www.blogger.com/static/v1/widgets/3950009988-widget_css_bundle.css' />
<link type="text/css" rel="stylesheet" href="//www.blogger.com/dyn-css/authorization.css?targetBlogID=2001908494944967755&zx=c6675d47-80ec-48ee-9c0f-613c9419f172"/>
<style id='page-skin-1' type='text/css'><!--
/*
-----------------------------------------------
Theme Name: Harmonika
Theme URL: http://newwpthemes.com/wordpress-theme/harmonika/
Description: Harmonika is a free WordPress theme with options page and supports the post thumbnails. Suitable for any niche.
Author: NewWpThemes.com
Author URI: http://newwpthemes.com/
Version: 1.0
Tags: Ads Ready, Two Columns, Right Sidebar, Fixed Width, Blogging, Options Page
Template Name: Harmonika
Template URI: http://themecraft.net/2010/06/harmonika-blogger-template
Version: 1.0
Author: Theme Craft
Author URI: http://www.themecraft.net
Harmonika Blogger Template comes under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Malaysia License.
This means it is free to use on your blog, and you must keep the footer link intact, also as a respect to the designer and converter.
We sincerely need your respect to continue our free Blogger template production, thank you.
ThemeCraft.net
----------------------------------------------- */
/* Use this with templates/template-twocol.html */
#navbar-iframe{
height:0px;
visibility:hidden;
display:none;
}
/* -----------------------------------------------------------------------
Blueprint CSS Framework 0.8
http://blueprintcss.org
* Copyright (c) 2007-Present. See LICENSE for more info.
* See README for instructions on how to use Blueprint.
* For credits and origins, see AUTHORS.
* This is a compressed file. See the sources in the 'src' directory.
----------------------------------------------------------------------- */
/* reset.css */
html, body, div, span, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, a, abbr, acronym, address, code, del, dfn, em, img, q, dl, dt, dd, ol, ul, li, fieldset, form, label, legend, table, caption, tbody, tfoot, thead, tr, th, td {margin:0;padding:0;border:0;font-weight:inherit;font-style:inherit;font-size:100%;font-family:inherit;}
body {line-height:1.5;}
caption, th, td {text-align:left;font-weight:normal;}
blockquote:before, blockquote:after, q:before, q:after {content:"";}
blockquote, q {quotes:"" "";}
img {max-width: 100%; /* not working in IE6*/}
a img {max-width: 100%; /* not working in IE6*/ border:none;}
/* typography.css */
body {font-size:75%;color:#222;background:#000;font-family:"Helvetica Neue", Arial, Helvetica, sans-serif;}
h1, h2, h3, h4, h5, h6 {font-weight:normal;}
h1 {font-size:3em;line-height:1;margin-bottom:0.5em;}
h2 {font-size:2em;margin-bottom:0.75em;}
h3 {font-size:1.5em;line-height:1;margin-bottom:1em;}
h4 {font-size:1.2em;line-height:1.25;margin-bottom:1.25em;}
h5 {font-size:1em;font-weight:bold;margin-bottom:1.5em;}
h6 {font-size:1em;font-weight:bold;}
h1 img, h2 img, h3 img, h4 img, h5 img, h6 img {margin:0;}
p {margin:0 0 1.5em;}
p img.left {float:left;margin:1.5em 1.5em 1.5em 0;padding:0;}
p img.right {float:right;margin:1.5em 0 1.5em 1.5em;}
a:focus, a:hover {color:#000;}
a {color:#009;text-decoration:underline;}
blockquote {margin:1.5em;color:#666;font-style:italic;}
strong {font-weight:bold;}
em, dfn {font-style:italic;}
dfn {font-weight:bold;}
sup, sub {line-height:0;}
abbr, acronym {border-bottom:1px dotted #666;}
address {margin:0 0 1.5em;font-style:italic;}
del {color:#666;}
pre {margin:1.5em 0;white-space:pre;}
pre, code, tt {font:1em 'andale mono', 'lucida console', monospace;line-height:1.5;}
li ul, li ol {margin:0 1.5em;}
ul, ol {margin:0 1.5em 1.5em 1.5em;}
ul {list-style-type:disc;}
ol {list-style-type:decimal;}
dl {margin:0 0 1.5em 0;}
dl dt {font-weight:bold;}
dd {margin-left:1.5em;}
table {margin-bottom:1.4em;width:100%;}
th {font-weight:bold;}
thead th {background:#c3d9ff;}
tr.even td {background:#e5ecf9;}
tfoot {font-style:italic;}
caption {background:#eee;}
.small {font-size:.8em;margin-bottom:1.875em;line-height:1.875em;}
.large {font-size:1.2em;line-height:2.5em;margin-bottom:1.25em;}
.hide {display:none;}
.quiet {color:#666;}
.loud {color:#000;}
.highlight {background:#ff0;}
.added {background:#060;color:#fff;}
.removed {background:#900;color:#fff;}
.first {margin-left:0;padding-left:0;}
.last {margin-right:0;padding-right:0;}
.top {margin-top:0;padding-top:0;}
.bottom {margin-bottom:0;padding-bottom:0;}
/* forms.css */
label {font-weight:bold;}
fieldset {padding:1.4em;margin:0 0 1.5em 0;border:1px solid #ccc;}
legend {font-weight:bold;font-size:1.2em;}
input.text, input.title, textarea, select {margin:0.5em 0;border:1px solid #bbb;}
input.text:focus, input.title:focus, textarea:focus, select:focus {border:1px solid #666;}
input.text, input.title {width:300px;padding:5px;}
input.title {font-size:1.5em;}
textarea {width:390px;height:250px;padding:5px;}
.error, .notice, .success {padding:.8em;margin-bottom:1em;border:2px solid #ddd;}
.error {background:#FBE3E4;color:#8a1f11;border-color:#FBC2C4;}
.notice {background:#FFF6BF;color:#514721;border-color:#FFD324;}
.success {background:#E6EFC2;color:#264409;border-color:#C6D880;}
.error a {color:#8a1f11;}
.notice a {color:#514721;}
.success a {color:#264409;}
/* grid.css */
.container {width:950px;margin:0 auto;}
.showgrid {background:url(src/grid.png);}
.column, div.span-1, div.span-2, div.span-3, div.span-4, div.span-5, div.span-6, div.span-7, div.span-8, div.span-9, div.span-10, div.span-11, div.span-12, div.span-13, div.span-14, div.span-15, div.span-16, div.span-17, div.span-18, div.span-19, div.span-20, div.span-21, div.span-22, div.span-23, div.span-24 {float:left;margin-right:10px;}
.last, div.last {margin-right:0;}
.span-1 {width:30px;}
.span-2 {width:70px;}
.span-3 {width:110px;}
.span-4 {width:150px;}
.span-5 {width:190px;}
.span-6 {width:230px;}
.span-7 {width:270px;}
.span-8 {width:310px;}
.span-9 {width:350px;}
.span-10 {width:390px;}
.span-11 {width:430px;}
.span-12 {width:470px;}
.span-13 {width:510px;}
.span-14 {width:550px;}
.span-15 {width:590px;}
.span-16 {width:630px;}
.span-17 {width:670px;}
.span-18 {width:710px;}
.span-19 {width:750px;}
.span-20 {width:790px;}
.span-21 {width:830px;}
.span-22 {width:870px;}
.span-23 {width:910px;}
.span-24, div.span-24 {width:950px;margin:0;}
input.span-1, textarea.span-1, input.span-2, textarea.span-2, input.span-3, textarea.span-3, input.span-4, textarea.span-4, input.span-5, textarea.span-5, input.span-6, textarea.span-6, input.span-7, textarea.span-7, input.span-8, textarea.span-8, input.span-9, textarea.span-9, input.span-10, textarea.span-10, input.span-11, te...
Curl wont be able to get it directly since it wont execute javascript. But if its coming from an ajax request, you can make a request to that endpoint directly.
Use dev tools/firebug to see what is happening.
Couple things:
Where is your 'output' coming from? I see no displays in your code ...
I also think your preg_grep statement is incorrect. You're searching a blank array and saving the result to the variable you just pulled your data into. Try:
$array = preg_grep('/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}\b/', $source1);
When I run the code and dump $source1['content'] directly after the get_data call, I get a crap-ton of IP addresses ...
It seems to me like either a timeout or a problem with your Regexp.
Why not stick to file_get_contents like you tried in the first place?
$content = file_get_contents('http://www.new-fresh-proxies.blogspot.com.au');
preg_match_all('/(\d+\.\d+\.\d+\.\d+(:\d+)?)/', $content, $matches);
print_r($matches[1]);
This will print out a list of IPs:
Array
(
[0] => 1.204.168.15:6673
[1] => 1.234.45.130:80
[2] => 1.34.163.101:8080
[3] => 1.34.29.89:8080
[4] => 1.34.8.221:3128
....
Hope that helps.

Categories