Get a value from string using regular expression in PHP - php

Im just trying to get a page count from a local pdf file.
I converted the pdf to string and tried getting the page number from it.
I tried using regular expression. But im not able to perfect it.
So please help.
The below is the string text
object(setasign\Fpdi\PdfParser\Type\PdfDictionary)#2728 (1) { ["value"]=>
array(3) { ["Size"]=> object(setasign\Fpdi\PdfParser\Type\PdfNumeric)#2726
(1) { ["value"]=> int(3028) } ["Root"]=>
object(setasign\Fpdi\PdfParser\Type\PdfIndirectObjectReference)#2725 (2) {
["generationNumber"]=> int(0) ["value"]=> int(3027) } ["Info"]=>
object(setasign\Fpdi\PdfParser\Type\PdfIndirectObjectReference)#2731 (2) {
["generationNumber"]=> int(0) ["value"]=> int(3026) } } } } } }
["objects":protected]=> array(0) { } }
["pageCount":protected]=> int(96)
["pages":protected]=> array(0) { } } } ["currentReaderId":protected]=>
string(71)
"C:\xampp\files\journals\2\articles\13\submission\mergedpdf\allFiles.pdf"
["importedPages":protected]=> array(0) { } ["objectMap":protected]=>
array(0) { } ["objectsToCopy":protected]=> array(1) { ["C:\xampp\files\journals\2\articles\13\submission\mergedpdf\allFiles.pdf"]=>
array(0) { } } }
I need to get the pageCount displayed in blockquote using regularexpression.
The regular expression code below:
ob_start();
var_dump($pdf);//this was an object so i converted it to string for pattern matching.
$result = ob_get_clean();//shows the result in string.
$subject = "pageCount";
$pattern = '/^pageCount/';//pattern to match to get page count
preg_match($pattern, substr($subject,20), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

you could use method provided by the library, like:
$filename = 'some-file.pdf';
require_once('library/SetaPDF/Autoload.php');
// or if you use composer require_once('vendor/autoload.php');
$document = SetaPDF_Core_Document::loadByFilename($filename);
$pages = $document->getCatalog()->getPages();
$pageCount = $pages->count();
echo $pageCount;

I would rather loop through the object given instead of using regex for getting values out of it.
But if this is really what you need here is your regex: \["pageCount":protected\]=> int\(\d*\)
You can test it here:
https://regex101.com/r/RyGMwb/2

Related

preg_match_all() creates multiple empty arrays and one with match. How to retrieve the right one?

Using PHP 8.0.
I want to search a line for regex matches and replace all those matches with a string. Here is the code I wrote to find matches:
// $lines is an array of html document lines
function include_files(array $lines): array {
$pattern = '/{FILE="[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+"}/';
foreach ($lines as $line){
preg_match_all($pattern, $line, $matches);
var_dump($matches);
foreach ($matches[0] as $match) {
$file_name = get_block_file($match[0]);
$file_content = file_get_contents($file_name);
$line = str_replace($match[0], $file_content, $line);
}
}
return $lines;
}
The problem is that var_dump($matches) displays the following:
array(1) { [0]=> array(0) { } }
array(1) { [0]=> array(0) { } }
array(1) { [0]=> array(0) { } }
array(1) { [0]=> array(0) { } }
array(1) { [0]=> array(0) { } }
array(1) { [0]=> array(0) { } }
array(1) { [0]=> array(0) { } }
array(1) { [0]=> array(1) { [0]=> string(6) "{FILE="1.txt"" } }
array(1) { [0]=> array(0) { } }
array(1) { [0]=> array(0) { } }
One of these arrays contains what I need, but neither can I accesses it nor can I understand where do all these other arrays come from. How can I fix this behavior?
preg_replace and preg_replace_callback can take an array as the subject, so you don't need the loop or multiple replaces. You didn't supply any sample data for $lines so assuming your pattern works as needed:
$pattern = '/{FILE="[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+"}/';
$lines = preg_replace_callback($pattern,
function($m) {
return file_get_contents(get_block_file($m[0]));
}, $lines);
I think that your problem is related with $matches variable, because this variable is inserted by reference &$matches and "n" iterations are being inserted.
preg_match_all ( string $pattern , string $subject , array &$matches = null , int $flags = 0 , int $offset = 0 ) : int|false|null
A possible solution would be to initialise the variable to null before calling the function preg_match_all
Example:
$pattern = '/{FILE="[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+"}/';
foreach ($lines as $line){
$matches = null;
preg_match_all($pattern, $line, $matches);
I have not been able to test it, but I think it can be a quick solution.

Cannot get html attribute using PHP Simple Html DOM

I am tryng to get the ,,sold" info from eBay listing- https://www.ebay.co.uk/itm/Box-With-Tail-Pipe-Rear-Back-Silencer-Fits-Citroen-C2-C3-I-C3-Pluriel-GCN499/254292997729?hash=item3b350b3661:g:clEAAOSwnhldLB4J.
Here is the screenshot:
As you can see I want to get ,1 sold" text on the upper right corner of the screen. I am using the class ,,vi-txt-underline" to get it, however it is not working. Does anyone know how this can be done, using other attribute or something different? Here is the code:
$sold = $html->find(".vi-text-underline", 0);
if($sold != null){
$item['sold'] = $sold->find("a", 0)->plaintext;
}else{
$item['sold'] = '';
["tag"]=>
string(4) "text"
["attr"]=>
array(0) {
}
["children"]=>
array(0) {
}
["nodes"]=>
array(0) {
}
["parent"]=>
*RECURSION*
["_"]=>
array(1) {
[4]=>
string(6) "1 sold"
The above is part of the debugged $sold variable.
I am using an array $item[] because I am also searching for more info before this part of the code.
get page contents
$url = "https://www.ebay.co.uk/itm/Box-With-Tail-Pipe-Rear-Back-Silencer-Fits-Citroen-C2-C3-I-C3-Pluriel-GCN499/254292997729?hash=item3b350b3661:g:clEAAOSwnhldLB4J";
$content = file_get_contents($url);
find what you want
echo strpos($content,'1 sold');

How to add data to OBJECT from Wordpress get_results in PHP

Seems really easy, but I can't seem to figure it out...
I have a simple line that gets mysql results through wordpress like this:
$sql_results = $wpdb->get_results($sql_phrase);
Then I parse it as JSON and echo it: json_encode($sql_results);
However, I want to add other data before I parse it as JSON. But I'm not sure how.
$sql_results basically gets me a list of post ID's, title and category.
It looks like this in var_dump (this is just the first row):
array(1)
{
[0]=> object(stdClass)#2737 (7)
{
["ID"]=> string(4) "2700"
["post_title"]=> string(18) "The compact helmet"
["category"]=> string(5) "Other"
}
}
Now to start with something easy, I'd like all associative arrays inside the object to have the extra key-value. I tried the following but got an error:
500 Internal error.
foreach($sql_search as $key => $value)
{
$value['pic_img'] = "test";
$sql_search[$key]=$value;
}
$result=$sql_search;
$sql_results = array(1)
{
[0]=> object(stdClass)#2737 (7)
{
["ID"]=> string(4) "2700"
["post_title"]=> string(18) "The compact helmet"
["category"]=> string(5) "Other"
}
}
foreach($sql_results as $key=>$value)
{
$value->solution = 'good';
$sql_results[$key]=$value;
}
$result=$sql_results;
var_dump($result);
$test = array ( array("ID"=>"35", "name"=>"Peter", "age"=>"43"),
array("ID"=>"34", "name"=>"James", "age"=>"19"), array("ID"=>"31", "name"=>"Joe", "age"=>"40") );
foreach($test as $key=>$value)
{
$value['solution'] = 'good';
$test[$key]=$value;
}
$result=$test;
var_dump($result);

PHP Regular Expression to extract JSON data

I have the following string:
window['test'] = false;
window['options'] = true;
window['data'] = { "id" : 2345, "stuff": [{"id":704,"name":"test"};`
How would I go about extracting the JSON data in window['data']? The example data I provided is just a small sample of what really exists. There could be more data before and/or after window['data'].
I've tried this but had no luck:
preg_match( '#window["test"] = (.*?);\s*$#m', $html, $matches );
There are several issues that I can see.
Your string uses single quotes: window['test'] not window["test"], which you have in your regular expression. This means you should use double quotes to enclose your regular expression (or escape the quotes).
Your regular expression has unescaped brackets, which is used to create a character class. You should use \[ instead of just [.
You say you are looking for data but your regular expression looks for test.
You have a $ at the end of the regular expression, which means you won't match if there is nothing other than whitespace after the bit you matched.
Also your data seems incomplete, there are some missing brackets at the end, but I think that is just a copy-paste error.
So I would try:
php > preg_match("#window\['data'\]\s*=\s*(.*?);#", $html, $matches);
php > print_r($matches);
Array
(
[0] => window['data'] = {"id":2345,"stuff":[{"id":704,"name":"test"};
[1] => {"id":2345,"stuff":[{"id":704,"name":"test"}
)
Of course then you must use json_decode() to convert the JSON string ($matches[1]) into an object or associative array that you can use.
You can use this regex:
window\['data'\]\s*=\s*(.*?);
Working demo
The match information is:
MATCH 1
1. [67-111] `{"id":2345,"stuff":[{"id":704,"name":"test"}`
As regex101 suggests you could have a code like this:
$re = "/window\\['data'\\]\\s*=\\s*(.*);/";
$str = "window['test'] = false; window['options'] = true; window['data'] = {\"id\":2345,\"stuff\":[{\"id\":704,\"name\":\"test\"};";
preg_match_all($re, $str, $matches);
You can parse the window data with the regular expression:
/^window\[['"](\w+)['"]\]\s*=\s*(.+);\s*$/m
Then you can retrieve the pieces by their original index in the window data structures, and parse the JSON at your leisure.
$data = <<<_E_
window['test'] = false;
window['options'] = true;
window['data'] = { "id" : 2345, "stuff": [{"id":704,"name":"test"}]};
_E_;
$regex = <<<_E_
/^window\[['"](\w+)['"]\]\s*=\s*(.+);\s*$/m
_E_; // SO syntax highlighting doesnt like HEREDOCs "
if( preg_match_all($regex,$data,$matches) > 0 ) {
var_dump($matches);
$index = array_search('data',$matches[1]);
if( $index !== 0 ) {
var_dump(json_decode($matches[2][$index]));
} else { echo 'no data section'; }
} else { echo 'no matches'; }
Output:
// $matches
array(3) {
[0]=>
array(3) {
[0]=> string(24) "window['test'] = false; "
[1]=> string(26) "window['options'] = true; "
[2]=> string(69) "window['data'] = { "id" : 2345, "stuff": [{"id":704,"name":"test"}]};"
}
[1]=>
array(3) {
[0]=> string(4) "test"
[1]=> string(7) "options"
[2]=> string(4) "data"
}
[2]=>
array(3) {
[0]=> string(5) "false"
[1]=> string(4) "true"
[2]=> string(51) "{ "id" : 2345, "stuff": [{"id":704,"name":"test"}]}"
}
}
// decoded JSON
object(stdClass)#1 (2) {
["id"]=> int(2345)
["stuff"]=>
array(1) {
[0]=>
object(stdClass)#2 (2) {
["id"]=> int(704)
["name"]=> string(4) "test"
}
}
}
Note: I fixed the JSON in your example to be valid so it would actually parse.

Extract specific PHP strings in array based on keywords

i have registry data in text as below:
/Classes/CLSID/AppID,SZ,{0010890e-8789-413c-adbc-48f5b511b3af},
/Classes/CLSID/InProcServer32,KEY,,2011-10-14 00:00:33
/Classes/CLSID/InProcServer32/,EXPAND_SZ,%SystemRoot%\x5Csystem32\x5CSHELL32.dll,
/Classes/CLSID/InProcServer32/ThreadingModel,SZ,Apartment,
/Classes/CLSID/,KEY,,2011-10-14 00:00:36
/Classes/CLSID/,SZ,,
/Classes/CLSID/InprocServer32,KEY,,2011-10-14 00:00:36
/Classes/CLSID/InprocServer32/,C:\x5CWINDOWS\x5Csystem32\x5Cmstime.dll,
then i do $registry = explode "\n" and create list of arrays below:
var_dump($registry);
[1]=> string(121) "/Classes/CLSID/AppID,SZ,{0010890e-8789-413c-adbc-48f5b511b3af},"
[2]=> string(139) "/Classes/CLSID/InProcServer32,KEY,,2011-10-14 00:00:33"
[3]=> string(89) "/Classes/CLSID/InProcServer32/,EXPAND_SZ,%SystemRoot%\x5Csystem32\x5CSHELL32.dll,"
[4]=> string(103) "/Classes/CLSID/InProcServer32/ThreadingModel,SZ,Apartment,"
[5]=> string(103) "/Classes/CLSID/,KEY,,2011-10-14 00:00:36"
[6]=> string(121) "/Classes/CLSID/,SZ,,"
[7]=> string(139) "/Classes/CLSID/InprocServer32,KEY,,2011-10-14 00:00:36"
[8]=> string(89) "/Classes/CLSID/InprocServer32/,C:\x5CWINDOWS\x5Csystem32\x5Cmstime.dll,"
i also have keywords in array form
var_dump($keywords);
[1]=> string(12) "Math.dll"
[2]=> string(12) "System.dll"
[3]=> string(12) "inetc.dll"
[4]=> string(12) "time.dll"
i want to show lines in $registry that consist string in $keywords, so i create 1 function below:
function separate($line) {
global $keywords;
foreach ($keywords as $data_filter) {
if (strpos($line, $data_filter) !== false) {
return true;
}
}
return false;
}
$separate = array_filter($registry, 'separate');
since in $keywords consists "time.dll" so the codes produce result as below:
var_dump($seperate);
[1]=> string(89) "/Classes/CLSID/InprocServer32/,C:\x5CWINDOWS\x5Csystem32\x5Cmstime.dll,"
in my case the result is not true because, mstime.dll != time.dll and the information is improper.
the output should be empty.
lets say i replace the "\x5C" as space, there is any function that can do the job? thank you in advance.
There's preg_match.
To go along with the array_filter way you have to do things:
function separate($line) {
global $keywords;
foreach ($keywords as $data_filter) {
// '.' means any character in regex, while '\.' means literal period
$data_filter = str_replace('.', '\.', $data_filter);
if (preg_match("/\\x5C{$data_filter}/", $line)) {
return true;
}
}
return false;
}
This would return false for
/Classes/CLSID/InprocServer32/,C:\x5CWINDOWS\x5Csystem32\x5Cmstime.dll,
but true for
/Classes/CLSID/InprocServer32/,C:\x5CWINDOWS\x5Csystem32\x5Ctime.dll,
If you're not familiar with Regular Expressions, they are awesome and powerful. You can customize mine as needed to suit your situation.

Categories