I have the following string:
window['test'] = false;
window['options'] = true;
window['data'] = { "id" : 2345, "stuff": [{"id":704,"name":"test"};`
How would I go about extracting the JSON data in window['data']? The example data I provided is just a small sample of what really exists. There could be more data before and/or after window['data'].
I've tried this but had no luck:
preg_match( '#window["test"] = (.*?);\s*$#m', $html, $matches );
There are several issues that I can see.
Your string uses single quotes: window['test'] not window["test"], which you have in your regular expression. This means you should use double quotes to enclose your regular expression (or escape the quotes).
Your regular expression has unescaped brackets, which is used to create a character class. You should use \[ instead of just [.
You say you are looking for data but your regular expression looks for test.
You have a $ at the end of the regular expression, which means you won't match if there is nothing other than whitespace after the bit you matched.
Also your data seems incomplete, there are some missing brackets at the end, but I think that is just a copy-paste error.
So I would try:
php > preg_match("#window\['data'\]\s*=\s*(.*?);#", $html, $matches);
php > print_r($matches);
Array
(
[0] => window['data'] = {"id":2345,"stuff":[{"id":704,"name":"test"};
[1] => {"id":2345,"stuff":[{"id":704,"name":"test"}
)
Of course then you must use json_decode() to convert the JSON string ($matches[1]) into an object or associative array that you can use.
You can use this regex:
window\['data'\]\s*=\s*(.*?);
Working demo
The match information is:
MATCH 1
1. [67-111] `{"id":2345,"stuff":[{"id":704,"name":"test"}`
As regex101 suggests you could have a code like this:
$re = "/window\\['data'\\]\\s*=\\s*(.*);/";
$str = "window['test'] = false; window['options'] = true; window['data'] = {\"id\":2345,\"stuff\":[{\"id\":704,\"name\":\"test\"};";
preg_match_all($re, $str, $matches);
You can parse the window data with the regular expression:
/^window\[['"](\w+)['"]\]\s*=\s*(.+);\s*$/m
Then you can retrieve the pieces by their original index in the window data structures, and parse the JSON at your leisure.
$data = <<<_E_
window['test'] = false;
window['options'] = true;
window['data'] = { "id" : 2345, "stuff": [{"id":704,"name":"test"}]};
_E_;
$regex = <<<_E_
/^window\[['"](\w+)['"]\]\s*=\s*(.+);\s*$/m
_E_; // SO syntax highlighting doesnt like HEREDOCs "
if( preg_match_all($regex,$data,$matches) > 0 ) {
var_dump($matches);
$index = array_search('data',$matches[1]);
if( $index !== 0 ) {
var_dump(json_decode($matches[2][$index]));
} else { echo 'no data section'; }
} else { echo 'no matches'; }
Output:
// $matches
array(3) {
[0]=>
array(3) {
[0]=> string(24) "window['test'] = false; "
[1]=> string(26) "window['options'] = true; "
[2]=> string(69) "window['data'] = { "id" : 2345, "stuff": [{"id":704,"name":"test"}]};"
}
[1]=>
array(3) {
[0]=> string(4) "test"
[1]=> string(7) "options"
[2]=> string(4) "data"
}
[2]=>
array(3) {
[0]=> string(5) "false"
[1]=> string(4) "true"
[2]=> string(51) "{ "id" : 2345, "stuff": [{"id":704,"name":"test"}]}"
}
}
// decoded JSON
object(stdClass)#1 (2) {
["id"]=> int(2345)
["stuff"]=>
array(1) {
[0]=>
object(stdClass)#2 (2) {
["id"]=> int(704)
["name"]=> string(4) "test"
}
}
}
Note: I fixed the JSON in your example to be valid so it would actually parse.
Related
I have array of links, i am trying to match using preg_match_all,regex but it is giving me the same result each and every time
foreach ($result[0] as $temp) {
preg_match_all($regex1, $temp["content"], $matches);
$storeUrl[]= $matches;
}
foreach ($storeUrl as $tem) {
preg_match_all($regex2, $tem[],$matches);
$storeUrllist [$count1++]= $matches;
}
$matches is working fine for first foreach and second foreach it is always returning same output only even it is not matching
I'm just guessing that you might want to design an expression that'd be similar to:
\b(?:https?:\/\/)?(?:www\.)?\w+\.(?:org|org2|com)\b
or:
\b(?:https?:\/\/)(?:www\.)?\w+\.(?:org|org2|com)\b
or:
\b(?:https?:\/\/)(?:www\.)\w+\.(?:org|org2|com)\b
not sure though.
Test
$re = '/\b(?:https?:\/\/)?(?:www\.)?\w+\.(?:org|org2|com)\b/m';
$str = 'some content http://alice.com or https://bob.org or https://www.foo.com or or baz.org2 ';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(4) {
[0]=>
array(1) {
[0]=>
string(16) "http://alice.com"
}
[1]=>
array(1) {
[0]=>
string(15) "https://bob.org"
}
[2]=>
array(1) {
[0]=>
string(19) "https://www.foo.com"
}
[3]=>
array(1) {
[0]=>
string(8) "baz.org2"
}
}
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Im just trying to get a page count from a local pdf file.
I converted the pdf to string and tried getting the page number from it.
I tried using regular expression. But im not able to perfect it.
So please help.
The below is the string text
object(setasign\Fpdi\PdfParser\Type\PdfDictionary)#2728 (1) { ["value"]=>
array(3) { ["Size"]=> object(setasign\Fpdi\PdfParser\Type\PdfNumeric)#2726
(1) { ["value"]=> int(3028) } ["Root"]=>
object(setasign\Fpdi\PdfParser\Type\PdfIndirectObjectReference)#2725 (2) {
["generationNumber"]=> int(0) ["value"]=> int(3027) } ["Info"]=>
object(setasign\Fpdi\PdfParser\Type\PdfIndirectObjectReference)#2731 (2) {
["generationNumber"]=> int(0) ["value"]=> int(3026) } } } } } }
["objects":protected]=> array(0) { } }
["pageCount":protected]=> int(96)
["pages":protected]=> array(0) { } } } ["currentReaderId":protected]=>
string(71)
"C:\xampp\files\journals\2\articles\13\submission\mergedpdf\allFiles.pdf"
["importedPages":protected]=> array(0) { } ["objectMap":protected]=>
array(0) { } ["objectsToCopy":protected]=> array(1) { ["C:\xampp\files\journals\2\articles\13\submission\mergedpdf\allFiles.pdf"]=>
array(0) { } } }
I need to get the pageCount displayed in blockquote using regularexpression.
The regular expression code below:
ob_start();
var_dump($pdf);//this was an object so i converted it to string for pattern matching.
$result = ob_get_clean();//shows the result in string.
$subject = "pageCount";
$pattern = '/^pageCount/';//pattern to match to get page count
preg_match($pattern, substr($subject,20), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
you could use method provided by the library, like:
$filename = 'some-file.pdf';
require_once('library/SetaPDF/Autoload.php');
// or if you use composer require_once('vendor/autoload.php');
$document = SetaPDF_Core_Document::loadByFilename($filename);
$pages = $document->getCatalog()->getPages();
$pageCount = $pages->count();
echo $pageCount;
I would rather loop through the object given instead of using regex for getting values out of it.
But if this is really what you need here is your regex: \["pageCount":protected\]=> int\(\d*\)
You can test it here:
https://regex101.com/r/RyGMwb/2
This is my string: $string="VARHELLO=helloVARWELCOME=123qwa";
I want to get 'hello' and '123qwa' from string.
My pseudo code is.
if /^VARHELLO/ exist
get hello(or whatever comes after VARHELLO and before VARWELCOME)
if /^VARWELCOME/ exist
get 123qwa(or whatever comes after VARWELCOME)
Note: values from 'VARHELLO' and 'VARWELCOME' are dynamic, so 'VARHELLO' could be 'H3Ll0' or VARWELCOME could be 'W3l60m3'.
Example:
$string="VARHELLO=H3Ll0VARWELCOME=W3l60m3";
Here is some code that will parse this string out for you into a more usable array.
<?php
$string="VARHELLO=helloVARWELCOME=123qwa";
$parsed = [];
$parts = explode('VAR', $string);
foreach($parts AS $part){
if(strlen($part)){
$subParts = explode('=', $part);
$parsed[$subParts[0]] = $subParts[1];
}
}
var_dump($parsed);
Output:
array(2) {
["HELLO"]=>
string(5) "hello"
["WELCOME"]=>
string(6) "123qwa"
}
Or, an alternative using parse_str (http://php.net/manual/en/function.parse-str.php)
<?php
$string="VARHELLO=helloVARWELCOME=123qwa";
$string = str_replace('VAR', '&', $string);
var_dump($string);
parse_str($string);
var_dump($HELLO);
var_dump($WELCOME);
Output:
string(27) "&HELLO=hello&WELCOME=123qwa"
string(5) "hello"
string(6) "123qwa"
Jessica's answer is perfect, but if you want to get it using preg_match
$string="VARHELLO=helloVARWELCOME=123qwa";
preg_match('/VARHELLO=(.*?)VARWELCOME=(.*)/is', $string, $m);
var_dump($m);
your results will be $m[1] and $m[2]
array(3) {
[0]=>
string(31) "VARHELLO=helloVARWELCOME=123qwa"
[1]=>
string(5) "hello"
[2]=>
string(6) "123qwa"
}
I have the following simple code which check if the password contains at least two lowercases.
preg_match("/^(?=.*[a-z].*[a-z])+$/")
But this gave me the following error message:
Compilation failed : nothing to repeat at offset 19.
I can't figure where I'm wrong
Later Edit
The following code which checks if i have at least two special characters works well:
preg_match("/^(?=.*[!##$%^&*].*[!##$%^&*])[a-zA-Z_!##%^&*]+$/")
Try this
<?php
preg_match("/^(.*[a-z].*[a-z].*)$/", "2313123g123123u123", $result);
var_dump($result);
preg_match("/^(.*[a-z].*[a-z].*)$/", "65665656s656565", $result);
var_dump($result);
?>
result
array(2) {
[0]=>
string(18) "2313123g123123u123"
[1]=>
string(18) "2313123g123123u123"
}
array(0) {
}
The (?= ) defines an assertion, you can not repeat an assertion. Did you mean to use (?: )?
$data = array('ab', '123a345b', '123');
foreach ($data as $subject) {
$found = preg_match("/^(?:.*[a-z].*[a-z])+$/", $subject, $match);
var_dump($found, $match);
}
Output:
int(1)
array(1) {
[0]=>
string(2) "ab"
}
int(1)
array(1) {
[0]=>
string(8) "123a345b"
}
int(0)
array(0) {
}
i have registry data in text as below:
/Classes/CLSID/AppID,SZ,{0010890e-8789-413c-adbc-48f5b511b3af},
/Classes/CLSID/InProcServer32,KEY,,2011-10-14 00:00:33
/Classes/CLSID/InProcServer32/,EXPAND_SZ,%SystemRoot%\x5Csystem32\x5CSHELL32.dll,
/Classes/CLSID/InProcServer32/ThreadingModel,SZ,Apartment,
/Classes/CLSID/,KEY,,2011-10-14 00:00:36
/Classes/CLSID/,SZ,,
/Classes/CLSID/InprocServer32,KEY,,2011-10-14 00:00:36
/Classes/CLSID/InprocServer32/,C:\x5CWINDOWS\x5Csystem32\x5Cmstime.dll,
then i do $registry = explode "\n" and create list of arrays below:
var_dump($registry);
[1]=> string(121) "/Classes/CLSID/AppID,SZ,{0010890e-8789-413c-adbc-48f5b511b3af},"
[2]=> string(139) "/Classes/CLSID/InProcServer32,KEY,,2011-10-14 00:00:33"
[3]=> string(89) "/Classes/CLSID/InProcServer32/,EXPAND_SZ,%SystemRoot%\x5Csystem32\x5CSHELL32.dll,"
[4]=> string(103) "/Classes/CLSID/InProcServer32/ThreadingModel,SZ,Apartment,"
[5]=> string(103) "/Classes/CLSID/,KEY,,2011-10-14 00:00:36"
[6]=> string(121) "/Classes/CLSID/,SZ,,"
[7]=> string(139) "/Classes/CLSID/InprocServer32,KEY,,2011-10-14 00:00:36"
[8]=> string(89) "/Classes/CLSID/InprocServer32/,C:\x5CWINDOWS\x5Csystem32\x5Cmstime.dll,"
i also have keywords in array form
var_dump($keywords);
[1]=> string(12) "Math.dll"
[2]=> string(12) "System.dll"
[3]=> string(12) "inetc.dll"
[4]=> string(12) "time.dll"
i want to show lines in $registry that consist string in $keywords, so i create 1 function below:
function separate($line) {
global $keywords;
foreach ($keywords as $data_filter) {
if (strpos($line, $data_filter) !== false) {
return true;
}
}
return false;
}
$separate = array_filter($registry, 'separate');
since in $keywords consists "time.dll" so the codes produce result as below:
var_dump($seperate);
[1]=> string(89) "/Classes/CLSID/InprocServer32/,C:\x5CWINDOWS\x5Csystem32\x5Cmstime.dll,"
in my case the result is not true because, mstime.dll != time.dll and the information is improper.
the output should be empty.
lets say i replace the "\x5C" as space, there is any function that can do the job? thank you in advance.
There's preg_match.
To go along with the array_filter way you have to do things:
function separate($line) {
global $keywords;
foreach ($keywords as $data_filter) {
// '.' means any character in regex, while '\.' means literal period
$data_filter = str_replace('.', '\.', $data_filter);
if (preg_match("/\\x5C{$data_filter}/", $line)) {
return true;
}
}
return false;
}
This would return false for
/Classes/CLSID/InprocServer32/,C:\x5CWINDOWS\x5Csystem32\x5Cmstime.dll,
but true for
/Classes/CLSID/InprocServer32/,C:\x5CWINDOWS\x5Csystem32\x5Ctime.dll,
If you're not familiar with Regular Expressions, they are awesome and powerful. You can customize mine as needed to suit your situation.