I'm trying to parse in json a txt file content. This is the file content:
[19-02-2016 16:48:45.505547] [info] System done.
0: array(
'ID' => 'Example 2'
)
Now this is my code for parse the file:
$fh = fopen($file, "r");
$content = array();
$content["trace"] = array();
while ($line = fgets($fh))
{
$raw = preg_split("/[\[\]]/", $line);
$entry = array();
$entry["date"] = trim($raw[1]);
$entry["type"] = trim($raw[3]);
$entry["message"] = trim($raw[4]);
$content["trace"][] = $entry;
}
fclose($fh);
return $content;
and this is what is returned from $content:
{
"trace": [{
"date": "19-02-2016 16:48:45.505547"
"type": "info"
"message": "System done."
}, {
"date": ""
"type": ""
"message": ""
}, {
"date": ""
"type": ""
"message": ""
}, {
"date": ""
"type": ""
"message": ""
}]
}
UPDATE I'm expecting this:
{
"trace": [{
"date": "19-02-2016 16:48:45.505547"
"type": "info"
"message": "System done."
"ID": Example 2
}]
}
how you can see the array is saw as a new line and the code create other empty array in the while without content. I just want create new index later message and put the array content, how I can achieve this?
UPDATE WITH MORE CONTENT IN FILE
[19-02-2016 16:57:17.104504] [info] system done.
0: array(
'ID' => 'john foo'
)
[19-02-2016 16:57:17.110482] [info] transaction done.
0: array(
'ID' => 'john foo'
)
Expected result:
{
"trace": [20]
0: {
"date": "19-02-2016 16:57:17.104504"
"type": "info"
"message": "system done."
"ID": john foo
}
1: {
"date": "19-02-2016 16:57:17.110482"
"type": "info"
"message": "transaction done."
"ID": john foo
}
...
Try this:
Code
<?php
$file = 'test.log';
$content = array();
$content["trace"] = array();
$input = file_get_contents('test.log');
preg_match_all('/\[(.*)\][\s]*?\[(.*?)\][\s]*?(.*)[\s][^\']*\'ID\'[ ]*=>[ ]*\'(.*)\'/', $input, $regs, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($regs[0]); $i++) {
$content['trace'][] = array(
'date' => $regs[1][$i],
'type' => trim($regs[2][$i]),
'message' => trim($regs[3][$i]),
'ID' => trim($regs[4][$i]),
);
}
// return $content;
echo '<pre>'; print_r($content); echo '</pre>'; // For testing only
$content = json_encode($content); // For testing only
echo '<pre>' . $content . '</pre>'; // For testing only
Result
PHP array:
Array
(
[trace] => Array
(
[0] => Array
(
[date] => 19-02-2016 16:57:17.104504
[type] => info
[message] => system done.
[ID] => john foo
)
[1] => Array
(
[date] => 19-02-2016 16:57:17.110482
[type] => info
[message] => transaction done.
[ID] => john foo
)
)
)
Json object (string):
{
"trace":[
{
"date":"19-02-2016 16:57:17.104504",
"type":"info",
"message":"system done.",
"ID":"john foo"
},
{
"date":"19-02-2016 16:57:17.110482",
"type":"info",
"message":"transaction done.",
"ID":"john foo"
}
]
}
Notes re. the RegEx:
The file is read as a whole into a string variable ($input).
The preg_match_all(RegEx) also scans the entire input.
The code iterates over all its hits, where the groups contain these parts…
1: date
2: type
3: message
4: ID
The RegEx in detail:
\[ Match the character “[” literally
( Match the regular expression below and capture its match into backreference number 1
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
[\s] Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
*? Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\[ Match the character “[” literally
( Match the regular expression below and capture its match into backreference number 2
. Match any single character that is not a line break character
*? Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
\] Match the character “]” literally
[\s] Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
*? Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
( Match the regular expression below and capture its match into backreference number 3
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
[\s] Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
[^'] Match any character that is NOT a “'”
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
'ID' Match the characters “'ID'” literally
[ ] Match the character “ ”
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
=> Match the characters “=>” literally
[ ] Match the character “ ”
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
' Match the character “'” literally
( Match the regular expression below and capture its match into backreference number 4
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
' Match the character “'” literally
Related
I try to figure out how to match following values via preg_match using:
^[\S].*[\S]{3,10}$
Unfortunately, the min works from size of 4 and the max of 10 is being ignored at all as the pattern still machtes on a lenght of 11.
Disallow leading and trailing spaces
Allow any character inside
Allow space within characters
Enforce min of 3 and max of 10 (not working)
Testset that could be used with: https://www.phpliveregex.com
[
[
"Test",
true
],
[
"Test Test",
true
],
[
"Test-Test",
true
],
[
"Test'Test",
true
],
[
"Test,Test",
true
],
[
null,
false
],
[
" ",
false
],
[
" Test ",
false
],
[
"12",
false
],
[
"12345678901",
false
]
]
Thanks for your help in advanced
You may use
^(?=.{4,10}$)\S.*\S$
See regex demo
Details
^ - start of string
(?=.{4,10}$) - four to ten chars other than line break chars up to the end of string allowed
\S - a non-whitespace char
.* - 0 or more chars other than line break chars as many as possible
\S - a non-whitespace char
$ - end of string.
I am trying to extract a specific JavaScript object from a page containing the usual HTML markup.
I have tried to use regex but i don't seem to be able to get it to parse the HTML correctly when the HTML contains a line break.
An example can be seen here: https://regex101.com/r/b8zN8u/2
The HTML i am trying to extract looks like this:
<script>
DATA.tracking.user = {
age: "19",
name: "John doe"
}
</script>
Using the following regex: DATA.tracking.user=(.*?)}
<?php
$re = '/DATA.tracking.user = (.*?)\}/m';
$str = '<script>
DATA.tracking.user = { age: "19", name: "John doe" }
</script>';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
If i parse DATA.tracking.user = { age: "19", name: "John doe" } without any linebreaks, Then it works fine but if i try to parse:
DATA.tracking.user = {
age: "19",
name: "John doe"
}
It does not like dealing with the line breaks.
Any help would be greatly appreciated.
Thanks.
You will need to specify whitespaces (\s) in your pattern in order to parse the javascript code containing linebreaks.
For example, if you use the following code:
<?php
$re = '/DATA.tracking.user = \{\s*.*\s*.*\s*\}/';
$str = '<script>
DATA.tracking.user = {
age: "19",
name: "John doe"
}
</script>';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches[0]);
?>
You will get the following output:
Array
(
[0] => DATA.tracking.user = {
age: "19",
name: "John doe"
}
)
The simple solution to your problem is to use the s pattern modifier to command the . (any character) to also match newline characters -- which it does not by default.
And you should:
escape your literal dots.
write the \{ outside of your capture group.
omit the m pattern modifier because you aren't using anchors.
...BUT...
If this was my task and I was going to be processing the data from the extracted string, I would probably start breaking up the components at extraction-time with the power of \G.
Code: (Demo) (Pattern Demo)
$htmls[] = <<<HTML
DATA.tracking.user = { age: "19", name: "John doe", int: 55 } // This works
HTML;
$htmls[] = <<<HTML
DATA.tracking.user = {
age: "20",
name: "Jane Doe",
int: 49
} // This does not works
HTML;
foreach ($htmls as $html) {
var_export(preg_match_all('~(?:\G(?!^),|DATA\.tracking\.user = \{)\s+([^:]+): (\d+|"[^"]*")~', $html, $out, PREG_SET_ORDER) ? $out : []);
echo "\n --- \n";
}
Output:
array (
0 =>
array (
0 => 'DATA.tracking.user = { age: "19"',
1 => 'age',
2 => '"19"',
),
1 =>
array (
0 => ', name: "John doe"',
1 => 'name',
2 => '"John doe"',
),
2 =>
array (
0 => ', int: 55',
1 => 'int',
2 => '55',
),
)
---
array (
0 =>
array (
0 => 'DATA.tracking.user = {
age: "20"',
1 => 'age',
2 => '"20"',
),
1 =>
array (
0 => ',
name: "Jane Doe"',
1 => 'name',
2 => '"Jane Doe"',
),
2 =>
array (
0 => ',
int: 49',
1 => 'int',
2 => '49',
),
)
---
Now you can simply iterate the matches and work with [1] (the keys) and [2] (the values). This is a basic solution, that can be further tailored to suit your project data. Admittedly, this doesn't account for values that contain an escaped double-quote. Adding this feature would be no trouble. Accounting for more complex value types may be more of a challenge.
You need to add the 's' modifier to the end of your regex - otherwise, "." does not include newlines. See this:
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
So basically change your regex to be:
'/DATA.tracking.user = (.*?)\}/ms'
Also, you should quote your other dots (otherwise you will match "DATAYtrackingzZuser". So...
'/DATA\.tracking\.user = (.*?)\}/ms'
I'd also add in the open curly bracket and not enforce the single space around the equal sign, so:
'/DATA\.tracking\.user\s*=\s*\{(.*?)\}/ms'
Since you seem to be scraping/reading the page anyway (so you have a local copy), you can simply replace all the newline characters in the HTML page with whitespace characters, then it should work perfectly without even changing your script.
Refer to this for the ascii values:
https://www.techonthenet.com/ascii/chart.php
$string = '/start info#example.com';
$pattern = '/{command} {name}#{domain}';
get array params in php, Like the example below:
['command' => 'start', 'name' => 'info', 'domain' => 'example.com']
and
$string = '/start info#example.com';
$pattern = '/{command} {email}';
['command' => 'start', 'email' => 'info#example.com']
and
$string = '/start info#example.com';
$pattern = '{command} {email}';
['command' => '/start', 'email' => 'info#example.com']
If its a single line string you can use preg_match and a regular expression such as this
preg_match('/^\/(?P<command>\w+)\s(?P<name>[^#]+)\#(?P<domain>.+?)$/', '/start info#example.com', $match );
But depending on variation in the data you may have to adjust the regx a bit. This outputs
command [1-6] start
name [7-11] info
domain [12-23] example.com
but it will also have the numeric index in the array.
https://regex101.com/r/jN8gP7/1
Just to break this down a bit, in English.
The leading ^ is start of line, then named capture ( \w (any a-z A-Z 0-9 _ ) ) then a space \s then named capture of ( anything but the #t sign [^#] ), then the #t sign #, then name captured of ( anything .+? to the end $ )
This will capture anything in this format,
(abc123_ ) space (anything but #)#(anything)
In php when user saves a text, I need to split string as
"genesis1:3-16" ==> "genesis", "1", "3", "16"
"revelation2:3-5" ==> "revelation", "2", "3", "5"
The conditions are there will be no white spaces between all characters I need to split according to symbol ":", "-", and character. the numbers can go up to only '999' 3 digits.
$sample = "genesis1:3-16";
//magic happens....
$book = ""; // genesis
$chapter = ""; // 1
$start_verse = ""; // 3
$end_verse = ""; //16
I have limited knowledge of reg expression and can't figure out using only strpos and substr...
Thank you in advance
I think this regex would accomplish what you are after:
([a-z]+)(\d{1,3}):(\d{1,3})-(\d{1,3})
Demo (with explanation of what each part does): https://regex101.com/r/uP4gW6/1
PHP Usage:
preg_match('~([a-z]+)(\d{1,3}):(\d{1,3})-(\d{1,3})~', 'genesis1:3-16', $data);
print_r($data);
Output:
Array
(
[0] => genesis1:3-16
[1] => genesis
[2] => 1
[3] => 3
[4] => 16
)
With preg_match the 0 index is the found content. The subsequent indexes are each captured group.
If you have a fixed set of names the book could be you could replace [a-z]+ with that list seperated by |, for example revelation|genesis|othername.
$parts = array();
preg_match('/^(.*?)\s*(\d+):(\d+)-(\d+)$/', $sample, $parts);
$book = $parts[1];
$chapter = $parts[2];
$startVerse = $parts[3];
$endVerse = $parts[4];
you could use this simple pattern
([a-zA-Z]+|\d{1,3})
Demo
I have the following regular expression in javascript and i would like to have the exact same functionality (or similar) in php:
// -=> REGEXP - match "x bed" , "x or y bed":
var subject = query;
var myregexp1 = /(\d+) bed|(\d+) or (\d+) bed/img;
var match = myregexp1.exec(subject);
while (match != null){
if (match[1]) { "X => " + match[1]; }
else{ "X => " + match[2] + " AND Y => " + match[3]}
match = myregexp1.exec(subject);
}
This code searches a string for a pattern matching "x beds" or "x or y beds".
When a match is located, variable x and variable y are required for further processing.
QUESTION:
How do you construct this code snippet in php?
Any assistance appreciated guys...
You can use the regex unchanged. The PCRE syntax supports everything that Javascript does. Except the /g flag which isn't used in PHP. Instead you have preg_match_all which returns an array of results:
preg_match_all('/(\d+) bed|(\d+) or (\d+) bed/im', $subject, $matches,
PREG_SET_ORDER);
foreach ($matches as $match) {
PREG_SET_ORDER is the other trick here, and will keep the $match array similar to how you'd get it in Javascript.
I've found RosettaCode to be useful when answering these kinds of questions.
It shows how to do the same thing in various languages. Regex is just one example; they also have file io, sorting, all kinds of basic stuff.
You can use preg_match_all( $pattern, $subject, &$matches, $flags, $offset ), to run a regular expression over a string and then store all the matches to an array.
After running the regexp, all the matches can be found in the array you passed as third argument. You can then iterate trough these matches using foreach.
Without setting $flags, your array will have a structure like this:
$array[0] => array ( // An array of all strings that matched (e.g. "5 beds" or "8 or 9 beds" )
0 => "5 beds",
1 => "8 or 9 beds"
);
$array[1] => array ( // An array containing all the values between brackets (e.g. "8", or "9" )
0 => "5",
1 => "8",
2 => "9"
);
This behaviour isn't exactly the same, and I personally don't like it that much. To change the behaviour to a more "JavaScript-like"-one, set $flags to PREG_SET_ORDER. Your array will now have the same structure as in JavaScript.
$array[0] => array(
0 => "5 beds", // the full match
1 => "5", // the first value between brackets
);
$array[1] => array(
0 => "8 or 9 beds",
1 => "8",
2 => "9"
);