PHP: How to extract JSON strings out of a string dump - php

I have a huge string dump that contains a mix of regular text and JSON. I want to seperate/remove the JSON objects from the string dump and get the text only.
Here is an example:
This is some text {'JSON':'Object'} Here's some more text {'JSON':'Object'} Yet more text {'JSON':'Object'} Again, some text.
My goal is to get a text dump that looks like this (basically the JSON is removed):
This is some text Here's some more text Yet more text Again, some text.
I need to do this all in PHP. The text dump is always random, and so is the JSON data structure (most of the it is deeply nested). The dump may or may not start with JSON, and it may or may not contain more than one JSON object within the string dump.
I have tried using json_decode on the string but the result ends up as NULL
EDIT: Amal's answer is really close to what I want (see the 2nd comment below):
$str = preg_replace('#\{.*?\}#s', '', $str);
However, it doesn't get rid of nested objects at all; e.g. data contained in brackets: [] or [{}]
Sorry, I'm not an expert in regex.
I realized that some of you may need a more concrete example of the string dump I'm dealing with; therefore I've created a gist (please note that this is not static data; the data in the dump will always be different; my example above just simplifies the string I'm working with): https://gist.github.com/anonymous/6855800

I wanted you to post the code you used in your attempt using JSON_decode but oh well...
You can use a recursive regex for nested braces in PHP:
$res = preg_replace('~\{(?:[^{}]|(?R))*\}~', '', $text);
regex101 demo (The part highlighted in blue will be removed).

take a stack and start iterating over the string from the begining.
for($i=0;i<count($str);$i++){
}
whenver you find $str[i] == '{' push this element into the stack and initialize the start variable to $i:
$start = $i;
now whenver a { or [ occurs in th string start push into the stack.
if ] or } occurs and the top of the stack is not { or ] that means this is not a correct json.
if not so then pop the top of stack and keep on doing so until stack is empty.
at that point you get $end = $i;
this will be one of the json string. (from $start to $end)
push this string into another array which keeps all the jsons.
and keep on processing till you reach the end

Here is a working code snippet that works based on animesh seth's answer.
if (strpos($msg, '{') !== false) {
$msg = str_split($msg);
// extract the json message.
$json = '';
$in = 0;
foreach ($msg as $i => $char) {
if ($char == '{') {
$in++;
}
if ($in) {
$json .= $msg[$i];
}
if ($char == '}') {
$in--;
}
}
if ($json) {
$json = json_decode($json);
}
// do something with the json object.
}

Related

PHP - substr() bug or invalid argument

I working at PHP Project With PHP Version 7.0.13
I was dealing with JSON lately, I have a JSON file that needs to be decode to PHP but before I decode the JSON, I need to clean some abstract string inside the file that JSON obtained inside, to clean the string using substr() to get the JSON.
when i write the code, like this:
$jsonraw = "\"{ JSON should be here, later }\"";
$cutstart = strpos($jsonraw, "{");
$cutend = strrpos($jsonraw, "\"");
$jsonclean = substr($jsonraw, $cutstart, $cutend);
echo $jsonclean;
The output is like this
{ JSON should be here, later }
But when the string is like this
$jsonraw = "\"some abstract string to remove { JSON should be here, later }\"";
The output is became like this
{ JSON should be here, later }"
As we can see there was a quote symbol " at the last of the string, I was trying to decrement the $cutend, like this $jsonclean = substr($jsonraw, $cutstart, --$cutend); and this to $cutend-1
Any help, I appreciate.
Sorry for my bad English
You can use preg_match to get the json from that string:
$string = "some abstract string to remove { JSON should be here, later }";
preg_match('/\{.*\}/', $string, $match);
var_dump($match[0]);
the result would be:
string(30) "{ JSON should be here, later }"
As the third parameter is the length of the string, you need to say that the length is the end position minus the start position...
$jsonclean = substr($jsonraw, $cutstart, $cutend-$cutstart);

How to convert object class into string

How can I convert the following object into string:
$ssh->exec('tail -1 /var/log/playlog.csv');
So I can parse the string as the first parameter in strripos():
if($idx = strripos($ssh,','))//Get the last index of ',' substring
{
$ErrorCode = substr($ssh,$idx + 1,(strlen($ssh) - $idx) - 1); //using the found index, get the error code using substring
echo " " .$Playlist.ReturnError($ErrorCode); //The ReturnError function just replaces the error code with a custom error
}
As currently when I run my script I get the following error message:
strpos() expects parameter 1 to be string
I've seen similar questions including this one Object of class stdClass could not be converted to string , however I still can't seem to come up with a solution.
There are two problems with this line of code:
if($idx = strripos($ssh,','))
$ssh is an instance of some class. You use it above as $ssh->exec(...). You should check the value it returns (probably a string) and strripos() on it, not on $ssh.
strripos() returns FALSE if it cannot find the substring or a number (that can be 0) when it founds it. But in boolean context, 0 is the same as false. This means this code cannot tell apart the cases when the comma (,) is found as the first character of the string or it is not found at all.
Assuming $ssh->exec() returns the output of the remote command as string, the correct way to write this code is:
$output = $ssh->exec('tail -1 /var/log/playlog.csv');
$idx = strrpos($output, ','); //Get the last index of ',' substring
if ($idx !== FALSE) {
// The value after the last comma is the error code
$ErrorCode = substr($output, $idx + 1);
echo ' ', $Playlist, ReturnError($ErrorCode);
} else {
// Do something else when it doesn't contain a comma
}
There is no need to use strripos(). It performs case-insensitive comparison but you are searching for a character that is not a letter, consequently the case-sensitivity doesn't make any sense for it.
You can use strrpos() instead, it produces the same result and it's a little bit faster than strripos().
An alternative way
An alternative way to get the same outcome is to use explode() to split $output in pieces (separated by comma) and get the last piece (using end() or array_pop()) as the error code:
$output = $ssh->exec('tail -1 /var/log/playlog.csv');
$pieces = explode(',', $output);
if (count($pieces) > 1) {
$ErrorCode = (int)end($pieces);
echo ' ', $Playlist, ReturnError($ErrorCode);
} else {
// Do something else when it doesn't contain a comma
}
This is not necessarily a better way to do it. It is, however, more readable and more idiomatic to PHP (the code that uses strrpos() and substr() resembles more the C code).

Trim a string until specific character appears

I want to trim a string and delete everything before a specific character, because I am using an API that gives me some unwanted data in its callback which I want to delete.
The Callback looks like this:
{"someVar":true,"anotherVar":false,"items":[ {"id":123456, [...] }
And I only want the code after the [ , so how can I split a string like this?
Thank you!
It is JSON, so you could just decode it:
$data = json_decode($string);
If you really want to trim up to a certain character then you can just find the character's position and then cut off everything before it:
if (($i = strpos($string, '[')) !== false) {
$string = substr($string, $i + 1);
}
You can use various functions. For example:
$someVar = explode('[',$string,2);
$wantedData = $someVar[1];
Or if you want only data between [ and ] then use:
$pattern = '~\[([^\]])\]~Ui';
if (preg_match($pattern,$inputString,$matches) {
$wantedData = $matches[1];
}
Edit:
Thats what you use if you want extract some string from another. But as #Dagon noticed, it's json and you can use other function to parse it. I will leave above anyway, because it's more general to the question of extracting string from another.

very large php string magically turns into array

I am getting an "Array to string conversion error on PHP";
I am using the "variable" (that should be a string) as the third parameter to str_replace. So in summary (very simplified version of whats going on):
$str = "very long string";
str_replace("tag", $some_other_array, $str);
$str is throwing the error, and I have been trying to fix it all day, the thing I have tried is:
if(is_array($str)) die("its somehow an array");
serialize($str); //inserted this before str_replace call.
I have spent all day on it, and no its not something stupid like variables around the wrong way - it is something bizarre. I have even dumped it to a file and its a string.
My hypothesis:
The string is too long and php can't deal with it, turns into an array.
The $str value in this case is nested and called recursively, the general flow could be explained like this:
--code
//pass by reference
function the_function ($something, &$OFFENDING_VAR, $something_else) {
while(preg_match($something, $OFFENDING_VAR)) {
$OFFENDING_VAR = str_replace($x, y, $OFFENDING_VAR); // this is the error
}
}
So it may be something strange due to str_replace, but that would mean that at some point str_replace would have to return an array.
Please help me work this out, its very confusing and I have wasted a day on it.
---- ORIGINAL FUNCTION CODE -----
//This function gets called with multiple different "Target Variables" Target is the subject
//line, from and body of the email filled with << tags >> so the str_replace function knows
//where to replace them
function perform_replacements($replacements, &$target, $clean = TRUE,
$start_tag = '<<', $end_tag = '>>', $max_substitutions = 5) {
# Construct separate tag and replacement value arrays for use in the substitution loop.
$tags = array();
$replacement_values = array();
foreach ($replacements as $tag_text => $replacement_value) {
$tags[] = $start_tag . $tag_text . $end_tag;
$replacement_values[] = $replacement_value;
}
# TODO: this badly needs refactoring
# TODO: auto upgrade <<foo>> to <<foo_html>> if foo_html exists and acting on html template
# Construct a regular expression for use in scanning for tags.
$tag_match = '/' . preg_quote($start_tag) . '\w+' . preg_quote($end_tag) . '/';
# Perform the substitution until all valid tags are replaced, or the maximum substitutions
# limit is reached.
$substitution_count = 0;
while (preg_match ($tag_match, $target) && ($substitution_count++ < $max_substitutions)) {
$target = serialize($target);
$temp = str_replace($tags,
$replacement_values,
$target); //This is the line that is failing.
unset($target);
$target = $temp;
}
if ($clean) {
# Clean up any unused search values.
$target = preg_replace($tag_match, '', $target);
}
}
How do you know $str is the problem and not $some_other_array?
From the manual:
If search and replace are arrays, then str_replace() takes a value
from each array and uses them to search and replace on subject. If
replace has fewer values than search, then an empty string is used for
the rest of replacement values. If search is an array and replace is a
string, then this replacement string is used for every value of
search. The converse would not make sense, though.
The second parameter can only be an array if the first one is as well.

Parse multiple predictably formatted substrings of user data existing in a single string

I have a really long string in a certain pattern such as:
userAccountName: abc userCompany: xyz userEmail: a#xyz.com userAddress1: userAddress2: userAddress3: userTown: ...
and so on. This pattern repeats.
I need to find a way to process this string so that I have the values of userAccountName:, userCompany:, etc. (i.e. preferably in an associative array or some such convenient format).
Is there an easy way to do this or will I have to write my own logic to split this string up into different parts?
Simple regular expressions like this userAccountName:\s*(\w+)\s+ can be used to capture matches and then use the captured matches to create a data structure.
If you can arrange for the data to be formatted as it is in a URL (ie, var=data&var2=data2) then you could use parse_str, which does almost exactly what you want, I think. Some mangling of your input data would do this in a straightforward manner.
You might have to use regex or your own logic.
Are you guaranteed that the string ": " does not appear anywhere within the values themselves? If so, you possibly could use implode to split the string into an array of alternating keys and values. You'd then have to walk through this array and format it the way you want. Here's a rough (probably inefficient) example I threw together quickly:
<?php
$keysAndValuesArray = implode(': ', $dataString);
$firstKeyName = 'userAccountName';
$associativeDataArray = array();
$currentIndex = -1;
$numItems = count($keysAndValuesArray);
for($i=0;$i<$numItems;i+=2) {
if($keysAndValuesArray[$i] == $firstKeyName) {
$associativeDataArray[] = array();
++$currentIndex;
}
$associativeDataArray[$currentIndex][$keysAndValuesArray[$i]] = $keysAndValuesArray[$i+1];
}
var_dump($associativeDataArray);
If you can write a regexp (for my example I'm considering there're no semicolons in values), you can parse it with preg_split or preg_match_all like this:
<?php
$raw_data = "userAccountName: abc userCompany: xyz";
$raw_data .= " userEmail: a#xyz.com userAddress1: userAddress2: ";
$data = array();
// /([^:]*\s+)?/ part works because the regexp is "greedy"
if (preg_match_all('/([a-z0-9_]+):\s+([^:]*\s+)?/i', $raw_data,
$items, PREG_SET_ORDER)) {
foreach ($items as $item) {
$data[$item[1]] = $item[2];
}
print_r($data);
}
?>
If that's not the case, please describe the grammar of your string in a bit more detail.
PCRE is included in PHP and can respond to your needs using regexp like:
if ($c=preg_match_all ("/userAccountName: (<userAccountName>\w+) userCompany: (<userCompany>\w+) userEmail: /", $txt, $matches))
{
$userAccountName = $matches['userAccountName'];
$userCompany = $matches['userCompany'];
// and so on...
}
the most difficult is to get the good regexp for your needs.
you can have a look at http://txt2re.com for some help
I think the solution closest to what I was looking for, I found at http://www.justin-cook.com/wp/2006/03/31/php-parse-a-string-between-two-strings/. I hope this proves useful to someone else. Thanks everyone for all the suggested solutions.
If i were you, i'll try to convert the strings in a json format with some regexp.
Then, simply use Json.

Categories