php regular expression for dictionary string - php

I could use some help writing a regular expression for this dictionary string (I don't use them all that often).
This is an example of the string dictionary:
O:8:"stdClass":5:{s:4:"sent";i:0;s:6:"graded";i:0;s:5:"score";i:0;s:6:"answer";s:14:"<p>Johnson</p>";s:8:"response";s:0:"";}
I want to extract Johnson from the string dictionary.
Any help would be appreciated, thanks.

This is a PHP serialized object. Don't use a regular expression. unserialize() the data and display the answer property accordingly.
unserialize($data);
echo $data->answer;

$str = 'O:8:"stdClass":5:{s:4:"sent";i:0;s:6:"graded";i:0;s:5:"score";i:0;s:6:"answer";s:14:"<p>Johnson</p>";s:8:"response";s:0:"";}';
$obj = unserialize($str);
echo $obj->answer;
This would be the correct answer, no regex needed. You may need some additional HTML parsing if you'd want the <p> tags removed. If the format will always remain the same (and only then!) simply remove the <p> and </p> tags.

It looks like you should be using unserialize() instead and then you can use preg_match to remove the <p> tags.
$obj = (unserialize('O:8:"stdClass":5:{s:4:"sent";i:0;s:6:"graded";i:0;s:5:"score";i:0;s:6:"answer";s:14:"<p>Johnson</p>";s:8:"response";s:0:"";}'));
preg_match('~<p>([^<]*)</p>~', $obj->answer, $ans);
print_r($ans[1]); //prints Johnson

Related

Strip multiple tags with strip_tags()

I am trying to strip html tags from my msg string.
I have the following string that contains user input :
$msg="Hello world ! <b>Welcome to venga club</b> .<br><li>We are here to entertain you....</li>";
I know it's simple to strip those tags with regex and preg_replace, but I want to do this using strip_tags() if possible.
I tried the following code
echo strip_tags("<a><b><li><br>",$msg);
but the result i get is black, is there something wrong with the function?
Any help is much appriciated.
Thanks
like #u_mulder has advised - sometimes it is indeed worth to spare some additional time on reading the manual (evening reading etc.) :)
function strip_tags ($str, $allowable_tags = null)
accepts first argument as the input string and second argument as allowable tags. Opposite of how it is written in your case.
http://php.net/manual/en/function.strip-tags.php
so you should put it for example like this:
$msg='Hello world ! <b>Welcome to venga club</b> .<br><li>We are here to entertain you....</li>';
echo strip_tags($msg, '<a>');

php regular expression breaks

I have the following string in an html.
BookSelector.load([{"index":25,"label":"Science","booktype":"pdf","payload":"<script type=\"text\/javascript\" charset=\"utf-8\" src=\"\/\/www.192.168.10.85\/libs\/js\/books.min.js\" publisher_id=\"890\"><\/script>"}]);
i want to find the src and the publisher_id from the string.
for this im trying the following code
$regex = '#\BookSelector.load\(.*?src=\"(.*?)\"}]\)#s';
preg_match($regex, $html, $matches);
$match = $matches[1];
but its always returning null.
what would be my regex to select the src only ?
what would be my regex if i need to parse the whole string between BookSelector.load ();
Why your regex isn't working?
First, I'll answer why your regex isn't working:
You're using \B in your regex. It matches any position not matched by a word boundary (\b), which is not what you want. This condition fails, and causes the entire regex to fail.
Your original text contains escaped quotes, but your regex doesn't account for those.
The correct approach to solve this problem
Split this task into several parts, and solve it one by one, using the best tool available.
The data you need is encapsulated within a JSON structure. So the first step is obviously to extract the JSON content. For this purpose, you can use a regex.
Once you have the JSON content, you need to decode it to get the data in it. PHP has a built-in function for that purpose: json_decode(). Use it with the input string and set the second parameter as true, and you'll have a nice associative array.
Once you have the associative array, you can easily get the payload string, which contains the <script> tag contents.
If you're absolutely sure that the order of attributes will always be the same, you can use a regex to extract the required information. If not, it's better to use an HTML parser such as PHP's DOMDocument to do this.
The whole code for this looks like:
// Extract the JSON string from the whole block of text
if (preg_match('/BookSelector\.load\((.*?)\);/s', $text, $matches)) {
// Get the JSON string and decode it using json_decode()
$json = $matches[1];
$content = json_decode($json, true)[0]['payload'];
$dom = new DOMDocument;
$dom->loadHTML($content);
// Use DOMDocument to load the string, and get the required values
$script_tag = $dom->getElementsByTagName('script')->item(0);
$script_src = $tag->getAttribute('src');
$publisher_id = $tag->getAttribute('publisher_id');
var_dump($src, $publisher_id);
}
Output:
string(40) "//www.192.168.10.85/libs/js/books.min.js"
string(3) "890"

Split a string with <span> using preg_split()

I have a sting that is in this format.
<span class="amount">$25</span>–<span class="amount">$100</span>
What I need to do is split that into two strings. The string will remain in the same format but the prices will change. I tried using str_split() but because the price changes I wouldn't be able to always know how many characters to split the string at.
What I am trying to get is something like this.
String 1
<span class="amount">$25</span>–
String 2
<span class="amount">$100</span>
It seems the best option I have found is to use preg_split() but I don't know anything about regex so I'm not sure how to format the expression. There may also be a better way to handle this and I just don't know of it.
Could someone please help me format the regex, or let me know of a better way to split that string.
Edit
Thanks to #rm-vanda for helping me figure out that I don't need to use preg_split for this. I was able to split the string using explode(). The issue I was having was because the '-' was encoded weird and therefore not returning correctly.
It might be better to translate this problem into DOM:
$html = <<<HTML
<span class="amount">$25</span>–<span class="amount">$100</span>
HTML;
$doc = new DOMDocument;
$doc->loadHTML($html);
foreach ($doc->getElementsByTagName('span') as $span) {
// do stuff with $span
// e.g. this is how you would get the outer html
echo $doc->saveXML($span);
}
If it always has the "-" then this would be the most simple way:
$span = explode("-", $spans);
echo $span[0];
echo $span[1];

Regular expression to extract json response in php

I'm new to php and am trying to write a regular expression using preg_match to extract the href value that I get from my http get.
The response looks:
{"_links":{"http://a.b.co/documents":{"href":"/docs"}}}
I want to extract only the href value and pass it to my next api... i.e. /docs.
Can anyone please tell me how to extract this?
I've been using http://www.solmetra.com/scripts/regex/index.php to test my regex.. and had no luck since last one day :(
please any help would be appreciated.
Thanks,
DR
No need for a regex.
Use json_decode() and then access the href property.
For example:
$data = json_decode('{"_links":{"http://a.b.co/documents":{"href":"/docs"}}}', true);
echo $data['_links']['http://a.b.co/documents']['href'];
Note: I'd encourage you to clean up your JSON if possible. Particularly the keys.
Don't use regex, use json_decode(). JSON is an excellent example of a context-free grammar that you shouldn't even try to parse with regex.
Here's PHP.NET's reference on using json_decode() for just this sort of thing.
Just like HTML parsing, I would recommend not using a REGEX but rather a json parser then reading the value. Check out json_encode and json_decode functions in php.
That said if you just need the href value then here is a regex to do just that on the example you gave
preg_match('/"href":"([^"]+)"/',$string,$matches);
$matches[1];// this is the href
Regex is only the right tool when you know exactly what you want and exactly the format it will be in. Often json and HTML from other parties can't be exactly predicted. There are also examples of certain legal HTML and json which can't properly be parsed with regex so in general use a specialized parser for them.

handle parts of string in preg_replace_callback differently

I got a string in which I replace all occurrences of [CODE]...[/CODE]. With preg_replace_callback can I call a function which handles the content of those tags. But how can I manipulate all string which are around those occurrences?
Example:
$str = "Hello, I am a string with [CODE]some code[/CODE] in it";
Now, with preg_replace_callback I manipulate the content of [CODE], in this case some code. But I'd like for all other text in this string, so Hello, I am a string with and in it to do something different. How could I do this the best way?
Thank you for you help!
Flo
It'd be simpler if I could see the regex, but the gist is that I think you want capture groups.
You should be able to access those regions separately by placing them into parenthesis-wrapped groups. Each section will be available to your callback. So (crudely) something like /(.*)(\[CODE\].*\[/CODE\])(.*)/ should pass an array of matches to your callback

Categories