How best use Regular Expressions to convert Heirarchical Text File into XML? - php

Good morning -
I'm interested in seeing an efficient way of parsing the values of an heirarchical text file (i.e., one that has a Title => Multiple Headings => Multiple Subheadings => Multiple Keys => Multiple Values) into a simple XML document. For the sake of simplicity, the answer would be written using:
Regex (preferrably in PHP)
or, PHP code (e.g., if looping were more efficient)
Here's an example of an Inventory file I'm working with. Note that Header = FOODS, Sub-Header = Type (A, B...), Keys = PRODUCT (or CODE, etc.) and Values may have one more more lines.
**FOODS - TYPE A**
___________________________________
**PRODUCT**
1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese;
2) La Fe String Cheese
**CODE**
Sell by date going back to February 1, 2009
**MANUFACTURER**
Quesos Mi Pueblito, LLC, Passaic, NJ.
**VOLUME OF UNITS**
11,000 boxes
**DISTRIBUTION**
NJ, NY, DE, MD, CT, VA
___________________________________
**PRODUCT**
1) Peanut Brittle No Sugar Added;
2) Peanut Brittle Small Grind;
3) Homestyle Peanut Brittle Nuggets/Coconut Oil Coating
**CODE**
1) Lots 7109 - 8350 inclusive;
2) Lots 8198 - 8330 inclusive;
3) Lots 7075 - 9012 inclusive;
4) Lots 7100 - 8057 inclusive;
5) Lots 7152 - 8364 inclusive
**MANUFACTURER**
Star Kay White, Inc., Congers, NY.
**VOLUME OF UNITS**
5,749 units
**DISTRIBUTION**
NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN
**FOODS - TYPE B**
___________________________________
**PRODUCT**
Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice;
**CODE**
990-10/2 10/5
**MANUFACTURER**
San Mar Manufacturing Corp., Catano, PR.
**VOLUME OF UNITS**
384
**DISTRIBUTION**
PR
And here's the desired output (please excuse any XML syntactical errors):
<foods>
<food type = "A" >
<product>Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese</product>
<product>La Fe String Cheese</product>
<code>Sell by date going back to February 1, 2009</code>
<manufacturer>Quesos Mi Pueblito, LLC, Passaic, NJ.</manufacturer>
<volume>11,000 boxes</volume>
<distibution>NJ, NY, DE, MD, CT, VA</distribution>
</food>
<food type = "A" >
<product>Peanut Brittle No Sugar Added</product>
<product>Peanut Brittle Small Grind</product>
<product>Homestyle Peanut Brittle Nuggets/Coconut Oil Coating</product>
<code>Lots 7109 - 8350 inclusive</code>
<code>Lots 8198 - 8330 inclusive</code>
<code>Lots 7075 - 9012 inclusive</code>
<code>Lots 7100 - 8057 inclusive</code>
<code>Lots 7152 - 8364 inclusive</code>
<manufacturer>Star Kay White, Inc., Congers, NY.</manufacturer>
<volume>5,749 units</volume>
<distibution>NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN</distribution>
</food>
<food type = "B" >
<product>Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice</product>
<code>990-10/2 10/5</code>
<manufacturer>San Mar Manufacturing Corp., Catano, PR</manufacturer>
<volume>384</volume>
<distibution>PR</distribution>
</food>
</FOODS>
<!-- and so forth -->
So far, my approach (which might be quite inefficient with a huge text file) would be one of the following:
Loops and multiple Select/Case statements, where the file is loaded into a string buffer, and while looping through each line, see if it matches one of the header/subheader/key lines, append the appropriate xml tag to a xml string variable, and then add the child nodes to the xml based on IF statements regarding which key name is most recent (which seems time-consuming and error-prone, esp. if the text changes even slightly) -- OR
Use REGEX (Regular Expressions) to find and replace key fields with appropriate xml tags, clean it up with an xml library, and export the xml file. Problem is, I barely use regular expressions, so I'd need some example-based help.
Any help or advice would be appreciated.
Thanks.

An example you can use as a starting point. At least I hope it gives you an idea...
<?php
define('TYPE_HEADER', 1);
define('TYPE_KEY', 2);
define('TYPE_DELIMETER', 3);
define('TYPE_VALUE', 4);
$datafile = 'data.txt';
$fp = fopen($datafile, 'rb') or die('!fopen');
// stores (the first) {header} in 'name' and the root simplexmlelement in 'element'
$container = array('name'=>null, 'element'=>null);
// stores the name for each item element, the value for the type attribute for subsequent item elements and the simplexmlelement of the current item element
$item = array('name'=>null, 'type'=>null, 'current_element'=>null);
// the last **key** encountered, used to create new child elements in the current item element when a value is encountered
$key = null;
while ( false!==($t=getstruct($fp)) ) {
switch( $t[0] ) {
case TYPE_HEADER:
if ( is_null($container['element']) ) {
// this is the first time we hit **header - subheader**
$container['name'] = $t[1][0];
// ugly hack, < . name . />
$container['element'] = new SimpleXMLElement('<'.$container['name'].'/>');
// each subsequent new item gets the new subheader as type attribute
$item['type'] = $t[1][1];
// dummy implementation: "deducting" the item names from header/container[name]
$item['name'] = substr($t[1][0], 0, -1);
}
else {
// hitting **header - subheader** the (second, third, nth) time
/*
header must be the same as the first time (stored in container['name']).
Otherwise you need another container element since
xml documents can only have one root element
*/
if ( $container['name'] !== $t[1][0] ) {
echo $container['name'], "!==", $t[1][0], "\n";
die('format error');
}
else {
// subheader may have changed, store it for future item elements
$item['type'] = $t[1][1];
}
}
break;
case TYPE_DELIMETER:
assert( !is_null($container['element']) );
assert( !is_null($item['name']) );
assert( !is_null($item['type']) );
/* that's maybe not a wise choice.
You might want to check the complete item before appending it to the document.
But the example is a hack anyway ...so create a new item element and append it to the container right away
*/
$item['current_element'] = $container['element']->addChild($item['name']);
// set the type-attribute according to the last **header - subheader** encountered
$item['current_element']['type'] = $item['type'];
break;
case TYPE_KEY:
$key = $t[1][0];
break;
case TYPE_VALUE:
assert( !is_null($item['current_element']) );
assert( !is_null($key) );
// this is a value belonging to the "last" key encountered
// create a new "key" element with the value as content
// and addit to the current item element
$tmp = $item['current_element']->addChild($key, $t[1][0]);
break;
default:
die('unknown token');
}
}
if ( !is_null($container['element']) ) {
$doc = dom_import_simplexml($container['element']);
$doc = $doc->ownerDocument;
$doc->formatOutput = true;
echo $doc->saveXML();
}
die;
/*
Take a look at gettoken() at http://www.tuxradar.com/practicalphp/21/5/6
It breaks the stream into much simpler pieces.
In the next step the parser would "combine" or structure the simple tokens into more complex things.
This function does both....
#return array(id, array(parameter)
*/
function getstruct($fp) {
if ( feof($fp) ) {
return false;
}
// shortcut: all we care about "happens" on one line
// so let php read one line in a single step and then do the pattern matching
$line = trim(fgets($fp));
// this matches **key** and **header - subheader**
if ( preg_match('#^\*\*([^-]+)(?:-(.*))?\*\*$#', $line, $m) ) {
// only for **header - subheader** $m[2] is set.
if ( isset($m[2]) ) {
return array(TYPE_HEADER, array(trim($m[1]), trim($m[2])));
}
else {
return array(TYPE_KEY, array($m[1]));
}
}
// this matches _____________ and means "new item"
else if ( preg_match('#^_+$#', $line, $m) ) {
return array(TYPE_DELIMETER, array());
}
// any other non-empty line is a single value
else if ( preg_match('#\S#', $line) ) {
// you might want to filter the 1),2),3) part out here
// could also be two diffrent token types
return array(TYPE_VALUE, array($line));
}
else {
// skip empty lines, would be nicer with tail-recursion...
return getstruct($fp);
}
}
prints
<?xml version="1.0"?>
<FOODS>
<FOOD type="TYPE A">
<PRODUCT>1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese;</PRODUCT>
<PRODUCT>2) La Fe String Cheese</PRODUCT>
<CODE>Sell by date going back to February 1, 2009</CODE>
<MANUFACTURER>Quesos Mi Pueblito, LLC, Passaic, NJ.</MANUFACTURER>
<VOLUME OF UNITS>11,000 boxes</VOLUME OF UNITS>
<DISTRIBUTION>NJ, NY, DE, MD, CT, VA</DISTRIBUTION>
</FOOD>
<FOOD type="TYPE A">
<PRODUCT>1) Peanut Brittle No Sugar Added;</PRODUCT>
<PRODUCT>2) Peanut Brittle Small Grind;</PRODUCT>
<PRODUCT>3) Homestyle Peanut Brittle Nuggets/Coconut Oil Coating</PRODUCT>
<CODE>1) Lots 7109 - 8350 inclusive;</CODE>
<CODE>2) Lots 8198 - 8330 inclusive;</CODE>
<CODE>3) Lots 7075 - 9012 inclusive;</CODE>
<CODE>4) Lots 7100 - 8057 inclusive;</CODE>
<CODE>5) Lots 7152 - 8364 inclusive</CODE>
<MANUFACTURER>Star Kay White, Inc., Congers, NY.</MANUFACTURER>
<VOLUME OF UNITS>5,749 units</VOLUME OF UNITS>
<DISTRIBUTION>NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN</DISTRIBUTION>
</FOOD>
<FOOD type="TYPE B">
<PRODUCT>Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice;</PRODUCT>
<CODE>990-10/2 10/5</CODE>
<MANUFACTURER>San Mar Manufacturing Corp., Catano, PR.</MANUFACTURER>
<VOLUME OF UNITS>384</VOLUME OF UNITS>
<DISTRIBUTION>PR</DISTRIBUTION>
</FOOD>
</FOODS>
Unfortunately the status of the php module for ANTLR currently is "Runtime is in alpha status." but it might be worth a try anyway...

See: http://www.tuxradar.com/practicalphp/21/5/6
This tells you how to parse a text file into tokens using PHP. Once parsed you can place it into anything you want.
You need to search for specific tokens in the file based on your criteria:
for example:
PRODUCT
This gives you the XML Tag
Then 1) can have special meaning
1) Peanut Brittle...
This tells you what to put in the XML tag.
I do not know if this is the most efficient way to accomplish your task but it is the way a compiler would parse a file and has the potential to make very accurate.

Instead of Regex or PHP use the XSLT 2.0 unparsed-text() function to read the file (see http://www.biglist.com/lists/xsl-list/archives/200508/msg00085.html)

Another Hint for an XSLT 1.0 Solution is here: http://bytes.com/topic/net/answers/808619-read-plain-file-xslt-1-0-a

Related

How to decode JSON event datasets?

How would I decode this JSON data to get the Location link of the event? NOTE: When I say Location I don't mean the field "location" in the json data, I am referring to the field which is in "customFields", then has a "value" which is a link to Google Maps, it also has the "type" = 9.
Problem: I am currently stuck with a page which looks like the image below, the "Notice: Undefined offset: # in...." error continues for 200 lines, because the JSON file contains the data of 200 events, the JSON included only contains the first event.
Desired Result: For the link to google maps page to be echoed on every line. I think the solution is very simple, just changing my Source code (Included) so that it can read the JSON file.
JSON dataset:
[{"eventID":152913573,"template":"Brisbane City Council","title":"Clock Tower Tour","description":"The Clock Tour Tower is a ‘must-do’ for anyone and everyone in Brisbane!<br /> <br /> For many years, City Hall’s Clock Tower made the building the tallest in Brisbane, offering visitors a magnificent 360 degree view of the city around them. Whilst the view has changed significantly over the last 90 years, the time-honoured tradition of “taking a trip up the tower” happily continues at Museum of Brisbane.<br /> <br /> The Clock Tower Tour includes a ride in one of Brisbane’s oldest working cage lifts, a look behind Australia’s largest analogue clock faces and time to explore the observation platform that shares a unique perspective of the city. See if you can catch a glimpse of the bells!<br /> <br /> <strong>Location</strong>: Tour begins from Museum of Brisbane reception on Level 3 of City Hall.","location":"Museum of Brisbane, Brisbane City","webLink":"","startDateTime":"2021-06-13T00:00:00","endDateTime":"2021-06-14T00:00:00","dateTimeFormatted":"Sunday, June 13, 2021","allDay":true,"startTimeZoneOffset":"+1000","endTimeZoneOffset":"+1000","canceled":false,"openSignUp":false,"reservationFull":false,"pastDeadline":false,"requiresPayment":false,"refundsAllowed":false,"waitingListAvailable":false,"signUpUrl":"https://eventactions.com/eareg.aspx?ea=Rsvp&invite=0tva7etjn38te1bve2yj59425pupt7wvscmr1z6depcj9ctnrh7r","repeatingRegistration":0,"repeats":"Every Sunday, Tuesday, Wednesday, Thursday, Friday and Saturday through June 30, 2021","seriesID":152913560,"eventImage":{"url":"https://www.trumba.com/i/DgDhxtvzZEBEz%2AjAEUDofPUE.jpeg","size":{"width":1290,"height":775}},"detailImage":{"url":"https://www.trumba.com/i/DgDhxtvzZEBEz%2AjAEUDofPUE.jpeg","size":{"width":1290,"height":775}},"customFields":[{"fieldID":22503,"label":"Venue","value":"Museum of Brisbane, Brisbane City","type":17},{"fieldID":22505,"label":"Venue address","value":"Museum of Brisbane, Brisbane City Hall, 64 Adelaide Street, Brisbane City","type":9},{"fieldID":21859,"label":"Event type","value":"Family events, Free","type":17},{"fieldID":22177,"label":"Cost","value":"Free","type":0},{"fieldID":23562,"label":"Age","value":"Suitable for all ages","type":0},{"fieldID":22732,"label":"Bookings","value":"Book via the Museum of Brisbane website.","type":1},{"fieldID":51540,"label":"Bookings required","value":"Yes","type":3}],"permaLinkUrl":"https://www.brisbane.qld.gov.au/trumba?trumbaEmbed=view%3devent%26eventid%3d152913573","eventActionUrl":"https://eventactions.com/eventactions/brisbane-city-council#/actions/cvuzsak1g2d45mndcjwkp24nfw","categoryCalendar":"Brisbane's calendar|Museum of Brisbane","registrationTransferTargetCount":0,"regAllowChanges":true}]
Code so far:
<?php
$output = file_get_contents("Events.json");
$decode = json_decode($output, true);
for($i = 0; $i < count($decode); $i++) {
if($decode[$i]['customFields'][$i]['type'] == 9){
echo $decode[$i]['customFields'][$i]['label'][$i]['value'];
}
echo "<br>";
}
?>
You're using the $i loop counter twice in the same expression, but the second time you use it it's pointing at non-existent elements. The snippet below 1) treats JSON objects as objects (I find it less confusing when matching code to data), and 2) uses foreach to iterate over the arrays.
I've also extracted the latitude and longitude for you into $latlong
Try this:
$decode = json_decode($json);
foreach ($decode as $event) {
foreach ($event->customFields as $field) {
if ($field->type == 9) {
echo $field->value."\n";
if (preg_match('/href="(.*?)"/', $field->value, $matches)){
preg_match('/q=([\-\.0-9]*),([\-\.0-9]*)/',$matches[1], $latlong);
array_shift($latlong);
var_dump($latlong);
}
break;
}
}
}
Output
Museum of Brisbane, Brisbane City Hall, 64 Adelaide Street, Brisbane City
array(2) {
[0]=>
string(11) "-27.4693454"
[1]=>
string(11) "153.0216909"
}
Demo:https://3v4l.org/AkRvI

read .txt file content into php multi dimensional array

I have a .txt file with content like this
Abuja, the Federal Capital Territory has -- -- -- -- -- area Council
A. 4
B. 6
C. 7
D. 2
ANSWER: B
The Federal Capital Territory is associated with-- -- -- -- -- -- -- vegetation belt
A. Sahel savanna
B. Rainforest
C. Guinea savanna
D. Sudan savanna
ANSWER: C
The most significant factor responsible for the ever increasing population of FCT is
A. High birth rate
B. Immigration
C. Death rate
D. CENSUS
ANSWER: B
i would love to read the file content into a multi dimensional array so i can get each questions, its answers and the correct answer for each of the questions.
i have tried this:-
$array=explode("\n", file_get_contents('file.txt'));
print_r($array);
but it doesn't give me what i want..
Alives answer gives a result that you probably can work with, but I think associative array is probably the way to go.
I look at each line to see if it has a question number => add new item in array with question number and question text.
If first char is letter and second is a dot, it's an answer => add answer letter as key and text as value.
If it's none of above it's the answer text => add key with ANSWER and value as the correct answer.
I use explode to split the lines. The third argument tells how many parts to split the string in.
With "2" it splits at first space meaning I have the question# as item 1 and question text as item 2 in the array.
https://3v4l.org/ZqppN
// $str = file_get_contents("text.txt");
$str = "1. Abuja, the Federal Capital Territory has -- -- -- -- -- area Council
A. 4
B. 6
C. 7
D. 2
ANSWER: B
2. The Federal Capital Territory is associated with-- -- -- -- -- -- -- vegetation belt
A. Sahel savanna
B. Rainforest
C. Guinea savanna
D. Sudan savanna
ANSWER: C
3. The most significant factor responsible for the ever increasing population of FCT is
A. High birth rate
B. Immigration
C. Death rate
D. CENSUS
ANSWER: B";
$arr = explode("\n", $str);
$res=[];
Foreach($arr as $line){
If($line != ""){
If(is_numeric($line[0])){
Preg_match("/^\d+/", $line, $num);
$res[$num[0]] =["QUESTION" =>explode(" ", $line,2)[1]];
$q = $num[0];
}Else if(ctype_alpha($line[0]) && $line[1] == "."){
$res[$q][$line[0]] = explode(" ", $line, 2)[1];
}Else{
$res[$q]["ANSWER"] = trim(explode(":", $line, 2)[1]);
}
}
}
Var_dump($res);
Try This..
$array=explode("\n", file_get_contents('file.txt'));
$array = array_filter(array_map('trim',$array));
$chunk_array = array_chunk($array, 6);
foreach($chunk_array as $key => $value){
$final_array[$key]['question'] = $value[0];
$final_array[$key]['options'] = array_slice($value, 1, -1);
$final_array[$key]['answer'] = end($value);
}
echo '<pre>';
print_r($final_array);
echo '</pre>';

web scraping : how would you detect new items in a list?

I'm working on some PHP code that would grab a music playlist from a remote radio page - which means it is continuously updated.
I would like to store the tracks history in my database.
My problem is that I need to detect when new entries have been added to the remote tracklist, knowing that :
I don't know how often the remote page will be updated
I don't know how many tracks are displayed on the remote page. Sometimes it will be a single track, sometimes it will be a few dozen.
A same track could show up several times.
For example, I will get this data when grabbing the page for the first time :
Dead Combo — Esse Olhar Que Era Só Teu
Myron & E — If I Gave You My Love
Hooverphonic — Badaboum
Alain Chamfort — Bambou - Pilooski / Jayvich Reprise
William Onyeabor — Atomic Bomb
Curtis Mayfield — Move on up - Extended version
Mos Def — Ms. Fat Booty
Nicki Minaj — Feeling Myself
Disclosure — You & Me (Flume remix)
Otis Redding — My Girl - Remastered Mono
Then on the second time I'll get :
Charles Aznavour — Emmenez moi
Mos Def — Ms. Fat Booty
Rag'n'Bone Man — Human
Bernard Lavilliers — Idées noires
Julien Clerc — Ma préférence
The Rolling Stones — Just Your Fool
Dead Combo — Esse Olhar Que Era Só Teu
Myron & E — If I Gave You My Love
Hooverphonic — Badaboum
Alain Chamfort — Bambou - Pilooski / Jayvich Reprise
As you can see, the second time, I got entries 7->10 that seems to be the same than the first time (so entries 1->6 are the new ones); and track #2 was already played in the first list but seems to have been replayed since.
The new entries here would be :
Charles Aznavour — Emmenez moi
Mos Def — Ms. Fat Booty
Rag'n'Bone Man — Human
Bernard Lavilliers — Idées noires
Julien Clerc — Ma préférence
The Rolling Stones — Just Your Fool
I store tracks entries in a table, and tracks history in another one.
Structure of the tracks table
| ID | artist | title | album |
--------------------------------------------------
| 12 | Mos Def | Ms. Fat Booty | |
Structure of the tracks history table
| ID | track ID | time |
------------------------------------------
| 24 | 12 | 2016-07-03 13:40:26 |
Have you got any ideas on how I could handle this ?
Thanks !
I think you're trying to find the items at the end of the second list that match those at beginning of the first?
If you can store both lists in an array (the old list in $previous and the new list in $current), this function should help:
function find_old_tracks($previous, $current)
{
for ($i = 0; $i < count($current); $i++)
{
if ($previous[$i] == $current[$i]) continue;
return find_old_tracks($previous, array_slice($current, $i + 1));
}
return array_slice($previous, 0, $i);
}
It scans through $current for contiguous matches to $previous, recursing on the remainder every time it finds a missmatch. When I run this:
$previous = array(
'Dead Combo — Esse Olhar Que Era Só Teu',
'Myron & E — If I Gave You My Love',
'Hooverphonic — Badaboum',
'Alain Chamfort — Bambou - Pilooski / Jayvich Reprise',
'William Onyeabor — Atomic Bomb',
'Curtis Mayfield — Move on up - Extended version',
'Mos Def — Ms. Fat Booty',
'Nicki Minaj — Feeling Myself',
'Disclosure — You & Me (Flume remix)',
'Otis Redding — My Girl - Remastered Mono'
);
$current = array(
'Charles Aznavour — Emmenez moi',
'Mos Def — Ms. Fat Booty',
'Rag Bone Man — Human',
'Bernard Lavilliers — Idées noires',
'Julien Clerc — Ma préférence',
'The Rolling Stones — Just Your Fool',
'Dead Combo — Esse Olhar Que Era Só Teu',
'Myron & E — If I Gave You My Love',
'Hooverphonic — Badaboum',
'Alain Chamfort — Bambou - Pilooski / Jayvich Reprise'
);
$old_tracks = find_old_tracks($previous, $current);
$new_tracks = array_slice($current, 0, count($current) - count($old_tracks));
print "NEW TRACKS: " . implode($new_tracks, '; ');
print "<br /><br />OLD TRACKS: " . implode($old_tracks, '; ');
my output is:
NEW TRACKS: Charles Aznavour — Emmenez moi; Mos Def — Ms. Fat Booty;
Rag Bone Man — Human; Bernard Lavilliers — Idées noires; Julien Clerc
— Ma préférence; The Rolling Stones — Just Your Fool
OLD TRACKS: Dead Combo — Esse Olhar Que Era Só Teu; Myron & E — If I
Gave You My Love; Hooverphonic — Badaboum; Alain Chamfort — Bambou -
Pilooski / Jayvich Reprise
You can do what you like with that info on the database end.

json string is showing blank why is not getting decoded

i have json string but when i am getting it json_decode() it is showing blank.
$str = '[{"actcode":"Auck4","actname":"Sky Tower","date":"","time":"","timeduration":"","adult":"0","adultprice":"28","child":"0","childprice":"0","description":"Discover the best of Auckland in half a day. Soak up spectacular sights on this scenic tour, from heritage-listed buildings on Queen Street to the stunning Viaduct Harbour and panoramic vistas from the Sky Tower observation deck.
Start your tour with a hotel pick-up and travel through Auckland?s dynamic Central Business District. Travel across the iconic Auckland Harbour Bridge and admire stunning city views. Then, return to the city centre and visit the vibrant precinct of Wynyard Quarter. Here, wander among the sculptures and enjoy the happenings on the water of Viaduct Harbour.
Continue to Queen Street, also known as the ?Golden Mile? of Aucklands business and shopping district. Marvel at historic buildings like the Ferry Terminal building before visiting the Auckland Museum. Here, explore fascinating exhibits paying tribute to New Zealands natural, Maori and European histories. Afterwards, travel along Aucklands most expensive residential streets with fantastic views of the Waitemata Harbour and its islands.
Your tour ends at Sky Tower, the tallest free-standing structure in the Southern Hemisphere. Take in breathtaking 360-degree views of the city and its surroundings. In the afternoon, continue your own exploration of Auckland."}]';
i tried the below code
$array = json_decode($str,true);
echo print_r($array);
this one too
$str1 = trim($str);
$array = json_decode($str1,true);
echo print_r($array);
but the string si showing blank
try this one.
$string = mysql_real_escape_string($str);
$findsym = array('\r', '\n');
$removesym = array("", "");
$strdone = stripslashes(str_replace($findsym,$removesym,strip_tags($string)));
$jsonarray = json_decode($strdone,true);
echo "<pre>"; echo print_r($jsonarray);

php: count instances of words in a given string then return top 5 which match in another array

php: sort and count instances of words in a given string
In this article, I have know how to count instances of words in a given string and sort by frequency. Now I want make a further work, match the result words into anther array ($keywords), then only get the top 5 words. But I do not know how to do that, open a question. thanks.
$txt = <<<EOT
The 2013 Monaco Grand Prix (formally known as the Grand Prix de Monaco 2013) was a Formula One motor race that took place on 26 May 2013 at the Circuit de Monaco, a street circuit that runs through the principality of Monaco. The race was won by Nico Rosberg for Mercedes AMG Petronas, repeating the feat of his father Keke Rosberg in the 1983 race. The race was the sixth round of the 2013 season, and marked the seventy-second time the Monaco Grand Prix has been held. Rosberg had started the race from pole.
Background
Mercedes protest
Just before the race, Red Bull and Ferrari filed an official protest against Mercedes, having learned on the night before the race of a three-day tyre test undertaken by Pirelli at the venue of the last grand prix using Mercedes' car driven by both Hamilton and Rosberg. They claimed this violated the rule against in-season testing and gave Mercedes a competitive advantage in both the Monaco race and the next race, which would both be using the tyre that was tested (with Pirelli having been criticised following some tyre failures earlier in the season, the tests had been conducted on an improved design planned to be introduced two races after Monaco). Mercedes stated the FIA had approved the test. Pirelli cited their contract with the FIA which allows limited testing, but Red Bull and Ferrari argued this must only be with a car at least two years old. It was the second test conducted by Pirelli in the season, the first having been between race 4 and 5, but using a 2011 Ferrari car.[4]
Tyres
Tyre supplier Pirelli brought its yellow-banded soft compound tyre as the harder "prime" tyre and the red-banded super-soft compound tyre as the softer "option" tyre, just as they did the previous two years. It was the second time in the season that the super-soft compound was used at a race weekend, as was the case with the soft tyre compound.
EOT;
$words = array_count_values(str_word_count($txt, 1));
arsort($words);
var_dump($words);
$keywords = array("Monaco","Prix","2013","season","Formula","race","motor","street","Ferrari","Mercedes","Hamilton","Rosberg","Tyre");
//var_dump($words) which should match in $keywords array, then get top 5 words.
You already have $words as an associative array, indexed by the word and with the count as the value, so we use array_flip() to make your $keywords array an associative array indexed by word as well. Then we can use array_intersect_key() to return only those entries from $words that have a matching index entry in our flipped $keywords array.
This gives a resulting $matchWords array, still keyed by the word, but containing only those entries from the original $words array that match $keywords; and still sorted by frequency.
We then simply use array_slice() to extract the first 5 entries from that array.
$matchWords = array_intersect_key(
$words,
array_flip($keywords)
);
$matchWords = array_slice($matchWords, 0, 5);
var_dump($matchWords);
gives
array(5) {
'race' =>
int(11)
'Monaco' =>
int(7)
'Mercedes' =>
int(5)
'Rosberg' =>
int(4)
'season' =>
int(4)
}
Caveat: You could have problems with case-sensitivity. "Race" !== "race", so the $words = array_count_values(str_word_count($txt, 1)); line will treat these as two different words.

Categories