Parse HTML in PHP and return JSON - php

I am using PHP Simple HTML DOM Parser in my PHP script to parse information from a website into a JSON object. My JSON object should be formatted like this in the end:
Array with maximum 5 objects (Monday to Friday) or less (Tuesday–Friday etc).
All of these objects should have two arrays, one called food1 and one called food 2. Both of these arrays should contain multiple food names and their prices. I think in JSON it would look like this:
{
"day" : [
{
"food1" : [
{
"price" : "1.00",
"foodname" : "test"
},
{
"price" : "1.00",
"foodname" : "test"
}
],
"food2" : [
{
"price" : "2.00",
"foodname" : "test2"
},
{
"price" : "2.00",
"foodname" : "test2"
}
]
},
{
"food1" : [
{
"price" : "1.00",
"foodname" : "test"
},
{
"price" : "1.00",
"foodname" : "test"
}
],
"food2" : [
{
"price" : "2.00",
"foodname" : "test2"
},
{
"price" : "2.00",
"foodname" : "test2"
}
]
},
{
"food1" : [
{
"price" : "1.00",
"foodname" : "test"
},
{
"price" : "1.00",
"foodname" : "test"
}
],
"food2" : [
{
"price" : "2.00",
"foodname" : "test2"
},
{
"price" : "2.00",
"foodname" : "test2"
}
]
},
{
"food1" : [
{
"price" : "1.00",
"foodname" : "test"
},
{
"price" : "1.00",
"foodname" : "test"
}
],
"food2" : [
{
"price" : "2.00",
"foodname" : "test2"
},
{
"price" : "2.00",
"foodname" : "test2"
}
]
},
{
"food1" : [
{
"price" : "1.00",
"foodname" : "test"
},
{
"price" : "1.00",
"foodname" : "test"
}
],
"food2" : [
{
"price" : "2.00",
"foodname" : "test2"
},
{
"price" : "2.00",
"foodname" : "test2"
}
]
}
]
}
Anyway I previously only worked with Objective-C and having problems with solving this problem in PHP. I have also implemented a parser in Objective-C that works, but if the site changes their structure I would have to re-submit the whole app etc. That’s why I wanted to make a web service where I can dynamically change the parser outside of the app. All I got is this:
<?php
include('simple_html_dom.php');
$opts = array('http'=>array('header' => "User-Agent:MyAgent/1.0\r\n"));
$context = stream_context_create($opts);
$html = file_get_html('http://www.studentenwerk-karlsruhe.de/de/essen/?view=ok&STYLE=popup_plain&c=erzberger&p=1&kw=3',false,$context);
foreach($html->find('b') as $e)
echo $e;
?>
Which gives me all the food names but it isn’t sorted for the days and also not for the different food menus (there are two different menus on each day which are called food1 and food2 in my example JSON object).
In my Objective-C parser I just created a new day object when the food name is “SchniPoSa” and added all the following food names to food1 until there comes the food name “Salatbuffet” that and all the following food names I added to food2 array until there comes the next “SchniPoSa” food name. But this isn’t very good because the structure could change every day.
Also, I do not even know how to implement that in PHP. In my little PHP script I also don’t parse all the prices which are in the tag <span class="bgp price_1">.
Here is the website from which I want to parse the information:
http://www.studentenwerk-karlsruhe.de/de/essen/?view=ok&STYLE=popup_plain&c=erzberger&p=1&kw=3
Is there anyone who can help me with parsing the information in a valid JSON object like I described below?

Just saw you message and realised I hadn't gotten back to you about this.
Maybe this will lead you in the right direction:
<?php
$opts = array('http'=>array('header' => "User-Agent:MyAgent/1.0\r\n"));
$context = stream_context_create($opts);
$html = file_get_contents('http://www.studentenwerk-karlsruhe.de/de/essen/?view=ok&STYLE=popup_plain&c=erzberger&p=1&kw=3',false,$context);
libxml_use_internal_errors(true);
$dom = new DomDocument;
$dom->loadHTML($html);
$xpath = new DomXPath($dom);
$nodes = $xpath->query("//table[#class='easy-tab-dot']");
//header("Content-type: text/plain");
foreach ($nodes as $i => $node) {
$arr = array();
$children = $node->childNodes;
foreach ($children as $child) {
$tmp_doc = new DOMDocument();
$tmp_doc->appendChild($tmp_doc->importNode($child,true));
#echo $tmp_doc->saveHTML();
print_r( $child );
}
echo "#######################################################################################";
}

Related

MongoCursorTimeoutException with sort on _id

I have an Mongo collection containing ~7 millions events. To get the events that happend for an aggregate I have the following PHP code
$client = new MongoClient();
$db = $client->selectDB('db_name');
$collection = $db->selectCollection('events');
foreach($collection->find([
'headers.for' => '89d115f8-0b2f-470e-9495-2a07d9dfb942',
])->sort([
'headers.occurredOn' => 1,
'_id' => 1,
]) as $event) {
var_dump($event);
}
When I run the above PHP code I get an MongoCursorTimeoutException after 30 seconds.
But when I run the same code without a sort on _id, so:
$client = new MongoClient();
$db = $client->selectDB('db_name');
$collection = $db->selectCollection('events');
foreach($collection->find([
'headers.for' => '89d115f8-0b2f-470e-9495-2a07d9dfb942',
])->sort([
'headers.occurredOn' => 1,
]) as $event) {
var_dump($event);
}
The error does not occur and get instant results (which is one record).
So why does a MongoCursorTimeoutException occur when a sort on _id is added?
The indexes for the collection looks as follow
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "db.events"
},
{
"v" : 1,
"key" : {
"headers.occurredOn" : NumberLong(1),
"_id" : NumberLong(1)
},
"name" : "headers_occurredOn_1__id_1",
"ns" : "db.events"
},
{
"v" : 1,
"key" : {
"headers.for" : NumberLong(1)
},
"name" : "headers_for_1",
"ns" : "db.events"
}
]
Well the problem was the indexes I had.
{
"v" : 1,
"key" : {
"headers.occurredOn" : NumberLong(1),
"_id" : NumberLong(1)
},
"name" : "headers_occurredOn_1__id_1",
"ns" : "db.events"
}
After I dropped this one and added the following:
{
"v" : 1,
"key" : {
"headers.occurredOn" : NumberLong(1)
},
"name" : "headers.occurredOn_1",
"ns" : "db.events"
}
Everything went smooth

MongoDB aggregate sub-array as group _id

I'm having some trouble using the MongoDB aggregation framework to count event types in my database. How do I calculate the sum of the value.count field for each unique 3rd index of the _id.val field?
The basic structure of my data looks like:
{ _id: { evt: "click", val: [ "default", "125", "311", "1" ] }, value: { count: 1 } }
{ _id: { evt: "click", val: [ "default", "154", "321", "2" ] }, value: { count: 2 } }
{ _id: { evt: "click", val: [ "default", "192", "263", "1" ] }, value: { count: 4 } }
The values in the val field denote ["type","x","y","time"], respectively.
I'm trying to extract the 3rd index, or time value of the _id.val key. The output I'm looking to achieve:
1: 5
2: 2
I've been trying to do it via this PHP:
$ops2 = array(
array(
'$match' => $q2
),
array(
'$group' => array(
'_id' => array(
'evt' => '$_id.evt',
'time' => '$_id.val.3'
),
'count' => array('$sum' => '$value.count' )
)
)
);
But it doesn't appear to like the 3 index in the group array
The data you are working with looks like it has come as the output of a mapReduce operation already, since it has that specific "_id" and "value" structure that mapReduce prodcues. As such you may be better off going back to the logic of how that process is implemented and follow the same to just extract and total what you want, or at least change it's output form to this:
{
_id: {
evt: "click",
val: { "type": "default", "x": "125", "y": "311", "time": "1" }
},
value: { count: 1 }
},
{
_id: {
evt: "click",
val: { "type": "default", "x": "154", "y": "321", "time": "2" }
},
value: { count: 2 }
},
{
_id: {
evt: "click",
val: { "type": "default", "x": "192", "y": "263", "time": "1" }
},
value: { count: 4 }
}
As the problem is that the aggregation framework "presently" lacks the ability to address the "indexed" position of an array ( real "non-associative" array and not PHP array ) and would always return null when you try to do so.
Lacking the ability to go back to the original source or mapReduce operation, then you can write a mapReduce operation on this data to get the expected results ( shell representation since it's going to be a JavaScript anyway ):
db.collection.mapReduce(
function() {
emit({ evt: this._id.evt, time: this._id.val[3] }, this.value.count)
},
function(key,values) {
return Array.sum(values)
},
{ out: { inline: 1 } }
)
Which returns typical mapReduce output like this:
{
"_id" : {
"evt" : "click",
"time" : "1"
},
"value" : 5
},
{
"_id" : {
"evt" : "click",
"time" : "2"
},
"value" : 2
}
If you were able to at least transform the current output collection to the form suggested at first above, then you would run with the aggregation framework like this instead ( again common representaion ):
{ "$group": {
"_id": {
"evt": "$_id.evt",
"time": "$_id.val.time"
},
"count": { "$sum": "$value.count" }
}}
Which of course would yield from the altered data:
{ "_id" : { "evt" : "click", "time" : "2" }, "count" : 2 }
{ "_id" : { "evt" : "click", "time" : "1" }, "count" : 5 }
In future releases of MongoDB, there will be a $slice operator which allows the array handling, so with your current structure you could do this instead:
{ "$group": {
"_id": {
"evt": "$_id.evt",
"time": { "$slice": [ "$_id.val", 3,1 ] }
},
"count": { "$sum": "$value.count" }
}}
Which allows picking of the "third" index element from the array, albeit that this will of course still return an "array" as the element like this:
{ "_id" : { "evt" : "click", "time" : [ "2" ] }, "count" : 2 }
{ "_id" : { "evt" : "click", "time" : [ "1" ] }, "count" : 5 }
So right now, if you can change your initial mapReduce output then do it. Either to the form as shown here or just work with modifications to the initial query to get the end result you want here. Modifying to the recommened form will at least allow the .aggregate() command to work as is shown in the second example here.
If not, then mapReduce is still the only way at present for writing, as shown in the "first" example.
At first, I think you may have something wrong in your understanding of Mongo...Because each document in mongo should have its unique _id, to identify itself from others. So I have add a _id to each object, and change your origin "_id" field to "data". Now the structure is:
/* 1 */
{
"_id" : "ubLrDptWvJE7LZqDF",
"data" : {
"evt" : "click",
"val" : [ "default", "125", "311", "1" ]
},
"value" : {
"count" : 1
}
}
/* 2 */
{
"_id" : "C2QCEhvCsp3xG6EKZ",
"data" : {
"evt" : "click",
"val" : [ "default", "154", "321", "2" ]
},
"value" : {
"count" : 2
}
}
/* 3 */
{
"_id" : "bT72z7gMKoyX5JfHL",
"data" : {
"evt" : "click",
"val" : [ "default", "192", "263", "1" ]
},
"value" : {
"count" : 4
}
}
I am not sure how to do this query in PHP, Because I only know a little PHP...... But I could give you an example of using aggregation in Javascript, its code and output are as follows:
Here are some useful link: using mongo in PHP
I wish it can help you solve your problem perfectly :-)

Create a desired JSON format using data stored in PHP

So I have a bunch of arrays inside which I have all the data I require to pass to a third party app. Problem is that they need it in a specific JSON format, and I do not have an idea how I can do that. The data format they require is like:
{
"appData" : {
"appKey" : "blah blah",
"synth" : {
"synth1" : {
"mono" : [
{
"monoId" : "529",
"templates" : [
{
"monoSequenceMap" : [
{
"map" : {
"X" : "3",
"Y" : "1"
},
"position" : {
"scale" : "1",
"x1" : "100",
"x2" : "150",
"y1" : "2000",
"y2" : "2500"
}
},
{
"map" : {
"X" : "2",
"Y" : "4"
},
"position" : {
"scale" : "1",
"x1" : "200",
"x2" : "550",
"y1" : "1000",
"y2" : "1500"
}
},
{
"map" : {
"X" : "3",
"Y" : "3"
},
"position" : {
"scale" : "1.5",
"x1" : "300",
"x2" : "750",
"y1" : "1750",
"y2" : "1800"
}
},
{
"map" : {
"X" : "4",
"Y" : "1"
},
"position" : {
"scale" : "1.5",
"x1" : "680",
"x2" : "790",
"y1" : "1950",
"y2" : "1850"
}
}
],
"templateId" : "01_A_19"
}
]
}
],
"synthId" : "XXXXXXXXXX"
}
}
}
}
I just want some pointers on how to convert the data I have into this JSON string. I think I need to use json_encode. Should I create a new class called 'appData' class then create each object/array inside it? or should I just write a string in this format into a text file?
My problem is that I cannot wrap my head around having all these objects inside objects thing...like for e.g, in the JSON synth is an object which contains synth1, synth2 etc which will be objects which in turn will have mono which will be an array of objects...And I am not sure how to tackle that..
Any pointers is greatly appreciated!
Are your arrays multidimensional? Like:
$array = array(
"data_table_1" => array(
"item1" => "Item 1",
"item2" => "Item 2"
),
"data_table_1" => array(
"item1" => "Item 1",
"item2" => "Item 2"
)
);
If so, all you have to use is use json_encode and that will do all the encoding for you:
$json = #json_encode($array);
==== Edit ====
arrays do not have to be multidimensional. Even an array with a single key => value will work. Just be sure you have keys for values, so they're registered correctly.

MongoDB update sub document issue

I'm having Mongo structure like this,
"campaigns":[{
"summary" :
[ {
"postcode" : [ {
"id" : "71",
"name" : "Recall 1",
"qty" : 3,
"history" :
[ { "timestamp" : "2014-12-16 11:15:32",
{ "timestamp": "2014-12-16 11:15:53"
} ]
} ]
},
{
"postcode" :
[ {
"id" : "72",
"name" : "Recall 2",
"qty" : 1,
"history" : [ { "timestamp" : "2014-12-16 11:15:53" } ]
} ]
}]
I'm trying to i) increment qty of postcode.id : 72 ii) insert another timestamp for that same postcode id 72.
My code:
$collection->update(array("campaigns.task.id" => $b,"_id"=> new MongoId($objectid),"campaigns.0.summary.0.postcode.0.id" => $a), array('$inc' => array('campaigns.0.summary.0.postcode.0.qty' => 1)));
$collection->update(array("campaigns.task.id" => $b,"_id"=>new MongoId($objectid),"campaigns.0.summary.0.postcode.0.id" => $a),
array('$addToSet' => array('campaigns.0.summary.0.postcode.0.history' => array("timestamp"=>$now->format('Y-m-d H:i:s')))));
but postcode.id = 72 not gets updated, i'm confused with this nested subdocument can anyone suggest me the solution ?

Can't find proper criteria to update document in mongoDB

I have followed model stored in mongoDB:
{
"_id" : "some_table_name",
"content" : [{
"id" : "1",
"locname" : "KKH"
}, {
"id" : "2",
"locname" : "Singapore National Eye Centre"
}]
}
I try to find criteria to update 2nd element (id=2) aka add new String.
"new_element" : "foo"
So new view should be:
{
"_id" : "some_table_name",
"content" : [{
"id" : "1",
"locname" : "KKH"
}, {
"id" : "2",
"locname" : "Singapore National Eye Centre"
"new_element" : "foo"
}]
}
Form PHP
When I try to find 2nd node by id I use:
$array = $collection_bios2->findOne(
array("_id" => "some_table_name", "content.id" => "2"),
array("_id" => 0, "content.$" => 1)
);
But when I try to update it, new node enters under content:
$newdata = array('$set' => array("new_element" => "foo"));
$collection_bios2->update(
array("_id" => "some_table_name", "content.id" => "2"),
$newdata
);
I get:
{
"_id" : "some_table_name",
"content" : [{
"id" : "1",
"locname" : "KKH"
}, {
"id" : "2",
"locname" : "Singapore National Eye Centre"
}],
"new_element" : "foo"
}
Whats wrong in my implementation?
Please, help,
Maxim
You need to use the positional operator here:
array('$set'=>array('content.$.new_element':'foo'))
You can read more about it here: http://docs.mongodb.org/manual/reference/operator/positional/

Categories