Querying for keywords in photos using DynamoDB and PHP - php

Requirement: I'm storing photos in S3 and want the user to be able to lookup all photos that contain a keyword they type in on a webpage.
My solution: I have created 2 tables:
tblPhotos:
Primary Key=HashKey: PhotoID
Fields:
PhotoID,
S3Location
tblKeywords:
Primary Key=HashKey (HashID) and RangeKey (Keyword)
Fields:
HashID (always set to 1),
Keyword (one big string; the keyword followed by PhotoID)
S3Location
I'm using the following php code to find all records that have my keyword in it:
$table_name = 'tblKeywords';
$keywordresponse = $dynamodb->query(array(
'TableName' => $table_name,
'HashKeyValue' => array(AmazonDynamoDB::TYPE_STRING => '1'),
'RangeKeyCondition' => array(
'ComparisonOperator' => AmazonDynamoDB::CONDITION_BEGINS_WITH,
'AttributeValueList' => array(
array(AmazonDynamoDB::TYPE_STRING => $_GET['sSearch'])
),
)
));
if ($keywordresponse->isOK())
{
foreach ($keywordresponse->body->Items as $item)
{
$aaData[] = array(
'Keyword' => (string) $item->Keyword->{AmazonDynamoDB::TYPE_STRING},
'S3Location' => (string) $item->S3Location->{AmazonDynamoDB::TYPE_STRING}
);
}
echo json_encode(array('keywordstatus' => 1, 'aaData' => $aaData));
}
else
{
echo json_encode(array('keywordstatus' => 0));
}
As you can see I'm passing sSearch as the keyword to filter results on. My question is, do you know a better way to achieve this filtering of records using DynamoDB query? I'm trying to avoid a scan as its very inefficient, although my solution above whilst it works doesn't seem particularly elegant!

If your primary goal is to be able to return all the photos for a given key word I think the following schema would make more sense:
HashKey = Keyword
RangeKey = PhotoID
Additional possible fields: S3Name (if it is not the same as PhotoID)

Related

Efficient solution to generating an array in PHP, which extracts unique data from one array, based on data from another

Writing in PHP, I have 2 arrays, each created from SQL queries.
The first query runs through a table that has multiple pieces of data that correspond to various quiz attempts. The table has a column for the user's Email, the activity ID (which represents a quiz attempt) and another 2 columns for data relating to the attempt (for example 'percentage achieved' or 'quiz ID'):
UserEmail ActID ActKey ActMeta
joB#gm.com 2354 Percentage 98
joB#gm.com 2354 Quiz ID 4
boM#hm.com 4567 Percentage 65
boM#hm.com 4567 Quiz ID 7
Once queried, this first array ($student_quiz_list) stores the selected data in the form of
[[UserEmail, ActID, ActKey, ActMeta], [UserEmail, ActID, ActKey, ActMeta], [UserEmail, ActID, ActKey, ActMeta]...]
where each pair of sub-arrays corresponds to a single quiz attempt.
The second table that is queried has two columns that relate to the quizzes themselves. The first column is the Quiz ID and the second is the Quiz name.
Quiz ID Quiz Name
4 Hardware
7 Logic
Once queried, this second array ($quiz_list) stores the selected data in the form of
[[ID, Name], [ID, Name]...]
What I need to do is create a 3rd array (from the 2 above) which holds the user's email and percentage score
[email, percentage], [email, percentage]...]
but with each sub-array corresponding to a unique actID (so basically the user's percentage in each quiz they attempted without duplicates) and (this is the challenging bit) only for quizzes with certain ID values, in this case, let's say quiz ID 4.
In PHP, what would be the most efficient solution to this? I continually create arrays with duplicates and cannot find a neat solution which provides the outcome desired.
Any help would be greatly received.
Try this code as the example and let me know.
$student_quiz_list=array(
array(
'UserEmail'=>'joB#gm.com','ActID'=>'2354','ActKey'=>'Percentage','ActMeta'=>'90',
),
array(
'UserEmail'=>'joB#gm.com','ActID'=>'2354','ActKey'=>'QuizID','ActMeta'=>'4',
),
array(
'UserEmail'=>'boM#hm.com','ActID'=>'4567','ActKey'=>'Percentage','ActMeta'=>'98',
),
array(
'UserEmail'=>'boM#hm.com','ActID'=>'4567','ActKey'=>'QuizID','ActMeta'=>'7',
),
);
$final_array=array();
foreach( $student_quiz_list as $row){
if($row['ActKey']=='Percentage'){
$final_array[]=array('UserEmail'=>$row['UserEmail'],
'ActMeta'=>$row['ActMeta']
) ;
}
}
echo"<pre>"; print_r($final_array); echo"</pre>";
As commenter #Nico Haase suggested, you can do most of the logic in SQL. You didn't respond to my comment, so I suppose a user can have multiple attempts per quiz ID:
SELECT
UserEmail,
ActMeta
FROM
your_table # replace with your table name
WHERE
ActKey = 'Percentage'
AND ActID IN (
# subselection with table alias
SELECT
t2.ActID
FROM
your_table t2 # replace with your table name
WHERE
t2.ActKey = 'Quiz ID'
AND t2.ActMeta = 2 # insert your desired quiz ID here
AND t2.ActID = ActID
)
(Query tested with MySQL/MariaDB)
For the case that you cannot change the SQL part, here is how you can process your data in PHP. But consider that a large dataset could exceed your server capabilities, so I would definitely recommend the solution above:
// Your sample data
$raw = [
['UserEmail' => 'joB#gm.com', 'ActID' => 2354, 'ActKey' => 'Percentage' , 'ActMeta' => 98],
['UserEmail' => 'joB#gm.com', 'ActID' => 2354, 'ActKey' => 'Quiz ID', 'ActMeta' => 4],
['UserEmail' => 'joB#gm.com', 'ActID' => 4567, 'ActKey' => 'Percentage' , 'ActMeta' => 65],
['UserEmail' => 'joB#gm.com', 'ActID' => 4567, 'ActKey' => 'Quiz ID', 'ActMeta' => 7],
];
// Extract the corresponding ActIDs for a QuizID
$quiz_id = 4;
$act_ids = array_column(
array_filter(
$raw,
function($item) use ($quiz_id) {
return $item['ActMeta'] == $quiz_id;
}
),
'ActID'
);
// Get the entries with ActKey 'Percentage' and an ActID present in the previously extracted set
$percentage_entries = array_filter(
$raw,
function($item) use ($act_ids) {
return $item['ActKey'] === 'Percentage' && in_array($item['ActID'], $act_ids);
}
);
// Map over the previous set to get the array into the final form
$final = array_map(
function($item) {
return [$item['UserEmail'], $item['ActMeta']];
},
$percentage_entries
);

Dynamically add columns to query results via CakePHP 3 ORM queries

I'm trying to write a query using CakePHP 3.7 ORM where it needs to add a column to the result set. I know in MySQL this sort of thing is possible: MySQL: Dynamically add columns to query results
So far I've implemented 2 custom finders. The first is as follows:
// src/Model/Table/SubstancesTable.php
public function findDistinctSubstancesByOrganisation(Query $query, array $options)
{
$o_id = $options['o_id'];
$query = $this
->find()
->select('id')
->distinct('id')
->contain('TblOrganisationSubstances')
->where([
'TblOrganisationSubstances.o_id' => $o_id,
'TblOrganisationSubstances.app_id IS NOT' => null
])
->orderAsc('Substances.app_id')
->enableHydration(false);
return $query;
}
The second custom finder:
// src/Model/Table/RevisionSubstancesTable.php
public function findProductNotifications(Query $query, array $options)
{
$date_start = $options['date_start'];
$date_end = $options['date_end'];
$query = $this
->find()
->where([
'RevisionSubstances.date >= ' => $date_start,
'RevisionSubstances.date <= ' => $date_end
])
->contain('Substances')
->enableHydration(false);
return $query;
}
I'm using the finders inside a Controller to test it out:
$Substances = TableRegistry::getTableLocator()->get('Substances');
$RevisionSubstances = TableRegistry::getTableLocator()->get('RevisionSubstances');
$dates = // method to get an array which has keys 'date_start' and 'date_end' used later.
$org_substances = $Substances->find('distinctSubstancesByOrganisation', ['o_id' => 123);
if (!$org_substances->isEmpty()) {
$data = $RevisionSubstances
->find('productNotifications', [
'date_start' => $dates['date_start'],
'date_end' => $dates['date_end']
])
->where([
'RevisionSubstances.substance_id IN' => $org_substances
])
->orderDesc('RevisionSubstances.date');
debug($data->toArray());
}
The logic behind this is that I'm using the first custom finder to produce a Query Object which contains unique (DISTINCT in SQL) id fields from the substances table, based on a particular company (denoted by the o_id field). These are then fed into the second custom finder by implementing where(['RevisionSubstances.substance_id IN' ....
This works and gives me all the correct data. An example of the output from the debug() statement is as follows:
(int) 0 => [
'id' => (int) 281369,
'substance_id' => (int) 1,
'date' => object(Cake\I18n\FrozenDate) {
'time' => '2019-09-02T00:00:00+00:00',
'timezone' => 'UTC',
'fixedNowTime' => false
},
'comment' => 'foo',
'substance' => [
'id' => (int) 1,
'app_id' => 'ID000001',
'name' => 'bar',
'date' => object(Cake\I18n\FrozenDate) {
'time' => '2019-07-19T00:00:00+00:00',
'timezone' => 'UTC',
'fixedNowTime' => false
}
]
],
The problem I'm having is as follows: Each of the results returned contains a app_id field (['substance']['app_id'] in the array above). What I need to do is perform a count (COUNT() in MySQL) on another table based on this, and then add that to the result set.
I'm unsure how to do this for a couple of reasons. Firstly, my understanding is that custom finders return Query Objects, but the query is not executed at this point. Because I haven't executed the query - until calling $data->toArray() - I'm unsure how I would refer to the app_id in a way where it could be referenced per row?
The equivalent SQL that would give me the required results is this:
SELECT COUNT (myalias.app_id) FROM (
SELECT
DISTINCT (tbl_item.i_id),
tbl_item.i_name,
tbl_item.i_code,
tbl_organisation_substances.o_id,
tbl_organisation_substances.o_sub_id,
tbl_organisation_substances.app_id,
tbl_organisation_substances.os_name
FROM
tbl_organisation_substances
JOIN tbl_item_substances
ON tbl_organisation_substances.o_sub_id = tbl_item_substances.o_sub_id
JOIN tbl_item
ON tbl_item.i_id = tbl_item_substances.i_id
WHERE
tbl_item.o_id = 1
AND
tbl_item.date_valid_to IS NULL
AND
tbl_organisation_substances.app_id IS NOT NULL
ORDER BY
tbl_organisation_substances.app_id ASC
) AS myalias
WHERE myalias.app_id = 'ID000001'
This does a COUNT() where the app_id is ID000001.
So in the array I've given previously I need to add something to the array to hold this, e.g.
'substance' => [
// ...
],
'count_app_ids' => 5
(Assuming there were 5 rows returned by the query above).
I have Table classes for all of the tables referred to in the above query.
So my question is, how do you write this using the ORM, and add the result back to the result set before the query is executed?
Is this even possible? The only other solution I can think of is to write the data (from the query I have that works) to a temporary table and then perform successive queries which UPDATE with the count figure based on the app_id. But I'm really not keen on that solution because there are potentially huge performance problems of doing this. Furthermore I'd like to be able to paginate my query so ideally need everything confined to 1 SQL statement, even if it's done across multiple finders.
I've tagged this with MySQL as well as CakePHP because I'm not even sure if this is achievable from a MySQL perspective although it does look on the linked SO post like it can be done? This has the added complexity of having to write the equivalent query using Cake's ORM.

data escaping remove for specific filed in cakephp

I am using subquery for id field.
$db = $this->AccountRequest->getDataSource();
$subQuery = $db->buildStatement(
array(
'fields' => array('MAX(id)'),
'table' => $db->fullTableName($this->AccountRequest),
'alias' => 'MaxRecord',
'limit' => null,
'offset' => null,
'order' => null,
'group' => array("user_id")
),
$this->AccountRequest
);
$searching_parameters = array(
#"AccountRequest.id IN " => "(SELECT MAX( id ) FROM `account_requests` GROUP BY user_id)"
"AccountRequest.id IN " => "(".$subQuery.")"
);
$this->Paginator->settings = array(
#'fields' => array('AccountRequest.*'),
'conditions' => $searching_parameters,
'limit' => $limit,
'page' => $page_number,
#'group' => array("AccountRequest.user_id"),
'order' => array(
'AccountRequest.id' => 'DESC'
)
);
$data = $this->Paginator->paginate('AccountRequest');
This structure is producing a query is:
SELECT
`AccountRequest`.`id`,
`AccountRequest`.`user_id`,
`AccountRequest`.`email`,
`AccountRequest`.`emailchange`,
`AccountRequest`.`email_previously_changed`,
`AccountRequest`.`first_name`,
`AccountRequest`.`first_namechange`,
`AccountRequest`.`f_name_previously_changed`,
`AccountRequest`.`last_name`,
`AccountRequest`.`last_namechange`,
`AccountRequest`.`l_name_previously_changed`,
`AccountRequest`.`reason`,
`AccountRequest`.`status`,
`AccountRequest`.`created`,
`AccountRequest`.`modified`
FROM
`syonserv_meetauto`.`account_requests` AS `AccountRequest`
WHERE
`AccountRequest`.`id` IN '(SELECT MAX(id) FROM `syonserv_meetauto`.`account_requests` AS `MaxRecord` WHERE 1 = 1 GROUP BY user_id)'
ORDER BY
`AccountRequest`.`id` DESC
LIMIT 25
In the subquery, its add an extra single quote so it's producing an error.
So, How can I remove these single quotes from this subquery?
Thanks
What are you trying to achieve with the sub query?
The MAX(id) just means it will pull the id with the largest value AKA the most recent insert. The sub query is completely redundant when you can just ORDER BY id DESC.
using MAX() will return only one record, if this is what you want to achieve you can replicate by adding LIMIT 1
If the sub query is just an example and is meant to be from another table I would just run the query that gets the most recent id before running the main query. Getting the last inserted id in a separate query is very quick and I cant see much of a performance loss. I think it will result in cleaner code that`s easier to follow to.
edit 1: From the comments it sounds like all your trying to get is a particular users latest account_requests.
You dont need the sub query at all. My query below will get the most recent account record for the user id you choose.
$this->Paginator->settings = array(
'fields' => array('AccountRequest.*'),
'conditions' => array(
'AccountRequest.user_id' => $userID // you need to set the $userID
)
'page' => $page_number,
'order' => array(
'AccountRequest.id DESC' //shows most recent first
),
'limit' => 1 // set however many you want the maximum to be
);
The other thing you cold be meaning is to get multiple entries from multiple users and display them in order of user first and then the order of recent to old for that user. MYSQL lets you order by more than one field, in that case try:
$this->Paginator->settings = array(
'conditions' => array(
'AccountRequest.user_id' => $userID // you need to set the $userID
)
'page' => $page_number,
'order' => array(
'AccountRequest.user_id', //order by the users first
'AccountRequest.id DESC' //then order there requests by recent to old
)
);
If the example data you have added into the question is irrelevant and you are only concerned about how to do nested subqueries it has already been answered here
CakePHP nesting two select queries
However I still think based on the data in the question you can avoid using a nested query.

MySQL result multiple arrays

i.e : i have 2 tables
Product ( id, name )
Photo ( id, name, photo_id )
And I need to get result in array like this:
array(
'id' => 1,
'name' => product,
'photos' => array(
array('id' => 1, 'name' => 'photo1')
array('id' => 2, 'name' => 'photo2')
)
}
Is it possible in PHP using clear SQL?
I know that is possible to get 2 arrays and connect it but I have many records and I dont want to wase time to quering.
You have to add a foreign_key in your photo table "product_id".
Then create a method getPhotos() in your Product class with will collect all photos for your product.
Is it possible in PHP using clear SQL?
Not in a single SQL call. With a single call, this is the closest you can get:
array(
'id' => 1,
'name' => product,
'photo_id' => 1,
'photo_name' => 'photo1')
),
array(
'id' => 1,
'name' => product,
'photo_id' => 2,
'photo_name' => 'photo2')
)
Your only choice for the format you want is to run queries separately or to combine them into the data structure you want.
As mentioned, this is not possible with SQL. SQL is based on the relational model which is a 1-Normal-Form data model. That means, the result relation is also flat (no nested relations in a relation).
However, there are good frameworks which generate intermediary models in your corresponding target language (e.g. Python, Java, ...) that circumvent the impression of a flat data model. Check for example Django.
https://docs.djangoproject.com/en/1.8/topics/db/models/
Moo

How do I combine two arrays in PHP based on a common key?

I'm trying to join two associative arrays together based on an entry_id key. Both arrays come from individual database resources, the first stores entry titles, the second stores entry authors, the key=>value pairs are as follows:
array (
'entry_id' => 1,
'title' => 'Test Entry'
)
array (
'entry_id' => 1,
'author_id' => 2
I'm trying to achieve an array structure like:
array (
'entry_id' => 1,
'author_id' => 2,
'title' => 'Test Entry'
)
Currently, I've solved the problem by looping through each array and formatting the array the way I want, but I think this is a bit of a memory hog.
$entriesArray = array();
foreach ($entryNames as $names) {
foreach ($entryAuthors as $authors) {
if ($names['entry_id'] === $authors['entry_id']) {
$entriesArray[] = array(
'id' => $names['entry_id'],
'title' => $names['title'],
'author_id' => $authors['author_id']
);
}
}
}
I'd like to know is there an easier, less memory intensive method of doing this?
Is it possible you can do a JOIN in the SQL used to retrieve the information from the database rather than fetching the data in multiple queries? It would be much faster and neater to do it at the database level.
Depending on your database structure you may want to use something similar to
SELECT entry_id, title, author_id
FROM exp_weblog_data
INNER JOIN exp_weblog_titles
ON exp_weblog_data.entry_id = exp_weblog_titles.entry_id
WHERE field_id_53 = "%s" AND WHERE entry_id IN ("%s")
Wikipedia has a bit on each type of join
Otherwise the best option may be to restructure the first array so that it is a map of the entry_id to the title
So:
array(
array(
'entry_id' => 1,
'title' => 'Test Entry 1',
),
array(
'entry_id' => 3,
'title' => 'Test Entry 2',
),
)
Would become:
array(
1 => 'Test Entry 1',
3 => 'Test Entry 2',
)
Which would mean the code required to merge the arrays is simplified to this:
$entriesArray = array();
foreach ($entryAuthors as $authors) {
$entriesArray[] = array(
'id' => $authors['entry_id'],
'title' => $entryNames[$authors['entry_id']],
'author_id' => $authors['author_id']
);
}
I've rearranged some of my code to allow for a single SQL query, which looks like:
$sql = sprintf('SELECT DISTINCT wd.field_id_5, wd.entry_id, mb.email, mb.screen_name
FROM `exp_weblog_data` wd
INNER JOIN `exp_weblog_titles` wt
ON wt.entry_id=wd.entry_id
INNER JOIN `exp_members` mb
ON mb.member_id=wt.author_id
WHERE mb.member_id IN ("%s")
AND wd.entry_id IN ("%s")',
join('","', array_unique($authors)),
join('","', array_unique($ids))
);
This solves my problem quite nicely, even though I'm making another SQL call. Thanks for trying.
In response to your comment on Yacoby's post, will this SQL not give the output you are after?
SELECT exp_weblog_data.entry_id, exp_weblog_data.field_id_5 AS title_ie, exp_weblog_titles.author_id
FROM exp_weblog_data LEFT JOIN exp_weblog_titles
ON exp_weblog_data.entry_id = exp_weblog_titles.entry_id
WHERE exp_weblog_data.field_id_53 = "%S"
Every entry in exp_weblog_data where field_id_53 = "%S" will be joined with any matching authors in exp_weblog_titles, if a an entry has more than one author, two or more rows will be returned.
see http://php.net/manual/en/function.array-merge.php

Categories