Creating dataset for network graph using data from database - php

I get data from my database and loop through it and compare employees of one company with employees of other companies to create an array of nodes and egdes. Nodes being companies and edges being employees working for both companies.
Comparing so many variables seems to be slowing the process a lot and is really inefficient so I am looking for a better way of achieving said array/json object.
Here is how the database looks like http://imgur.com/8iVJfwW
The final json object for d3 should look like:
{"nodes":[{"fullName":"Anglo American plc"},{"fullName":"Associated British Foods plc"},{"fullName":"ARM Holdings plc"},{"fullName":"Dixons Carphone plc"},{"fullName":"Diageo plc"},{"fullName":"Direct Line Insurance Group PLC"},{"fullName":"easyJet plc"},{"fullName":"GKN plc"},{"fullName":"Hammerson plc"},{"fullName":"International Consolidated Airlines Group, S.A."},{"fullName":"Imperial Brands PLC"},{"fullName":"intu properties plc"},{"fullName":"Intertek Group plc"},{"fullName":"ITV plc"},{"fullName":"Johnson Matthey Plc"},{"fullName":"Kingfisher plc"},{"fullName":"Lloyds Banking Group plc"},{"fullName":"Mediclinic International plc"},{"fullName":"Merlin Entertainments plc"},{"fullName":"National Grid plc"},{"fullName":"Next Plc"},{"fullName":"Provident Financial plc"},{"fullName":"Pearson plc"},{"fullName":"Reckitt Benckiser Group plc"},{"fullName":"Royal Dutch Shell plc"},{"fullName":"RELX PLC"},{"fullName":"Rio Tinto plc"},{"fullName":"RSA Insurance Group plc"},{"fullName":"SABMiller plc"},{"fullName":"J Sainsbury plc"},{"fullName":"Sky plc"},{"fullName":"Standard Life plc"},{"fullName":"SSE plc"},{"fullName":"Severn Trent Plc"},{"fullName":"Travis Perkins plc"},{"fullName":"Tesco PLC"},{"fullName":"Taylor Wimpey plc"},{"fullName":"United Utilities Group PLC"},{"fullName":"Worldpay Group plc"},{"fullName":"Whitbread PLC"}],"edges":[{"source":0,"target":12,"officers":["PARKER, Thomas, Sir"]},{"source":0,"target":19,"officers":["STEVENS, Anne"]},{"source":0,"target":47,"officers":["GROTE, Byron"]},{"source":1,"target":14,"officers":["BASON, John"]},{"source":2,"target":13,"officers":["PUSEY, Stephen"]},{"source":2,"target":51,"officers":["KENNEDY, Christopher"]},{"source":3,"target":7,"officers":["BARKER, Glyn"]},{"source":3,"target":13,"officers":["WHEWAY, Jonathan"]},{"source":4,"target":9,"officers":["REYNOLDS, Paula"]}]};
What I am doing is executing this query:
SELECT
CD.Company_ID,
CD.Company_Name,
OD.Officer_Name,
CO.Officer_Role
FROM
Company_Details CD
INNER JOIN Company_Officer CO
ON CD.Company_ID = CO.Company_ID
INNER JOIN Officer_Details OD
ON CO.Officer_ID = OD.Officer_ID
WHERE CD.Company_Index='FTSE 100' AND
CO.Resigned_On='' AND
CO.Officer_ID IN
( SELECT CO2.officer_id
FROM Company_Officer CO2
INNER JOIN Company_Details CD2
ON CO2.Company_ID = CD2.Company_ID
WHERE CO2.Resigned_On='' AND CD2.Company_Index ='FTSE 100'
GROUP BY CO2.officer_id
HAVING Count( DISTINCT CO2.company_id ) > 1
)
ORDER BY `CD`.`Company_ID` ASC;
Which gives me names of officers and companies, only officers that work for more than 1 company (to create edges) and only officers that have not resigned.
First I create $nodes by looping through the query and getting only unique companies.
while($row = mysqli_fetch_array($data)){
array_push($Officers_DB,array("name"=>$row['Officer_Name'], "company"=>$row['Company_Name']));
if(!valueExists($nodes, 'fullName', $row['Company_Name'])){ //Get rid of duplicates
array_push($nodes, array("fullName"=>$row['Company_Name']));
}
}
Then I create $edges by comparing each company with every other company from my $nodes array then I check if I am not comparing the same companies after that I loop through all officers from query and compare them with all other officers from query again I check if this isn't the same officer and then I check if officer from first loop works for $i company and in the next loop if officer works for '$j' company so if there is a person with the same name working for 2 different companies I create edge in $edges.
$edges = array();
for ($i = 0; $i < count($nodes); $i++) {
for ($j = $i + 1; $j < count($nodes); $j++) {
if ($nodes[$i]['fullName'] != $nodes[$j]['fullName']) {
foreach($Officers_DB as $Officer){
if($Officer['company']==$nodes[$i]['fullName']){
foreach($Officers_DB as $Officer2){
if($Officer2['company']==$nodes[$j]['fullName']){
if($Officer['name']==$Officer2['name']){
array_push($edges, array("source"=>$i, "target"=>$j, "officers"=>array($Officer['name'])));
}
}
}
}
}
}
}
}
foreach ($edges as $i => &$edge) {
for ($j = $i + 1; $j < count($edges); $j++) {
if ($edge['source'] == $edges[$j]['source'] && $edge['target'] == $edges[$j]['target']) {
foreach ($edges[$j]['officers'] as $officer) {
array_push($edge['officers'], $officer);
}
array_splice($edges, $j, 1);
}
}
}
This method works but is really slow and inefficient and I was wondering of other ways of achieving the same result.
Here is how the database looks like Company_details: http://i.imgur.com/bzDBIPI.png companies are unique
Officer_details: http://i.imgur.com/xce9DW5.png officers are unique
and Company_Officer: http://i.imgur.com/SNYOx0i.png which is the relational table between the other two. With relation one to many and many to one.

Related

Report generation based from mysql query

I have this 3 tables namely form, form_responses, metrics with the following structure
form
->id
->phone
->calldatetime
form_reponses
->id
->form_id
->metrics_id
->response
metrics
->id
->description
->question
And I want to make report with a format something like this
|Metrics Description|Metrics Question|Phone1|Phone2|Phone3|Phone4
|___________________|________________|______|______|______|______
| Sample | Sample | Yes | Yes | Yes | Yes
Is it possbile to this output just by the mysql query alone? Please note that the Phone1, Phone2, Phone3... is scaling horizontally. Originally I need that output in the excel file I have already tried this using Laravel PHP and http://www.maatwebsite.nl/laravel-excel/docs
$query = "SELECT id, phone FROM qcv.forms WHERE calldatetime >= '$from' AND calldatetime <= '$to' ORDER BY id ASC LIMIT 250 ;";
$phone = DB::connection('mysql')->select($query);
$metrics = Metric::all();
$metric_start = 10;
$start = "D";
$count = 10;
foreach ($phone as $key => $value2) // Populate Phone Numbers Horizontally
{
$sheet->cell($start.'9', $value2->phone);
// This will fill the responses for each number
foreach ($metrics as $key => $value)
{
$responses = FormResponses::where('form_id', '=', $value2->id)->where('metrics_id', '=', $value->id)->get();
$sheet->cell($start.$count, $responses[0]->response);
$count++;
}
$start++;
$count = 10;
}
foreach ($metrics as $key => $value) // Populate Metrics Vertically
{
$sheet->cell('C'.$metric_start, $value->question);
$sheet->cell('B'.$metric_start, $value->description);
$sheet->cell('A'.$metric_start, $value->metrics_name);
$metric_start++;
}
But seems this method is really slow especially in processing so I'm wondering if I could do the output in mysql command alone?
To get multiple sub-records per row in a one-to-many relationship using SQL, you would have to use a sub-query:
SELECT
m.description,
m.question,
(select phone from form f1 where f1.id = m.id and ...
/* some other unique criteria */) as Phone1,
(select phone from form f2 where f2.id = m.id and ...
/* some other unique criteria */) as Phone2,
(select phone from form f3 where f3.id = m.id and ...
/* some other unique criteria */) as Phone3,
(select phone from form f4 where f4.id = m.id and ...
/* some other unique criteria */) as Phone4
FROM metrics m
However...you may not have any columns to uniquely identify each form in this way...and your SQL engine may not allow a 3rd level of nesting of sub-queries (which is another way to individually select records from the same table).
So here's one other variation that would work. It should be slightly less code and fewer database connections, so it should perform better, even if you find it less intuitive. Here's the SQL portion:
SELECT
m.description,
m.question,
f.phone
FROM metrics m
INNER JOIN form f ON f.id = m.id
And then in PHP:
$lastid = '';
$phone_count = 0;
foreach ($record as $key => $value) {
$phone[$phone_count] = $value->phone;
$phone_count++;
if ($lastid != $value->id) {
// new record
$sheet->cell ( /* whatever */ );
$phone_count = 0;
}
$lastid = $value->id;
}

Is it possible to create a database call outside of a required loop?

I'm trying to display and sort an array by an average created using data from a database. I'm retrieving three variables from the database and creating an average from these values. This value is then placed inside a new array to be sorted along with the rest of the database data.
Am I right in thinking that having the SQL query inside the loop isn't a great idea? (Performance issue?)
Is there any alternative that's available? I've attached the code below:
^ database connection/query string to retrieve all data...
$result = $stmt_business_list->fetchAll(PDO::FETCH_ASSOC);
$items = array();
foreach($result as $row){
$single_business_id = $row['id'];
$name = $row['name'];
//Query to get ALL the service, value and quality ratings for certain business
$test_query = "SELECT * FROM rating WHERE business_id = $single_business_id";
$test_query_stmt = $dbh->prepare($test_query);
$test_query_stmt->execute();
$test_row = $test_query_stmt->fetchAll(PDO::FETCH_ASSOC);
$total_value = $total_quality = $total_service = 0;
foreach($test_row as $review)
{
$total_value += $review['value'];
$total_quality += $review['quality'];
$total_service += $review['service'];
}
$bayesian_value = (($set_site_average_review_count * $set_site_average_review_score) + $total_value) / ($set_site_average_review_count + $business_review_count);
$bayesian_quality = (($set_site_average_review_count * $set_site_average_review_score) + $total_quality) / ($set_site_average_review_count + $business_review_count);
$bayesian_service = (($set_site_average_review_count * $set_site_average_review_score) + $total_service) / ($set_site_average_review_count + $business_review_count);
$average_bayesian_rating = ($bayesian_value + $bayesian_quality + $bayesian_service) / 3;
$average_bayesian_rating = $average_bayesian_rating;
array_push($items, array(
"id"=>"$single_business_id",
"name"=>"$name",
"value"=>"$total_value",
"quality"=>"$total_quality",
"service"=>"$total_service",
"average"=>"$average_bayesian_rating"));
echo
'Name: '.$name.'<br>
Value: '.$total_value.'<br>
Quality: '.$total_quality.'<br>
Service: '.$total_service.'<br>
Average: '.$average_bayesian_rating.'<br><br>';
}
}
The page will be split up by a separate pagination script and will only display 6 objects at a time, but over time this may change so I do have an eye on performance as much as I can.
SQL aggregate queries are made for this kind of thing.
Use this query to summarize the results
SELECT b.name, b.id,
SUM(value) total_value,
SUM(quality) total_quality,
SUM(service) total_service,
COUNT(*) review_count,
avg_reviews_per_biz
FROM business b
JOIN ratings r ON b.id = r.business_id
JOIN (
SELECT COUNT(DISTINCT business_id) / COUNT(*) avg_reviews_per_biz
FROM ratings
) a ON 1=1
GROUP BY b.name, b.id, avg_review_per_biz
This will give you one row per business showing the summed ratings and the number of ratings. This result set will have the following columns
name business name
id business id
total_value sum of value ratings for that business
total_quality sum of quality ditto
total_service sum of service ditto
review_count number of reviews for business "id"
avg_reviews_per_biz avg number of reviews per business
The last column has the same value for all rows of your query.
You can then loop over these row one business at a time doing your statistical computations.
I can't tell from your question where you're getting variables like $set_site_average_review_count, so I can't help with those computations.
You'll find that SQL aggregate querying is very powerful indeed.

Pull the top three items from the database in php

I am trying to pull a list of genres from the database. I enter a list of genres into the database for each song, and then it (is supposed to) pull each song's genre into a list and order them by the top three most common occurrences.
The genres get put into a single text field in such a fashion:
(basic fashion, not an actual result):
blues rock, garage rock, hard rock
Here's my code:
$sql = "SELECT `song_name`, `song_genres` FROM `songs` WHERE `album_id` = '$songAlbumId'";
$query = mysqli_query($conn, $sql);
while ($row = mysqli_fetch_array($query)){
$song_name = $row['song_name'];
$song_genres = explode(", ", $row['song_genres']);
for ($i = 0; $i < count($song_genres); $i++){
$count=array_count_values($song_genres);//Counts the values in the array, returns associatve array
arsort($count);//Sort it from highest to lowest
$keys=array_keys($count);//Split the array so we can find the most occuring key
echo $keys[$i] . "<br>";
}
}
This ends up giving me:
Hard Rock
Garage Rock
Hard Rock
Garage Rock
Psychedelic Rock
Blues Rock
Garage Rock
Hard Rock
Also note there is nothing wrong with the album id or anything else. It is specifically to do with just the genres being ordered right.
Normalize the genres. Use a genres table instead of a comma separated list, and an additional songs_genres table to link songs to genres. Then you can get the data from the database without further logic in php
SELECT g.name, COUNT(DISTINCT(sg.song_id)) cnt
FROM genres g
INNER JOIN songs_genres sg ON sg.genre_id = g.id
GROUP BY g.name
ORDER BY cnt DESC
LIMIT 3
You need another loop so your genres echo for each songname
while ($row = mysqli_fetch_array($query)){
$song_name = $row['song_name'];
$song_genres = explode(", ", $row['song_genres']);
foreach($song_name as $songname){
for ($i = 0; $i < count($song_genres); $i++){
$count=array_count_values($song_genres);//Counts the values in the array, returns associatve array
arsort($count);//Sort it from highest to lowest
$keys=array_keys($count);//Split the array so we can find the most occuring key
echo $keys[$i] . "<br>";
}
}
}
I spent a couple days trying to figure it out. I ended up reworking the database and posting the songs genres into a database for the song, album, and band sections to easily pull from.
For anyone in the future who wants help with a problem like this, the solution is here: Merge multiple arrays into one array.
I appreciate the input from the other people though.

Performance, sql heavy join vs multiple small request

I have the following Mysql database structure
[Table - Category1]
[Table Category1 -> Category2 ] (One to N relation)
[Table - Category2]
[Table Category2 -> Item ] (One to N relation)
[Table - Item]
and I want to get everything into an array in PHP with the following structure
$arr[$i]['name'] = 'name of something in category1';
$arr[$i]['data'][$j]['name'] = 'name of something in category2';
$arr[$i]['data'][$j]['data'][$k]['name'] = 'name of something in item';
So basically I don't know if I should use one "heavy" sql request with JOIN like the following one or use an iterative method
The join request
SELECT c1.name as c1name, c2.name as c2name, i.name
FROM category1 c1
LEFT JOIN category1_to_category2 c1tc2 ON c1.id = c1tc2.id_category1
LEFT JOIN category2 c2 ON c1tc2.id_category2 = c2.id
LEFT JOIN category2_to_item c2ti ON c2.id = c2ti.id_category2
LEFT JOIN item i ON c2ti.id_item = i.id
The iterative method
$sql = 'SELECT id, name FROM category1';
$result = $mysqli->query($sql);
$arr = array();
$i = 0;
while ($arr[$i] = $result->fetch_assoc()) {
$join = $mysqli->query('SELECT c2.id, c2.name FROM category2 c2 LEFT JOIN category1_to_category2 c1tc2 ON c2.id = c1tc2.id_category 2 WHERE c1tc2.id_category1 = '.$arr[$i]['id']);
$j = 0;
while ($arr[$i]['data'][$j] = $join->fetch_assoc())
/* same request as above but with items */
$i++;
}
The iterative solution will make around 10 * 20 request which seems a lot to me that's why I would choose the first solution (4 JOIN single request).
However, with the single request solution, my array will look like that
$arr[0]['c1name'];
$arr[0]['c2name'];
$arr[0]['iname'];
And it will require some PHP traitement to obtain the desired array which I require to display in tabs in an HTML page. So my question is, is it better to have one big SQL request with some PHP array manipulation or to have multiple small request without the PHP array manipulation ? I know that in most case, getting all the data from SQL is a better solution but in this case I'm not sure. By the way, my only consideration is the loading time of my web page.
Thanks in advance for your help =).
It is typically better, and your example is no exception, to have the SQL server do as much of the data formatting and iteration as possible as SQL servers are typically more efficient at the task than common programming languages.
Add to this that you are cutting down on query load of the server and you have a very good reason for using complex joins.
The only downside is complex SQL queries can be hard to format and debug, if not already using a 3rd party SQL tool I would recommend getting one.
To go with the answer by Wobbles (that I agree with), I would suggest that you do a single query but you store the last key for each of c1name, c2name and iname. When these change you increment the relevant array subscript and initialise the lower level ones again to build up your array.
Something like this:-
<?php
$sql = "SELECT c1.name AS c1name, c2.name AS c2name, i.name AS iname
FROM category1 c1
LEFT JOIN category1_to_category2 c1tc2 ON c1.id = c1tc2.id_category1
LEFT JOIN category2 c2 ON c1tc2.id_category2 = c2.id
LEFT JOIN category2_to_item c2ti ON c2.id = c2ti.id_category2
LEFT JOIN item i ON c2ti.id_item = i.id"
$result = $mysqli->query($sql);
$arr = array();
$i = 0;
$j = 0;
$k = 0;
$c1name = '';
$c2name = '';
$iname = '';
while ($row = $result->fetch_assoc())
{
switch(true)
{
case $row['c1name'] != $c1name :
$i++;
$j = 0;
$k = 0;
$arr[$i]['name'] = $row['c1name'];
$arr[$i]['data'][$j]['name'] = $row['c2name'];
$arr[$i]['data'][$j]['data'][$k]['name'] = $row['iname'];
break;
case $row['c2name'] != $c2name :
$j++;
$k = 0;
$arr[$i]['data'][$j]['name'] = $row['c2name'];
$arr[$i]['data'][$j]['data'][$k]['name'] = $row['iname'];
break;
default :
$k++;
$arr[$i]['data'][$j]['data'][$k]['name'] = $row['iname'];
break;
}
$c1name = $row['c1name'];
$c2name = $row['c2name'];
$iname = $row['iname'];
}
As an aside there is some code at work that is used to generate a menu. Just 2 levels, and it was originally coded as one query for the first level and then one query for each of the records in the first level to get all the items below it. Not complex (there are only ~16 items in the first level, and on average under 10 items below each of those). I rewrote that to a single joined query. Typical time to generate that menu dropped from 0.25 seconds down to 0.004 seconds. It is easy for the time taken sending queries to the database to rapidly become excessive.

PHP MYSQL search same keyword with multiple tables

I'm try to create a query which will search four different tables with one keyword to bring all the items which are list under that location.
I have four tables
- Country
- State
- County
- City
for e.g. UK -> England -> West Midlands -> Birmingham
When user types in west Midlands i wont to see all the items including items under birmingham, Walsall, wolverhampton
This what I came up with
$location = $_POST['location'];
$city_sql = " SELECT * FROM city";
$city_result = $db->query( $city_sql );
$new_array=array();
$i=0;
while ($fetch_sql = $db->fetch_object($city_result) ){
if ( strcmp(soundex(strtolower($fetch_sql->name)), soundex(strtolower($location))) == 0 ) {
$new_array[$i]['name'] = $fetch_sql->name;
$new_array[$i]['code'] = $fetch_sql->name;
$i++;
}
}
$k=0;
for ( $j=0; $j < sizeof($new_array); $j++ ){
$i = similar_text(strtolower($new_array[$j]['name']), strtolower($db->escape_value($location)), &$similarity_pst);
if( $i > $k && $i > 7 ){
$k = $i;
$city_db_name = $new_array[$j]['name'];
$city_code = $new_array[$j]['code'];
}
}
Please let me know if you have any idea.
PHP MYSQL search same keyword with multiple tables
You should use SQL features to get your data, not PHP's.
If I correctly understood, you want to get data from several tables and several columns.
Change your query like that:
SELECT
-- list of considered columns
col1,
col2,
col3
-- ...
FROM
City
JOIN
State ON State.state_id = City.state_id
JOIN
Country ON Country.country_id = State.country_id
WHERE
col1 LIKE '%keyword%'
OR col2 LIKE '%keyword%'
OR col3 LIKE '%keyword%'
-- ...
Like that, you will get columns you need containing your keyword. For example, if the table City contains {'Paris', 'paramatta', 'Porto'}, using the keyword Par, and the query SELECT name FROM city WHERE name LIKE '%Par%' will return you { 'Paris', 'paramatta' }
By the way, the link between countries, cities, etc. should be represented backward in your database: Country <- State <- City

Categories