Advice for Java HashMap alternative in PHP? - php

I have a database with this kind of a table, has more than 10 million rows.
ID colA colB Length
1 seq1 seq11 1
2 seq1 seq11 11
3 seq3 seq33 21
4 seq3 seq33 14
I want to loop though colA first, get the relevant colB value, and check if there are any other occurrences of the same value.
For example in colB (seq11) there are 2 occurrences of colA(seq1), this time I have to combine those and output the sum of the length. Similar to this:
ID colA colB Length
1 seq1 seq11 12
2 seq3 seq33 35
I am a bit Java guy, but because my colleague has written everything in PHP and this will be just an adding, I need a PHP solution.
With Java I would have used hashmap, so that I would have the colA data once and just increment the value of "Length Column".
I tried this query in order to group by occurences:
SELECT COUNT(*) SeqName FROM SeqTable GROUP BY SeqName HAVING COUNT(*)>0;

This is something easily achieved within SQL rather than in programming logic:
SELECT colA, colB, SUM(Length) as `length_sum`
FROM SeqTable
GROUP BY colA, colB
Of course you would still need PHP to iterate through the result set and do whatever it is you want to do with the data.

In PHP you can use an array like an hash map
$array = Array();
$array['seq1'] = Array();
$array['seq1']['seq11'] = 0;
$array['seq1']['seq11']++;
Or you can use an SQL query like this one:
select id,colA,colB,sum(Length) as Length from {tableName} group by colA,colB order by colA, colB;

Related

How to count each item in array with sql

I have a user table which contain a membergroupids, and user table looks like this:
userid membergroupids
1 1,2
2 2,3
3 2,3,4
and I want to use sql to output a result like this
membergroupid count
1 1
2 3
3 2
4 1
I tried use SELECT membergroupids FROM user, then use php to loop through the result and get the count, but it works with small set of user table, but I have a really big user table, the select query itself will take more than 1min to finish, is there better way to do this?
There is a much better way to do it. Your tables need to be normalized:
Instead of
userid membergroupids
1 1,2
2 2,3
3 2,3,4
It needs to be
userid membergroupids
1 1
1 2
2 2
2 3
3 2
3 3
3 4
From here, it's a simple query to get the counts (assuming this table is called your_table:
select count(membergroupids) as numberofgroups, userid
from your_table
group by userid
order by userid
The real problem, then, is getting your tables normalized. If you only have 9 membergroupids, then you could use a like '%1%' to find all userids with membergroupid #1. But if you have 10, then it won't be able to distinguish between 1 and 10. And sadly, you can't count on the commas to help you distinguish because the number might not be surrounded by commas.
unless...
Create new field with group ids encapsulated by commas
you could create a new field and populate it with membergroupids and surround it with commas by using concat (check your database's docs). Something along this line:
update your_table set temp=concat(',', membergroupids, ',');
This could give you a table structure like so:
userid membergroupids temp
1 1,2 ,1,2,
2 2,3 ,2,3,
3 2,3,4 ,2,3,4,
Now, you have the ability to grab distinct member group ids in the new field, ie, where temp like '%,1,%' to find userids with membergroupid 1. (They will be encapsulated by commas) Now, you can manually build your new normalized table which I'll call user_member.
Insert membergroupid 1:
insert into user_member (userid,membergroupid) select userid,'1' from your_table where temp like '%,1,%';
You could make a php script that loops through all the membergroupids.
Keep in mind that like %...% is not very efficient, so don't even think about relying on this to do your count. It'll work, but it's not scalable. It would be much better to use this to build the normalized table.
It's easy to do your purpose IF the data structure is as like as below:
SELECT `membergroupids`, COUNT(`membergroupids`) as
CountOfMembergroupids FROM `TBL_TEST01` WHERE 1
GROUP BY `membergroupids`
ORDER BY `userid`
As you mentioned that you have to proceed with large amount of data..., I'd strongly suggest that you could revise your table structure as above...

Efficiently get diff of large data set?

I need to be able to diff the results of two queries, showing the rows that are in the "old" set but aren't in the "new"... and then showing the rows that are in the "new" set but not the old.
Right now, i'm pulling the results into an array, and then doing an array_diff(). But, i'm hitting some resource and timing issues, as the sets are close to 1 million rows each.
The schema is the same in both result sets (barring the setId number and the table's autoincrement number), so i assume there's a good way to do it directly in MySQL... but im not finding how.
Example Table Schema:
rowId,setId,userId,name
Example Data:
1,1,user1,John
2,1,user2,Sally
3,1,user3,Tom
4,2,user1,John
5,2,user2,Thomas
6,2,user4,Frank
What i'm needing to do, is figure out the adds/deletes between setId 1 and setId 2.
So, the result of the diff should (for the example) show:
Rows that are in both setId1 and setId2
1,1,user1,John
Rows that are in setId 1 but not in setId2
2,1,user2,Sally
3,1,user3,Tom
Rows that are in setId 2 but not in setId1
5,2,user2,Thomas
6,2,user4,Frank
I think that's all the details. And i think i got the example correct. Any help would be appreciated. Solutions in MySQL or PHP are fine by me.
You can use exists or not exists to get rows that are in both or only 1 set.
Users in set 1 but not set 2 (just flip tables for the opposite):
select * from set1 s1
where set_id = 1
and not exists (
select count(*) from set1 s2
where s1.user1 = s2.user1
)
Users that are in both sets
select * from set2 s2
where set_id = 2
and exists (
select 1 from set1 s1
where s1.setId = 1
and s2.user1 = s1.user1
)
If you only want distinct users in both groups then group by user1:
select min(rowId), user1 from set1
where set_id in (1,2)
group by user1
having count(distinct set_id) = 2
or for users in group but not the other
select min(rowId), user1 from set1
where set_id in (1,2)
group by user1
having count(case when set_id <> 1 then 1 end) = 0
What we ended up doing, was adding a checksum column to the necessary tables being diffed. That way, instead of having to select multiple columns for comparison, the diff could be done against a single column (the checksum value).
The checksum value was a simple md5 hash of a serialized array that contained the columns to be diffed. So... it was like this in PHP:
$checksumString = serialize($arrayOfColumnValues);
$checksumValue = md5($checksumString);
That $checksumValue would then be inserted/updated into the tables, and then we can more easily do the joins/unions etc on a single column to find the differences. It ended up looking something like this:
SELECT i.id, i.checksumvalue
FROM SAMPLE_TABLE_I i
WHERE i.checksumvalue not in(select checksumvalue from SAMPLE_TABLE_II)
UNION ALL
SELECT ii.id, ii.checksumvalue
FROM SAMPLE_TABLE_II ii
WHERE ii.checksumvalue not in(select checksumvalue from SAMPLE_TABLE_I);
This runs fast enough for my purposes, at least for now :-)

how ignore duplicate values without consider of their position in mysql?

I have this table with one column
A:
16654,16661
16661,16654
16670,16717
16717,16670
I want to have this: (ignore duplicate values without consider of their position)
16661,16654
16670,16717
is there any math function that operate between two number and have unique result?
actually i have this table ( name:class)
id second_code have_second_code
1 0 no
2 3 yes
3 2 yes
4 5 yes
5 4 yes
when "have_second_code" is "yes"
column second_code have a value!
id is primary
second code is from id column and there is a binary relation between them. now i need this output 2,3 and 4,5
SELECT rowone, rowtwo, rowonemillion FROM yourtable GROUP BY(nodupecolumn)
I suppose, that your query that produces this one-column-multiple-values-table uses GROUP_CONCAT(). In this case you need to do it like this:
SELECT DISTINCT GROUP_CONCAT(DISTINCT whatever_column ORDER BY whatever_column) FROM ...
Use the DISTINCT keyword two times. In GROUP_CONCAT(), so that duplicates are removed from the comma separated values, and one time outside of GROUP_CONCAT(), so that duplicate rows are removed. The ORDER BY in GROUP_CONCAT() is important, otherwise the outer DISTINCT won't detect duplicates. Also note, that (outer) DISTINCT works on the whole row, not just one column.

Storing data from MySql in PHP (Back to basics)

here is the log for the result of my sql SELECT using PHP:
224=[Array containing 1 elements]
iProduct=[604]
where 224 is the line number. and iProduct is the Column Heading
How do I do I make a variable that just contains the value of the return (not an array or anything) (i.e. $var = 604 in this instance)?
Having issues with the basics here lol
If using SQL, you shouldn't need to sort in your script. Instead, sorting should be done using the database; as it will sort faster than PHP would be able to do. Example:
SELECT colA, colB, colC FROM myTable WHERE colA = 'foo' ORDER BY colB DESC

how to use MySQL bitwise operations in php?

im trying to use MySQL bitwise operations for my query and i have this example:
table1
id ptid
1 3
2 20
3 66
4 6
table2
id types
1 music
2 art
4 pictures
8 video
16 art2
32 actor
64 movies
128 ..
...
now, the id = 3 from table1 is '66', witch means that it has 64 or movies and 2 or art
but
doesn't he also have 32 or actor twice and 2 or art ??
hope you see where my confusion is. How do i control what result i want back. In this case i want 64 or movies and 2 or art.
But sometimes i want three id's from table2 to belong to an id from table1
any ideas?
Thanks
Using bitwise OR
The following query returns all the items from table 2 in 66:
SELECT *
FROM table2
WHERE id | 66 = 66
But 32 + 32 = 64?
Though 32 + 32 = 64, it doesn't affect us.
Here's 64 in binary:
01000000
Here's 32 in binary:
00100000
Here's 2 in binary:
00000010
It's the position of the 1 that we use in this case, not the value. There won't be two of anything. Each flag is either on or off.
Here's 66 in binary. Notice that 64 and 2 are turned on, not 32:
01000010
Using bitwise AND instead of OR
Another way to write the query is with bitwise AND like this:
SELECT *
FROM table
WHERE id & 66 <> 0
Since 0 = false to MySQL, it can be further abbreviated like this:
SELECT *
FROM table
WHERE id & 66
select * from table2 where id & 66
Although the question on how to perform bitwise operations in MySQL has been answered, the sub-question in the comments about why this may not be an optimal data model remains outstanding.
In the example given there are two tables; one with a bitmask and one with a break down of what each bit represents. The implication is that, at some point, the two tables must be joined together to return/display the meaning of the various bits.
This join would either be explicit, e.g.
SELECT *
FROM Table1
INNER JOIN TABLE2
ON table1.ptid & table2.id <> 0
Or implicit where you might select the data from table1 into your application and then make a second call to lookup the bitmask values e.g.
SELECT *
FROM table2
WHERE id & $id <> 0
Neither of these options are ideas because they are not "sargable" that is, the database cannot construct a Search ARGument. As a result, you cannot optimize the query with an index. The cost of the query goes beyond the inability to leverage an index since for every row in the table, the DB must compute and evaluate an expression. This becomes very Memory, CPU and I/O intensive very quickly and it cannot be optimized without fundamentally changing the table structure.
Beyond the complete inability to optimize the query, it can also be awkward to read the data, report on the data, and you also potentially run into limits adding more bits (64 values in an 8 bit column might be fine now but not necessarily always so. They also make systems difficult to understand, and I would argue that this design violates first normal form.
Although using bitmasks in a database is often a sign of bad design, there are times when it's fine to use them. Implementing a many-to-many relationship really isn't one of those times.
The typical approach to implementing this type of relationship looks something like this:
table1
Id Val1 Val2
---------------------------
1 ABC DEF
2 ABC DEF
3 ABC DEF
4 ABC DEF
5 ABC DEF
6 ABC DEF
table2
id types
-------------
1 music
2 art
3 pictures
4 video
5 art2
6 actor
7 movies
table1-table2-relationshitp
table1ID Table2ID
---------------------
1 1
1 2
2 3
2 5
3 2
3 7
...
And you would query the data thusly
SELECT table1.*, table2.types
FROM table1
INNER JOIN table1-table2-relationship
ON table1.id = table1-table2-relationship.table1id
INNER JOIN table2
ON table1-table2-relationship.table2.id = table2.id
Depending on the access pattern of these tables, you would typically index both columns on the relationship table as a composite index (I usually treat them as a composite primary key.) This index would allow the database to quickly seek to the relevant rows in the relationship table and then seek to the relevant rows in table2.
After playing around with the answer from Marcus Adams, I thought I'd provide another example that helped me understand how to join two tables using bitwise operations.
Consider the following sample data, which defines a table of vowels, and a table of words with a single value representing the vowels present in that word.
# Create sample tables.
drop temporary table if exists Vowels;
create temporary table Vowels
(
Id int,
Letter varchar(1)
);
drop temporary table if exists Words;
create temporary table Words
(
Word varchar(20),
Vowels int
);
# Insert sample data.
insert into Vowels
select 1, 'a' union all
select 2, 'e' union all
select 4, 'i' union all
select 8, 'o' union all
select 16, 'u';
insert into Words
select 'foo', 8 union all
select 'hello', 10 union all
select 'language', 19 union all
select 'programming', 13 union all
select 'computer', 26;
We can now join the Vowel table to the Word table like so:
# List every word with its vowels.
select Word, Vowels, Letter, Id as 'Vowel Id'
from (
select *
from Words
) w
join Vowels v
where v.Id | w.Vowels = w.Vowels
order by Word, Letter;
And of course we can apply any conditions to the inner query.
# List the letters for just the words with a length < 6
select Letter
from (
select *
from Words
where length(Word) < 6
) w
join Vowels v
where v.Id | w.Vowels = w.Vowels
order by Word, Letter

Categories