Advertisement System Tips - php

I am creating an advertisement system which shows the highest bidder's ads more frequently.
Here is an example of the table structure I am using, but simplified...
+----+----------+------------------------+----------------------+-----+
| id | name | image | destination | bid |
+----+----------+------------------------+----------------------+-----+
| 1 | abc, co | htt.../blah | htt...djkd.com/ | 3 |
+----+----------+------------------------+----------------------+-----+
| 2 | facebook | htt.../blah | htt...djkd.com/ | 200 |
+----+----------+------------------------+----------------------+-----+
| 3 | google | htt.../blah | htt...djkd.com/ | 78 |
+----+----------+------------------------+----------------------+-----+
Now, right now I am selecting the values from the database and then inserting them into an array and picking one out by random similar to the following:
$ads_array = [];
$ads = Ad::where("active", "=", 1)->orderBy("price", "DESC");
if ($ads->count() > 0) {
$current = 0;
foreach ($ads->get() as $ad) {
for ($i = 0; $i <= ($ad->price == 0 ? 1 : $ad->price); $i++) {
$ads_array[$current] = $ad->id;
$current++;
}
}
$random = rand(0,$current-1);
$ad = Ad::where("id", "=", $ads_array[$random])->first();
...
}
So, essentially what this is doing is, it is inserting the advert's ID into an array 1*$bid times. This is very inefficient, sadly (for obvious reasons), but it was the best way I could think of doing this.
Is there a better way of picking out a random ad from my database; while still giving the higher bidders a higher probability of being shown?

Looks like this might do the trick (but all the credit go to this guy in the comments)
SELECT ads.*
FROM ads
ORDER BY -log(1.0 - rand()) / ads.bid
LIMIT 1
A script to test this :
<?php
$pdo = new PDO('mysql:host=localhost;dbname=test;', 'test', 'test');
$times = array();
// repeat a lot to have real values
for ($i = 0; $i < 10000; $i++) {
$stmt = $pdo->query('SELECT ads.* FROM ads ORDER BY -log(1.0 - rand()) / bid LIMIT 1');
$bid = $stmt->fetch()['bid'];
if (isset($times[$bid])) {
$times[$bid] += 1;
} else {
$times[$bid] = 1;
}
}
// echoes the number of times one bid is represented
var_dump($times);
The figures that comes to me out of that test are pretty good :
// key is the bid, value is the number of times this bid is represented
array (size=3)
200 => int 7106
78 => int 2772
3 => int 122
Further reference on mathematical explanation
Many important univariate distributions can be sampled by inversion using simple closed form expressions. Some of the most useful ones are listed here.
Example 4.1 (Exponential distribution). The standard exponential distribution has density f(x) = e−x on x > 0. If X has this distribution, then E(X) = 1, and we write X ∼ Exp(1). The cumulative distribution function is F(x) = P(X 􏰀 x) = 1 − e−x, with F−1(u) = −log(1 − u). Therefore taking X = − log(1 − U ) for U ∼ U(0, 1), generates standard exponential random variables. Complementary inversion uses X = − log(U ).
The exponential distribution with rate λ > 0 (and mean θ = 1/λ) has PDF λexp(−λx) for 0 􏰀 x < ∞. If X has this distribution, then we write X ∼ Exp(1)/λ or equivalently X ∼ θExp(1), depending on whether the problem is more naturally formulated in terms of the rate λ or mean θ. We may generate X by taking X = − log(1 − U )/λ.
coming from http://statweb.stanford.edu/~owen/mc/Ch-nonunifrng.pdf

Related

Packing rectangles in a rectangle, generate grids coordinates

Looking to generate some randoms grids of rectangles in a bigger rectangle.
This looks like a fairly easy problem, but it's not, seeking for advises here.
This is not a packing problem, since the inner rectangles can have any width and height.
But the amount of rectangles aren't always the same.
I have some results already with different kinds of loops, but none are really efficient.
For example, with 15 rectangles, a possible way to represent them could be:
O 10 50 60
+----------+---------------------------------------------+----------+-------+
| | | | |
| | | | |
| | | | |
5 +----------+---------------------------------------------+----------+-------+
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
15+----------+---------------------------------------------+----------+-------+
| | | | |
| | | | |
| | | | |
+----------+---------------------------------------------+----------+-------+
| | | | |
| | | | |
| | ↓ | | |
+----------+---------------------------------------------+----------+-------+
The coordinates would then be something like an array of [x,y] point of the top left corner (+) of each inner squares:
[[0,0],[10,0],[50,0],[60,0],[5,0],[5,10],[5,50], ...]
Or even better an array of [x,y,w,h] values (Top left x, top left y, width, height)
[[0,0,10,5],[10,0,40,5],[50,0,10,5],[60,0,10,5],[5,0,10,10],[5,10,40,10],[5,50,20,10], ...]
But the goal is to make a function that generate coordinates for any amount of inner squares:
For example, with 14 rectangles, a possible way to represent them could be:
+----------+----------------------------------+---------------------+-------+
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
+----------+----------------------------------+----------+----------+-------+
| | | | |
| | | | |
+----------+--------+------------------------------------+----------+-------+
| | | | |
| | | | |
+-------------------+-----------+------------------------+----------+-------+
| | |
| | |
| | |
| | |
+-------------------------------+-----------------------------------+-------+
Related links:
Packing rectangular image data into a square texture
What algorithm can be used for packing rectangles of different sizes into the smallest rectangle possible in a fairly optimal way?
How to arrange N rectangles to cover minimum area
Your problem has two aspects: You want to create a regular grid with n rectangles that may span several cells and you want to distribute the coordinates of the cell borders randomly.
So I propose the foolowing algorithm:
Determine the number of rows and columns in your grid so that each cell is more or less square.
Create a rectangular grid of 1×1 cells. The number of cells will be greater than n.
Conflate two adjacent cells until you have n cells.
Now create random axes for the cell boundaries. If you have m columns, create an array of ncreasing values so that the first coordinate is 0 and the last coordinate is the width of the original ractangle. You can do this by creating a list of increasing random numbers and then normalizing with the overall width.
Finally, create the actual rectangles, using the information of the cells (position, cell span) and the random axes.
This algorithm uses two representations of the rectangles: First, it creates "cells", which have information about their row and column indices and spans. The actual output are rectangles with left, top, width and height information.
I'm not familar with PHP, so here is an implementaion in Javascript. I think you can see how it works:
function tile(big, n) {
// big: outer rectangle
// n: number of subrectangles to create
// determine number of rows and cols
let l = Math.sqrt(big.height * big.width / n);
let ncol = (big.width / l + 1) | 0;
let nrow = (big.height / l + 1) | 0;
let cells = [];
// create grid of m*m cells
for (let j = 0; j < nrow; j++) {
for (let i = 0; i < ncol; i++) {
cells.push(new Cell(i, j, 1, 1));
}
}
// conflate rectangles until target number is reached
while (cells.length > n) {
let k = (cells.length * Math.random()) | 0;
let c = cells[k];
if (c.col + c.colspan < ncol) {
let cc = cells[k + 1];
c.colspan += cc.colspan;
cells.splice(k + 1, 1);
}
}
// generate increasing lists of random numbers
let xx = [0];
let yy = [0];
for (let i = 0; i < ncol; i++) {
xx.push(xx[xx.length - 1] + 0.5 + Math.random());
}
for (let i = 0; i < nrow; i++) {
yy.push(yy[yy.length - 1] + 0.5 + Math.random());
}
// fit numbers to outer rectangle
for (let i = 0; i < ncol; i++) {
xx[i + 1] = (big.width * xx[i + 1] / xx[ncol]) | 0;
}
for (let i = 0; i < nrow; i++) {
yy[i + 1] = (big.height * yy[i + 1] / yy[nrow]) | 0;
}
// create actual rectangles
let res = [];
for (let cell of cells) {
let x = xx[cell.col];
let w = xx[cell.col + cell.colspan] - x;
let y = yy[cell.row];
let h = yy[cell.row + cell.rowspan] - y;
res.push(new Rect(x, y, w, h));
}
return res;
}
Notes:
The code above conflates cells only horizontally. You can change this to conflate cells vertically or both, but as the algorithm is at the moment, you won't be able to create cells that are more than one cell wide and high without major modifications.
The x | 0 is a cheap way to turn floating-point numbers into integers. I've used it to snap the final coordinates to integer values, but you can also snap them to any grid size s with s * ((x / s) | 0) or s * intval(x / s) in PHP.
The code doesn't care much about aesthetics. It picks the cell sizes and the cells to conflate randomly, so that you might get cross joints, which don't look nice. You can influence the regularity of the result a bit, though:
When you determine the number of columns and rows, you must add one to the result, so that you get cells to conflate in every case. (Even if you have a square and pass a square number as n, you will get joined rectangles.) If you add more, you will get more joined rectangles and the result will look more irregular.
Math.random() returns a random number between 0 and 1. When creating the axes, I've added 0.5 to the result, so that you don't get very narrow cells. If you add less, the coordinates will be more irregular, if you add more, they will be more evenly distributed.
Perhaps you can get a good effect from making the row coordinates even and the clumns coordinates irregular.
You've mentioned good-looking properties in a comment. Perhaps it is easier to create a good-looking structure when you create the grid and placing the rectangles with constraints instead of first creating a grid and then remoing joints.

Non-overlapping minutes per day

I have been cracking my head trying to resolve this problem.
I need to know how many minutes of the day are being worked by a staff member alone in the shop.
Here is the data for daynumber = 0 (monday):
For this day, the staff member with staffid = 32 is alone from 11:00 to 11:05 in the shop.
What I have so far, is just adding all starting times, but basically what I'm thinking is, if I have any way of knowing a staff member is alone, I can calculate time between the index and the next.
for($i=0; $i<count($results); $i++){
if(isset($results[$i+1])){
if($results[$i]->starttime < $results[$i+1]->starttime)
$start = strtotime($results[$i]->starttime);
$end = strtotime($results[$i+1]->endtime);
$minutes += idate('i', $end - $start);
}
}
}
Any thoughts?
UPDATE 1:
I get to this but still no luck;
for($i=0; $i<count($results); $i++){
if(isset($results[$i+1])){
$StartDate1 = strtotime($results[$i]->starttime);
$EndDate1 = strtotime($results[$i]->endtime);
$StartDate2 = strtotime($results[$i+1]->starttime);
$EndDate2 = strtotime($results[$i+1]->endtime);
if(($StartDate1 <= $EndDate2) && ($EndDate1 >= $StartDate2)){
$StartDate1 = idate('i', $StartDate1);
$EndDate1 = idate('i', $EndDate1);
$StartDate2 = idate('i', $StartDate2);
$EndDate2 = idate('i', $EndDate2);
$a = abs($EndDate1 - $StartDate1);
$b= abs($EndDate1 - $StartDate2);
$c = abs($EndDate2 - $StartDate2);
$d = abs($EndDate2 - $StartDate1);
$minutes += min([$a,$b,$c,$d]);
}
}
}
What am I doing wrong?
Here's one idea, using a utility table - in this case a table of integers from 0-9.
Utility tables are frowned on by some, but I like them because they mean less typing.
You can always replace the table with a string of UNIONs.
This is for all days. I might modify it later to show how you could filter for a specific day.
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
SELECT SEC_TO_TIME((i4.i*1000+i3.i*100+i2.i*10+i1.i)*60) n
FROM ints i1
JOIN ints i2
JOIN ints i3
JOIN ints i4
JOIN
( SELECT daynumber
, MIN(starttime) starttime
, MAX(CASE WHEN endtime < starttime THEN SEC_TO_TIME(TIME_TO_SEC('24:00:00')+TIME_TO_SEC(endtime)) ELSE endtime END) endtime
FROM my_table
GROUP
BY daynumber
) x
ON SEC_TO_TIME((i4.i*1000+i3.i*100+i2.i*10+i1.i)*60) BETWEEN x.starttime AND x.endtime
JOIN my_table y
ON SEC_TO_TIME((i4.i*1000+i3.i*100+i2.i*10+i1.i)*60) BETWEEN y.starttime AND CASE WHEN y.endtime < y.starttime THEN SEC_TO_TIME(TIME_TO_SEC('24:00:00')+TIME_TO_SEC(y.endtime)) ELSE y.endtime END
GROUP
BY n HAVING COUNT(*) = 1;
The number of lone minutes is equal to the number of rows in this result.

PHP- Query MySQLi results nearest a given number

I am trying to search for an invoice by the amount. So, I would like to search all invoices +/- 10% of the amount searched, and order by the result closest to the given number:
$search = 100.00
$lower = $search * 0.9; // 90
$higher = $search * 1.1 // 110
$results = $db->select("SELECT ID from `invoices` WHERE Amount >= `$lower` && Amount >= `$higher`");
So, I am not sure how to order these. Let's say this query gives me the following results:
108, 99, 100, 103, 92
I want to order the results, starting with the actual number searched (since it's an exact match), and working out from there, so:
100, 99, 103, 92, 108
You could do this as follows:
$search = 100.00
$deviation = 0.10;
$results = $db->select("
SELECT ID, Amount, ABS(1 - Amount/$search) deviation
FROM invoices
WHERE ABS(1 - Amount/$search) <= $deviation
ORDER BY ABS(1 - Amount/$search)
");
Output is:
+----+--------+-----------+
| id | Amount | deviation |
+----+--------+-----------+
| 3 | 100 | 0 |
| 2 | 99 | 0.01 |
| 4 | 103 | 0.03 |
| 1 | 108 | 0.08 |
| 5 | 92 | 0.08 |
+----+--------+-----------+
Here is an SQL fiddle
This way you let SQL calculate the deviation, by dividing the actual amount by the "perfect" amount ($search). This will be 1 for a perfect match. By subtracting this from 1, the perfect match is represented by the value 0. Any deviation is non-zero. By taking the absolute value of that, you get the exact deviation as a fractional number (representing a percentage), like for example 0.02 (which is 2%).
By comparing this deviation to a given maximum deviation ($deviation), you get what you need. Of course, ordering is then easily done on this calculated deviation.
Try this:
$search = 100.00
$lower = $search * 0.9; // 90
$higher = $search * 1.1 // 110
$results = $db->select("SELECT ID from `invoices`
WHERE Amount >= `$lower` && Amount <= `$higher`
ORDER BY ABS(Amount - $search)
");
The ABS function returns the absolute value of its argument (=> it basically removes the minus from negative numbers). Therefore ABS(Amount - $search) returns the distance from the $search value.
Besides that you should consider using prepared statements. Otherwise your application could be vulnerable to sql injection.

Selecting a random row that hasnt been selected much before?

Lets just put it at its simplest, a table with two fields: 'item_id' & 'times_seen'.
| item_id | times_seen |
----------+-------------
| 1001 | 48 |
| 1002 | 25 |
| 1003 | 1 |
| 1004 | 12 |
| 1005 | 96 |
| 1006 | 35 |
I'm trying to find a way to randomly select a row, but give preference to items that haven't been selected much before.
(obviously, a second query would be sent to increment the 'times-seen' field after it has been selected)
Although my current "project" is a php/mysql one, I'd like language agnostic solutions if possible. I'd much rather have a math based solution that could be adapted elsewhere. I'm not opposed to a php solution though. I'd just like to be able to understand how the code works rather than just copy and paste it.
How about a SQL solution:
select * from item order by times_seen + Rand()*100 limit 1;
How much you multiply random with (Its a value between 0 and 1) depends on how much randomness you want..
Edit: http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand
Fetch all the rows in the table
Determine the max value for times_seen
Assign each row a weight of max - times_seen
Pick from list based on weights
Step 4 is the tricky part, but you could do it all like this:
$max = 1;
$rows = array();
$result = mysql_query("SELECT * FROM table");
while ($row = mysql_fetch_array($result)){
$max = max($max, $row['times_seen']);
$rows[] = $row;
}
$pick_list = array();
foreach ($rows as $row){
$count = $max - $row['times_seen'];
for ($i=0; $i<$count; $i++) $pick_list[] = $row['item_id'];
}
shuffle($pick_list);
$item_id = array_pop($item_id);
To do it all in SQL:
SELECT *
FROM table
ORDER BY RAND( ) * ( MAX( times_seen ) - times_seen ) DESC
LIMIT 1
This selects a single row with weightings inversely proportional to the times_seen

Optimizing near-duplicate value search

I'm trying to find near duplicate values in a set of fields in order to allow an administrator to clean them up.
There are two criteria that I am matching on
One string is wholly contained within the other, and is at least 1/4 of its length
The strings have an edit distance less than 5% of the total length of the two strings
The Pseudo-PHP code:
foreach($values as $value){
$matches = array();
foreach($values as $match){
if(
(
$value['length'] < $match['length']
&&
$value['length'] * 4 > $match['length']
&&
stripos($match['value'], $value['value']) !== false
)
||
(
$match['length'] < $value['length']
&&
$match['length'] * 4 > $value['length']
&&
stripos($value['value'], $match['value']) !== false
)
||
(
abs($value['length'] - $match['length']) * 20 < ($value['length'] + $match['length'])
&&
0 < ($match['changes'] = levenshtein($value['value'], $match['value']))
&&
$match['changes'] * 20 <= ($value['length'] + $match['length'])
)
){
$matches[] = &$match;
}
}
// output matches for current outer loop value
}
I've tried to reduce calls to the comparatively expensive stripos and levenshtein functions where possible, which has reduced the execution time quite a bit.
However, as an O(n^2) operation this just doesn't scale to the larger sets of values and it seems that a significant amount of the processing time is spent simply iterating through the arrays.
Some properties of a few sets of values being operated on
Total | Strings | # of matches per string | |
Strings | With Matches | Average | Median | Max | Time (s) |
--------+--------------+---------+--------+------+----------+
844 | 413 | 1.8 | 1 | 58 | 140 |
593 | 156 | 1.2 | 1 | 5 | 62 |
272 | 168 | 3.2 | 2 | 26 | 10 |
157 | 47 | 1.5 | 1 | 4 | 3.2 |
106 | 48 | 1.8 | 1 | 8 | 1.3 |
62 | 47 | 2.9 | 2 | 16 | 0.4 |
Are there any other things I can do to reduce the time to check criteria, and more importantly are there any ways for me to reduce the number of criteria checks required (for example, by pre-processing the input values), since there is such low selectivity?
Edit: Implemented solution
// $values is ordered from shortest to longest string length
$values_count = count($values); // saves a ton of time, especially on linux
for($vid = 0; $vid < $values_count; $vid++){
for($mid = $vid+1; $mid < $values_count; $mid++){ // only check against longer strings
if(
(
$value['length'] * 4 > $match['length']
&&
stripos($match['value'], $value['value']) !== false
)
||
(
($match['length'] - $value['length']) * 20 < ($value['length'] + $match['length'])
&&
0 < ($changes = levenshtein($value['value'], $match['value']))
&&
$changes * 20 <= ($value['length'] + $match['length'])
)
){
// store match in both directions
$matches[$vid][$mid] = true;
$matches[$mid][$vid] = true;
}
}
}
// Sort outer array of matches alphabetically with uksort()
foreach($matches as $vid => $mids){
// sort inner array of matches by usage count with uksort()
// output matches
}
You could first order the strings by length ( O(N) ) and then only check smaller strings to be substrings or larger strings, plus only check with levenshtein in string pairs for which the difference is not too large.
You already perform these checks, but now you do it for all N x N pairs, while preselecting first by length will help you reduce the pairs to check first. Avoid the N x N loop, even if it contains only tests that will fail.
For substring matching you could further improve by creating an index for all smaller items, and update this accordingly as you parse larger items. The index should can form a tree structure branching on letters, where each word (string) forms a path from root to leaf. This way you can find if any of the words in the index compare to some string to match. For each character in your match string try to proceed any pointers in the tree index, and create a new pointer at the index. If a pointer can not be proceeded to a following character in the index, you remove it. If any pointer reaches a leaf note, you've found a substring match.
Implementing this is, I think, not difficult, but not trivial either.
You can get an instant 100% improvement by tightening your inner loop. Aren't you getting duplicate matches in your results?
For a preprocess step I'd go through and calculate character frequencies (assuming your set of characters is small like a-z0-9, which, given that you're using stripos, I think is likely). Then rather than comparing sequences (expensive) compare frequencies (cheap). This will give you false positives which you can either live with, or plug into the test you've currently got to weed out.

Categories