I need to update tags column so each cell has the content like this:
2-5-1-14-5
or
3-9-14-19-23
or simmilar (five integers, in range from 1-25).
id column is not consecutive from 1-117, but anyway min id is 1 and max 117.
$arr = [];
$str = '';
$id = 1;
for ($x = 1; $x <= 25; $x++){
array_push($arr, $x);
}
while ($id < 117) {
shuffle($arr);
array_splice($arr, 5, 25);
foreach ($arr as $el){
$str .= $el . '-';
}
$str = rtrim($str,'-');
$db->query("update posts set tags = '" . $str . "' where id = " . $id);
$id += 1;
}
I'm not sure how to describe the final result, but it seems that the majority of cells are written multiple times.
Any help ?
To combine my comments into one piece of code:
$full = range(1, 25);
$id = 1;
while ($id < 117) {
shuffle($full);
$section = array_slice($full, 0, 5);
$str = implode('-',$section);
$db->query("update posts set tags = '" . $str . "' where id = " . $id);
$id += 1;
}
So the reset of $str is not needed anymore since I have inserted the implode() where it seems functional. The other bits of code could probably be improved.
Two warnings:
Using PHP variables directly in queries is not a good idea. Please use parameter binding. This particular piece of code might not be vulnerable to SQL-injection but if you do the same elsewhere it might be.
Your database doesn't seem to be normalized. This might cause trouble for you in the long run when you expand your application.
Related
I want to merge cells dynamically based on count using PHPEXCEl.
For example:
if $count = 2;
I want to merge two cells as given below,
$objPHPExcel->getActiveSheet()->mergeCells('A1:B1');
similarly, if $count = 4;
$objPHPExcel->getActiveSheet()->mergeCells('C1:F1');
similarly, if $count = 5;
$objPHPExcel->getActiveSheet()->mergeCells('G1:K1');
I want to get this logic in a loop.
I tried the below logic, which doesn't work
$count = ew_Execute("SELECT COUNT(*) FROM ems_defects_codes WHERE DEF_CODE = '$def_code'");
$start_letter = A;
$rowno = 1;
for ($i = 0; $i < $count ; $i++) {
$objPHPExcel->getActiveSheet()->mergeCells($start_letter.$rowno.':'.$i.$rowno);
}
Any help will be much appreciated.Thanks..!!
You need to get column range string value for the inputs - start_letter, row_number and count. Once the column range is available, same can be used in the PHPExcel mergeCells function. Here is example code to get column range:
function getColRange($start_letter, $row_number, $count) {
$alphabets = range('A', 'Z');
$start_idx = array_search(
$start_letter,
$alphabets
);
return sprintf(
"%s%s:%s%s",
$start_letter,
$row_number,
$alphabets[$start_idx + $count],
$row_number
);
}
print getColRange('A', 1, 2) . PHP_EOL;
print getColRange('C', 1, 4) . PHP_EOL;
print getColRange('G', 1, 4) . PHP_EOL;
Output
A1:C1
C1:G1
G1:K1
Further you can use this new function with your code to do actual merge. You can choose to call this function or in a loop.
$sheet = $objPHPExcel->getActiveSheet();
$sheet->mergeCells(
getColRange(
$start_letter,
$row_number,
$count
)
);
The problem is that your $i inside of your loop is always going to be an integer; you need to convert that integer to the corresponding index of the alphabet, by creating an alphabetic array. This can be done with a simple range('A', 'Z').
You also need to wrap the A in $start_letter in apostrophes (as 'A'), and now that the range has been created, you can simply use the index of the alphabet for that:$start_letter = 0 (later becoming 'A' with $alphabet[$start_letter]).
Then you'll need to add the starting letter to the count for in order to get the ending cell in mergeCells(). Your starting cell now becomes $alphabet[$start_letter] . $rowno, and your ending cell now becomes ($alphabet[$start_letter] + $alphabet[$i]) . $rowno.
This can be seen in the following:
$count = ew_Execute("SELECT COUNT(*) FROM ems_defects_codes WHERE DEF_CODE = '$def_code'");
$alphabet = range('A', 'Z');
$start_letter = 0;
$rowno = 1;
for ($i = 0; $i < $count; $i++) {
$objPHPExcel->getActiveSheet()->mergeCells($alphabet[$start_letter] . $rowno . ':' . ($alphabet[$start_letter] + $alphabet[$i]) . $rowno);
}
I've got a script that I needed to change since the data which is going to be inserted into the db got too big to do it at once. So I created a loop, that splits up the array in blocks of 6000 rows and then inserts it.
I don't know exactly if the data is to big for the server to process at once or if it's too big to upload, but atm I got both steps split up in these 6000s blocks.
Code:
for ($j = 0; $j <= ceil($alength / 6000); $j++){
$array = array_slice($arraysource, $j * 6000, 5999);
$sql = "INSERT INTO Ranking (rank, name, score, kd, wins, kills, deaths, shots, time, spree) VALUES ";
foreach($array as $a=>$value){
//transforming code for array
$ra = $array[$a][0];
$na = str_replace(",", ",", $array[$a][1]);
$na = str_replace("\", "\\\\", $na);
$na = str_replace("'", "\'", $na);
$sc = $array[$a][2];
$kd = $array[$a][3];
$wi = $array[$a][4];
$ki = $array[$a][5];
$de = $array[$a][6];
$sh = $array[$a][7];
$ti = $array[$a][8];
$sp = $array[$a][9];
$sql .= "('$ra',' $na ','$sc','$kd','$wi','$ki','$de','$sh','$ti','$sp'),";
}
$sql = substr($sql, 0, -1);
$conn->query($sql);
}
$conn->close();
Right now it only inserts the first 5999 rows, but not more as if it only executed the loop once. No error messages..
Don't know if this'll necessarily help, but what about using array_chunk, array_walk, and checking error codes (if any)? Something like this:
function create_query(&$value, $key) {
//returns query statements; destructive though.
$value[1] = str_replace(",", ",", $value[1]);
$value[1] = str_replace("\", "\\\\", $value[1]);
$value[1] = str_replace("'", "\'", $value[1]);
$queryvalues = implode("','",$value);
$value = "INSERT INTO Ranking (rank, name, score, kd, wins, kills, deaths, shots, time, spree) VALUES ('".$queryvalues."');";
}
$array = array_chunk($arraysource, 6000);
foreach($array as $key=>$value){
array_walk($value,'create_query');
if (!$conn->query($value)) {
printf("Errorcode: %d\n", $conn->errno);
}
}
$conn->close();
Secondly, have you considered using mysqli::multi_query? It'll do more queries at once, but you'll have to check the max allowed packet size (max_allowed_packet).
Another tip would be to check out the response from the query, which your code doesn't include.
Thanks for the tips but I figured it out. Didn't think about this ^^
it was the first line after the for loop that i didnt include in my question:
array_unshift($array[$a], $a + 1);
this adds an additional value infront of each user, the "rank". But the numbers would repeat after one loop finishes and it can't import users with the same rank.
now it works:
array_unshift($array[$a], $a + 1 + $j * 5999);
I am trying to create a random string which will be used as a short reference number. I have spent the last couple of days trying to get this to work but it seems to get to around 32766 records and then it continues with endless duplicates. I need at minimum 200,000 variations.
The code below is a very simple mockup to explain what happens. The code should be syntaxed according to 1a-x1y2z (example) which should give a lot more results than 32k
I have a feeling it may be related to memory but not sure. Any ideas?
<?php
function createReference() {
$num = rand(1, 9);
$alpha = substr(str_shuffle("abcdefghijklmnopqrstuvwxyz"), 0, 1);
$char = '0123456789abcdefghijklmnopqrstuvwxyz';
$charLength = strlen($char);
$rand = '';
for ($i = 0; $i < 6; $i++) {
$rand .= $char[rand(0, $charLength - 1)];
}
return $num . $alpha . "-" . $rand;
}
$codes = [];
for ($i = 1; $i <= 200000; $i++) {
$code = createReference();
while (in_array($code, $codes) == true) {
echo 'Duplicate: ' . $code . '<br />';
$code = createReference();
}
$codes[] = $code;
echo $i . ": " . $code . "<br />";
}
exit;
?>
UPDATE
So I am beginning to wonder if this is not something with our WAMP setup (Bitnami) as our local machine gets to exactly 1024 records before it starts duplicating. By removing 1 character from the string above (instead of 6 in the for loop I make it 5) it gets to exactly 32768 records.
I uploaded the script to our centos server and had no duplicates.
What in our enviroment could cause such a behaviour?
The code looks overly complex to me. Let's assume for the moment you really want to create n unique strings each based on a single random value (rand/mt_rand/something between INT_MIN,INT_MAX).
You can start by decoupling the generation of the random values from the encoding (there seems to be nothing in the code that makes a string dependant on any previous state - excpt for the uniqueness). Comparing integers is quite a bit faster than comparing arbitrary strings.
mt_rand() returns anything between INT_MIN and INT_MAX, using 32bit integers (could be 64bit as well, depends on how php has been compiled) that gives ~232 elements. You want to pick 200k, let's make it 400k, that's ~ a 1/10000 of the value range. It's therefore reasonable to assume everything goes well with the uniqueness...and then check at a later time. and add more values if a collision occured. Again much faster than checking in_array in each iteration of the loop.
Once you have enough values, you can encode/convert them to a format you wish. I don't know whether the <digit><character>-<something> format is mandatory but assume it is not -> base_convert()
<?php
function unqiueRandomValues($n) {
$values = array();
while( count($values) < $n ) {
for($i=count($values);$i<$n; $i++) {
$values[] = mt_rand();
}
$values = array_unique($values);
}
return $values;
}
function createReferences($n) {
return array_map(
function($e) {
return base_convert($e, 10, 36);
},
unqiueRandomValues($n)
);
}
$start = microtime(true);
$references = createReferences(400000);
$end = microtime(true);
echo count($references), ' ', count(array_unique($references)), ' ', $end-$start, ' ', $references[0];
prints e.g. 400000 400000 3.3981630802155 f3plox on my i7-4770. (The $end-$start part is constantly between 3.2 and 3.4)
Using base_convert() there can be strings like li10, which can be quite annoying to decipher if you have to manually type the string.
I have the following code - it produces a series of queries that are sent to a database:
$a = 'q';
$aa = 1;
$r = "$a$aa";
$q = 54;
while($aa <= $q){
$query .= "SELECT COUNT(". $r .") as Responses FROM tresults;";
$aa = $aa + 1;
$r = "$a$aa";
}
The issue I have is simple, within the database, the number is not sequential.
I have fields that go from q1 to q13 but then goes q14a, q14b, q14c, q14d and q14e and then from q15 to q54.
I've looked at continue but that's more for skipping iterations and hasn't helped me.
I'm struggling to adapt the above code to handle this non-sequential situation. Any ideas and suggestions welcomed.
I have fields that go from q1 to q13 but then goes q14a, q14b, q14c, q14d and q14e and then from q15 to q54.
for($i=1; $i<=54; ++$i) {
if($i != 14) {
echo 'q' . $i . "<br>";
}
else {
for($j='a'; $j<='e'; ++$j) {
echo 'q14' . $j . "<br>";
}
}
}
If you don’t need to execute the statements in order of numbering, then you could also just skip one in the first loop if the counter is 14, and then have a second loop (not nested into the first one), that does the q14s afterwards.
You could get the columns from the table and test to see if they start with q (or use a preg_match):
$result = query("DESCRIBE tresults");
while($row = fetch($result)) {
if(strpos($row['Field'], 'q') === 0) {
$query .= "SELECT COUNT(". $r .") as Responses FROM tresults;";
}
}
Or build the columns array and use it:
$columns = array('q1', 'q2', 'q54'); //etc...
foreach($columns as $r) {
$query .= "SELECT COUNT(". $r .") as Responses FROM tresults;";
}
I have the following PHP function to calculate the relation between to texts:
function check($terms_in_article1, $terms_in_article2) {
$length1 = count($terms_in_article1); // number of words
$length2 = count($terms_in_article2); // number of words
$all_terms = array_merge($terms_in_article1, $terms_in_article2);
$all_terms = array_unique($all_terms);
foreach ($all_terms as $all_termsa) {
$term_vector1[$all_termsa] = 0;
$term_vector2[$all_termsa] = 0;
}
foreach ($terms_in_article1 as $terms_in_article1a) {
$term_vector1[$terms_in_article1a]++;
}
foreach ($terms_in_article2 as $terms_in_article2a) {
$term_vector2[$terms_in_article2a]++;
}
$score = 0;
foreach ($all_terms as $all_termsa) {
$score += $term_vector1[$all_termsa]*$term_vector2[$all_termsa];
}
$score = $score/($length1*$length2);
$score *= 500; // for better readability
return $score;
}
The variable $terms_in_articleX must be an array containing all single words which appear in the text.
Assuming I have a database of 20,000 texts, this function would take a very long time to run through all the connections.
How can I accelerate this process? Should I add all texts into a huge matrix instead of always comparing only two texts? It would be great if you had some approaches with code, preferably in PHP.
I hope you can help me. Thanks in advance!
You can split the text on adding it. Simple example: preg_match_all(/\w+/, $text, $matches); Sure real splitting is not so simple... but possible, just correct the pattern :)
Create table id(int primary autoincrement), value(varchar unique) and link-table like this: word_id(int), text_id(int), word_count(int). Then fill the tables with new values after splitting text.
Finally you can do with this data anything you want, quickly operating with indexed integers(IDs) in DB.
UPDATE:
Here are the tables and queries:
CREATE TABLE terms (
id int(11) NOT NULL auto_increment, value char(255) NOT NULL,
PRIMARY KEY (`id`), UNIQUE KEY `value` (`value`)
);
CREATE TABLE `terms_in_articles` (
term int(11) NOT NULL,
article int(11) NOT NULL,
cnt int(11) NOT NULL default '1',
UNIQUE KEY `term` (`term`,`article`)
);
/* Returns all unique terms in both articles (your $all_terms) */
SELECT t.id, t.value
FROM terms t, terms_in_articles a
WHERE a.term = t.id AND a.article IN (1, 2);
/* Returns your $term_vector1, $term_vector2 */
SELECT article, term, cnt
FROM terms_in_articles
WHERE article IN (1, 2) ORDER BY article;
/* Returns article and total count of term entries in it ($length1, $length2) */
SELECT article, SUM(cnt) AS total
FROM terms_in_articles
WHERE article IN (1, 2) GROUP BY article;
/* Returns your $score wich you may divide by ($length1 / $length2) from previous query */
SELECT SUM(tmp.term_score) * 500 AS total_score FROM
(
SELECT (a1.cnt * a2.cnt) AS term_score
FROM terms_in_articles a1, terms_in_articles a2
WHERE a1.article = 1 AND a2.article = 2 AND a1.term = a2.term
GROUP BY a2.term, a1.term
) AS tmp;
Well, now, I hope, this will help? The 2 last queries are enough to perform your task. Other queries are just in case. Sure, you can count more stats like "the most popular terms" etc...
Here's a slightly optimized version of your original function. It produces the exact same results. (I run it on two articles from Wikipedia with 10000+ terms and like 20 runs each:
check():
test A score: 4.55712524522
test B score: 5.08138042619
--Time: 1.0707
check2():
test A score: 4.55712524522
test B score: 5.08138042619
--Time: 0.2624
Here's the code:
function check2($terms_in_article1, $terms_in_article2) {
$length1 = count($terms_in_article1); // number of words
$length2 = count($terms_in_article2); // number of words
$score_table = array();
foreach($terms_in_article1 as $term){
if(!isset($score_table[$term])) $score_table[$term] = 0;
$score_table[$term] += 1;
}
$score_table2 = array();
foreach($terms_in_article2 as $term){
if(isset($score_table[$term])){
if(!isset($score_table2[$term])) $score_table2[$term] = 0;
$score_table2[$term] += 1;
}
}
$score =0;
foreach($score_table2 as $key => $entry){
$score += $score_table[$key] * $entry;
}
$score = $score / ($length1*$length2);
$score *= 500;
return $score;
}
(Btw. The time needed to split all the words into arrays was not included.)
EDIT: Trying to be more explicit:
First, encode every term into an
integer. You can use a dictionary
associative array, like this:
$count = 0;
foreach ($doc as $term) {
$val = $dict[$term];
if (!defined($val)) {
$dict[$term] = $count++;
}
$doc_as_int[$val] ++;
}
This way, you replace string
calculations with integer
calculations. For example, you can
represent the word "cloud" as the
number 5, and then use the index 5
of arrays to store counts of the
word "cloud". Notice that we only
use associative array search here,
no need for CRC etc.
Do store all texts as a matrix, preferably a sparse one.
Use feature selection (PDF).
Maybe use a native implementation in a faster language.
I suggest you first use K-means with about 20 clusters, this way get a rough draft of which document is near another, and then compare only pairs inside each cluster. Assuming uniformly-sized cluster, this improves the number of comparisons to 20*200 + 20*10*9 - around 6000 comparisons instead of 19900.
If you can use simple text instead of arrays for comparing, and if i understood right where your goal is, you can use the levenshtein php function (that is usually used for give the google-like 'Did you meaning ...?' function in php search engines).
It works in the opposite way youre using: return the difference between two strings.
Example:
<?php
function check($a, $b) {
return levenshtein($a, $b);
}
$a = 'this is just a test';
$b = 'this is not test';
$c = 'this is just a test';
echo check($a, $b) . '<br />';
//return 5
echo check($a, $c) . '<br />';
//return 0, the strings are identical
?>
But i dont know exactly if this will improve the speed of execution.. but maybe yes, you take-out many foreach loops and the array_merge function.
EDIT:
A simply test for the speed (is a 30-second-wroted-script, its not 100% accurated eh):
function check($terms_in_article1, $terms_in_article2) {
$length1 = count($terms_in_article1); // number of words
$length2 = count($terms_in_article2); // number of words
$all_terms = array_merge($terms_in_article1, $terms_in_article2);
$all_terms = array_unique($all_terms);
foreach ($all_terms as $all_termsa) {
$term_vector1[$all_termsa] = 0;
$term_vector2[$all_termsa] = 0;
}
foreach ($terms_in_article1 as $terms_in_article1a) {
$term_vector1[$terms_in_article1a]++;
}
foreach ($terms_in_article2 as $terms_in_article2a) {
$term_vector2[$terms_in_article2a]++;
}
$score = 0;
foreach ($all_terms as $all_termsa) {
$score += $term_vector1[$all_termsa]*$term_vector2[$all_termsa];
}
$score = $score/($length1*$length2);
$score *= 500; // for better readability
return $score;
}
$a = array('this', 'is', 'just', 'a', 'test');
$b = array('this', 'is', 'not', 'test');
$timenow = microtime();
list($m_i, $t_i) = explode(' ', $timenow);
for($i = 0; $i != 10000; $i++){
check($a, $b);
}
$last = microtime();
list($m_f, $t_f) = explode(' ', $last);
$fine = $m_f+$t_f;
$inizio = $m_i+$t_i;
$quindi = $fine - $inizio;
$quindi = substr($quindi, 0, 7);
echo 'end in ' . $quindi . ' seconds';
print: end in 0.36765 seconds
Second test:
<?php
function check($a, $b) {
return levenshtein($a, $b);
}
$a = 'this is just a test';
$b = 'this is not test';
$timenow = microtime();
list($m_i, $t_i) = explode(' ', $timenow);
for($i = 0; $i != 10000; $i++){
check($a, $b);
}
$last = microtime();
list($m_f, $t_f) = explode(' ', $last);
$fine = $m_f+$t_f;
$inizio = $m_i+$t_i;
$quindi = $fine - $inizio;
$quindi = substr($quindi, 0, 7);
echo 'end in ' . $quindi . ' seconds';
?>
print: end in 0.05023 seconds
So, yes, seem faster.
Would be nice to try with many array items (and many words for levenshtein)
2°EDIT:
With similar text the speed seem to be equal to the levenshtein method:
<?php
function check($a, $b) {
return similar_text($a, $b);
}
$a = 'this is just a test ';
$b = 'this is not test';
$timenow = microtime();
list($m_i, $t_i) = explode(' ', $timenow);
for($i = 0; $i != 10000; $i++){
check($a, $b);
}
$last = microtime();
list($m_f, $t_f) = explode(' ', $last);
$fine = $m_f+$t_f;
$inizio = $m_i+$t_i;
$quindi = $fine - $inizio;
$quindi = substr($quindi, 0, 7);
echo 'end in ' . $quindi . ' seconds';
?>
print: end in 0.05988 seconds
But it can take more than 255 char:
Note also that the complexity of this
algorithm is O(N**3) where N is the
length of the longest string.
and, it can even return the similary value in percentage:
function check($a, $b) {
similar_text($a, $b, $p);
return $p;
}
Yet another edit
What about create a database function, to make the compare directly in the sql query, instead of retrieving all the data and loop them?
If youre running Mysql, give a look at this one (hand-made levenshtein function, still 255 char limit)
Else, if youre on Postgresql, this other one (many functions that should be evalutate)
Another approach to take would be Latent Semantic Analysis, which leverages a large corpus of data to find similarities between documents.
The way it works is by taking the co-occurance matrix of the text and comparing it to the Corpus, essentially providing you with an abstract location of your document in a 'semantic space'. This will speed up your text comparison, as you can compare documents using Euclidian distance in the LSA Semantic space. It's pretty fun semantic indexing. Thus, adding new articles will not take much longer.
I can't give a specific use case of this approach, having only learned it in school but it appears that KnowledgeSearch is an open source implementation of the algorithm.
(Sorry, its my first post, so can't post links, just look it up)