All-
New to HBase and I've finally been able to actually take data I was once storing in MySQL (about 50 million rows) and insert it into my HBase table.
I'm now trying to query this data based on the keys and am running into some problems.
Basically I have a key that is constructed like:
objectname-createdtime-customerid
Now I need to query based on the objectname and a range for the createdtime, does anyone know how I can do this? (I'm using PHP/Thrift, but I don't need it to be a specific answer to this)
I can query if I know the exact row/key, I just need to know how to specify a range now for the middle property.
Thanks in advance!
Use a scan where the start row is the one with key objectname-<min created time>-customerid and the stop row has key objectname-<max created time>-customerid.
http://wiki.apache.org/hadoop/Hbase/ThriftApi#Scanner_methods
Related
I have a database in which i've got a table where im having question to be displayed randomly on the main site (about 80 for now). Im reading all the IDs from the database and then randomly selecting one and doing next query to get all the rest needed data of this one. And im curious if should i leave this like that or would it be bether to store all the IDs in .json file and just update it every time i add a question. What is bether? Thanks for help.
If you're just interested in a random record from the table, just do it like this:
SELECT * FROM your_table
ORDER BY RAND()
LIMIT 1;
All in one query and you don't have to retrieve a list of IDs first.
And it's almost always a bad idea to maintain two separate data sources.
I have a large table of about 14 million rows. Each row has contains a block of text. I also have another table with about 6000 rows and each row has a word and six numerical values for each word. I need to take each block of text from the first table and find the amount of times each word in the second table appears then calculate the mean of the six values for each block of text and store it.
I have a debian machine with an i7 and 8gb of memory which should be able to handle it. At the moment I am using the php substr_count() function. However PHP just doesn't feel like its the right solution for this problem. Other than working around time-out and memory limit problems does anyone have a better way of doing this? Is it possible to use just SQL? If not what would be the best way to execute my PHP without overloading the server?
Do each record from the 'big' table one-at-a-time. Load that single 'block' of text into your program (php or what ever), and do the searching and calculation, then save the appropriate values where ever you need them.
Do each record as its own transaction, in isolation from the rest. If you are interrupted, use the saved values to determine where to start again.
Once you are done the existing records, you only need to do this in the future when you enter or update a record, so it's much easier. You just need to take your big bite right now to get the data updated.
What are you trying to do exactly? If you are trying to create something like a search engine with a weighting function, you maybe should drop that and instead use the MySQL fulltext search functions and indices that are there. If you still need to have this specific solution, you can of course do this completely in SQL. You can do this in one query or with a trigger that is run each time after a row is inserted or updated. You wont be able to get this done properly with PHP without jumping through a lot of hoops.
To give you a specific answer, we indeed would need more information about the queries, data structures and what you are trying to do.
Redesign IT()
If for size on disc is not !important just joints table into one
Table with 6000 put into memory [ memory table ] and make backup every one hour
INSERT IGNORE into back.table SELECT * FROM my.table;
Create "own" index in big table eq
Add column "name index" into big table with id of row
--
Need more info about query to find solution
Can some shed some light on hoe to get the number of rows (using php) in a table without actually having to read all the rows? I am using squlite to log data periodically and need to know the table row count before I actually access specific data?
Apart from reading all rows and incrementing a counter, I cannot seem to work out how to do this quickly (it's a large database) rather simple requirement? I have tried the following php code but it only returns a boolean response rather that the actual number of rows?
$result = $db->query('SELECT count(*) FROM mdata');
Normally the SELECT statement will also return the data object (if there is any?)
Just use
LIMIT 1
That should work!! It Limits the result to ONLY look at 1 row!!
if you have record set then you get number or record by mysql_num_rows(#Record Set#);
I have a query to list a set of numbers, for example 1000000 through 2000000, I then run another query in that while loop to see if it matches another table in another database. This part runs fine, but a little slow.
I then need to have another query such that if it returns false, then it does another check on yet another table. The problem I'm having though, is that the check in this table is not as simple as a match.
The table structure on last table is like this:
firstnum
secondnum
This is intended for use in a range of numbers. So row 1 for example might be:
1000023, 1000046
This would mean it's for all numbers between and including those values.
There are thousands of these entries in the DB, and I'm trying to figure out the best way to determine if that particular number I'm searching on exists in that table somewhere, but since it's not a direct match, I'm not sure how to accomplish this. The table is also PostgreSQL while the main queries are MySQL.
It's a bit hard to understand what you're trying to say, but I'm afraid the solution is ridiculously simple: ... WHERE firstnum <= X AND X <= secondnum, where X is the number you are looking for.
I'm running a sql query to get basic details from a number of tables. Sorted by the last update date field. Its terribly tricky and I'm thinking if there is an alternate to using the UNION clause instead...I'm working in PHP MYSQL.
Actually I have a few tables containing news, articles, photos, events etc and need to collect all of them in one query to show a simple - whats newly added on the website kind of thing.
Maybe do it in PHP rather than MySQL - if you want the latest n items, then fetch the latest n of each of your news items, articles, photos and events, and sort in PHP (you'll need the last n of each obviously, and you'll then trim the dataset in PHP). This is probably easier than combining those with UNION given they're likely to have lots of data items which are different.
I'm not aware of an alternative to UNION that does what you want, and hopefully those fetches won't be too expensive. It would definitely be wise to profile this though.
If you use Join in your query you can select datas from differents tables who are related with foreign keys.
You can look of this from another angle: do you need absolutely updated information? (the moment someone enters new information it should appear)
If not, you can have a table holding the results of the query in the format you need (serving as cache), and update this table every 5 minutes or so. Then your query problem becomes trivial, as you can have the updates run as several updates in the background.