I'm currently storing user-generated surveys in JSON files, and am now converting these to a sql database. Regarding this portion of JSON:
"surveys": [
{
"surveyId": 1,
"name": "Landing Page Survey",
"active": true,
"panes": [
{
"type": "question",
"name": "Question",
"head": "Is there anything preventing you from signing up for a free 14-day trial?",
"response": "textbox",
"options": [
{
"data": "Time",
"target": "Response 1",
"placeholder": "",
"list": "f7b3cdeed8"
},
{
"data": "Money",
"target": "Response 1",
"placeholder": "",
"list": "local"
},
{
"data": "I'm not interested",
"target": "Thanks",
"placeholder": "",
"list": "local"
}
],
"button": "Send"
},
{
"type": "response",
"name": "Response 1",
"head": "Thanks for your interest in our product. Enter your email address to have a team member follow-up with you.",
"response": "email",
"options": [
{
"data": "data",
"target": "Thanks",
"placeholder": "Your email",
"list": "f7b3cdeed8"
}
],
"button": "Submit"
},
{
"type": "thanks",
"name": "Thanks",
"head": "Thanks for your feedback!",
"response": "multichoice",
"options": [
{
"data": "data",
"target": "",
"placeholder": "put response here",
"list": "local"
}
],
"button": "Button text"
}
]
The JSON isn't that important, just posting so you can see the idea.. surveys.panes are the questions of the survey. Each survey could have an infinite number of questions in it, so I'm struggling with how to store those.
Originally I was thinking of having a surveys table with columns for the questions, 'question_1_type, question_1_text', etc.. This would work if each survey was limited to an amount of questions (10 for example, so I could create 10 sets of columns). This feels horribly wrong though.
Or is it more correct to create a questions table, and for each question in a survey create a row in the questions table. Link it to the survey table with an id, then when you want to output the survey JSON for an API or whatever, just do a bunch of joins across the survey and questions table.
Also, if a survey question is multiple choice, it could have an infinite number of response options also, so would you have to build a table for those as well? Then for each survey JSON build you'd have to join across the surveys, questions, and question_options table. That seems like a lot of overhead running all the joins..
But as I understand, it's incorrect to store anything but one value in a column (an array for example) as it defeats the relational idea of a sql db.
Very noob question.. I haven't quite wrapped my head around correct database design. Appreciate any help!
Related
In DynamoDB i have a table with the following structure.
The actions "field" contains all the info (and this is the field i would like to search into) and orderId it's the primary key
{
"actions": [
{
"actionDescription": "8f23029def1d6baa4",
"actionTitle": "UNDEFINED_ACTION",
"timestamp": 1533730680,
"user": {
"fullName": "XXXXX",
"userName": "xxxxx#xxxx.xxx",
}
},
{
"actionDescription": "21857e61037bc29ec",
"actionTitle": "UNDEFINED_ACTION",
"timestamp": 1533731788,
"user": {
"fullName": "XXXXX",
"userName": "xxxxx#xxxx.xxx",
}
},
{
"actionDescription": "cf10abd44e24cef56",
"actionTitle": "UNDEFINED_ACTION",
"timestamp": 1533731788,
"user": {
"fullName": "XXXXX",
"userName": "xxxxx#xxxx.xxx",
}
},
{
"actionDescription": "7787fe7a5bf4d22de",
"actionTitle": "UNDEFINED_ACTION",
"timestamp": 1533731789,
"user": {
"fullName": "OOOOOO",
"userName": "ooooo#oooo.ooo",
}
},
{
"actionDescription": "9528c439021f504bf",
"actionTitle": "UNDEFINED_ACTION",
"timestamp": 1533731789,
"user": {
"fullName": "XXXXX",
"userName": "xxxxx#xxxx.xxx",
}
},
{
"actionDescription": "bfba100e0e54934b2",
"actionTitle": "UNDEFINED_ACTION",
"timestamp": 1533731789,
"user": {
"fullName": "XXXXX",
"userName": "xxxxx#xxxx.xxx",
}
},
{
"actionDescription": "f789dc12f1dbe3be2",
"actionTitle": "UNDEFINED_ACTION",
"timestamp": 1533731789,
"user": {
"fullName": "OOOOOO",
"userName": "ooooo#oooo.ooo",
}
},
{
"actionDescription": "4cd6b68dfea7cf8ee",
"actionTitle": "UNDEFINED_ACTION",
"timestamp": 1533731789,
"user": {
"fullName": "XXXXX",
"userName": "xxxxx#xxxx.xxx",
}
},
{
"actionDescription": "1e3a0e95f8e5106d7",
"actionTitle": "UNDEFINED_ACTION",
"timestamp": 1533731790,
"user": {
"fullName": "OOOOOO",
"userName": "ooooo#oooo.ooo",
}
}
],
"orderId": "13aae31"
}
What i would like to do it's to make the scan terms in PHP to be able to search by userName. or by any field inside the actions array (timestamp, actionTitle, etc, etc).
Bellow it's one of the many terms i tried to use but i was unable to achieve any results
$params = [
'TableName' => $this->tableName,
'FilterExpression' => "userName = :searchTerm",
'ExpressionAttributeValues' => [
':searchTerm' => 'ooooo#oooo.ooo',
],
'ReturnConsumedCapacity' => 'TOTAL',
];
$results = $this->dynamoDbClient->scan($params);
Can you please guide my by telling me what i'm missing?
Also, please note: I don't want to get a specific orderId, i would like to get ALL orderIds containing the searchTerm (in this case userName)
Your best bet with this item schema is to filter the table items yourself. That is to say, scan the table with no filter expression and write your own code to filter the results. Scanning without the filter expression will consume the same amount of read capacity units.
You can set the filter expression to something like this, however this isn't scalable and only works if you have a fixed number of items in the actions list.
actions[0].user.userName == :searchTerm OR actions[1].user.userName == :searchTerm OR actions[2].user.userName == :searchTerm OR ....
If you need complex search abilities you are probably better off using a dedicated search database. AWS provides two services around this, AWS CloudSearch and AWS ElasticSearch. You can use DynamoDB streams to keep your search indexes up to date.
If you are set on scanning the DynamoDB table with a filter you can refactor your structure to include additional attributes that have all the searchable information in a set (or concatenated string)
{
"actions": [....],
"actionsDescriptions": Set["8f23029def1d6baa4", "21857e61037bc29ec", "cf10abd44e24cef56", "7787fe7a5bf4d22de", "9528c439021f504bf", "bfba100e0e54934b2", "f789dc12f1dbe3be2", "4cd6b68dfea7cf8ee", "1e3a0e95f8e5106d7"],
"actionTitles": Set["UNDEFINED_ACTION"],
"timestamps": Set[1533730680, 1533731788, 1533731789, 1533731790],
"user_fullNames": Set["XXXXX"],
"user_userNames": Set["ooooo#oooo.ooo", "xxxxx#xxxx.xxx"],
"orderId": "13aae31"
}
Notice you have to use a Set (or concatenate all the values into a string) since the contains functions only works on strings and sets.
Then you can use a filter expression like this
contains(user_userNames, :searchTerm)
The DynamoDB QueryFilter and ScanFilter options do not currently support the CONTAINS operator for maps. You'll need to build another lookup table indexed by userName to avoid scanning the entire table.
E.g. new table schema:
{
"userName": "xxxxx#xxxx.xxx"
"orderId": "13aae31"
}
Where the hash key is userName and orderId is the ID of an order in the other table.
The closest you can get with the current schema is to use #cementblocks's suggestions to scan the whole table and filter application-side or query each element in the list individually.
If you are adding a "Search" like feature to your application, then scanning may not be the best approach.
DynamoDB scan can be expensive and slow, especially when you have many rows.
So, if you intend on adding a "Search" feature you may consider using AWS CloudSearch. It is a scalable "Search" feature. You can quickly enable "Search" from a DynamoDB table.
I've created two collections "Users" and "Posts".
Users document structure is as follows:
{
"_id": {
"$oid": "54dde0e32a2a999c0f00002a"
},
"first_name": "Vamsi",
"last_name": "Krishna",
"email": "vamshi#test.com",
"password": "5f4dcc3b5aa765d61d8327deb882cf99",
"date_of_birth": "1999-01-05",
"gender": "male",
"status": "Active",
"date_created": "2015-02-13 12:32:50"
}
While posts document structure is:
{
"_id": {
"$oid": "54e1a2892a2a99d00500002b"
},
"post_description": "Test post 1",
"posted_by": {
"id": "54dde0e32a2a999c0f00002a",
"first_name": "Vamsi",
"last_name": "Krishna",
"gender": "male"
},
"posted_on": "2015-02-16 08:55:53",
"comments": [],
"likes": {
"count": 0,
"liked_by": []
}
}
My query is that when user updates his information it should reflect everywhere like posted by, commented by and liked by. How can I achieve that?
I'm using PHP.
Thanks!!
Mongodb does not have a notion similar to sql on update cascade, so you have to do this in your application (whenever you update user information, update all other documents that relate to this user in other collections).
As you might have guessed this is super inefficient when there are a lot of such documents, which means that your schema is bad. Just have a userID in your document and this will link to your user's collection.
So this is my first time using JSON Schema and I have a fairly basic question about requirements.
My top level schema is as follows:
schema.json:
{
"id": "http://localhost/srv/schemas/schema.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"event": { "$ref": "events_schema.json#" },
"building": { "$ref": "buildings_schema.json#" }
},
"required": [ "event" ],
"additionalProperties": false
}
I have two other schema definition files (events_schema.json and buildings_schema.json) that have object field definitions in them. The one of particular interest is buildings_schema.json.
buildings_schema.json:
{
"id": "http://localhost/srv/schemas/buildings_schema.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "buildings table validation definition",
"type": "object",
"properties": {
"BuildingID": {
"type": "integer",
"minimum": 1
},
"BuildingDescription": {
"type": "string",
"maxLength": 255
}
},
"required": [ "BuildingID" ],
"additionalProperties": false
}
I am using this file to test my validation:
test.json:
{
"event": {
"EventID": 1,
"EventDescription": "Some description",
"EventTitle": "Test title",
"EventStatus": 2,
"EventPriority": 1,
"Date": "2007-05-05 12:13:45"
},
"building": {
"BuildingID": 1,
}
}
Which passes validation fine. But when I use the following:
test2.json
{
"event": {
"EventID": 1,
"EventDescription": "Some description",
"EventTitle": "Test title",
"EventStatus": 2,
"EventPriority": 1,
"Date": "2007-05-05 12:13:45"
}
}
I get the error: [building] the property BuildingID is required
Inside my buildings_schema.json file I have the line "required": [ "BuildingID" ] which is what causes the error. It appears that the schema.json is traversing down the property definitions and enforcing all the requirements. This is counter intuitive and I would like it to ONLY enforce a requirement if it's parent property is enforced.
I have a few ways around this that involve arrays and fundamentally changing the structure of the JSON, but that kind of defeats the purpose of my attempts at validating existing JSON. I have read over the documentation (/sigh) and have not found anything relating to this issue. Is there a some simple requirement inheritance setting I am missing?
I am using the Json-Schema for PHP implementation from here: https://github.com/justinrainbow/json-schema
After messing with different validators, it appears to be an issue with the validator. The validator assumes required inheritance through references. I fixed this by simply breaking apart the main schema into subschemas and only using the required subschema when necessary.
Recently, our team is going to develop mobile(iphone, android platforms) applications for our existing website, let user can use the application to more easy to read our content via the application.
But our team have different views in JSON schema of the API return, below are the sample response.
Schema type 1:
{
"success": 1,
"response": {
"threads": [
{
"thread_id": 9999,
"title": "Topic haha",
"content": "blah blah blah",
"category": {
"category_id": 100,
"category_name": "Chat Room",
"category_permalink": "http://sample.com/category/100"
},
"user": {
"user_id": 1,
"name": "Hello World",
"email": "helloworld#hello.com",
"user_permalink": "http://sample.com/user/Hello_World"
},
"post_ts": "2012-12-01 18:16:00T0800"
},
{
"thread_id": 9998,
"title": "asdasdsad ",
"content": "dsfdsfdsfds dsfdsf ds",
"category": {
"category_id": 101,
"category_name": "Chat Room 2",
"category_permalink": "http://sample.com/category/101"
},
"user": {
"user_id": 2,
"name": "Hello baby",
"email": "hellobaby#hello.com",
"user_permalink": "http://sample.com/user/2"
},
"post_ts": "2012-12-01 18:15:00T0800"
}
]
}
}
Schema type 2:
{
"success": 1,
"response": {
"threads": [
{
"thread_id": 9999,
"title": "Topic haha",
"content": "blah blah blah",
"category": 100,
"user": 1,
"post_ts": "2012-12-01 18:16:00T0800"
},
{
"thread_id": 9998,
"title": "asdasdsad ",
"content": "dsfdsfdsfds dsfdsf ds",
"category": 101,
"user": 2,
"post_ts": "2012-12-01 18:15:00T0800"
}
],
"category": [
{
"category_id": 100,
"category_name": "Chat Room",
"category_permalink": "http://sample.com/category/100"
},
{
"category_id": 101,
"category_name": "Chat Room 2",
"category_permalink": "http://sample.com/category/101"
}
],
"user": [
{
"user_id": 1,
"name": "Hello World",
"email": "helloworld#hello.com",
"user_permalink": "http://sample.com/user/Hello_World"
},
{
"user_id": 2,
"name": "Hello baby",
"email": "hellobaby#hello.com",
"user_permalink": "http://sample.com/user/Hello_baby"
}
]
}
}
Some Developers claim that if using schema type 2,
can reduce data size if the category & user entities comes too much duplicated. it does really reduce at least 20~40% size of response plain text.
once if the data size come less, in parsing it to JSON object, the memory get less
categoey & user can be store in hash-map, easy to reuse
reduce the overhead on retrieving data
I have no idea on it if schema type 2 does really enhanced. Because I read so many API documentation, never seen this type of schema design. For me, it looks like a relational database. So I have few questions, because I have no experience on designing a web services API.
Does it against API design principle (Easy to read, Easy to use) ?
Does it really get faster and get less memory resource on parsing on IOS / Android platform?
Does it can reduce the overhead between client & server?
Thanks you.
When I do such an application for android, I parse JSON just one and put it in database. Later I'm using ContentProvider to access it. In Your case You could use 2nd schema but without user, category part. Use lazy loading instead but it will be good solution just in case categories and users repeat often.
How can we identity the date on which someone liked my page.
is there any way where we can identify the date on which someone liked my page ?
No. You can't even get a list of people that like your page, so you can't get a date they liked it. The only information you can get is how many people like it.
You can view a chart of how many people liked your page over time at Facebook Insights.
Well no, You can make a graph call to the statuses and feeds of a user with valid access_token to get the id and name of the people who liked the post.. The timestamp can be found for the comments though ..
{
"id": "257821xxxxxxx",
"from": {
"name": "Maxxxxxx",
"id": "100xxxxxx"
},
"message": "incredible ..",
"updated_time": "2011-09-15T11:21:15+0000",
"likes": {
"data": [
{
"id": "6xxxxxx6",
"name": "Axxxxxxxxxa"
}
]
},
"comments": {
"data": [
{
"id": "257xxxxxxxxxxxx904",
"from": {
"name": "Maxxxxxxxxxxal",
"id": "1xxxxxxxxxxxxxx"
},
"message": "htxxxxxxxxxxxxxxxxxxxxxxxxxx",
"can_remove": true,
"created_time": "2011-09-15T11:22:06+0000"
}
]
}
}