Bulk inserting documents into couchbase database with php - how to? - php

I am experimenting actually a little bit with couchbase server.
I have tried to read a mysql database table, build a document from each data row and then inserting the document with an id which I generate with
uniqid('table_name');
via cUrl, method is POST.
This so far works pretty good, until the script has inserted roundabout 7050 documents. Then an exception is thrown -> "No buffer space".
Until now I was not able to fix this, so I decided to collect i.e. 50 rows of data build a json_encode(d) string and POST it again via cUrl.
This worked so far if I don't set the id - but I can't figure out how to set the id of the inserted documents.
Actually I try to send my documents in a format like this:
{"docs": {
"_id": {
"geodata_de_54476f7e6adc57.14196038": {
"table": "geodata_de",
"country": "DE",
"postal_code": "01945",
"place_name": "Lindenau",
"state_name": "Brandenburg",
"state_code": "BB",
"province_name": "",
"province_code": "00",
"community_name": "Oberspreewald-Lausitz",
"community_code": "12066",
"lat": "51.4",
"lng": "13.7333",
"Xco": "3861.1",
"Yco": "943.614",
"Zco": "4979.07"
}
}, ...
}
}
but this just inserts ONE document with the above object.
Maybe there is someone here who can point me the right direction.

I would use the Couchbase PHP SDK to insert these documents instead of using curl. http://docs.couchbase.com/developer/php-2.0/storing.html
Also for CB, you do not have to set the ID in the document itself. it depends. I might take a look at instead using the ID you have in your example ("geodata_de_54476f7e6adc57.14196038") and put it as the key for the object in Couchbase. Then you do not necessarily need the _id. The key in Couchbase can be up to 250 bytes of data and you can make it meaningful to your application so you can do lookup by key extremely fast.
Another option is, if you wrote your docs to the filesystem, you could also use cbdocloader utility which is specifically for bulk loading docs. If you are on linux it is in /opt/couchbase/bin/tools/cbdocloader.

Related

DocuSign API Replace template document but keep fields

I want to use the existing fields from a server template over another document.
At first I tried attaching the document at the same level as inline/server.
If I have the signer defined it gives me a 400 error, if I leave it off (did by accident) it completely wipes out the fields and shows the attached document.
Second I tried attaching the document to the inline template but that results in the attached document not appearing, it just operates like normal.
update
After adding additional debugging and research I now know that attaching it to the inline template was incorrect. After adding debug to read the 400 response I am getting this error:
"The DocumentId specified in the tab element does not refer to a document in this envelope. Tab refers to DocumentId 32475214 which is not present."
DocumentId is being set to 1 which is apparently wrong.
Which led me to this question on SO. In which a comment mentions that the ID kicked back from the 400 should be used.
After I hard coded this ID I see the replacement operation is a success!
However I now need to find a way find and to plug that value in programatically.
Detail
I am using the DocuSign php sdk to help me build the data structure and access the api.
Use the listTemplateDocuments API to retrieve the documentId for the template.
The documentId retrieved in the above step should be used in the CompositeTemplate of CreateEnvelope request
{
"emailSubject": "Tabs should remain from the Server Template",
"status": "sent",
"compositeTemplates": [
{
"document": {
"documentId": "<document Id>", //Use the documentId retrieved using the listTemplateDocuments api
"name": "Replaced Document",
"fileExtension": "txt",
"documentBase64": "RG9jIFRXTyBUV08gVFdP"
},
"serverTemplates": [
{
"sequence": "1",
"templateId": "<Server Template Id Here>"
}
]
}
]
}

Unable to delete unique document from Solr

I am trying to delete a document from Solr using deleteByQuery.
The issue I am getting is that whenever I am trying to delete the document uniquely using only id, it is working fine.
However, when I am trying to delete based on more than one attribute, it is deleting all the documents where it finds even one attribute.
For eg,
if I have two documents say :
{
"id": "232",
"Author": "DEFG",
"Name": "Alphabet",
"Description": "Franz jagt im komplett verwahrlosten Taxi quer durch Bayern",
"_version_": 1513077984291979300
},
{
"id": "231",
"Author": "ABCD",
"Name": "Alphabet",
"Description": "Franz jagt im komplett verwahrlosten Taxi quer durch Bayern",
"_version_": 1513077999721775000
}
and I want to delete the document where id is 231 and Author is "ABCD", I wrote this query to delete that particular document.
$id=231;
$author= "ABCD";
$client->deleteByQuery("id:$id, Author:$author");
$client->commit();
It is deleting both the documents with id 231 and id 232 rather than deleting only one.
Can anyone please resolve this issue or give me any solution so that I can achieve this?
Thanks.
The delete query uses the same syntax as a search query. So you can easily test that query and tune it until it works. In your case, I suspect just doing id:$id AND Author:$author should work.

Finding all pages containing images in Wikimedia Commons category via API

I'm currently trying to find all the pages where images/media from a particular category are being used on Wikimedia Commons.
Using the API, I can list all the images with no problem, but I'm struggling to make the query add in all the pages where the items are used.
Here is an example category with only two media images
https://commons.wikimedia.org/wiki/Category:Automobiles
Here is the API call I am using
https://commons.wikimedia.org/w/api.php?action=query&prop=images&format=json&generator=categorymembers&gcmtitle=Category%3AAutomobiles&gcmprop=title&gcmnamespace=6&gcmlimit=200&gcmsort=sortkey
The long term aim is to find all the pages the images from our collections appear on and then get all the tags from those pages about the images. We can then use this to enhance our archive of information about those images and hopefully used linked data to find relevant images we may not know about from DBpedia.
I might have to do two queries, first get the images then request info about each page, but I was hoping to do it all in one call.
Assuming that you don't need to recurse into subcategories, you can just use a prop=globalusage query with generator=categorymembers, e.g. like this:
https://commons.wikimedia.org/w/api.php?action=query&prop=globalusage&generator=categorymembers&gcmtitle=Category:Images_from_the_German_Federal_Archive&gcmtype=file&gcmlimit=200&continue=
The output, in JSON format, will looks something like this:
// ...snip...
"6197351": {
"pageid": 6197351,
"ns": 6,
"title": "File:-Bundesarchiv Bild 183-1987-1225-004, Schwerin, Thronsaal-demo.jpg",
"globalusage": [
{
"title": "Wikipedia:Fotowerkstatt/Archiv/2009/M\u00e4rz",
"wiki": "de.wikipedia.org",
"url": "https://de.wikipedia.org/wiki/Wikipedia:Fotowerkstatt/Archiv/2009/M%C3%A4rz"
}
]
},
"6428927": {
"pageid": 6428927,
"ns": 6,
"title": "File:-Fernsehstudio-Journalistengespraech-crop.jpg",
"globalusage": [
{
"title": "Kurt_von_Gleichen-Ru\u00dfwurm",
"wiki": "de.wikipedia.org",
"url": "https://de.wikipedia.org/wiki/Kurt_von_Gleichen-Ru%C3%9Fwurm"
},
{
"title": "Wikipedia:Fotowerkstatt/Archiv/2009/April",
"wiki": "de.wikipedia.org",
"url": "https://de.wikipedia.org/wiki/Wikipedia:Fotowerkstatt/Archiv/2009/April"
}
]
},
// ...snip...
Note that you will very likely have to deal with query continuations, since there may easily be more results than MediaWiki will return in a single request. See the linked page for more information on handling those (or just use an MW API client that handles them for you).
I don't understand your use case ("our collections"?) so I don't know why you want to use the API directly, but if you want to recurse in categories you're going to do a lot of wheel reinvention.
Most people use the tools made by Magnus Manske, creator of MediaWiki: in this case it's GLAMourous. Example with 3 levels of recursion (finds 186k images, 114k usages): https://tools.wmflabs.org/glamtools/glamorous.php?doit=1&category=Automobiles&use_globalusage=1&depth=3
Results can also be downloaded in XML format, so it's machine-readable.

REST api resource dependencies

I have this data management panel of IP addresses, which belong to organization and have users responsible for it.
Now I have the route /api/ip and /api/ip/{id} to get all or specific IP. The format of one resource is:
{
"ip": "200.0.0.0",
"mask": 32,
"broadcast": "200.0.0.1"
}
Now when I choose the IP, I want to show IP information, also the organization information it belongs to and the users, that are responsible for it, information in one page.
Is it good idea to return the following data format, while requiring /api/ip/{id}:
{
"ip": "200.0.0.0",
"mask": 32,
"broadcast": "200.0.0.1",
"organization": { /* organization data */ },
"users": { /* users information */ }
}
This way I get all the information I need in one request, but is it still RESTful API?
Or should I make 2 more api routes like /api/ip/{id}/organization and /api/ip/{id}/users
and get all the data I need in 3 separate requests?
If not, what would be the appropriate way of doing this?
I would do the last one, using Hateoas, which allows you to link between the resources. There is a really great bundle for that called the BazingaHateoasBundle. The result will then be something like:
/api/ip/127.0.0.1
{
"ip": "200.0.0.0",
"mask": 32,
"broadcast": "200.0.0.1",
"_links": {
"organization": "/api/ip/127.0.0.1/organization",
"users": "/api/ip/127.0.0.1/users"
}
}
It is perfectly okay to have nested resources. You can expand them the way you showed, or you can collapse them by adding links (with the proper link relation or RDF metadata). I suggest you to use a standard or at least documented hypermedia type, e.g. JSON-LD + Hydra, or HAL+JSON.

Single MySQL table into multidimensional JSON with PHP

I have a table in my database called timeline_entries. This table contains the following fields:
id, headline, text, startDate, type, media, caption, credit. The id field is used for referencing individual entries through a CMS.
I have worked out how to export the data as JSON and save to a file, but I'm struggling to find a way to format it into the following structure;
{
"timeline":
{
"headline":"value",
"type":"default",
"startDate":"value",
"text":"value",
"asset":
{
"media":"value",
"credit":"value",
"caption":"value"
},
"date": [
{
"startDate":"value",
"type":"",
"headline":"value",
"text":"value",
"asset":
{
"media":"value",
"credit":"value",
"caption":"value"
}
},
{
"startDate":"value",
"type":"",
"headline":"value",
"text":"value",
"asset":
{
"media":"value",
"credit":"value",
"caption":"value"
}
},
{
"startDate":"value",
"type":"",
"headline":"value",
"text":"value",
"asset":
{
"media":"value",
"credit":"value",
"caption":"value"
}
},
]
}
}
(Please ignore the shoddy indenting, I'm still getting used to this!)
I've had to replace the actual data with 'value' as some of the data is quite long.
As you can see, the first set of data needs to be formatted slightly different to the rest, the remainder of the sets placed within "date" and then the media, caption and credit fields need to be structured as a subset of "asset".
There will be more rows of data than just four or so, so I can't hardcode anything.
Can anyone help me format it? If it's possible, I'd like to keep the database side as simple as possible, but it can be changed if I have to. Perhaps I'm going about this completely wrong? Any help would be much appreciated.
Thank you.
Formatting of JSON isn't at all important other than for aesthetics, have you thought of just using the php json_encode method and have done with it?

Categories