Thursday, August 31, 2017
Setting up a CICD Pipeline for Containers on AWS
Q+A: Understanding Optimistic Concurrency in Databases
I have struggled in this area, but have found no consolidated information in a single place. So I've tried to summarize everything here! I've done it in a Q&A format for best understanding.
Getting StartedACID helps make concepts clear through question and answers. In this example, we will assume that lock-based RDBMS is being used. I will only talk about "read committed" and "read uncommitted" isolation levels because "read committed" is the most preferred isolation level set in the database. I mention threads in the context of Java.
How to Implement a SQL Formatting Standard Using SQL Prompt V8
With ten programmers working on the same project, how do you agree on a standard style for formatting SQL code, and then implement it consistently? One way is through Draconian rules, meeting after meeting, and the occasional sacrifice of a programmer to put the fear in those who remain.
If that doesn’t sound like fun, here’s my suggested alternative:
Unit Testing Multi-Tenant Database Providers
My saga on supporting multiple tenants in ASP.NET Core web applications has come to the point where tenants can use separate databases. It's time now to write some tests for data context to make sure it behaves correctly in unexpected situations. This post covers unit tests for data context and multi-tenancy.
My Previous Work on Multi-TenancyTo get a better understanding of my previous work on multi-tenancy in ASP.NET Core and Entity Framework Core 2.0, please go through the following posts, as the current one builds on these:
SQL Server Profiler: How It Works, Best Practices, and Tutorials
If you work with SQL, you understand the importance of being able to monitor your queries, not to mention how critical it is to have an accurate analysis of how long queries take. Today, we'll take a look at SQL Server Profiler, a tool for doing just that, how it works, some shortcomings (most notably, depreciation of its features), and alternatives.
What Is SQL Server Profiler?It's a tool for tracing, recreating, and troubleshooting problems in MS SQL Server, Microsoft's relational database management system (RDBMS). The profiler lets developers and database administrators (DBAs) create and handle traces and replay and analyze trace results. In a nutshell, it's like a dashboard that shows the health of an instance of MS SQL Server.
Wednesday, August 30, 2017
Database Fundamentals #8: All About Data Types
SQL Server provides all different kinds of data types in order to give you more flexibility and control over how you store your data. You define these data types as you define the columns in your tables. If you wanted to store information about the date and time that a purchase was made online, you’re going to store the values in a column (or columns) that define dates and times in order to ensure accuracy in your data.
Choosing a Data TypeYou could make the data type into one that stores just about anything you give it, such as one of the character types like char or varchar. But do this and you’re going to run into issues when you start running queries against that data. For example, your business may ask for the amount of time between purchases. Since you chose to store the date and time as character values, you won’t be able to use some of the functions provided by the date and time data types. Instead, you’ll have to either convert your data into date and time data types or you’ll have to write your own functions. But, if you used the datetime data types, you get all that functionality and more, such as date/time formatting for different countries and validation that the date and time you’re entering are valid dates and times, as well as Universal Time Code (UTC) values and offsets. The same thing is true of all the various data types. You don’t have to use a specific data type, but you’re sacrificing functionality if you don’t try to specify a data type appropriate to the business need.
OrientDB Intro and HTTP REST API Tutorial
Data access is a foundational consideration of any software application.
Should your data be stored locally or in the cloud? Is it currently organized and logical, or is it a mess of database tables that require several requests to construct a meaningful interface?
Learning MySQL 5.7: Q+A
In this post, I’ll answer questions I received in one of my recent webinars, Learning MySQL 5.7!
First, thank you all who attended the webinar. The link to the slides and the webinar recording can be found here.
5 Sharding Data Models and Which Is Right
When it comes to scaling your database, there are challenges — but the good news is that you have options. The easiest option, of course, is to scale up your hardware. And when you hit the ceiling on scaling up, you have a few more choices: sharding, deleting swaths of data that you think you might not need in the future, or trying to shrink the problem with microservices.
Deleting portions of your data is simple — if you can afford to do it. Regarding sharding, there are a number of approaches, and which one is the right one depends on a number of factors. Here, we'll review a survey of five sharding approaches and dig into what factors guide you to each approach.
Tuesday, August 29, 2017
Couchbase and Azure: Getting Started for Free
Azure is where Microsoft is spending a lot of its efforts lately. Microsoft is dedicated to making Azure a success. As someone who started working with Azure a little in the early days, I can say that it’s come a long way, and offers a remarkable set of services at good prices.
But not everyone is on board with Azure or even with cloud computing yet. If you haven’t yet dipped your toe into the Azure pool, but you are curious, this blog post is for you.
AWS Knowledge Center Videos: How do I access member accounts created using AWS Organizations?
Why You Should Never Put Objects Into the SYSTEM or SYSAUX Tablespace
I still see some people putting objects, i.e. tables, in the SYSTEM or SYSAUX tablespace. Sometimes, it’s done deliberately; but sometimes, it happens automatically by creating a table in the SYS schema. Well, let me tell you — this is a really bad idea. You should never, ever, ever put any kind of user object into those tablespaces. Even the Oracle Database Documentation warns you of doing so:
7.3.1 SYS and SYSTEM UsersThe SYS and SYSTEM administrative user accounts are automatically created when you install Oracle Database. They are both created with the password that you supplied upon installation, and they are both automatically granted the DBA role.
Quicker Insight Into Apache Solr and Collection Health
Successful cluster administration can be very difficult without a real-time view of the state of the cluster. Solr itself does not provide aggregated views about its state or any historical usage data, which is necessary to understand how the service is used and how it is performing. Knowing the throughput and capacities not only helps detect errors and troubleshoot issues but is also useful for capacity planning.
Questions may arise, such as:
Blockchain Meets Database: Replace or Combine?
When adopting blockchain into your organization, it's not necessary to replace your existing databases and associated processes. Instead, you can integrate and improve.
Although blockchain technology offers promising advances to DevOps and the digitization of disintermediation and consensus-based dissemination of information, the proponents of pilot systems all too often suggest that existing processes and legacy databases should be discarded in favor of the new.
Monday, August 28, 2017
How can I set up HTTP to HTTPS redirections on ELB using Apache backend servers?
Work Order Management With Neo4j
I look terrible in a bikini (take my word for it), but I'd love me a Lamborghini. However, in order to afford nice things, we need to do as the song says and get to work... and we need to manage and prioritize that work somehow. Today, I'm going to show you how to build part of a work order management system with Neo4j.
I'm going to build an evented work order model. Let's say our order gets created, then based on what it is, pieces of work need to happen. This work is performed by some provider (whether internal or external) and that work can be broken down into tasks that have dependencies on events that have occurred. How would this look in the graph? Glad you asked:
CURL Comes to N1QL: Querying External JSON Data
N1QL has many functions that allow you to perform a specific operation. One such function that has been added into the new Couchbase 5.0 DP is CURL.
CURL allows you to use N1QL to interact with external JSON endpoints; namely, Rest API’s that return results and data in JSON format. Interaction primarily consists of data transfer to and from a server using the http and https protocols. In short, the CURL function in N1QL provides you, the user, a subset of standard curl functionality (https://curl.haxx.se/docs/manpage.html) within a query language.
Pronto Move Shard: A Flash Game Created With a Database
In July, Adobe announced that they plan the end-of-life for Flash to be around 2020. As HTML5 progressed — and due to a long history of critical security vulnerabilities — this is, technologically speaking, certainly the right decision. However, I also became a bit sad.
Flash was the first technology that brought interactivity to the web. We tend to forget how static the web was in the early 2000s. Flash brought life to the web — and there were plenty of stupid trash games and animations that I really enjoyed at the time. As a homage to the age of trashy Flash games, I created a game that resembles the games of this era:
How to Implement a GraphQL API in Rails
GraphQL came out of Facebook a number of years ago as a way to solve a few different issues that typical RESTful APIs are prone to. One of those was the issue of under- or over-fetching data.
Under-fetching is when the client has to make multiple round trips to the server just to satisfy the data needs they have. For example, the first request is to get a book, and a follow-up request is to get the reviews for that book. Two round trips are costly, especially when dealing with mobile devices on suspect networks.
Sunday, August 27, 2017
Building a Boolean Logic Rules Engine in Neo4j
A boolean logic rules engine was the first project I did for Neo4j before I joined the company some five years ago. I was working for some start-up at the time but took a week off to play consultant. I had never built a rules engine before, but as far as I know, ignorance has never stopped anyone from trying. Neo4j shipped me to the client site and put me in a room with a projector and a white board where I live coded with an audience of developers staring at me, analyzing every keystroke and cringing at every typo and failed unit test. I forgot what sleep was, but I managed to figure it out... and I lost all sense of fear after that experience.
The data model chained together fact nodes with crisscrossing relationships, each chain containing the same path ID property that we followed until reaching an end node which triggered a rule. There were a few complications along the way and more complexity near the end for ordering and partial matches. The traversal ended up being some 40 lines of the craziest Gremlin code I ever wrote, but it worked. After the proof of concept, the project was rewritten using the Neo4j Java API because at the time only a handful of people could look at a 40-line Gremlin script and not shudder in horror. I think we're up to two handfuls now.
jOOQ 3.10 Supports JPA AttributeConverter
One of the cooler hidden features in jOOQ is JPADatabase, which allows for reverse engineering a pre-existing set of JPA-annotated entities to generate jOOQ code.
For instance, you could write these entities here:
Saturday, August 26, 2017
Throttling Database Using Rate Limits for SQL or REST
When you are planning to expose your database to new users or tenants, one of the important areas to consider is resource governance. When in production, there's always a high probability that you will see complex live queries for data visualization or MapReduce jobs impacting your analytical database, which can impact other users. Then, you start to scale, as with any web application, by running a load balancer in front of your servers to distribute requests efficiently. But often, in a production environment, you come across a bad user that affects your quality of service (QoS). To give you an idea on how a bad user can affect your service, here are a couple of abusive scenarios:
A naïve developer who keeps hogging all the resources due to an inefficiently written client request. A low-priority user who keeps hogging the resources, causing service outages for high-priority users. A malicious user who keeps attacking your API endpoints to cause DDoS for all other users.It is not pragmatic to scale your system to accommodate genuine requests whenever there is a drop in QoS due to such abusive behavior. To deal with this, rate limiting is one technique that can be employed. Essentially, rate limiting defines a number of requests or the amount of data that you can request with in an interval of time. This is an effective technique that can mitigate the abusive scenarios discussed above, and you can find rate limits for almost all the SQL and REST APIs that you would want to interact with.
Migrating Data From an Encrypted Amazon MySQL RDS Instance to an Encrypted Amazon Aurora Instance
In this blog post, we'll discuss migrating data from encrypted Amazon MySQL RDS to encrypted Amazon Aurora.
One of my customers wanted to migrate from an encrypted MySQL RDS instance to an encrypted Aurora instance. They have a pretty large database; therefore, using mysqldump or a similar tool was not suitable for them. They also wanted to setup replication between old MySQL RDS and new Aurora instances.
Friday, August 25, 2017
Aggregate Grouping With N1QL or With MapReduce
"Aggregate grouping" is what I’m putting in the title of this blog post, but I don’t know if that’s the best name. Have you ever used MySQL’s GROUP_CONCAT function or the FOR XML PATH('') workaround in SQL Server? That’s basically what I’m writing about today. With Couchbase Server, the easiest way to do it is with N1QL’s ARRAY_AGG function, but you can also do it with an old school MapReduce View.
I’m writing this post because one of our solution engineers was working on this problem for a customer (who will go unnamed). Neither of us could find a blog post like this with the answer, so after we worked together to come up with a solution, I decided I would blog about it for my future self (which is pretty much the main reason I blog anything, really; the other reason is to find out if anyone else knows a better way).
The MySQL High Availability Landscape in 2017
In the previous post of this series, we looked at the MySQL high availability (HA) solutions that have been around for a long time. I called these solutions "the elders." Some of these solutions (like replication) are heavily used today and have been improved from release to release of MySQL.
This post focuses on the MySQL high availability solutions that have appeared over the last five years and gained a fair amount of traction in the community. I chose to include this group only two solutions: Galera and RDS Aurora. I'll use the term "Galera" generically — it covers Galera Cluster, MariaDB Cluster, and Percona XtraDB Cluster. I debated for some time whether or not to include Aurora. I don't like the fact that they use closed-source code. Given the tight integration with the AWS environment, what is the commercial risk of opening the source code? That question evades me, but I am not on the business side of technology.
Functional RESTful Backends for DataTables
Functional programming presents itself as a great building block for composing RESTful backends that handle errors and behaviors elegantly.
With my previous project, we wanted to give to our development teams the ability to start in functional programming and compose reusable backends for UIs that use the DataTables Plugin for JQuery. DataTables, in general, are an easy way to aggregate multiple functionalities into a single UI. Using one component, we can have search and pagination together. Once we have all these functionalities associated, we need a good programming paradigm for composing controllers and services as well as separating concerns.
How to Create an Oracle Database Docker Image
Oracle has released Docker build files for the Oracle Database on GitHub. With those build files, you can go ahead and build your own Docker image for the Oracle Database. If you don’t know what Docker is, you should go and check it out. It’s a cool technology based on the Linux containers technology that allows you to containerize your application — whatever that application may be. Naturally, it didn’t take long for people to start looking at containerizing databases, as well, which makes a lot of sense — especially for, but not only, development and test environments. Here is a detailed blog post on how to containerize your Oracle Database by using those build files that Oracle has provided.
You will need:
Thursday, August 24, 2017
Reusing Open Connections When Testing Your Database
When testing how well your database queries are optimized, opening up too many connections to the database might create overhead and cause performance degradation. To be able to isolate database query testing, Apache JMeter™ provides flexibility, allowing you to choose if you want to run many queries using one connection or to establish many connections but to run queries less extensively.
In this blog post, we will show you how to run MySQL database queries both with one connection and with multiple connections. This is done through JMeter's JDBC elements and Thread Groups. As soon as you get the idea of how it works, you will be able to apply a more accurate load to your database, simulate all possible test scenarios, and make your database application layer rock solid!
Finding All Palindromes Contained in Strings With SQL
SQL is a really cool language. I can write really complex business with this logic programming language. I was again thrilled about SQL recently, at a customer site:
But whenever I tweet something like the above, the inevitable happened. I was nerd sniped. Oleg Å elajev from ZeroTurnaround challenged me to prove why SQL is so awesome:
Stubbing Key-Value Stores
Every project that has a database has a dilemma when it comes to how to test database-dependent code. There are several options (not mutually exclusive):
Use mocks. Only use unit tests and mock the data-access layer, assuming the DAO-to-database communication works. Use an embedded database that each test starts and shuts down. This can also be viewed as unit testing. Use a real database deployed somewhere (either locally or on a test environment). The hard part is making sure it's always in a clean state. Use end-to-end/functional tests/BDD/UI tests after deploying the application on a test server (which has a proper database).None of the above is without problems. Unit tests with mocked DAOs can't really test more complex interactions that rely on a database state. Embedded databases are not always available (for example, if you are using a non-relational database or if you rely on RDBMS-specific functionality, HSQLDB won't do), or they can be slow to start. This means your tests may take too long in supporting. A real database installation complicates setup, and keeping it clean is not always easy. The coverage of end-to-end tests can't be easily measured and they don't necessarily cover all the edge cases, as they are harder to maintain than unit and integration tests.
Runtime Metrics in Execution Plans
Capturing query execution metrics is much easier now that you can see the runtime metrics in execution plans when you’re using SQL Server 2016 SP1 or better in combination with SQL Server Management Studio 2017. When you capture an actual plan using any method, you get the query execution time on the server as well as wait statistics and I/O for the query. This fundamentally changes how we can go about query tuning.
Runtime MetricsTo see these runtime metrics in action, let’s start with a query:
Wednesday, August 23, 2017
From Excel Hell to Cloud Database Heaven
Most well-known database technologies have some or all of the following features:
Data quality and consistency: A data schema with a detailed description of all data resources and properties. Automatic data validation according to the data schema. Row/document locking to prevent data collision. Access control: Define access roles to allow/prevent read, write, or delete on resources. Allow users to have private data views of shared resources. Data relations Query language API: A REST API for platform-agnostic data access and integration. A platform-specific SDK.The rest of this blog post is a step-by-step tutorial on how you can migrate from spreadsheets to a fast and consistent NoSQL cloud database using RestDB.io.
Bringing DevOps Practices to Database Administrators [Audio]
I spoke with Robert Reeves, CTO and co-founder of Datical, a company that aims to remove the pain that database administrators (DBAs) experience on a regular basis attempting to make changes to database heavy applications.
They do this by bringing easy-to-use tools inspired by practices from the DevOps world to the traditionally messy world of enterprise databases with change simulation, rollbacks, rules engines, packaging databases as code, and strong monitoring.
Brad Anderson's Lunch Break / s6 e8 / Mary Cecola, CIO, Antares Capital LP (Part 2)
What’s Next After Dynamo and Cassandra?
Avinash Lakshman, CEO of Hedvig and developer of Dynamo and Cassandra, shares his thoughts on the current and future state of databases.
How are you and Hedvig involved in databases?
How to Use SQL Syntax to Access MongoDB
tangyuan-mongo is the Mongo service component in the tangyuan framework. The tangyuan-mongo component encapsulates a series of Mongo operations into Tangyuan's services and provides a unified way to access it. It also provides access to Mongo in SQL syntax.
The source code can be found here, and the official website is here.
Tuesday, August 22, 2017
SQL Solution to Elasticsearch
Developers today have a lot of storage options for building strategic new apps, including JSON databases like Elasticsearch. These new technologies allow development teams to more easily and efficiently iterate on features.
Using agile methodologies, teams work in sprints that last just a few weeks, getting new features to market fast. Compared to relational databases, Elasticsearch is far less demanding in terms of modeling and structuring data, and this is a big advantage in terms of development speed.
Diving into Couchbase Index Replicas
With Couchbase Server 4.x, customers used to create Equivalent Indexes to satisfy the twin requirements of keeping the indexes highly available and to load balance the N1QL queries. What this meant was that the exact same index definition was used to create indexes with different names. What’s in a name you might ask… a rose is a rose is a rose :)
create index index1 on bucket(field1);
Financial Services and Neo4j: 360-Degree View of Customer Experience
Customer expectations are rising at a time when customer service is a significant differentiator within the financial services industry.
Customers expect companies to deliver personalized service – i.e., an end-to-end customer experience — that reflects an understanding of who they are, their communication preferences, the products and services they’ve purchased in the past, and what they might be interested in in the future.
Meet the New DBA, Different From the Old
There’s a rapid shift taking place in today's technology organizations. The role of the DBA is being redefined and increasingly replaced by other roles and specialties. This is happening even as data explodes — in fact, it's happening precisely because data is exploding. It's a trend that is accelerating, and it sometimes takes people by surprise.
In general, the DBA role is shifting from the lower layers of the technology stack up into the higher layers, where its concerns overlap more and more with technical operations and even development. Let’s consider why this is happening and what it means for the future of data management and data operations.
Monday, August 21, 2017
How do I increase bandwidth for active traffic on AWS Direct Connect using a link aggregation group?
An Introduction to TensorFlow
In this post, we are going to see some TensorFlow examples, define tensors, perform math operations using tensors, and see other machine learning examples.
What Is TensorFlow?TensorFlow is a library that was developed by Google for solving complicated mathematical problems, which takes a lot of time.
Async and Await: An Explanation
For a while now, the Async and Await commands in C# have confused me.
Like most things, the best way to learn about something is to use it in a real-world example. I am currently adding an email alert feature to a website. This is an ideal example of something that would benefit from Asynchronous programming. There is no need for the webpage to wait to send thousands of emails; let's just send a call to get started and allow the browser to carry on as normal.
Sunday, August 20, 2017
How the ClustrixDB Query Evaluation Model Works
Recently, we’ve started to dig into the internals of ClustrixDB, specifically how ClustrixDB accomplishes horizontal scaling of both writes and reads without sharding. Next, we dug into the details of the multi-patented ClustrixDB Rebalancer. This time, we will discuss the ClustrixDB Query Evaluation Model.
ClustrixDB is a MySQL-compatible distributed RDBMS that provides linear scale out of both writes and reads, while maintaining relational semantics, including ACID transactionality and referential integrity. Typically, MySQL workloads are only able to scale out both writes and reads if sharding is used. Sharding is the strategy of partitioning your MySQL application workload across multiple separate MySQL database servers, allowing queries and data CRUD operations to fan-out. This means multiple separate MySQL physical servers must be deployed, the workload data needs to be partitioned across them, and the application needs to be rewritten to manage any ACID transactionality needed between those servers. ClustrixDB is able to provide a similar linear scale out of sharding, but the data distribution is automatically handled via the multi-patented ClustrixDB Rebalancer behind the scenes. The application doesn’t require rewrites and sees only a single logical RDBMS while all cross-node ACID transactionality is handled automatically.
Saturday, August 19, 2017
Why Do We Need Blockchain?
Many times in our business conversations, I have come across this question:
“Why do we need to implement this business functionality using Blockchain? Why can’t we implement this using a database and a web application?”Anyone who's eager to jump onto the blockchain bandwagon has probably asked the same question. In this post, we take a look at an example implemented using traditional systems leveraging a database plus an application and how the same use case implemented with blockchain changes the equation.
Friday, August 18, 2017
Three DBAs Walk Into a NoSQL Bar... [Comic]
The Top Resources for Understanding Graph Theory and Algorithms
Recently, we announced the availability of some super efficient graph algorithms for Neo4j. In case you missed the announcement, we now have an easy-to-use library of graph algorithms that are tuned to make full use of compute resources.
As part of assisting with this ongoing project, I needed to come up to speed as well as compile a list of graph algorithm and graph theory resources. Although this seemed like a short task, my list grew and continues to grow.
Enabling GTIDs for Server Replication in MariaDB Server 10.2
I originally wrote this post in 2014 after the release of MariaDB Server 10.0. Most of what was in that original post still applies, but I've made some tweaks and updates since replication and high availability (HA) remain among the most popular MariaDB/MySQL features.
Replication first appeared on the MySQL scene more than a decade ago, and as replication implementations became more complex over time, some limitations of MySQL’s original replication mechanisms started to surface. To address those limitations, MySQL v5.6 introduced the concept of global transaction identifiers (GTIDs), which enable some advanced replication features. MySQL DBAs were happy with this but complained that in order to implement GTIDs, you needed to stop all the servers in the replication group and restart them with the feature enabled. There are workarounds; for instance, Booking.com documented a procedure to enable GTIDs with little or no downtime, but it involves more complexity than most organizations are willing to allow. (Check out this blog post for more on how Booking.com handles replication and high availability.)
Thursday, August 17, 2017
This Week in Neo4j: Fake News, Threat Hunting, and Triplets
Welcome to this week in Neo4j where we round up what’s been happening in the world of graph databases in the last 7 days.
Featured Community Member: Eve FreemanThis week’s featured community member is Eve Freeman, Applications Development Analyst IV at Fannie Mae.
Technology and Friends Episode 496: Oren Eini on RavenDB [Video]
Last week, in That Conference (which was great) I had the chance to do an interview with David Giard.
You can go to the interview directly, or watch it here:
Sensitive Data Masking With MariaDB MaxScale
Protecting personal and sensitive data and complying with security and privacy regulations is a high priority for organizations. This includes personally identifiable information (PII), protected health information (PHI), payment card information (subject to PCI-DSS regulation), and intellectual property (subject to ITAR and EAR regulations). In many cases, if not most, it needs to be redacted or masked when accessed (internally and/or externally).
Data redaction obfuscates all or part of the data, reducing unnecessary exposure of sensitive data while at the same time maintaining its usability. Various terms such as data masking, data obfuscation, and data anonymization are used to describe this functionality in databases. Data redaction allows an organization to:
Wednesday, August 16, 2017
Extending the Power of MariaDB ColumnStore With User-Defined Functions
MariaDB ColumnStore 1.0 supports User-Defined Functions (UDF) for query extensibility. This allows you to create custom filters and transformations to suit any need. This blog outlines adding support for distributed JSON query filtering.
An important MariaDB ColumnStore concept to grasp is that there are distributed and non-distributed functions. Distributed functions are executed at the PM nodes supporting query execution scale out. Nondistributed functions are MariaDB Server functions that are executed within the UM node. As a result, MariaDB ColumnStore requires two distinct implementations of any function.
"How can I import my virtual machine into an Amazon Machine Image by using the AWS CLI?"
Bad Parameter Sniffing Decision Flow Chart [Infographic]
Lots of people are confused with how to deal with bad parameter sniffing when it occurs. In an effort to help with this, I’m going to try to make a decision flow chart to walk you through the process. This is a rough — quite rough — first draft.
I would love to hear any input. For this draft, I won’t address the things I think I’ve left out. I want to see what you think of the decision flow and what you think might need to be included.
To DBaaS or Not to DBaaS?
According to a new forecast from the International Data Corporation (IDC), total spending on IT infrastructure products (server, enterprise storage, and Ethernet switches) for deployment in cloud environments will increase 15.3% year-over-year in 2017 to $41.7 billion.
Gartner Inc., a leading research and advisory company, predicts that the public cloud services market will grow 18% in 2017 to $246.8B, while it was $209.2B in 2016. In the cloud world, Infrastructure as a Service (IaaS) is predicted to have highest growth rate from 36.8% in 2017, making a total of $34.6 billion. Cloud application services (Software as a Service, or SaaS) are predicted to grow 20.1% to reach $46.3 billion.
Building a Full-Text Search Test Framework
This article will give you a quick glimpse into a test framework built to validate Couchbase’s new full-text search feature. The idea described here can be extended to test any text search engine in general.
Couchbase Full-Text SearchSearching unstructured schema-less JSON documents in Couchbase is now easy thanks to the full-text capability it offers. What this means is that Couchbase users can now search for phrases, words, and date/numeric-ranges inside JSON documents. These searches are essentially “queries” on full-text indexes. Couchbase full-text search is RESTful and distributed and is driven by Bleve, an indexing and search library written in Go. For more about full-text search, refer to the recommended reading section.
Tuesday, August 15, 2017
Neo4j and Cypher: Rounding Floating Point Numbers/BigDecimals
I was doing some data cleaning a few days ago and wanting to multiply a value by one million. My Cypher code to do so looked like this:
with "8.37" as rawNumeric RETURN toFloat(rawNumeric) * 1000000 AS numeric ╒═════════════════╕ │"numeric" │ ╞═════════════════╡ │8369999.999999999│ └─────────────────┘Unfortunately, that suffers from the classic rounding error when working with floating point numbers. I couldn’t figure out a way to solve it using pure Cypher, but there tends to be an APOC function to solve every problem... and this was no exception.
MongoDB: Evaluate Query Performance Using Indexes
This blog shows commands that you can use to manage MongoDB indexes on a particular collection, as well as tips on how to evaluate query performance with or without indexes.
Why Create Indexes?Indexes can significantly improve read query performance for MongoDB collections. In the absence of indexes, when searching for documents based on filter criteria, MongoDB performs a collection scan in which it scans every document and returns the documents matching the filter criteria. This is not a very efficient way of searching the document. For example, if one or more fields are frequently used for filtering out the document, it is recommended to create indexes on those fields. MongoDB thus limits the number of documents that are scanned when indexes are present. When there are fewer documents being scanned, there is faster query execution time.
Maintaining Transaction Boundary Integrity in a Distributed Cluster
We pretty much treat RavenDB’s transactional nature as a baseline — same as the safe assumption that any employee we hire will have a pulse. (Sorry, we discriminate against Zombies and Vampires because they create a hostile work environment. See here for details.)
OK, now back to transactions, and why I’m bringing up a basic requirement like that. Consider a case when you need to pay someone. That operation is composed of two distinct operations. First, the bank debits your account and then the bank credits the other account. You generally want these to happen as a transactional unit — either both of them happened or neither of them did. In practice, that isn’t how banks work at all, but that is the simplest way to explain transactions, so we’ll go with that.
How to Order Streamed DataFrames
A few days ago, I had to perform aggregation on a streaming DataFrame. And the moment I applied groupBy for aggregation, the data got shuffled. Now, a new situation arises regarding how to maintain order.
Yes, I can use orderBy with a streaming DataFrame using Spark structured streaming, but only in complete mode. There is no way of doing the ordering of streaming data in append mode nor in update mode.
Monday, August 14, 2017
Finding Triples With Neo4j
A user had an interesting Neo4j question on Stack Overflow the other day:
I have two types of nodes in my graph. One type is Testplan and the other is Tag. Testplans are tagged to Tags. I want most common pairs of Tags that share the same Testplans with a Tag having a specific name. I have been able to achieve the most common Tags sharing the same Testplan with one Tag, but getting confused when trying to do it for pairs of Tags.Their Cypher query looked like this:
Syncing Databases Properly to Work With eCommerce Business Applications
When it comes to planning database backends for eCommerce sites and applications, one may come across many technical terms like:
Simple MySQL PostgreSQL-powered Cloud Redundant database Multi-zone NoSQL backendThese all are standard terms that describe databases of different eCommerce systems, but what do they mean and how do they work? What is the purpose of a database? Is it possible to run an eCommerce store without a database?
2 Approaches to Scalable Database Design
Any form of application used for data analysis is stringently dependent on its ability to retrieve queries fast. However, when working with larger or more complex datasets, as well as an increasing amount of concurrent users, the performance depends largely on the underlying analytical database — whether this is built into the application as part of a single-stack tool or implemented via a separate data warehouse layer.
What Makes a Scalable Database?Database scalability is a concept in database design that emphasizes the capability of a database to handle growth in the amount of data and users. In the modern applications sphere, two types of workloads have emerged: analytical and transactional workloads. Planning for workload growth must take into account operating system, database design, and hardware design decisions.
Cypher: Write Fast and Furious
Editor’s Note: This presentation was given by Christophe Willemsen at GraphConnect San Francisco in October 2016.
Presentation SummaryIn this presentation, Christophe Willemsen covers a variety of do-and-don’t tips to help your Cypher queries run faster than ever in Neo4j.
Sunday, August 13, 2017
Syncing Databases Properly to Work With eCommerce Business Applications
When it comes to planning database backends for eCommerce sites and applications, one may come across many technical terms like:
Simple MySQL PostgreSQL-powered Cloud Redundant database Multi-zone NoSQL backendThese all are standard terms that describe databases of different eCommerce systems, but what do they mean and how do they work? What is the purpose of a database? Is it possible to run an eCommerce store without a database?
Saturday, August 12, 2017
Azure Functions With Couchbase Server
Azure Functions are Microsoft’s answer to Amazon’s Lambdas or Google’s Cloud Functions (AKA “serverless” architecture). They give you a way to deploy small pieces of code and let Azure handle the underlying server. I’ve never used them before, so I thought I would give them a try beyond “Hello, World” by getting them to work with Couchbase Server.
There are more options in Azure Functions beyond simple HTTP events (for example, blob triggers, GitHub webhooks, Azure Storage queue triggers, etc.). But, for this blog post, I’m going to focus on just HTTP events. I’ll create simple GET and SET endpoints that interact with Couchbase Server.
Friday, August 11, 2017
Faster PostgreSQL Counting
Everybody counts — but not always quickly. This article takes a close look into how PostgreSQL optimizes counting. If you know the tricks, there are ways to count rows orders of magnitude faster than you do already.
The problem is actually under-described — there are several variations of counting, each with their own methods. First, think about whether you need an exact count or if an estimate suffices. Next, are you counting duplicates or just distinct values? Finally, do you want a lump count of an entire table or will you want to count only those rows matching extra criteria?
Scalable MySQL Cluster With ProxySQL and Orchestrator
MySQL is one of the most popular open-source relational databases, used by lots of projects around the world — including incredibly large-scale ones like Facebook, Twitter, and YouTube. Obviously, such projects need a truly reliable and highly available data storing system to ensure the appropriate level of a service quality. And the very first and the main way to get the most efficiency from your data storage is setting up database clustering so that it could process a big number of requests simultaneously and remain workable in conditions of increased load. However, configuring such solution from the scratch can appear to be a rather complicated task.
Thus, the Jelastic team has prepared a one-click installation package for you: a Scalable MySQL Cluster with out-of-box master-slave replication, event request distribution, and node auto-discovery. It is intended to instantly deploy a pair of interconnected MySQL containers, which handle asynchronous data replication and are automatically reconfigured upon cluster scaling (i.e. changing the number of nodes). In addition, this solution is supplied with a ProxySQL load balancer in front of the database nodes set and embedded Orchestrator for its convenient management via GUI.
Execute an Oracle Stored Procedure With Nested Table as a Parameter
The objective of this tutorial is to demonstrate the steps required to execute an Oracle stored procedure with a nested table as one of the parameters from a Mule flow.
To demonstrate the application, I will be using a simple use case of inserting employee records by calling an Oracle stored procedure with a nested table as one of the parameters. Each employee record has two columns: the employee’s department and the nested table of a data structure with employee name and employee number as attributes.
Fun With SQL: Functions in Postgres
DZone Database Zone Fun With SQL: Functions in Postgres In our previous Fun with SQL post on the Citus Data blog, we covered w...
-
DZone Database Zone Monitoring OpenWRT With Telegraf What's the most popular open-source router software in the world? OpenWRT...
-
DZone Database Zone Tarantool Queues (Part 3): The Art of Queue Parsing In our previous article , we used the tarantool-authman mo...
-
DZone Database Zone How to Use SQL Complete for T-SQL Code I was recently working on a project with several stored procedures, fun...