Thursday, August 31, 2017

Setting up a CICD Pipeline for Containers on AWS

Data Videos
#Data -Setting up a CICD Pipeline for Containers on AWS

Video Course: See IoT in Action with Microsoft

Data Videos
#Data -Video Course: See IoT in Action with Microsoft

Q+A: Understanding Optimistic Concurrency in Databases

DZone Database Zone
Q+A: Understanding Optimistic Concurrency in Databases

I have struggled in this area, but have found no consolidated information in a single place. So I've tried to summarize everything here! I've done it in a Q&A format for best understanding.

Getting Started

ACID helps make concepts clear through question and answers. In this example, we will assume that lock-based RDBMS is being used. I will only talk about "read committed" and "read uncommitted" isolation levels because "read committed" is the most preferred isolation level set in the database. I mention threads in the context of Java.

Languages and timelines in Power BI and previews

Data Videos
#Data -Languages and timelines in Power BI and previews

How to Implement a SQL Formatting Standard Using SQL Prompt V8

DZone Database Zone
How to Implement a SQL Formatting Standard Using SQL Prompt V8

With ten programmers working on the same project, how do you agree on a standard style for formatting SQL code, and then implement it consistently? One way is through Draconian rules, meeting after meeting, and the occasional sacrifice of a programmer to put the fear in those who remain.

If that doesn’t sound like fun, here’s my suggested alternative:

Unit Testing Multi-Tenant Database Providers

DZone Database Zone
Unit Testing Multi-Tenant Database Providers

My saga on supporting multiple tenants in ASP.NET Core web applications has come to the point where tenants can use separate databases. It's time now to write some tests for data context to make sure it behaves correctly in unexpected situations. This post covers unit tests for data context and multi-tenancy.

My Previous Work on Multi-Tenancy

To get a better understanding of my previous work on multi-tenancy in ASP.NET Core and Entity Framework Core 2.0, please go through the following posts, as the current one builds on these:

SQL Server Profiler: How It Works, Best Practices, and Tutorials

DZone Database Zone
SQL Server Profiler: How It Works, Best Practices, and Tutorials

If you work with SQL, you understand the importance of being able to monitor your queries, not to mention how critical it is to have an accurate analysis of how long queries take. Today, we'll take a look at SQL Server Profiler, a tool for doing just that, how it works, some shortcomings (most notably, depreciation of its features), and alternatives.

What Is SQL Server Profiler?

It's a tool for tracing, recreating, and troubleshooting problems in MS SQL Server, Microsoft's relational database management system (RDBMS). The profiler lets developers and database administrators (DBAs) create and handle traces and replay and analyze trace results. In a nutshell, it's like a dashboard that shows the health of an instance of MS SQL Server.

Wednesday, August 30, 2017

Achieving Business Results with AWS Professional Services

Data Videos
#Data -Achieving Business Results with AWS Professional Services

Database Fundamentals #8: All About Data Types

DZone Database Zone
Database Fundamentals #8: All About Data Types

SQL Server provides all different kinds of data types in order to give you more flexibility and control over how you store your data. You define these data types as you define the columns in your tables. If you wanted to store information about the date and time that a purchase was made online, you’re going to store the values in a column (or columns) that define dates and times in order to ensure accuracy in your data.

Choosing a Data Type

You could make the data type into one that stores just about anything you give it, such as one of the character types like char or varchar. But do this and you’re going to run into issues when you start running queries against that data. For example, your business may ask for the amount of time between purchases. Since you chose to store the date and time as character values, you won’t be able to use some of the functions provided by the date and time data types. Instead, you’ll have to either convert your data into date and time data types or you’ll have to write your own functions. But, if you used the datetime data types, you get all that functionality and more, such as date/time formatting for different countries and validation that the date and time you’re entering are valid dates and times, as well as Universal Time Code (UTC) values and offsets. The same thing is true of all the various data types. You don’t have to use a specific data type, but you’re sacrificing functionality if you don’t try to specify a data type appropriate to the business need.

Fast Database Cloning in Amazon Aurora

Data Videos
#Data -Fast Database Cloning in Amazon Aurora

OrientDB Intro and HTTP REST API Tutorial

DZone Database Zone
OrientDB Intro and HTTP REST API Tutorial

Data access is a foundational consideration of any software application.

Should your data be stored locally or in the cloud? Is it currently organized and logical, or is it a mess of database tables that require several requests to construct a meaningful interface?

Learning MySQL 5.7: Q+A

DZone Database Zone
Learning MySQL 5.7: Q+A

In this post, I’ll answer questions I received in one of my recent webinars, Learning MySQL 5.7!

First, thank you all who attended the webinar. The link to the slides and the webinar recording can be found here.

5 Sharding Data Models and Which Is Right

DZone Database Zone
5 Sharding Data Models and Which Is Right

When it comes to scaling your database, there are challenges — but the good news is that you have options. The easiest option, of course, is to scale up your hardware. And when you hit the ceiling on scaling up, you have a few more choices: sharding, deleting swaths of data that you think you might not need in the future, or trying to shrink the problem with microservices.

Deleting portions of your data is simple — if you can afford to do it. Regarding sharding, there are a number of approaches, and which one is the right one depends on a number of factors. Here, we'll review a survey of five sharding approaches and dig into what factors guide you to each approach.

Tuesday, August 29, 2017

Video Course: See IoT in Action with Microsoft

Data Videos
#Data -Video Course: See IoT in Action with Microsoft

Couchbase and Azure: Getting Started for Free

DZone Database Zone
Couchbase and Azure: Getting Started for Free

Azure is where Microsoft is spending a lot of its efforts lately. Microsoft is dedicated to making Azure a success. As someone who started working with Azure a little in the early days, I can say that it’s come a long way, and offers a remarkable set of services at good prices.

But not everyone is on board with Azure or even with cloud computing yet. If you haven’t yet dipped your toe into the Azure pool, but you are curious, this blog post is for you.

AWS Knowledge Center Videos: How do I access member accounts created using AWS Organizations?

Data Videos
#Data -AWS Knowledge Center Videos: How do I access member accounts created using AWS Organizations?

Why You Should Never Put Objects Into the SYSTEM or SYSAUX Tablespace

DZone Database Zone
Why You Should Never Put Objects Into the SYSTEM or SYSAUX Tablespace

I still see some people putting objects, i.e. tables, in the SYSTEM or SYSAUX tablespace. Sometimes, it’s done deliberately; but sometimes, it happens automatically by creating a table in the SYS schema. Well, let me tell you — this is a really bad idea. You should never, ever, ever put any kind of user object into those tablespaces. Even the Oracle Database Documentation warns you of doing so:

7.3.1 SYS and SYSTEM Users

The SYS and SYSTEM administrative user accounts are automatically created when you install Oracle Database. They are both created with the password that you supplied upon installation, and they are both automatically granted the DBA role.

Brad Anderson's Lunch Break / s6 e12 / Mark Bowker, Analyst, ESG (Part 2)

Data Videos
#Data -Brad Anderson's Lunch Break / s6 e12 / Mark Bowker, Analyst, ESG (Part 2)

Quicker Insight Into Apache Solr and Collection Health

DZone Database Zone
Quicker Insight Into Apache Solr and Collection Health

Successful cluster administration can be very difficult without a real-time view of the state of the cluster. Solr itself does not provide aggregated views about its state or any historical usage data, which is necessary to understand how the service is used and how it is performing. Knowing the throughput and capacities not only helps detect errors and troubleshoot issues but is also useful for capacity planning.

Questions may arise, such as:

Blockchain Meets Database: Replace or Combine?

DZone Database Zone
Blockchain Meets Database: Replace or Combine?

When adopting blockchain into your organization, it's not necessary to replace your existing databases and associated processes. Instead, you can integrate and improve.

Although blockchain technology offers promising advances to DevOps and the digitization of disintermediation and consensus-based dissemination of information, the proponents of pilot systems all too often suggest that existing processes and legacy databases should be discarded in favor of the new.

Monday, August 28, 2017

How can I set up HTTP to HTTPS redirections on ELB using Apache backend servers?

Data Videos
#Data -How can I set up HTTP to HTTPS redirections on ELB using Apache backend servers?

Work Order Management With Neo4j

DZone Database Zone
Work Order Management With Neo4j

I look terrible in a bikini (take my word for it), but I'd love me a Lamborghini. However, in order to afford nice things, we need to do as the song says and get to work... and we need to manage and prioritize that work somehow. Today, I'm going to show you how to build part of a work order management system with Neo4j.

I'm going to build an evented work order model. Let's say our order gets created, then based on what it is, pieces of work need to happen. This work is performed by some provider (whether internal or external) and that work can be broken down into tasks that have dependencies on events that have occurred. How would this look in the graph? Glad you asked:

CURL Comes to N1QL: Querying External JSON Data

DZone Database Zone
CURL Comes to N1QL: Querying External JSON Data

N1QL has many functions that allow you to perform a specific operation. One such function that has been added into the new Couchbase 5.0 DP is CURL.

CURL allows you to use N1QL to interact with external JSON endpoints; namely, Rest API’s that return results and data in JSON format. Interaction primarily consists of data transfer to and from a server using the http and https protocols. In short, the CURL function in N1QL provides you, the user, a subset of standard curl functionality (https://curl.haxx.se/docs/manpage.html) within a query language.

Pronto Move Shard: A Flash Game Created With a Database

DZone Database Zone
Pronto Move Shard: A Flash Game Created With a Database

In July, Adobe announced that they plan the end-of-life for Flash to be around 2020. As HTML5 progressed — and due to a long history of critical security vulnerabilities — this is, technologically speaking, certainly the right decision. However, I also became a bit sad.

Flash was the first technology that brought interactivity to the web. We tend to forget how static the web was in the early 2000s. Flash brought life to the web — and there were plenty of stupid trash games and animations that I really enjoyed at the time. As a homage to the age of trashy Flash games, I created a game that resembles the games of this era:

How to Implement a GraphQL API in Rails

DZone Database Zone
How to Implement a GraphQL API in Rails

GraphQL came out of Facebook a number of years ago as a way to solve a few different issues that typical RESTful APIs are prone to. One of those was the issue of under- or over-fetching data.

Under-fetching is when the client has to make multiple round trips to the server just to satisfy the data needs they have. For example, the first request is to get a book, and a follow-up request is to get the reviews for that book. Two round trips are costly, especially when dealing with mobile devices on suspect networks.

Sunday, August 27, 2017

Building a Boolean Logic Rules Engine in Neo4j

DZone Database Zone
Building a Boolean Logic Rules Engine in Neo4j

A boolean logic rules engine was the first project I did for Neo4j before I joined the company some five years ago. I was working for some start-up at the time but took a week off to play consultant. I had never built a rules engine before, but as far as I know, ignorance has never stopped anyone from trying. Neo4j shipped me to the client site and put me in a room with a projector and a white board where I live coded with an audience of developers staring at me, analyzing every keystroke and cringing at every typo and failed unit test. I forgot what sleep was, but I managed to figure it out... and I lost all sense of fear after that experience.

The data model chained together fact nodes with crisscrossing relationships, each chain containing the same path ID property that we followed until reaching an end node which triggered a rule. There were a few complications along the way and more complexity near the end for ordering and partial matches. The traversal ended up being some 40 lines of the craziest Gremlin code I ever wrote, but it worked. After the proof of concept, the project was rewritten using the Neo4j Java API because at the time only a handful of people could look at a 40-line Gremlin script and not shudder in horror. I think we're up to two handfuls now.

jOOQ 3.10 Supports JPA AttributeConverter

DZone Database Zone
jOOQ 3.10 Supports JPA AttributeConverter

One of the cooler hidden features in jOOQ is JPADatabase, which allows for reverse engineering a pre-existing set of JPA-annotated entities to generate jOOQ code.

For instance, you could write these entities here:

Saturday, August 26, 2017

Cadre at AWS Startup Booth - NY Summit

Data Videos
#Data -Cadre at AWS Startup Booth - NY Summit

Kit at AWS Startup Booth - NY Summit

Data Videos
#Data -Kit at AWS Startup Booth - NY Summit

Throttling Database Using Rate Limits for SQL or REST

DZone Database Zone
Throttling Database Using Rate Limits for SQL or REST

When you are planning to expose your database to new users or tenants, one of the important areas to consider is resource governance. When in production, there's always a high probability that you will see complex live queries for data visualization or MapReduce jobs impacting your analytical database, which can impact other users. Then, you start to scale, as with any web application, by running a load balancer in front of your servers to distribute requests efficiently. But often, in a production environment, you come across a bad user that affects your quality of service (QoS). To give you an idea on how a bad user can affect your service, here are a couple of abusive scenarios:

A naïve developer who keeps hogging all the resources due to an inefficiently written client request. A low-priority user who keeps hogging the resources, causing service outages for high-priority users. A malicious user who keeps attacking your API endpoints to cause DDoS for all other users.

It is not pragmatic to scale your system to accommodate genuine requests whenever there is a drop in QoS due to such abusive behavior. To deal with this, rate limiting is one technique that can be employed. Essentially, rate limiting defines a number of requests or the amount of data that you can request with in an interval of time. This is an effective technique that can mitigate the abusive scenarios discussed above, and you can find rate limits for almost all the SQL and REST APIs that you would want to interact with.

Migrating Data From an Encrypted Amazon MySQL RDS Instance to an Encrypted Amazon Aurora Instance

DZone Database Zone
Migrating Data From an Encrypted Amazon MySQL RDS Instance to an Encrypted Amazon Aurora Instance

In this blog post, we'll discuss migrating data from encrypted Amazon MySQL RDS to encrypted Amazon Aurora.

One of my customers wanted to migrate from an encrypted MySQL RDS instance to an encrypted Aurora instance. They have a pretty large database; therefore, using mysqldump or a similar tool was not suitable for them. They also wanted to setup replication between old MySQL RDS and new Aurora instances.

Friday, August 25, 2017

Parsec at AWS Startup Booth - NY Summit

Data Videos
#Data -Parsec at AWS Startup Booth - NY Summit

Enigma at AWS Startup Booth - NY Summit

Data Videos
#Data -Enigma at AWS Startup Booth - NY Summit

AWS for Digital Marketing - Real Time Bidding (RTB)

Data Videos
#Data -AWS for Digital Marketing - Real Time Bidding (RTB)

Aggregate Grouping With N1QL or With MapReduce

DZone Database Zone
Aggregate Grouping With N1QL or With MapReduce

"Aggregate grouping" is what I’m putting in the title of this blog post, but I don’t know if that’s the best name. Have you ever used MySQL’s GROUP_CONCAT function or the  FOR XML PATH('') workaround in SQL Server? That’s basically what I’m writing about today. With Couchbase Server, the easiest way to do it is with N1QL’s ARRAY_AGG function, but you can also do it with an old school MapReduce View.

I’m writing this post because one of our solution engineers was working on this problem for a customer (who will go unnamed). Neither of us could find a blog post like this with the answer, so after we worked together to come up with a solution, I decided I would blog about it for my future self (which is pretty much the main reason I blog anything, really; the other reason is to find out if anyone else knows a better way).

The MySQL High Availability Landscape in 2017

DZone Database Zone
The MySQL High Availability Landscape in 2017

In the previous post of this series, we looked at the MySQL high availability (HA) solutions that have been around for a long time. I called these solutions "the elders." Some of these solutions (like replication) are heavily used today and have been improved from release to release of MySQL.

This post focuses on the MySQL high availability solutions that have appeared over the last five years and gained a fair amount of traction in the community. I chose to include this group only two solutions: Galera and RDS Aurora. I'll use the term "Galera" generically — it covers Galera Cluster, MariaDB Cluster, and Percona XtraDB Cluster. I debated for some time whether or not to include Aurora. I don't like the fact that they use closed-source code. Given the tight integration with the AWS environment, what is the commercial risk of opening the source code? That question evades me, but I am not on the business side of technology.

Brad Anderson's Lunch Break / s6 e10 / Christine Singh, VP, Moody’s (Part 2)

Data Videos
#Data -Brad Anderson's Lunch Break / s6 e10 / Christine Singh, VP, Moody’s (Part 2)

Functional RESTful Backends for DataTables

DZone Database Zone
Functional RESTful Backends for DataTables

Functional programming presents itself as a great building block for composing RESTful backends that handle errors and behaviors elegantly.

With my previous project, we wanted to give to our development teams the ability to start in functional programming and compose reusable backends for UIs that use the DataTables Plugin for JQuery. DataTables, in general, are an easy way to aggregate multiple functionalities into a single UI. Using one component, we can have search and pagination together. Once we have all these functionalities associated, we need a good programming paradigm for composing controllers and services as well as separating concerns.

How to Create an Oracle Database Docker Image

DZone Database Zone
How to Create an Oracle Database Docker Image

Oracle has released Docker build files for the Oracle Database on GitHub. With those build files, you can go ahead and build your own Docker image for the Oracle Database. If you don’t know what Docker is, you should go and check it out. It’s a cool technology based on the Linux containers technology that allows you to containerize your application — whatever that application may be. Naturally, it didn’t take long for people to start looking at containerizing databases, as well, which makes a lot of sense — especially for, but not only, development and test environments. Here is a detailed blog post on how to containerize your Oracle Database by using those build files that Oracle has provided.

You will need:

Thursday, August 24, 2017

Unlock Insights and Reduce Costs by Modernizing Your Data Warehouse on AWS

Data Videos
#Data -Unlock Insights and Reduce Costs by Modernizing Your Data Warehouse on AWS

Reusing Open Connections When Testing Your Database

DZone Database Zone
Reusing Open Connections When Testing Your Database

When testing how well your database queries are optimized, opening up too many connections to the database might create overhead and cause performance degradation. To be able to isolate database query testing, Apache JMeter™ provides flexibility, allowing you to choose if you want to run many queries using one connection or to establish many connections but to run queries less extensively.

In this blog post, we will show you how to run MySQL database queries both with one connection and with multiple connections. This is done through JMeter's JDBC elements and Thread Groups. As soon as you get the idea of how it works, you will be able to apply a more accurate load to your database, simulate all possible test scenarios, and make your database application layer rock solid!

Eclipse, culture, Power BI updates and a contest!

Data Videos
#Data -Eclipse, culture, Power BI updates and a contest!

Finding All Palindromes Contained in Strings With SQL

DZone Database Zone
Finding All Palindromes Contained in Strings With SQL

SQL is a really cool language. I can write really complex business with this logic programming language. I was again thrilled about SQL recently, at a customer site:

But whenever I tweet something like the above, the inevitable happened. I was nerd sniped. Oleg Å elajev from ZeroTurnaround challenged me to prove why SQL is so awesome:

Stubbing Key-Value Stores

DZone Database Zone
Stubbing Key-Value Stores

Every project that has a database has a dilemma when it comes to how to test database-dependent code. There are several options (not mutually exclusive):

Use mocks. Only use unit tests and mock the data-access layer, assuming the DAO-to-database communication works. Use an embedded database that each test starts and shuts down. This can also be viewed as unit testing. Use a real database deployed somewhere (either locally or on a test environment). The hard part is making sure it's always in a clean state. Use end-to-end/functional tests/BDD/UI tests after deploying the application on a test server (which has a proper database).

None of the above is without problems. Unit tests with mocked DAOs can't really test more complex interactions that rely on a database state. Embedded databases are not always available (for example, if you are using a non-relational database or if you rely on RDBMS-specific functionality, HSQLDB won't do), or they can be slow to start. This means your tests may take too long in supporting. A real database installation complicates setup, and keeping it clean is not always easy. The coverage of end-to-end tests can't be easily measured and they don't necessarily cover all the edge cases, as they are harder to maintain than unit and integration tests.

Brad Anderson's Lunch Break / s6 e9 / Christine Singh, VP, Moody’s

Data Videos
#Data -Brad Anderson's Lunch Break / s6 e9 / Christine Singh, VP, Moody’s

Runtime Metrics in Execution Plans

DZone Database Zone
Runtime Metrics in Execution Plans

Capturing query execution metrics is much easier now that you can see the runtime metrics in execution plans when you’re using SQL Server 2016 SP1 or better in combination with SQL Server Management Studio 2017. When you capture an actual plan using any method, you get the query execution time on the server as well as wait statistics and I/O for the query. This fundamentally changes how we can go about query tuning.

Runtime Metrics

To see these runtime metrics in action, let’s start with a query:

Wednesday, August 23, 2017

From Excel Hell to Cloud Database Heaven

DZone Database Zone
From Excel Hell to Cloud Database Heaven

Most well-known database technologies have some or all of the following features:

Data quality and consistency: A data schema with a detailed description of all data resources and properties. Automatic data validation according to the data schema. Row/document locking to prevent data collision. Access control: Define access roles to allow/prevent read, write, or delete on resources. Allow users to have private data views of shared resources. Data relations Query language API: A REST API for platform-agnostic data access and integration. A platform-specific SDK.

The rest of this blog post is a step-by-step tutorial on how you can migrate from spreadsheets to a fast and consistent NoSQL cloud database using RestDB.io.

AWS Summit Tel Aviv 2017: Keynote with Adrian Cockcroft

Data Videos
#Data -AWS Summit Tel Aviv 2017: Keynote with Adrian Cockcroft

Bringing DevOps Practices to Database Administrators [Audio]

DZone Database Zone
Bringing DevOps Practices to Database Administrators [Audio]

I spoke with Robert Reeves, CTO and co-founder of Datical, a company that aims to remove the pain that database administrators (DBAs) experience on a regular basis attempting to make changes to database heavy applications.

They do this by bringing easy-to-use tools inspired by practices from the DevOps world to the traditionally messy world of enterprise databases with change simulation, rollbacks, rules engines, packaging databases as code, and strong monitoring.

Brad Anderson's Lunch Break / s6 e8 / Mary Cecola, CIO, Antares Capital LP (Part 2)

Data Videos
#Data -Brad Anderson's Lunch Break / s6 e8 / Mary Cecola, CIO, Antares Capital LP (Part 2)

What’s Next After Dynamo and Cassandra?

DZone Database Zone
What’s Next After Dynamo and Cassandra?

Avinash Lakshman, CEO of Hedvig and developer of Dynamo and Cassandra, shares his thoughts on the current and future state of databases.

How are you and Hedvig involved in databases?

How to Use SQL Syntax to Access MongoDB

DZone Database Zone
How to Use SQL Syntax to Access MongoDB

tangyuan-mongo is the Mongo service component in the tangyuan framework. The tangyuan-mongo component encapsulates a series of Mongo operations into Tangyuan's services and provides a unified way to access it. It also provides access to Mongo in SQL syntax.

The source code can be found here, and the official website is here

Tuesday, August 22, 2017

What Is It Like to Work at AWS? We Asked the Experts.

Data Videos
#Data -What Is It Like to Work at AWS? We Asked the Experts.

Brad Anderson's Lunch Break / s6 e7 / Mary Cecola, CIO, Antares Capital LP

Data Videos
#Data -Brad Anderson's Lunch Break / s6 e7 / Mary Cecola, CIO, Antares Capital LP

SQL Solution to Elasticsearch

DZone Database Zone
SQL Solution to Elasticsearch

Developers today have a lot of storage options for building strategic new apps, including JSON databases like Elasticsearch. These new technologies allow development teams to more easily and efficiently iterate on features.

Using agile methodologies, teams work in sprints that last just a few weeks, getting new features to market fast. Compared to relational databases, Elasticsearch is far less demanding in terms of modeling and structuring data, and this is a big advantage in terms of development speed.

ATLO Software Delivers Secure Training Programs with Sophos UTM on AWS

Data Videos
#Data -ATLO Software Delivers Secure Training Programs with Sophos UTM on AWS

AWS Knowledge Center Videos: How do I encrypt RDS Snapshots?

Data Videos
#Data -AWS Knowledge Center Videos: How do I encrypt RDS Snapshots?

Diving into Couchbase Index Replicas

DZone Database Zone
Diving into Couchbase Index Replicas

With Couchbase Server 4.x, customers used to create Equivalent Indexes to satisfy the twin requirements of keeping the indexes highly available and to load balance the N1QL queries. What this meant was that the exact same index definition was used to create indexes with different names. What’s in a name you might ask… a rose is a rose is a rose :)

 create index index1 on bucket(field1);

Financial Services and Neo4j: 360-Degree View of Customer Experience

DZone Database Zone
Financial Services and Neo4j: 360-Degree View of Customer Experience

Customer expectations are rising at a time when customer service is a significant differentiator within the financial services industry.

Customers expect companies to deliver personalized service – i.e., an end-to-end customer experience — that reflects an understanding of who they are, their communication preferences, the products and services they’ve purchased in the past, and what they might be interested in in the future.

Meet the New DBA, Different From the Old

DZone Database Zone
Meet the New DBA, Different From the Old

There’s a rapid shift taking place in today's technology organizations. The role of the DBA is being redefined and increasingly replaced by other roles and specialties. This is happening even as data explodes — in fact, it's happening precisely because data is exploding. It's a trend that is accelerating, and it sometimes takes people by surprise.

In general, the DBA role is shifting from the lower layers of the technology stack up into the higher layers, where its concerns overlap more and more with technical operations and even development. Let’s consider why this is happening and what it means for the future of data management and data operations.

Monday, August 21, 2017

Live from the NY Summit | Interview with Adrian Cockcroft

Data Videos
#Data -Live from the NY Summit | Interview with Adrian Cockcroft

Brad Anderson's Lunch Break / s6 e6 / Jacob Morgan, Author, Futurist (Part 2)

Data Videos
#Data -Brad Anderson's Lunch Break / s6 e6 / Jacob Morgan, Author, Futurist (Part 2)

Live from the NY Summit | Interview with Adrian Cockcroft

Data Videos
#Data -Live from the NY Summit | Interview with Adrian Cockcroft

AWS Summit Tel Aviv 2017: Fundamentals of Networking and Security on AWS

Data Videos
#Data -AWS Summit Tel Aviv 2017: Fundamentals of Networking and Security on AWS

How do I increase bandwidth for active traffic on AWS Direct Connect using a link aggregation group?

Data Videos
#Data -How do I increase bandwidth for active traffic on AWS Direct Connect using a link aggregation group?

An Introduction to TensorFlow

DZone Database Zone
An Introduction to TensorFlow

In this post, we are going to see some TensorFlow examples, define tensors, perform math operations using tensors, and see other machine learning examples.

What Is TensorFlow?

TensorFlow is a library that was developed by Google for solving complicated mathematical problems, which takes a lot of time.

Async and Await: An Explanation

DZone Database Zone
Async and Await: An Explanation

For a while now, the Async and Await commands in C# have confused me.

Like most things, the best way to learn about something is to use it in a real-world example. I am currently adding an email alert feature to a website. This is an ideal example of something that would benefit from Asynchronous programming. There is no need for the webpage to wait to send thousands of emails; let's just send a call to get started and allow the browser to carry on as normal.

Sunday, August 20, 2017

How the ClustrixDB Query Evaluation Model Works

DZone Database Zone
How the ClustrixDB Query Evaluation Model Works

Recently, we’ve started to dig into the internals of ClustrixDB, specifically how ClustrixDB accomplishes horizontal scaling of both writes and reads without sharding. Next, we dug into the details of the multi-patented ClustrixDB Rebalancer. This time, we will discuss the ClustrixDB Query Evaluation Model.

ClustrixDB is a MySQL-compatible distributed RDBMS that provides linear scale out of both writes and reads, while maintaining relational semantics, including ACID transactionality and referential integrity. Typically, MySQL workloads are only able to scale out both writes and reads if sharding is used. Sharding is the strategy of partitioning your MySQL application workload across multiple separate MySQL database servers, allowing queries and data CRUD operations to fan-out. This means multiple separate MySQL physical servers must be deployed, the workload data needs to be partitioned across them, and the application needs to be rewritten to manage any ACID transactionality needed between those servers. ClustrixDB is able to provide a similar linear scale out of sharding, but the data distribution is automatically handled via the multi-patented ClustrixDB Rebalancer behind the scenes. The application doesn’t require rewrites and sees only a single logical RDBMS while all cross-node ACID transactionality is handled automatically.

Saturday, August 19, 2017

Why Do We Need Blockchain?

DZone Database Zone
Why Do We Need Blockchain?

Many times in our business conversations, I have come across this question:

“Why do we need to implement this business functionality using Blockchain? Why can’t we implement this using a database and a web application?”

Anyone who's eager to jump onto the blockchain bandwagon has probably asked the same question. In this post, we take a look at an example implemented using traditional systems leveraging a database plus an application and how the same use case implemented with blockchain changes the equation.

Friday, August 18, 2017

AWS for Digital Marketing - Overview

Data Videos
#Data -AWS for Digital Marketing - Overview

Three DBAs Walk Into a NoSQL Bar... [Comic]

DZone Database Zone
Three DBAs Walk Into a NoSQL Bar... [Comic]

The Top Resources for Understanding Graph Theory and Algorithms

DZone Database Zone
The Top Resources for Understanding Graph Theory and Algorithms

Recently, we announced the availability of some super efficient graph algorithms for Neo4j. In case you missed the announcement, we now have an easy-to-use library of graph algorithms that are tuned to make full use of compute resources.

As part of assisting with this ongoing project, I needed to come up to speed as well as compile a list of graph algorithm and graph theory resources. Although this seemed like a short task, my list grew and continues to grow.

Enabling GTIDs for Server Replication in MariaDB Server 10.2

DZone Database Zone
Enabling GTIDs for Server Replication in MariaDB Server 10.2

I originally wrote this post in 2014 after the release of MariaDB Server 10.0. Most of what was in that original post still applies, but I've made some tweaks and updates since replication and high availability (HA) remain among the most popular MariaDB/MySQL features.

Replication first appeared on the MySQL scene more than a decade ago, and as replication implementations became more complex over time, some limitations of MySQL’s original replication mechanisms started to surface. To address those limitations, MySQL v5.6 introduced the concept of global transaction identifiers (GTIDs), which enable some advanced replication features. MySQL DBAs were happy with this but complained that in order to implement GTIDs, you needed to stop all the servers in the replication group and restart them with the feature enabled. There are workarounds; for instance, Booking.com documented a procedure to enable GTIDs with little or no downtime, but it involves more complexity than most organizations are willing to allow. (Check out this blog post for more on how Booking.com handles replication and high availability.)

Thursday, August 17, 2017

Introducing Amazon Macie

Data Videos
#Data -Introducing Amazon Macie

Travelex: A Secure FCA Regulated B2C Payment Platform on ECS

Data Videos
#Data -Travelex: A Secure FCA Regulated B2C Payment Platform on ECS

AWS Summit Series 2017 – New York: Claus Moldt, CIO of FICO

Data Videos
#Data -AWS Summit Series 2017 – New York: Claus Moldt, CIO of FICO

This Week in Neo4j: Fake News, Threat Hunting, and Triplets

DZone Database Zone
This Week in Neo4j: Fake News, Threat Hunting, and Triplets

Welcome to this week in Neo4j where we round up what’s been happening in the world of graph databases in the last 7 days.

Featured Community Member: Eve Freeman

This week’s featured community member is Eve Freeman, Applications Development Analyst IV at Fannie Mae.

Brad Anderson's Lunch Break / s6 e6 / Jacob Morgan, Author, Futurist (Part 2)

Data Videos
#Data -Brad Anderson's Lunch Break / s6 e6 / Jacob Morgan, Author, Futurist (Part 2)

Technology and Friends Episode 496: Oren Eini on RavenDB [Video]

DZone Database Zone
Technology and Friends Episode 496: Oren Eini on RavenDB [Video]

Last week, in That Conference (which was great) I had the chance to do an interview with David Giard.

You can go to the interview directly, or watch it here:

Sensitive Data Masking With MariaDB MaxScale

DZone Database Zone
Sensitive Data Masking With MariaDB MaxScale

Protecting personal and sensitive data and complying with security and privacy regulations is a high priority for organizations. This includes personally identifiable information (PII), protected health information (PHI), payment card information (subject to PCI-DSS regulation), and intellectual property (subject to ITAR and EAR regulations). In many cases, if not most, it needs to be redacted or masked when accessed (internally and/or externally).

Data redaction obfuscates all or part of the data, reducing unnecessary exposure of sensitive data while at the same time maintaining its usability. Various terms such as data masking, data obfuscation, and data anonymization are used to describe this functionality in databases. Data redaction allows an organization to:

Wednesday, August 16, 2017

AWS Summit Series 2017 – New York: Serkan Kutan, CTO of Zocdoc

Data Videos
#Data -AWS Summit Series 2017 – New York: Serkan Kutan, CTO of Zocdoc

VPC Endpoints for Amazon DynamoDB

Data Videos
#Data -VPC Endpoints for Amazon DynamoDB

Extending the Power of MariaDB ColumnStore With User-Defined Functions

DZone Database Zone
Extending the Power of MariaDB ColumnStore With User-Defined Functions

MariaDB ColumnStore 1.0 supports User-Defined Functions (UDF) for query extensibility. This allows you to create custom filters and transformations to suit any need. This blog outlines adding support for distributed JSON query filtering. 

An important MariaDB ColumnStore concept to grasp is that there are distributed and non-distributed functions. Distributed functions are executed at the PM nodes supporting query execution scale out. Nondistributed functions are MariaDB Server functions that are executed within the UM node. As a result, MariaDB ColumnStore requires two distinct implementations of any function.

"How can I import my virtual machine into an Amazon Machine Image by using the AWS CLI?"

Data Videos
#Data -"How can I import my virtual machine into an Amazon Machine Image by using the AWS CLI?"

Bad Parameter Sniffing Decision Flow Chart [Infographic]

DZone Database Zone
Bad Parameter Sniffing Decision Flow Chart [Infographic]

Lots of people are confused with how to deal with bad parameter sniffing when it occurs. In an effort to help with this, I’m going to try to make a decision flow chart to walk you through the process. This is a rough — quite rough — first draft.

I would love to hear any input. For this draft, I won’t address the things I think I’ve left out. I want to see what you think of the decision flow and what you think might need to be included. 

To DBaaS or Not to DBaaS?

DZone Database Zone
To DBaaS or Not to DBaaS?

According to a new forecast from the International Data Corporation (IDC), total spending on IT infrastructure products (server, enterprise storage, and Ethernet switches) for deployment in cloud environments will increase 15.3% year-over-year in 2017 to $41.7 billion. 

Gartner Inc., a leading research and advisory company, predicts that the public cloud services market will grow 18% in 2017 to $246.8B, while it was $209.2B in 2016. In the cloud world, Infrastructure as a Service (IaaS) is predicted to have highest growth rate from 36.8% in 2017, making a total of $34.6 billion. Cloud application services (Software as a Service, or SaaS) are predicted to grow 20.1% to reach $46.3 billion.

Building a Full-Text Search Test Framework

DZone Database Zone
Building a Full-Text Search Test Framework

This article will give you a quick glimpse into a test framework built to validate Couchbase’s new full-text search feature. The idea described here can be extended to test any text search engine in general.

Couchbase Full-Text Search

Searching unstructured schema-less JSON documents in Couchbase is now easy thanks to the full-text capability it offers. What this means is that Couchbase users can now search for phrases, words, and date/numeric-ranges inside JSON documents. These searches are essentially “queries” on full-text indexes. Couchbase full-text search is RESTful and distributed and is driven by Bleve, an indexing and search library written in Go. For more about full-text search, refer to the recommended reading section.

Tuesday, August 15, 2017

AWS Summit Series 2017 - New York: Keynote

Data Videos
#Data -AWS Summit Series 2017 - New York: Keynote

AWS CloudTrail Event History is Now Available to All Customers

Data Videos
#Data -AWS CloudTrail Event History is Now Available to All Customers

Neo4j and Cypher: Rounding Floating Point Numbers/BigDecimals

DZone Database Zone
Neo4j and Cypher: Rounding Floating Point Numbers/BigDecimals

I was doing some data cleaning a few days ago and wanting to multiply a value by one million. My Cypher code to do so looked like this:

with "8.37" as rawNumeric RETURN toFloat(rawNumeric) * 1000000 AS numeric ╒═════════════════╕ │"numeric" │ ╞═════════════════╡ │8369999.999999999│ └─────────────────┘

Unfortunately, that suffers from the classic rounding error when working with floating point numbers. I couldn’t figure out a way to solve it using pure Cypher, but there tends to be an APOC function to solve every problem... and this was no exception.

Brad Anderson's Lunch Break / s6 e5 / Jacob Morgan, Author, Futurist

Data Videos
#Data -Brad Anderson's Lunch Break / s6 e5 / Jacob Morgan, Author, Futurist

gumi Asia Case Study for Game Development on AWS

Data Videos
#Data -gumi Asia Case Study for Game Development on AWS

MongoDB: Evaluate Query Performance Using Indexes

DZone Database Zone
MongoDB: Evaluate Query Performance Using Indexes

This blog shows commands that you can use to manage MongoDB indexes on a particular collection, as well as tips on how to evaluate query performance with or without indexes. 

Why Create Indexes?

Indexes can significantly improve read query performance for MongoDB collections. In the absence of indexes, when searching for documents based on filter criteria, MongoDB performs a collection scan in which it scans every document and returns the documents matching the filter criteria. This is not a very efficient way of searching the document. For example, if one or more fields are frequently used for filtering out the document, it is recommended to create indexes on those fields. MongoDB thus limits the number of documents that are scanned when indexes are present. When there are fewer documents being scanned, there is faster query execution time.

Brad Anderson's Lunch Break / s6 e5 / Jacob Morgan, Author, Futurist

Data Videos
#Data -Brad Anderson's Lunch Break / s6 e5 / Jacob Morgan, Author, Futurist

Maintaining Transaction Boundary Integrity in a Distributed Cluster

DZone Database Zone
Maintaining Transaction Boundary Integrity in a Distributed Cluster

We pretty much treat RavenDB’s transactional nature as a baseline — same as the safe assumption that any employee we hire will have a pulse. (Sorry, we discriminate against Zombies and Vampires because they create a hostile work environment. See here for details.)

OK, now back to transactions, and why I’m bringing up a basic requirement like that. Consider a case when you need to pay someone. That operation is composed of two distinct operations. First, the bank debits your account and then the bank credits the other account. You generally want these to happen as a transactional unit — either both of them happened or neither of them did. In practice, that isn’t how banks work at all, but that is the simplest way to explain transactions, so we’ll go with that.

How to Order Streamed DataFrames

DZone Database Zone
How to Order Streamed DataFrames

A few days ago, I had to perform aggregation on a streaming DataFrame. And the moment I applied groupBy for aggregation, the data got shuffled. Now, a new situation arises regarding how to maintain order.

Yes, I can use orderBy with a streaming DataFrame using Spark structured streaming, but only in complete mode. There is no way of doing the ordering of streaming data in append mode nor in update  mode.

Monday, August 14, 2017

Introducing AWS Glue

Data Videos
#Data -Introducing AWS Glue

Introducing the Coco Framework

Data Videos
#Data -Introducing the Coco Framework

AWS Knowledge Center Videos: How do I encrypt my data in Amazon EFS?

Data Videos
#Data -AWS Knowledge Center Videos: How do I encrypt my data in Amazon EFS?

Finding Triples With Neo4j

DZone Database Zone
Finding Triples With Neo4j

A user had an interesting Neo4j question on Stack Overflow the other day:

I have two types of nodes in my graph. One type is Testplan and the other is Tag. Testplans are tagged to Tags. I want most common pairs of Tags that share the same Testplans with a Tag having a specific name. I have been able to achieve the most common Tags sharing the same Testplan with one Tag, but getting confused when trying to do it for pairs of Tags.

Their Cypher query looked like this:

Syncing Databases Properly to Work With eCommerce Business Applications

DZone Database Zone
Syncing Databases Properly to Work With eCommerce Business Applications

When it comes to planning database backends for eCommerce sites and applications, one may come across many technical terms like:

Simple MySQL PostgreSQL-powered Cloud Redundant database Multi-zone NoSQL backend

These all are standard terms that describe databases of different eCommerce systems, but what do they mean and how do they work? What is the purpose of a database? Is it possible to run an eCommerce store without a database?

2 Approaches to Scalable Database Design

DZone Database Zone
2 Approaches to Scalable Database Design

Any form of application used for data analysis is stringently dependent on its ability to retrieve queries fast. However, when working with larger or more complex datasets, as well as an increasing amount of concurrent users, the performance depends largely on the underlying analytical database — whether this is built into the application as part of a single-stack tool or implemented via a separate data warehouse layer.

What Makes a Scalable Database?

Database scalability is a concept in database design that emphasizes the capability of a database to handle growth in the amount of data and users. In the modern applications sphere, two types of workloads have emerged: analytical and transactional workloads. Planning for workload growth must take into account operating system, database design, and hardware design decisions.

Cypher: Write Fast and Furious

DZone Database Zone
Cypher: Write Fast and Furious

Editor’s Note: This presentation was given by Christophe Willemsen at GraphConnect San Francisco in October 2016.

Presentation Summary

In this presentation, Christophe Willemsen covers a variety of do-and-don’t tips to help your Cypher queries run faster than ever in Neo4j.

Sunday, August 13, 2017

GameDay Essentials | Episode 3: Changes

Data Videos
#Data -GameDay Essentials | Episode 3: Changes

Introducing the Coco Framework

Data Videos
#Data -Introducing the Coco Framework

Syncing Databases Properly to Work With eCommerce Business Applications

DZone Database Zone
Syncing Databases Properly to Work With eCommerce Business Applications

When it comes to planning database backends for eCommerce sites and applications, one may come across many technical terms like:

Simple MySQL PostgreSQL-powered Cloud Redundant database Multi-zone NoSQL backend

These all are standard terms that describe databases of different eCommerce systems, but what do they mean and how do they work? What is the purpose of a database? Is it possible to run an eCommerce store without a database?

Saturday, August 12, 2017

Azure Functions With Couchbase Server

DZone Database Zone
Azure Functions With Couchbase Server

Azure Functions are Microsoft’s answer to Amazon’s Lambdas or Google’s Cloud Functions (AKA “serverless” architecture). They give you a way to deploy small pieces of code and let Azure handle the underlying server. I’ve never used them before, so I thought I would give them a try beyond “Hello, World” by getting them to work with Couchbase Server.

There are more options in Azure Functions beyond simple HTTP events (for example, blob triggers, GitHub webhooks, Azure Storage queue triggers, etc.). But, for this blog post, I’m going to focus on just HTTP events. I’ll create simple GET and SET endpoints that interact with Couchbase Server.

Friday, August 11, 2017

Faster PostgreSQL Counting

DZone Database Zone
Faster PostgreSQL Counting

Everybody counts — but not always quickly. This article takes a close look into how PostgreSQL optimizes counting. If you know the tricks, there are ways to count rows orders of magnitude faster than you do already.

The problem is actually under-described — there are several variations of counting, each with their own methods. First, think about whether you need an exact count or if an estimate suffices. Next, are you counting duplicates or just distinct values? Finally, do you want a lump count of an entire table or will you want to count only those rows matching extra criteria?

GameDay Essentials | Episode 3: Changes

Data Videos
#Data -GameDay Essentials | Episode 3: Changes

Scalable MySQL Cluster With ProxySQL and Orchestrator

DZone Database Zone
Scalable MySQL Cluster With ProxySQL and Orchestrator

MySQL is one of the most popular open-source relational databases, used by lots of projects around the world — including incredibly large-scale ones like Facebook, Twitter, and YouTube. Obviously, such projects need a truly reliable and highly available data storing system to ensure the appropriate level of a service quality. And the very first and the main way to get the most efficiency from your data storage is setting up database clustering so that it could process a big number of requests simultaneously and remain workable in conditions of increased load. However, configuring such solution from the scratch can appear to be a rather complicated task.

Thus, the Jelastic team has prepared a one-click installation package for you: a Scalable MySQL Cluster with out-of-box master-slave replication, event request distribution, and node auto-discovery. It is intended to instantly deploy a pair of interconnected MySQL containers, which handle asynchronous data replication and are automatically reconfigured upon cluster scaling (i.e. changing the number of nodes). In addition, this solution is supplied with a ProxySQL load balancer in front of the database nodes set and embedded Orchestrator for its convenient management via GUI.

Introducing the Coco Framework

Data Videos
#Data -Introducing the Coco Framework

Execute an Oracle Stored Procedure With Nested Table as a Parameter

DZone Database Zone
Execute an Oracle Stored Procedure With Nested Table as a Parameter

The objective of this tutorial is to demonstrate the steps required to execute an Oracle stored procedure with a nested table as one of the parameters from a Mule flow.

To demonstrate the application, I will be using a simple use case of inserting employee records by calling an Oracle stored procedure with a nested table as one of the parameters. Each employee record has two columns: the employee’s department and the nested table of a data structure with employee name and employee number as attributes.

Fun With SQL: Functions in Postgres

DZone Database Zone Fun With SQL: Functions in Postgres In our previous  Fun with SQL  post on the  Citus Data  blog, we covered w...