Monday, July 31, 2017
How Splunk and AWS Enabled End-to-End Visibility for PagerDuty and Bolstered Their Security Posture
July in Database: Dynamic SQL, Data Warehousing, and a New Music Database
Welcome to the second edition of This Month in Database! Last time, we looked at migration, deployment, and fraud. This time, we've got a whole new slew of fun stuff to talk about. We'll cover everything you need to know about databases from the past month, including the top database-related articles on DZone that you should check out, the top news that happened in the world of database, database jobs that you might be interested in, and more. Let's get started!
You Know We're All About Database (No Treble)The Most Effective Way to Write Effective SQL: Change Your Thinking Style by Emrah Mete. Your approach to solving the problem that you are working on at the database level should be a holistic one instead of a procedural one. See why.
MongoDB Backup and Recovery
Several customer conversations at MongoDB World, which was held in Chicago this year, reaffirmed my conviction that there is a burning need for an enterprise class backup and recovery solution for MongoDB databases. Although MongoDB is one of the top five most popular databases according to DB-Engines, the backup and recovery capabilities are inadequate. The ecosystem around the database is not mature as well and hence there is a dearth of viable solutions.
This lack of enterprise-ready data protection has led several customers to delve on their own in developing a scripted solution from scratch. Most times, such efforts end up in investing significant time and resources without producing a resilient and reliable solution. In fact, I chatted with a large financial institution and a healthcare organization, both of whom were struggling to handle the backup and archival of their large MongoDB environments. On the other hand, a technology organization was using the backup-as-a-service (Cloud Manager) provided by MongoDB but was concerned about data security, astronomical costs and long recovery time.
3 Must-Haves of an In-Memory Database for Modern Applications
More companies are realizing that leveraging real-time data is the key to success in modern applications, which means that more people are searching for the right in-memory database technology. However, with many options, comparing in-memory databases for your use case can be challenging.
Regardless of use case and which database you choose, there are three essential components that any in-memory database of quality must fulfill: being cloud-ready, Internet of Things (IoT)-ready, and ACID-compliant.
Sunday, July 30, 2017
Neo4j Sandbox Now Supports Neo4j 3.2, Sharing, Google Spreadsheets, and More!
When we announced the new Neo4j Sandbox back in March, we enabled developers to learn Neo4j while exploring data and interactive guides for four use cases: recommendation engines, TrumpWorld, US Congress (Legis-Graph) and Twitter analysis.
We’ve seen lots of developers super excited about how easy the Sandbox enables them to learn Neo4j and graph database.
Saturday, July 29, 2017
Should You Ditch Core Data for YapDatabase?
The choice of database technology for our apps has been something of a question for us for... the last while. And by “last while,” we mean since iCloud Core Data was deprecated, which miffed us no end since we rather like our apps’ initial releases to just work across devices for iOS users without jumping through any great hoops or introducing any dependencies on non-Apple services.
So let’s see what’s new in core data this year in the iOS 11 release notes! Errr… nothing?
Using NGINX With GeoIP MaxMind Database to Fetch Geolocation Data
Geolocation data plays a significant role in businesses. This data is used to promote or market brands, products, and services to certain demographics. It also helps in enhancing the user profile.
In this blog, we'll discuss finding the geographical location of a user using their IP address by just configuring NGINX with GeoIP MaxMind databases and without doing any coding!
Friday, July 28, 2017
Working With Hierarchies in a NoSQL Database
Organizing information in hierarchies is something programmers have to deal with from time to time. Examples are:
Threaded discussions/comments. Addresses on a map. Folders and documents. Organizational structures. Storage/shelf locations in a warehouse. Pages on a web site. Link referrals.Using a NoSQL document database (or SQL, for that matter), it is quite easy to create a structure to organize this type of information. For each record/document/node, you simply need a reference to the parent (except for the top node).
How to Convert a Table Column Into a C# Model Class
In this blog, I will demonstrate how to convert a table column into a C# model class using stored procedures. This is a very useful tip for any C# programmer.
SQL CodeCreate tables and columns as you need, like given below:
Hibernate Tips: How to Log SQL Statements and Their Parameters [Video]
Hibernate Tips is a series of posts in which I describe a quick and easy solution for common Hibernate questions. If you have a question for a future Hibernate Tip, please leave a comment below.
QuestionHow do you configure Hibernate so that it writes the executed SQL statements and used bind parameters to the log file?
Extend Your SQL With User-Defined Functions in OpenEdge 11.7
With user-defined functions, OpenEdge database developers can extend their programming to validate business logic, use less bandwidth, and more.
OpenEdge SQL 11.7 allows you as a database developer to create your own routines. These routines, called user-defined functions (UDF), can accept parameters, perform custom actions, and return results. These are required especially when it comes to business logic, complex algorithmic calculations, and custom actions. They also help reduce network traffic.
Thursday, July 27, 2017
Blending Databases: A Database Jam Session
In this blog series, we’ll be experimenting with the most interesting blends of data and tools. Whether it’s mixing traditional sources with modern data lakes, open-source DevOps on the cloud with protected internal legacy tools, SQL with NoSQL, web-wisdom-of-the-crowd with in-house handwritten notes, or IoT sensor data with idle chatting, we’re curious to find out: Will they blend? Want to find out what happens when IBM Watson meets Google News, Hadoop Hive meets Excel, R meets Python, or MS Word meets MongoDB?
The ChallengeToday, we will push the limits by attempting to blend data from not just two or three, but six databases!
Database Fundamentals #6: Create a Table With the SSMS GUI
The whole idea behind databases is to store information — data — in order to look at it later. The place where that data is going to be kept is referred to as a table. A table consists of columns and each column can be defined to store a variety of specific kinds of data. You can make small tables or large ones. The columns on the tables can be made so that they have to have data in them or they can be empty. The choices are yours to make, but you need to know how to set tables up appropriately.
In addition to tables, we’re going to start learning about columns. Columns can be very generic in nature, supporting all sorts of different kinds of data, or they can be extremely specific, storing only one type of data. I’ll introduce a number of different types of data so you can begin understanding the wealth of choices that are available to you.
Database Performance Testing With Apache JMeter
Database performance testing is used to identify performance issues before deploying database applications for end users. Database load testing is used to test the database applications for performance, reliability, and scalability using varying user load. Load testing involves simulating real-life user load for the target database applications and is used to determine the behavior of the database applications when multiple users hit the applications simultaneously.
Prerequisites Install Java Development Kit. Install Apache JMeter. Use CaseLet's perform database load testing to measure the performance of a database using Apache JMeter by configuring MySQL JDBC driver.
How to Interact With a Database Using the async Module in Node.js
The first pattern we looked at in this series was the Node.js callback pattern. As I mentioned there, that pattern alone will only get you so far. Eventually, you’ll want to construct asynchronous workflows that process elements in a collection serially or run several tasks in parallel. You could write your own library, but why reinvent the wheel when you could just use async, one of the most popular Node.js libraries ever? In this post, we’ll take a look at async to see how it can help you write asynchronous code in Node.js.
async Module Overviewasync is not included with Node.js, so it must be installed via NPM (Yarn and Bower work, too) using a command like npm install async –save. A native means to reason about async processing, Promise, eventually made its way into JavaScript and Node.js. We’ll cover promises in the next part of this series. For now, let’s focus on async.
Wednesday, July 26, 2017
Summer 2017 Release of the APOC Procedures Library
It’s summertime, but that doesn’t mean we’re less active building cool stuff for you to use with Neo4j.
If you haven’t heard of APOC yet — dubbed “Awesome Procedures On Cypher” — it’s a Swiss Army knife of useful utilities that make your life with Neo4j much easier. Besides the documentation, there are a number of past articles, that introduce relevant parts of the APOC library.
Top Data Sources and Data Connectivity Landscape [Infographic]
Progress recently published the results from our fourth annual Data Connectivity Outlook survey. These results validate the explosive growth we’ve seen in SaaS data sources and the common challenges faced when trying to connect to data in a hybrid environment. We covered some of these results in an earlier blog post, and you can check out the infographic below for the key findings (click to view it in full size).
Check out the full report to get all the data and recommendations.
Introducing Mongoose to Your Node.js and Restify API
This post is a sequel to Getting Started With MongoDB, Node.js, and Restify. We’ll now guide you through the steps needed to modify your API by introducing Mongoose. If you have not yet created the base application, please head back and read the original tutorial.
In this post, we’ll do a deep dive into how to integrate Mongoose, a popular ODM (Object -Document Mapper) for MongoDB, into a simple Restify API. Mongoose is similar to an ORM (Object-Relational Mapper) you would use with a relational database. Both ODMs and ORMs can make your life easier with built-in structure and methods. The structure of an ODM or ORM will contain business logic that helps you organize data. The built-in methods of an ODM or ORM automate common tasks that help you communicate with the native drivers, which helps you work more quickly and efficiently.
How to Use the SQL Helper Class to Create Web APIs
The SQL Helper class is used in the Data Access Layer, which interacts with a database with the help of connection strings provided. It contains several methods, as shown below. And it improves the performance for the Business Layer and Data Access Layer.
ExecuteNonQuery ExecuteDatasetExecuteDataTable ExecuteReader ExcuteScalar ASP.NET Web API
The ASP.NET Web API is a framework for building Web APIs on the top on the .NET framework, which makes it easy to build HTTP services for a range of clients, including mobile devices, browsers, and desktop applications.
Tuesday, July 25, 2017
How do I use AWS VM Import/Export to create an EC2 instance based on my on-premises server?
How AWS Manages Security at Massive Scale: Fireside Chat with CJ Moses, Deputy CISO AWS (119677)
Using an Impala JDBC Driver to Query Apache Kudu
Apache Kudu is columnar storage manager for Apache Hadoop platform that provides fast analytical and real-time capabilities, efficient utilization of CPU and I/O resources, the ability to do updates in place and an evolvable data model that’s simple. You can learn more about Apache Kudu features in detail from the documentation.
One of the features of Apache Kudu is that it has a tight integration with Apache Impala, which allows you to insert, update, delete, or query Kudu data, along with several other operations. In this tutorial, we will walk you through on how you can access Progress DataDirect Impala JDBC driver to query Kudu tablets using Impala SQL syntax.
Using Cockroach With Scala: An Introduction
Today, we are going to discuss how to use Scala with CockroachDB, a distributed SQL database built on top of a transactional and consistent key-value store. Before starting the journey, you may want to check out this introduction to CockroachDB first.
Before getting started with the code, set up Cockroach DB on your local environment by following these steps.
Financial Services and Neo4j: Identity and Access Management
Within the IT organization of any financial services enterprise, identity and entitlements management must be managed to minimize risk.
Over time, the centralized entitlements structure — whether represented in Active Directory or an LDAP directory — grows in such a way that people end up having more permissions than they need. As a result, your financial services firm is put at risk.
I Built a Query Parser Over a Weekend (Part 1)
Some tasks are fun — they are self-contained, easy to conceptualize (though not always easy to build), and challenging. A few weeks ago, I spent my weekend writing a parser, and it was a lot of fun.
I’ve been writing parsers for a very long time. My book about them came out in 2010 and I've been playing with Boo since 2005. The ANTLR book was an interesting read and it taught me a lot about how to approach text parsing.
Monday, July 24, 2017
Cassandra Design Best Practices
Cassandra is a great NoSQL product. It provides near real-time performance for designed queries and enables high availability with linear scale growth as it uses the eventually consistent paradigm.
In this post, we will focus on some best practices for this great product.
You QA Most of Your System; What About Your Database?
For virtually all development teams, testing code is a given: It's one of the most important parts of software development. Whether your organization includes a separate team devoted to QA, or your developers are testing their own code, QA is the primary way your team ensures that your application's logic is working correctly, and it's the best way for you to identify issues as early as possible.
As a result, QA is critical for engineering velocity, and it helps shape your users' overall experience when engaging with your product. Nobody likes finding a broken app or website. But what about quality assurance for a database? Do most teams apply the same QA practices to improve their data tier? Do many teams even know how to perform database QA?
Geolocation APIs in MongoDB
MongoDB is the NoSQL database known around the world for its clever document-based structure, ease of use, and flexibility. When some of the biggest companies in the world, like Forbes and Bosch, use MongoDB for their systems, you know that you are in good hands.
Unlike most other NoSQL databases, MongoDB comes with built-in geospatial indexing and search functionality, which makes it perfect for developers needing simple location based querying and map work.
Sunday, July 23, 2017
What should I do if my Virtual Interface BGP status is "DOWN" in the AWS Console?
How to Interact With a Database Using Callbacks in Node.js
Be sure to check out Part 1 first if you haven't already!
Callback functions have been around since the early days of JavaScript, but there have never been any standards for using them. How should callbacks be passed into async APIs? How should errors that occur during async processing be handled? A lack of standards led to variations in API implementations.
Saturday, July 22, 2017
How to Create a Database Seeder in Spring Boot
Spring Boot is an awesome Java web framework that is very comparable to Laravel web framework (in PHP). They both aim at making web application development fast and less rigorous for a developer.
I am a Java lover who courts Laravel due to professional requirements. Laravel framework has a feature that lets you create database seeders — i.e. default data to be inserted into the database during application installation.
Friday, July 21, 2017
Where Do I Put ProxySQL?
In this blog post, we’ll look at how to deploy ProxySQL.
ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. It was created for DBAs by René Cannaò, as a means of solving complex replication topology issues. When bringing up ProxySQL with my clients, I always get questions about where it fits into the architecture. This post should clarify that.
How to Use the Satellite Collections in ArangoDB
With the new Version 3.2, we have introduced a new feature called Satellite Collections. This post explains what this is all about and how it can help you, and give a concrete use case for which it is essential.
Join operations are very useful but can be troublesome in a distributed database. This is because quite often, a join operation has to bring together different pieces of your data that reside on different machines. This leads to cluster internal communication and can easily ruin query performance. As in many contexts nowadays, data locality is very important to avoid such headaches. There is no silver bullet because there will be many cases in which one cannot do much to improve data locality.
Reviewing Resin (Part 7)
Looking back at this series, I have the strong feeling that I’m being unfair to Resin. I’m judging it using the same criteria I would use to judge our own production, highly optimized code. The projects have very different goals, maturities, and environments. That said, I think that a lot of the comments I have on the project are at the implementation level. That is, they can be fixed (except maybe the analyzer/tokenizer pipeline) by simply optimizing one method at a time. Even the architectural change with analyzing the text isn’t very big. What is important is that the code is quite clear, is easy to follow, and has a well-defined structure. That means that it is actually possible to make this changes as the project matures.
And now that this is out of the way, let me cover some of the things that I would have done differently in the codebase. A lot of them are I/O-related. The usage of all those different files and the way this is done is decidedly not optimal — in particular, opening and closing of the files constantly, reading and seeking all over the place, etc. The actual design seems to be based around LSM, even if this isn't stated explicitly. And that has pretty good semantics already for writes, but reads currently are probably leaning very heavily on the file system cache, and that won’t work as the data grows beyond a certain scope.
A Look at the History of RDBMS
If you had to pick a unifying technology to bring all developers together, then you could do worse than selecting the relational database. Of course, no topic can truly unify
And, why not? We could boil software down to two core components: data and behavior. So, just as we all learn programming languages to express behavior, we also learn some means of recording and persisting our precious data.
Thursday, July 20, 2017
Caching Salesforce Data in Redis With StreamSets Data Collector
Redis is an open-source, in-memory, NoSQL database implementing a networked key-value store with optional persistence to disk. Perhaps the most popular key-value database, Redis is widely used for caching web pages, sessions and other objects that require blazingly fast access — lookups are typically in the millisecond range.
At RedisConf 2017, I presented a session called Cache All The Things! Data Integration via Jedis (slides), looking at how the open-source Jedis library provides a small, sane, easy-to-use Java interface to Redis, and how a StreamSets Data Collector (SDC) pipeline can read data from a platform such as Salesforce, write it to Redis via Jedis, and keep Redis up-to-date by subscribing to notifications of changes in Salesforce, writing new and updated data to Redis. In this blog entry, I'll describe how I built the SDC pipeline I showed during my session.
Reviewing Resin (Part 5)
How to Interact With a Database Using Various async Patterns in Node.js
It seems simple enough: Get a connection to the database, use it to do some work, then close it when you're done. But due to the asynchronous nature of Node.js, coding this sequence isn't as straightforward as it seems. There are lots of options for writing asynchronous code with Node.js, and each one requires the sequence to be coded differently. In this series, I'll provide some examples that demonstrate how to get, use, and close a connection using various async patterns.
In this parent post, I'll provide a little context on how async programming varies from traditional programming. The details of how a particular async pattern is used will be covered in its own post (see the links at the bottom).
Redshift Is 2X Faster Than BigQuery... Which Is 48X Faster Than Redshift
Every once in a while, a vendor war ignites over the question of performance. Due to the fickle and subjective nature of benchmarks, it’s quite possible for one vendor to publish results showing their product is by far the fastest, while their competitor will use the same benchmark to prove exactly the opposite. Is speed in the eye of the beholder?
Redshift vs. BigQuery: Concerns Shift to PerformanceAmazon Redshift is a popular cloud-based data warehouse, and Google’s BigQuery is quickly catching up as an alternative. Both products are acclaimed for their ability to process big data at lightning speed.
Wednesday, July 19, 2017
Understanding and Managing Disk Space on Your MongoDB Server
Disk storage is a critical resource for any scalable database system. The performance of disk-based databases is dependent on how data is managed on the disk. Your MongoDB server supports various pluggable storage engines that handle the storage management. MongoDB storage engines initially store all documents sequentially. As the database grows, and multiple write operations run, this contiguous space gets fragmented into smaller blocks with chunks of free space in between. The usual solution is to increase the disk size in such situations; however, there are alternatives that can help you regain the free space without scaling the disk size. You need to be aware of MongoDB storage statistics and how you can compact or repair the database to handle fragmentation.
How Large Is Your Database, Really?You should always keep an eye on the amount of free disk space on your production server. It would also be prudent to know your database size when you are paying for it on a cloud platform. MongoDB has a command db.stats() that can provide insights into the storage statistics of a MongoDB instance.
Database Fundamentals #5: Database Properties
Don’t let the ease of creating databases lull you into a false sense of security. They actually can be very complicated. You can modify and individualize their behavior within your server so that different databases behave in radically different ways. The best way to see all the different manipulations you can make is by opening the New Database window by right-clicking on the Databases folder within the Object Explorer window, assuming you’re already connected to the server.
Don’t bother typing anything into the first page. Click on the Options tab on the left side of the window. You’ll see a screen that should look very similar to this:
Migrating From MongoDB to DynamoDB
Persisting data is at the heart of the majority of web services today. The choice of a database system is one of the most important decisions you will make when selecting elements of your stack. Database technology, once selected, is one of the hardest to replace once the system is in production.
This post is about ditching MongoDB and moving to DynamoDB as part of our ongoing evolution of the Auth0 Extend product. I will cover the why and how, and share solutions to some of the challenges of this transition.
Reviewing Resin (Part 4)
Be sure to check out Part 1, Part 2, and Part 3 first!
In the previous part, I looked at UpsertTransaction in Resin and speculated about how the queries work. In this one, I’m going to try to figure out how queries work. Our starting point is this:
Tuesday, July 18, 2017
And So the NoSQL Bloodletting Begins…
I'm going to discuss the most likely survivors from the NoSQL movement.
It all started so well. A myriad of products to answer data management needs over any structure or query plan you could possibly want. A rich ecosystem of databases to choose from has sprung up from the NoSQL community since 2005.
Data Flow Pipeline Using StreamSets
StreamSets Data Collector — an open-source, lightweight, powerful engine — is used to stream data in real time. It is a continuous big data ingest and enterprise-grade infrastructure used to route and process data in your data streams. It accelerates time to analysis by bringing unique transparency and processing to data in motion.
In this blog, let's discuss generating a data flow pipeline using StreamSets.
Customizing My Postgres Shell
As a developer, your CLI is your home. You spend a lifetime of person-years in your shell, and even small optimizations can pay major dividends to your efficiency. For anyone that works with Postgres (and likely the PSQL editor), you should consider investing some love and care into PSQL. A little-known fact is that PSQL has a number of options you can configure it with, and these configuration options can all live within an RC file called psqlrc in your home directory. Here is my .psqlrc file, which I’ve customized to my liking. Let’s walk through some of the commands within my .psqlrc file:
First, you see that we set QUIET 1. This makes it less noisy when we start up. We also unset QUIET at the end so it’s back to a standard psql shell later.
Writing Mocks With WireMock and CSV Extension
If you are currently working on a project where several modules will have to communicate but they do not exist yet, then you may have to mock the communications.
And if you are using REST APIs, several tools might do the trick to provide a mocking server.
Monday, July 17, 2017
How Retail Insights, LLC Used Alert Logic to Meet Compliance Mandates and Enhance Security on AWS
AWS Knowledge Center Videos: Do I need to set a static IP address on an EC2 instance?
Generating Millions of Rows in SQL Server [Code Snippets]
Often, we have a need to generate and insert many rows into a SQL Server Table. For example, for testing purposes or performance tuning. It might be useful to imitate production volume in the testing environment or to check how our query behave when challenged with millions of rows.
Below please find an example of code used for generating primary key columns, random ints, and random nvarchars in the SQL Server environment.
Creating a Sandbox for Learning Node.js and Oracle Database
With Oracle Database 12.2 and Node.js 8 now available, this is a great time to create a local sandbox for learning. Thanks to some prebuilt VMs provided by Oracle, you can have such an environment up and running in less than 20 minutes (excluding download times) without spending a dime!
In this post, I’ll walk you through the creation of such a sandbox. Here’s an overview of what we’ll be working through:
The Ugly of Event Sourcing: Projection Schema Changes
Event sourcing is a beautiful solution for high-performance or complex business systems, but you need to be aware that it also introduces challenges most people don't tell you about. Last year, I blogged about the things I would do differently next time. But after attending another introductory presentation about event sourcing recently, I realized it is time to talk about some real experiences. So in this multi-part post, I will share the good, the bad, and the ugly to prepare you for the road ahead. After having dedicated the last posts on the pains of wrongly designed aggregates, it is time to talk about the ugliness of dealing with projection schema changes.
As I explained in the beginning of this series, projections in event sourcing are a very powerful concept that provides ample opportunities to optimize the performance of your system. However, as far as I'm concerned, they also offer you the most painful challenges. Projections are great if their structure or the way they interpret event streams don't change. But as soon as any of these change, you'll be faced with the problem of increasing rebuild times. The bigger your database becomes, the longer rebuilding will take. And considering the nature of databases, this problem tends to grow non-linearly. Over the years, we've experimented and implemented various solutions to keep this process to a minimum.
Sunday, July 16, 2017
Introduction to Target Tracking Scaling Policies for Auto Scaling - Dynamic Scaling on AWS
Using a Cuckoo Filter for Unique Relationships
We often see a pattern in Neo4j applications where a user wants to create one and only one relationship between two nodes. For example, a user follows another user on a social network. We don’t want to accidentally create a second-follows relationship because that may create errors such as duplicate entries on their feed, errors unfollowing or blocking them, or even skew recommendation algorithms. Also, it is just plain wasteful, and while an occasional duplicate relationship won’t be a big deal, millions of them could be.
So how do we deal with this?
Saturday, July 15, 2017
Introduction to Target Tracking Scaling Policies for Auto Scaling - Dynamic Scaling on AWS
Reviewing Resin (Part 3)
Be sure to check out Part 1 and Part 2 first! In the last post, I started looking at UpsertTransaction, but got sidetracked into the utils functions. Let's focus back on this. The key parts of UpsertTransaction are:
Let's see what they are. DocumentStream is the source of the documents that will be written in this transaction. Its job is to get the documents to be indexed, to give them a unique ID if they don’t already have one, and to hash them.
Friday, July 14, 2017
How I Incorrectly Fetched JDBC ResultSets. Again.
You know JDBC, right? It’s that really easy, concise API that we love to use to work with virtually any database, relational or not. It has essentially three types that you need to care about:
Connection Statement (and its subtypes) ResultSetAll the other types some sort of utilities.
How to Store Money in SQL Server
Today, I would like to present another intriguing and challenging topic in SQL. When constructing databases or data warehouses, it may be required to store financial figures like amounts in currency or FX rates in SQL Server.
Let’s take a look at tools at our disposal. Microsoft provides us with Exact Numerics and Approximate Numerics.
Lessons From FedEx to the Data Center Industry
In 1965, Yale undergrad Fred Smith’s term paper advanced the then unheard of idea of an overnight, aviation delivery service. But the paper was called unrealistic by his professor and the young student received a C grade. Undaunted, Smith held the idea close and upon returning from two tours in Vietnam with the Marine Corps, he secured an astonishing $91 million in funding and launched Federal Express.
Much of Smith’s story is the stuff of entrepreneurial legend — from thumbing his nose at a skeptical professor to saving the company from bankruptcy with blackjack winnings in Vegas. How did the tenacious entrepreneur build one of the most reputable brands in the world? Through consistently out-innovating his competitors.
Reducing Memory Usage by Apache and MySQL in a 512MB VPS
Installing with the default Apache and MySQL configurations, you end up with far too many idle Apache threads, and MySQL is the largest memory user by far. The default config of these two is maxing out my available 512MB in my VPS server:
/etc/apache2$ ps -eo pmem,pcpu,rss,vsize,args | sort -k 1 -r %MEM %CPU RSS VSZ COMMAND 9.5 0.0 49904 720256 /usr/sbin/mysqld 6.7 0.0 35620 286860 /usr/sbin/apache2 -k start 6.5 0.0 34452 283524 /usr/sbin/apache2 -k start 6.2 0.0 32692 283012 /usr/sbin/apache2 -k start 5.9 0.0 31276 283116 /usr/sbin/apache2 -k start 5.8 0.0 30896 282652 /usr/sbin/apache2 -k start 5.0 0.0 26724 282588 /usr/sbin/apache2 -k start 4.8 0.0 25204 279552 /usr/sbin/apache2 -k start 4.8 0.0 25200 279552 /usr/sbin/apache2 -k start 4.7 0.0 25156 279508 /usr/sbin/apache2 -k start 4.4 0.0 23216 279540 /usr/sbin/apache2 -k start 2.8 0.0 15136 278400 /usr/sbin/apache2 -k start 0.7 0.0 3968 90908 sshd: myuser [priv] 0.4 0.0 2564 61312 /usr/sbin/sshd -D 0.4 0.0 2264 33188 init 0.3 0.0 2060 18128 -bash ... /etc/apache2$ free total used free shared buffers cached Mem: 524288 476496 47792 68220 0 260764 -/+ buffers/cache: 215732 308556 Swap: 0 0 0From tips in this article, I reduced down the Apache threads and now we’re at:
Thursday, July 13, 2017
Differences in PREPARE Statement Error Handling With Binary and Text Protocol
In this blog, we’ll look at the differences in how a PREPARE statement handles errors in binary and text protocols.
Since Percona XtraDB Cluster is a multi-master solution, when an application executes conflicting workloads, one of the workloads gets rolled back with a DEADLOCK error. While the same holds true even if you fire the workload through a PREPARE statement, there are differences between using the MySQL connector API (with binary protocol) and the MySQL client (with text protocol). Let’s look at these differences with the help of an example.
Reviewing Resin (Part 2)
In the first part of this series, I looked into how Resin is tokenizing and analyzing text. I’m still reading the code from the tests (this is because the Tests folder sorted higher than the Resin folder, basically) and I've now moved to the second file, CollectorTests.
That one has a really interesting start:
NULL in SQL
NULL in SQL is a very interesting creature. First of all, it is important to understand that NULL is not a value, so the expression "null value" is incorrect. NULL is a mark for a missing value but it is not a value itself.
Let me illustrate this with some examples created in SQL Server.
The Popularity of Cloud-Based DBMSs Increased Tenfold in 4 Years
Cloud-based database management systems are still relatively exotic data storage solutions, adding up to only 1.6% of the popularity of the entire DBMS market.
This is, however, a tenfold increase in the last four years.
Wednesday, July 12, 2017
Introduction to Target Tracking Scaling Policies for Auto Scaling - Dynamic Scaling on AWS
Introduction to Dynamic SQL
The idea of using dynamic SQL is to execute SQL that will potentially generate and execute another SQL statement. While querying data, you might want to dynamically set columns you would like to query. On the other hand, you might want to parametrize tables on which you want to operate.
The first idea one might come up with is to use variables and set them as required column names or table names. However, such an approach is not supported by T-SQL.
Actorbase, or the Persistence Chaos
Everybody out there is talking about big data, NoSQL databases, reactive programming, and so on. There are a lot of buzzwords that are constantly used in this era and those are only some of them.
The idea I will describe to you in a moment is something that's I've been thinking about for a couple of years. My busy life brings me very little time to work on side projects out of work, so I decided to let some other people try to transform my idea into a real thing.
Reviewing Resin (Part 1)
Resin is a “cross-platform document database and search engine with a query language, API, and CLI.” It is written in C#, and while I admit that reading C# code isn’t as challenging as diving into a new language, a project that has a completely new approach to a subject that is near and dear to my heart is always welcome. It is also small, coming at about 6,500 lines of code, so that makes for quick reading.
I’m reviewing commit ddbffff88995226fa52236f6dd6af4a48c833f7a.
Is Your Database Wasting the Ephemeral Drive?
If you are running in a VM or a container, you get the following types of storage:
Network-attached durable storage. Even if your VM or container moves from one physical host to another, your drive is guaranteed to follow without losing committed data. This is typically what all databases use for data on disk. A downside is that it's network-attached and thus, a regular disk write is a network plus disk write.
Tuesday, July 11, 2017
Key Considerations for a Cloud Data Warehouse
Data growth and diversity have put new pressures on traditional data warehouses, resulting in a slew of new technology evaluations. The data warehouse landscape offers a variety of options, including popular cloud solutions that offer pay-as-you-go pricing in an easy-to-use and scale package. Here are some considerations to help you select the best cloud data warehouse.
First, Identify Your Use CaseA cloud data warehouse supports numerous use cases for a variety of business needs. Here are some common use cases along with the notable capabilities required for each.
I Deleted My MySQL Database
I just deleted my primary MySQL database. Of course, I backed up everything, but it is the first time since 2011 I’ve cleaned up my entire database backend to the point where I could delete the entire instance (with confidence). I was motivated to do this mostly because I couldn’t downsize the AWS RDS instance to a smaller instance due to a variety of constraints. The situation gave me the opportunity to clean house and to rethink my next moves.
Instead of setting up a new MySQL instance, I went with the new MySQL compatible Amazon Aurora. I set up a smaller instance that was more affordable, and I was able to easily import the database backups I had made in my previous setup, but now I had a cleaner, more modern Amazon Aurora situation that, as Amazon claims “provides up to five times better performance than MySQL with the security, availability, and reliability of a commercial database at one-tenth the cost.” Time will tell…
MongoDB Indexing Types: How, When and Where Should They Be Used?
In this blog post, we will talk about MongoDB indexing and the different types of indexes that are available in MongoDB.
MongoDB is a NoSQL database that is document-oriented. NoSQL databases share many features with relational databases, and one of them is indexes. The question is, how are such documents indexed in the database?
Financial Services and Neo4j: Network and IT Infrastructure Monitoring
Discovering, capturing, and making sense of complex interdependencies is central to managing IT infrastructure more effectively, and it is also a critical part of running the businesses IT serves.
Whether it’s optimizing a network or an application infrastructure, managing change, or providing more effective security-related access, more often than not, these problems involve a complex set of physical and human interdependencies that can be quite challenging to manage.
Monday, July 10, 2017
Sixgill Increases System Performance by Hundreds of Percent Using ScaleArc and AWS Marketplace
MySQL Sharding DevOps Challenges
Previously, we’ve discussed application and design challenges for MySQL sharding, and some of the corresponding business challenges that can result and affect your business flexibility. But what about MySQL sharding DevOps challenges?
For reference, here’s a quick précis about MySQL sharding: MySQL sharding is the strategy of partitioning your MySQL application workload across multiple different MySQL database servers, allowing queries and data CRUD operations to fan out. This works around MySQL’s single write-master architecture, providing the ability to scale out both writes and reads, albeit with tradeoffs. This is a big DevOps project.
How to Use a Backup to Start a Secondary Instance for MongoDB
In this blog post, I’ll look at how you can use a backup to start a secondary instance for MongoDB.
Although the documentation says it is not possible to use a backup to start a secondary, sometimes this is the only possible way to start a new instance. In this blog post, we will explain how to bypass this limitation and use a backup to start a secondary instance.
Getting Started With Apache Ignite (Part 5)
This is the fifth article in this blog series. I will focus this time on the support for a distributed SQL database in Apache® Ignite™.
Distributed SQL DatabaseToday, SQL is still a very popular language for data definition, data manipulation, and querying in database management systems. Although often associated with relational database systems, it is now used far more widely with many non-relational database systems also supports SQL to varying degrees. Furthermore, there is a huge market for a wide range of SQL-based tools that can provide visualization, reports, and business intelligence. These use standards such as ODBC and JDBC to connect to data sources.
Fun With SQL: Functions in Postgres
DZone Database Zone Fun With SQL: Functions in Postgres In our previous Fun with SQL post on the Citus Data blog, we covered w...
-
DZone Database Zone Monitoring OpenWRT With Telegraf What's the most popular open-source router software in the world? OpenWRT...
-
DZone Database Zone Tarantool Queues (Part 3): The Art of Queue Parsing In our previous article , we used the tarantool-authman mo...
-
DZone Database Zone How to Use SQL Complete for T-SQL Code I was recently working on a project with several stored procedures, fun...