Big data

How To Spin Up a Hadoop Cluster with cloud servers — step-by-step Linux tutorial on Progressive Robot

How To Spin Up a Hadoop Cluster with cloud servers

This tutorial will cover setting up a Hadoop cluster on the cloud provider. The Hadoop software library is an Apache framework that lets you process large data sets in a distributed way across server clusters through leveraging basic programming models….

Read more
What is Big Data? — step-by-step DevOps tutorial on Progressive Robot

What is Big Data?

Big data is a blanket term for the non-traditional strategies and technologies needed to organize, process, and gather insights from large datasets. Many users and organizations are turning to big data for certain types of workloads, and using it to supplement their existing analysis and business tools. Tools that exist in this space offer different options for interpolating data into a system, storing it, analyzing it, and working with it through visualizations.

Read more
An Introduction to Hadoop — step-by-step DevOps tutorial on Progressive Robot

An Introduction to Hadoop

Apache Hadoop is one of the earliest and most influential open-source tools for storing and processing the massive amount of readily-available digital data that has accumulated with the rise of the World Wide Web. It evolved from a project called Nutch, which attempted to find a…

Read more
Apache Spark Example: Word Count Program in Java — step-by-step Web Servers tutorial on Progressive Robot

Apache Spark Example: Word Count Program in Java

Apache Spark Apache Spark is an open source data processing framework which can perform analytic operations on Big Data in a distributed environment. It was an academic project in UC Berkley and was initially started by Matei Zaharia at UC Berkeley’s AMPLab in 2009. Apache Spark was created on top of a cluster management tool […]

Read more
How To Install and Use ClickHouse on Debian 9 — step-by-step Linux tutorial on Progressive Robot

How To Install and Use ClickHouse on Debian 9

ClickHouse is an open-source, column-oriented analytics database created by [Yandex](https://yandex.com) for OLAP and big data use cases. In this tutorial, you’ll install the ClickHouse database server and client on your machine. You’ll use the DBMS for typical tasks and optionally enable remote access from another server so that you’ll be able to connect to the database from another machine.

Read more
How To Install and Use ClickHouse on Ubuntu 20.04 — step-by-step Linux tutorial on Progressive Robot

How To Install and Use ClickHouse on Ubuntu 20.04

ClickHouse is an open source, column-oriented analytics database created by Yandex for OLAP and big data use cases. In this tutorial, you’ll install the ClickHouse database server and client on your machine. You’ll use the DBMS for typical tasks and optionally enable remote access from another server so that you’ll be able to connect to the database from another machine. Then you’ll test ClickHouse by modeling and querying example website-visit data.

Read more
CHAT