Apache Zeppelin Installation Guide on CentOS 7

Apache Zeppelin is a web-based open-source notebook designed for interactive data operations such as ingestion, exploration, analytics, and visualization. It supports over 20 programming languages, including Apache Spark, SQL, R, and Elasticsearch. This tool enables the creation of visually appealing data-centric documents and provides immediate insights from your analysis.

Requirements

  • A CentOS 7 server
  • User access with sudo privileges
  • A domain name configured to point to the server

In this guide, we use zeppelin.example.com as a placeholder domain pointing to the instance. Be sure to replace this with your actual domain wherever applicable.

Before beginning, make sure your system is updated. Refer to the guide How to Update CentOS 7. Once updated, proceed to install Java.

Installing Java

Since Apache Zeppelin is Java-based, it requires the JDK to function. Begin by downloading the Oracle SE JDK RPM package.

wget --no-cookies --no-check-certificate --header "Cookie:oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u151-b12/e758a0de34e24606bca991d704f6dcbf/jdk-8u151-linux-x64.rpm"

Now, install the RPM package you downloaded.

sudo yum -y localinstall jdk-8u151-linux-x64.rpm

After successful installation, verify Java by checking its version.

Expected output:

[user@centron~]$ java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

Setting JAVA_HOME and JRE_HOME

Before continuing, configure the environment variables JAVA_HOME and JRE_HOME. First, determine the full path to the Java executable.

You should get an output similar to the following:

[user@centron ~]$ readlink -f $(which java)
/usr/java/jdk1.8.0_151/jre/bin/java

Next, export the environment variables based on the Java path.

echo "export JAVA_HOME=/usr/java/jdk1.8.0_151" >> ~/.bash_profile
echo "export JRE_HOME=/usr/java/jdk1.8.0_151/jre" >> ~/.bash_profile

Apply the changes by sourcing the bash profile.

Finally, verify that the JAVA_HOME variable is set correctly.

[user@centron~]$ echo $JAVA_HOME
/usr/java/jdk1.8.0_151

Installing Apache Zeppelin

Apache Zeppelin comes bundled with all necessary dependencies in its binary package, making Java the only external requirement. Begin by downloading the Zeppelin binary to your server. The most recent version can be located on the official Zeppelin download page.

wget http://www-us.apache.org/dist/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz

Unpack the downloaded archive using the following command:

sudo tar xf zeppelin-*-bin-all.tgz -C /opt

The contents will be extracted into the /opt/zeppelin-0.7.3-bin-all directory. To simplify usage, rename the folder as shown below:

sudo mv /opt/zeppelin-*-bin-all /opt/zeppelin

Apache Zeppelin is now installed. While you can launch the application immediately, by default it binds only to localhost. We’ll now configure it as a Systemd service and prepare Nginx to act as a reverse proxy.

Setting up Zeppelin as a Systemd Service

Next, configure a Systemd unit file for Zeppelin to ensure it auto-starts on reboot or process failures.

First, create a non-root user dedicated to running Zeppelin:

sudo adduser -d /opt/zeppelin -s /sbin/nologin zeppelin

Assign the ownership of the Zeppelin files to this user:

sudo chown -R zeppelin:zeppelin /opt/zeppelin

Create the Systemd service unit file for Zeppelin:

sudo nano /etc/systemd/system/zeppelin.service

Insert the following content into the file:

[Unit]
Description=Zeppelin service
After=syslog.target network.target

[Service]
Type=forking
ExecStart=/opt/zeppelin/bin/zeppelin-daemon.sh start
ExecStop=/opt/zeppelin/bin/zeppelin-daemon.sh stop
ExecReload=/opt/zeppelin/bin/zeppelin-daemon.sh reload
User=zeppelin
Group=zeppelin
Restart=always

[Install]
WantedBy=multi-user.target

Start the Zeppelin service using:

sudo systemctl start zeppelin

Enable the service to auto-start during system boot:

sudo systemctl enable zeppelin

To confirm whether Zeppelin is active, run the following status check:

sudo systemctl status zeppelin

Disable Anonymous Access in Apache Zeppelin

To prevent default anonymous access, begin by copying the template configuration file to its active location.

cd /opt/zeppelin
sudo cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml

Next, open the configuration file for editing:

sudo nano conf/zeppelin-site.xml

Locate the following property section within the file:

<property>
  <name>zeppelin.anonymous.allowed</name>
  <value>true</value>

To deactivate anonymous access, change the value from true to false.

Activate Shiro Authentication

After disabling anonymous access, an authentication mechanism is required to permit access to authorized users. Apache Zeppelin employs Apache Shiro for this purpose. Start by copying the Shiro configuration template:

sudo cp conf/shiro.ini.template conf/shiro.ini

Then open the Shiro configuration file for editing:

Inside the file, locate the following block under the [users] section:

[users]

admin = password1, admin
user1 = password2, role1, role2
user2 = password3, role3
user3 = password4, role2

This section defines user credentials, including usernames, passwords, and assigned roles. For this setup, we will only use admin and user1. Modify their passwords to more secure alternatives, and disable the other users by commenting them out. You are also free to customize usernames and roles. For a detailed understanding, refer to the Shiro authorization guide.

After making the necessary changes, the updated section should look like this:

[users]

admin = StrongPassword, admin
user1 = UserPassword, role1, role2
# user2 = password3, role3
# user3 = password4, role2

Finally, restart the Zeppelin service to apply the new authentication settings:

sudo systemctl restart zeppelin

After restarting, authentication will be active, and users will be prompted to log in using the credentials configured in the shiro.ini file.

Conclusion

With the above steps completed, Apache Zeppelin is now installed, secured, and configured to run as a service with proper authentication. You’ve disabled anonymous access, implemented Shiro-based user management, and ensured that Zeppelin starts automatically on system boot. From here, you can proceed with additional configurations, such as integrating Nginx as a reverse proxy or connecting Zeppelin with data processing backends like Apache Spark or Elasticsearch. This setup forms a solid foundation for building interactive, collaborative data analytics workflows on your CentOS 7 server.

Source: vultr.com

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in: