Apache Zeppelin Installation Guide on CentOS 7
Apache Zeppelin is a web-based open-source notebook designed for interactive data operations such as ingestion, exploration, analytics, and visualization. It supports over 20 programming languages, including Apache Spark, SQL, R, and Elasticsearch. This tool enables the creation of visually appealing data-centric documents and provides immediate insights from your analysis.
Requirements
- A CentOS 7 server
- User access with sudo privileges
- A domain name configured to point to the server
In this guide, we use zeppelin.example.com as a placeholder domain pointing to the instance. Be sure to replace this with your actual domain wherever applicable.
Before beginning, make sure your system is updated. Refer to the guide How to Update CentOS 7. Once updated, proceed to install Java.
Installing Java
Since Apache Zeppelin is Java-based, it requires the JDK to function. Begin by downloading the Oracle SE JDK RPM package.
wget --no-cookies --no-check-certificate --header "Cookie:oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u151-b12/e758a0de34e24606bca991d704f6dcbf/jdk-8u151-linux-x64.rpm"
Now, install the RPM package you downloaded.
sudo yum -y localinstall jdk-8u151-linux-x64.rpm
After successful installation, verify Java by checking its version.
java -version
Expected output:
[user@centron~]$ java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
Setting JAVA_HOME and JRE_HOME
Before continuing, configure the environment variables JAVA_HOME
and JRE_HOME
. First, determine the full path to the Java executable.
readlink -f $(which java)
You should get an output similar to the following:
[user@centron ~]$ readlink -f $(which java)
/usr/java/jdk1.8.0_151/jre/bin/java
Next, export the environment variables based on the Java path.
echo "export JAVA_HOME=/usr/java/jdk1.8.0_151" >> ~/.bash_profile
echo "export JRE_HOME=/usr/java/jdk1.8.0_151/jre" >> ~/.bash_profile
Apply the changes by sourcing the bash profile.
source ~/.bash_profile
Finally, verify that the JAVA_HOME
variable is set correctly.
[user@centron~]$ echo $JAVA_HOME
/usr/java/jdk1.8.0_151
Installing Apache Zeppelin
Apache Zeppelin comes bundled with all necessary dependencies in its binary package, making Java the only external requirement. Begin by downloading the Zeppelin binary to your server. The most recent version can be located on the official Zeppelin download page.
wget http://www-us.apache.org/dist/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
Unpack the downloaded archive using the following command:
sudo tar xf zeppelin-*-bin-all.tgz -C /opt
The contents will be extracted into the /opt/zeppelin-0.7.3-bin-all
directory. To simplify usage, rename the folder as shown below:
sudo mv /opt/zeppelin-*-bin-all /opt/zeppelin
Apache Zeppelin is now installed. While you can launch the application immediately, by default it binds only to localhost
. We’ll now configure it as a Systemd service and prepare Nginx to act as a reverse proxy.
Setting up Zeppelin as a Systemd Service
Next, configure a Systemd unit file for Zeppelin to ensure it auto-starts on reboot or process failures.
First, create a non-root user dedicated to running Zeppelin:
sudo adduser -d /opt/zeppelin -s /sbin/nologin zeppelin
Assign the ownership of the Zeppelin files to this user:
sudo chown -R zeppelin:zeppelin /opt/zeppelin
Create the Systemd service unit file for Zeppelin:
sudo nano /etc/systemd/system/zeppelin.service
Insert the following content into the file:
[Unit]
Description=Zeppelin service
After=syslog.target network.target
[Service]
Type=forking
ExecStart=/opt/zeppelin/bin/zeppelin-daemon.sh start
ExecStop=/opt/zeppelin/bin/zeppelin-daemon.sh stop
ExecReload=/opt/zeppelin/bin/zeppelin-daemon.sh reload
User=zeppelin
Group=zeppelin
Restart=always
[Install]
WantedBy=multi-user.target
Start the Zeppelin service using:
sudo systemctl start zeppelin
Enable the service to auto-start during system boot:
sudo systemctl enable zeppelin
To confirm whether Zeppelin is active, run the following status check:
sudo systemctl status zeppelin
Disable Anonymous Access in Apache Zeppelin
To prevent default anonymous access, begin by copying the template configuration file to its active location.
cd /opt/zeppelin
sudo cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
Next, open the configuration file for editing:
sudo nano conf/zeppelin-site.xml
Locate the following property section within the file:
<property>
<name>zeppelin.anonymous.allowed</name>
<value>true</value>
To deactivate anonymous access, change the value from true
to false
.
Activate Shiro Authentication
After disabling anonymous access, an authentication mechanism is required to permit access to authorized users. Apache Zeppelin employs Apache Shiro for this purpose. Start by copying the Shiro configuration template:
sudo cp conf/shiro.ini.template conf/shiro.ini
Then open the Shiro configuration file for editing:
sudo nano conf/shiro.ini
Inside the file, locate the following block under the [users]
section:
[users]
admin = password1, admin
user1 = password2, role1, role2
user2 = password3, role3
user3 = password4, role2
This section defines user credentials, including usernames, passwords, and assigned roles. For this setup, we will only use admin
and user1
. Modify their passwords to more secure alternatives, and disable the other users by commenting them out. You are also free to customize usernames and roles. For a detailed understanding, refer to the Shiro authorization guide.
After making the necessary changes, the updated section should look like this:
[users]
admin = StrongPassword, admin
user1 = UserPassword, role1, role2
# user2 = password3, role3
# user3 = password4, role2
Finally, restart the Zeppelin service to apply the new authentication settings:
sudo systemctl restart zeppelin
After restarting, authentication will be active, and users will be prompted to log in using the credentials configured in the shiro.ini
file.
Conclusion
With the above steps completed, Apache Zeppelin is now installed, secured, and configured to run as a service with proper authentication. You’ve disabled anonymous access, implemented Shiro-based user management, and ensured that Zeppelin starts automatically on system boot. From here, you can proceed with additional configurations, such as integrating Nginx as a reverse proxy or connecting Zeppelin with data processing backends like Apache Spark or Elasticsearch. This setup forms a solid foundation for building interactive, collaborative data analytics workflows on your CentOS 7 server.