Friday, November 13, 2015

How to optimize Apache concurrency for running WordPress

To a technical savvy person, hosting your own website on a VPS server can bring a tremendous sense of accomplishment and a wealth of learning opportunities. If WordPress is what you fancy, with a minimal monthly financial commitment, you can host a WordPress website on the LAMP platform (Linux, Apache, MySQL, PHP). For example, the entry-level, $5 per month plan offered by DigitalOcean, of which I am an affiliate, will give you a 512MB RAM, single-core VPS.

With such a small RAM capacity, you will need to optimize how your Apache webserver is configured to run PHP applications such as WordPress and Drupal. The goal is to maximize the number of concurrent web connections.

This tutorial details the Apache/PHP setup procedure on Debian 8.2, aka Jessie. The procedure assumes Apache is yet to be installed. However, if Apache2 is already installed, you will find practical information below on how to reconfigure Apache2 to run a different multi-processing module.

Background knowledge

According to a recent Netcraft webserver survey, Apache powers 46.91% of the top million busiest websites on the Internet. Busy websites mean many concurrent web connections.

Concurrent connection requests to Apache are handled by its Multi-Processing Modules. MPMs can be loosely classified as threaded or non-threaded. Older Apache releases default to a MPM named Prefork. This MPM is non-threaded. Each connection request is handled by a dedicated, self-contained Apache process.

Newer Apache releases default to a threaded MPM, either Worker or Event. The Worker MPM uses one worker thread per connection. One issue with this approach is that a thread is tied up if the connection is kept alive despite it being inactive.

The Event MPM, a variant of Worker, addresses the aforesaid keep-alive issue. A main thread is used as the traffic controller that listens for requests and passes requests to worker threads on demand. In this scenario, an inactive but kept-alive connection does not tie up a worker thread.

Note that MPMs are mutually exclusive: only 1 MPM can be active at any given time.

Traditionally, the Apache core functionality serves static web content (e.g., HTML text and images). To serve dynamic content, such as PHP pages from WordPress, Apache requires special modules that execute PHP code.

For the Prefork MPM, each spawned Apache process embeds its own copy of the PHP handler (mod_php). Concurrency in this model is limited by the number of processes that Apache can spawn given the available memory.

For both Worker and Event MPMs, PHP requests are passed to an external FastCGI process, PHP5-FPM. PHP-FPM stands for PHP-FastCGI Process Manager. Essentially, the webserver and the PHP handler are split to separate processes. Apache communicates with PHP-FPM through an Apache module, either mod_fastcgi or mod_fcgid. Optimizing concurrency in this model means configuring both the MPM and the PHP handler (PHP-FPM) to have pools of processes and threads to handle requests.

The rest of this tutorial covers the cases of installing the Event MPM from scratch as well as migrating to Event from the Prefork MPM.

Installing Apache2

This tutorial starts with the installation of Apache2. If Apache is already installed, you should find out which MPM is currently running using the command apache2ctl -V, and proceed to the next section.

$ sudo apt-get update && sudo apt-get upgrade
$ sudo apt-get install apache2

Next, note the Apache version you just installed and the MPM that is loaded.

$ sudo apache2ctl -V
Server version: Apache/2.4.10 (Debian)
...
Architecture: 64-bit
Server MPM: event
threaded: yes (fixed thread count)
forked: yes (variable process count)
...

The above output tells us that we are running Apache release 2.4. Beginning with 2.4, Apache runs the Event MPM by default. If you are running an older version of Apache, the default MPM is either Prefork or Worker.

Configuring Apache2

  1. Back up the Apache configuration file, /etc/apache2/apache2.conf.

    $ sudo cp /etc/apache2/apache2.conf{,.orig}

  2. Edit the configuration file.

    Below is a subset of configuration parameters belonging to the Apache core module. You should adjust their values in order to optimize concurrency. The corresponding values are what I use for an entry-level VPS.

    Timeout 100
    KeepAlive On
    MaxKeepAliveRequests 1000
    KeepAliveTimeout 5

    For an in-depth explanation of the above parameters, please refer to Apache on-line documentation.

  3. Enable mod_rewrite.

    While the mod_rewrite module is not strictly relevant to optimizing concurrency, I've included it here as a reminder to install the module. It is an important module for running WordPress.

    $ sudo a2enmod rewrite

Installing Event MPM

If you are already running the Event MPM, skip to the next section, 'Configuring Event MPM'. Otherwise, follow the procedure below.

  1. Install the Event MPM.

    $ sudo apt-get install apache2-mpm-event

  2. Disable existing MPM.

    Recall that only 1 of Prefork, Worker or Event MPM can be running at any given time. Therefore, if you were previously running Prefork or Worker, you must first disable it, and then enable Event.

    To disable the Prefork MPM, run this command:

    $ sudo a2dismod mpm_prefork

    To disable the Worker MPM, run this:

    $ sudo a2dismod mpm_worker

  3. Enable the Event MPM.

    $ sudo a2enmod mpm_event

    Note that the above enable and disable commands are quite 'forgiving'. If you attempt to enable an MPM that is already enabled, or disable an MPM that is already disabled, it will simply return a harmless informational message.

Configuring Event MPM

To configure the Event MPM, modify its configuration file, /etc/apache2/mods-available/mpm_event.conf. Before making any changes, back it up using the following command:

$ sudo cp /etc/apache2/mods-available/mpm_event.conf{,.orig}

Edit the file to specify the following parameters:


<IfModule mpm_event_module>   
  StartServers 2  
  MinSpareThreads 25  
  MaxSpareThreads 75  
  ThreadLimit 25  
  ThreadsPerChild 25  
  MaxRequestWorkers 250  
  MaxConnectionsPerChild 10000  
  ServerLimit 12  
</IfModule>  

The above configuration is what I recommend for an entry-level VPS (512MB RAM, single-core). You need to adjust the parameters to satisfy your own system requirements. For a detailed explanation of the above parameters, click here. Note that the Event MPM shares the same parameters as the Worker MPM.

Installing PHP5 handler

To execute PHP code, Apache requires a PHP handler. PHP5-FPM is the PHP handler to use with the Event MPM.

For a new PHP installation, install php5-fpm followed by the meta-package php5.

$ sudo apt-get install php5-fpm php5

In addition to the above packages, I also installed other PHP5 packages which WordPress requires. While they are not strictly relevant to optimizing concurrency, I've included them here for completeness.

$ sudo apt-get install php5-mysql php5-gd php5-curl

Configuring virtual host

Suppose your WordPress website has the domain name example.com. To set up a virtual host with that domain name, follow the steps below:

  1. Create the Apache configuration file for example.com.

    Instead of creating the file from scratch, use the default site as a template.

    $ sudo cp /etc/apache2/sites-available/000-default.conf /etc/apache2/sites-available/example.com.conf

  2. Edit the configuration file.

    Customize the following site-specific parameters:

    ServerName example.com
    ServerAlias www.example.com
    ServerAdmin info@example.com
    DocumentRoot /var/www/example.com
    
  3. Create DocumentRoot directory.

    $ sudo mkdir /var/www/example.com
    $ sudo chown -R <webuser>:<webuser> /var/www/example.com

    Notes:

    • WordPress should be installed in the directory /var/www/example.com. For instructions on how to install WordPress, refer to my earlier post.

    • The DocumentRoot directory should be owned by a non-root user.

  4. Enable the new site.

    $ sudo a2ensite example.com.conf

  5. Disable the default site.

    $ sudo a2dissite 000-default.conf

Configuring PHP handler

Follow the procedure below to configure PHP5-FPM.

  1. Create a custom PHP configuration file for example.com by copying the template from the default site.

    $ sudo cp /etc/php5/fpm/pool.d/www.conf /etc/php5/fpm/pool.d/example.com.conf

  2. Edit the configuration file.

    Customize the following parameters.

    [example.com]
    user = <webuser>
    group = <webuser>
    listen = /var/run/php5-fpm_example.com.sock
    pm = dynamic
    pm.max_children = 5
    pm.start_servers = 2
    pm.min_spare_servers = 1
    pm.max_spare_servers = 3
    pm.max_requests = 2000

    Notes:

    • The user and group parameters specify respectively the Unix user and group names under which the FPM processes will run. You should specify a non-root user for both.

    • The listen parameter specifies the source address that the FPM will listen to for receiving PHP requests. In this case, it will listen to the Unix socket /var/run/php5-fpm_example.com.sock.

    • The rest of the parameters are for an entry-level VPS system. You should adjust their values to satisfy your system requirements.

    • Click here for more details about the above parameters.

  3. Restart PHP5-FPM.

    $ sudo systemctl restart php5-fpm

Installing FastCGI

Apache requires a FastCGI module to interface with the external PHP5-FPM processes. You can use 1 of 2 FastCGI modules: mod_fastcgi and mod_fcgid. Click here for a discussion of their differences. This tutorial uses mod_fastcgi.

Before you install mod_fastcgi, you must:

  1. Enable non-free.

    Debian pre-packages the mod_fastcgi module in the non-free archive area of its repositories. Make sure that non-free is included in the /etc/apt/sources.list file.

  2. Disable mod_php.

    If Apache2 was previously installed with the Prefork MPM, most likely, it is configured to execute PHP using the mod_php module. In this case, you must disable the mod-php module before you install mod_fastcgi. Otherwise, the install will fail with the error message, 'Apache is running a threaded MPM, but your PHP Module is not compiled to be threadsafe. You need to recompile PHP.'

    To disable mod_php, run this command:

    $ sudo a2dismod php5

To install mod_fastcgi, execute the following command:

$ sudo apt-get install libapache2-mod-fastcgi

Configuring FastCGI

  1. Back up configuration file.

    Before you edit the configuration file /etc/apache2/mods-available/fastcgi.conf, back it up using the following command.

    $ sudo cp /etc/apache2/mods-available/fastcgi.conf{,.orig}

  2. Edit the file.

    Insert the following lines:

    
    <IfModule mod_fastcgi.c> 
      AddHandler php5-fcgi .php 
      Action php5-fcgi /php5-fcgi 
      Alias /php5-fcgi /usr/lib/cgi-bin/php5-fcgi_example.com 
      FastCgiExternalServer /usr/lib/cgi-bin/php5-fcgi_example.com -socket /var/run/php5-fpm_example.com.sock -pass-header Authorization 
      <Directory /usr/lib/cgi-bin> 
          Require all granted 
      </Directory> 
    </IfModule>
    

    Notes:

    • example.com should be replaced with your own domain name.

    • To access the website, you need to grant the proper permission explicitly using the Require all granted statement. Without it, access to the website will be denied with the error message 'You don't have permission to access /php5-fcgi/index.php on this server.'

  3. Enable additional modules.

    $ sudo a2enmod actions fastcgi alias

  4. Restart Apache.

    The final step is to restart Apache to make all the above changes go live.

    $ sudo systemctl restart apache2

Threads in action

Concurrency for WordPress occurs at both the webserver (Apache2) and the PHP handler (PHP-FPM) levels. You can use the ps -efL command to monitor the processes and threads at either level.

To monitor Apache processes and threads, execute the following ps command.

$ ps -efL |grep apach[e]
www-data 31681 24441 31681 0 27 03:25 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 31681 24441 31684 0 27 03:25 ? 00:00:00 /usr/sbin/apache2 -k start
...

The second and the fourth columns are the process ID (PID) and the thread ID respectively. Note that the above output reports 2 different threads (31681 and 31684) of the same process (31681).

Execute the following command to monitor PHP.

$ ps -efL |grep ph[p]
root 24398 1 24398 0 1 Nov10 ? 00:00:17 php-fpm: master process (/etc/php5/fpm/php-fpm.conf)
peter 31519 24398 31519 0 1 03:14 ? 00:00:17 php-fpm: pool example.com
peter 31520 24398 31520 0 1 03:14 ? 00:00:16 php-fpm: pool example.com
peter 31827 24398 31827 0 1 04:15 ? 00:00:15 php-fpm: pool example.com

Conclusion

When traffic to your website increases over time, your webserver must scale up to handle the increase in traffic. This tutorial explains how to configure Apache2 and PHP to optimize the number of concurrent connections. After you try it out, if you still find that your website cannot keep up with the traffic, you should consider upgrading your VPS plan to have more RAM.

If you are interested in WordPress, please refer to my earlier posts.

Friday, October 30, 2015

WPScan: Stop feeling vulnerable about WordPress

Introduction

WordPress is the most popular content management system (CMS) on the planet today. You can customize the look and feel of a WordPress website using third-party themes. If you want a functionality not offered by the WordPress core, you will most likely find a third-party plugin that satisfies your requirement. With the plethora of themes and plugins comes a major challenge in assuring their quality. Intruders can potentially exploit the vulnerabilities in poorly designed themes and plugins to gain unauthorized access to a WordPress website.

WPScan is a WordPress vulnerability scanner that is free for non-commerical use. It scans your WordPress website and reveals any known vulnerabilities in the installed plugins and themes.

The rest of this post explains how to install and run WPScan.

Installation

WPScan comes pre-installed on only a handful of lesser-known Linux distributions. If you run Debian, Ubuntu, Centos or Fedora, you must download the WPScan source and build it yourself. Because WPScan is written in Ruby, to build WPScan, you need to install the Ruby development environment.

Your first decision is to select a machine on which to build WPScan. This is the machine you use to launch WPScan later. Note that you can (and should) run WPScan on a different machine than the WordPress host. The examples in this post are based on a Debian 8.2 machine, aka Jessie.

Your next decision is how you will install the Ruby development environment, including the supporting development libraries. The WPScan website outlines 2 ways to install the necessary development environment on a Debian server: the Debian package management system and the Ruby Version Manager(RVM).

RVM is the WPScan-recommended method. It allows you to install multiple, self-contained Ruby environments on the same system. RVM puts a dedicated Ruby environment under your Home directory (e.g., /home/peter/.rvm). You can find the RVM procedure on the WPScan home page. I've followed the steps, and it works as advertised.

I opted instead for the Debian package manager method because it is a shorter procedure and I did not need the versatility (and the complexity) that RVM offers.

Below are the steps to install WPScan using the Debian package manager. The procedure is largely based on what is on the WPScan home page. I've added a couple of missing packages that are actually required.

  1. Update your Debian repository.

    $ sudo apt-get update

  2. Install the supporting development packages.

    $ sudo apt-get install git make gcc ruby ruby-dev libcurl4-openssl-dev zlib1g-dev

  3. Download the WPScan source.

    $ cd; git clone https://github.com/wpscanteam/wpscan.git

  4. Install WPScan.

    $ cd wpscan
    $ sudo gem install bundler
    $ bundle install --without test --path vendor/bundle

Operation

The WPScan development team maintains a WordPress vulnerability database. Before you scan your WordPress website, you should first update the local copy of the vulnerability database as follows:

$ cd $HOME/wpscan
$ ruby wpscan.rb --update

To scan a WordPress website for vulnerabilities, use the following general WPScan command sequence.

$ cd $HOME/wpscan
$ ruby wpscan.rb --url http://yourWP.com --enumerate <options>

Notes:

  • The --url parameter specifies which WordPress website to scan.

  • The --enumerate parameter specifies which WordPress components to scan for vulnerabilities. I use the following options:
    * vp: Scan only vulnerable plugins.
    * vt: Scan only vulnerable themes.
    * u: Scan users.
    * tt: Scan timthumbs.
    You can combine multiple options with separating commas.
    For example, --enumerate vp,vt,u

  • For further on-line help, run this command:

    $ ruby wpscan.rb --help

Results Analysis

I invoked WPScan to scan the WordPress website deployed on my desktop (localhost). The report is displayed below. Note that the output has been abbreviated in order to improve its overall readability.

WPScan managed to find:

  1. 3 vulnerabilities in the WordPress core.

    Those vulnerabilities were fixed in the most recently released version of WordPress.

  2. 1 vulnerability in the WP Super Cache plugin.

    Again, the vulnerability was fixed in the most recently released version of the plugin.

  3. No known vulnerability in the installed themes.


$ ruby wpscan.rb --url localhost --enumerate vp,vt  
---
[+] WordPress version 4.3 identified from meta generator    
[!] 3 vulnerabilities identified from the version number    

[!] Title: WordPress <= 4.3-Authenticated Shortcode Tags Cross-Site Scripting (XSS)    
    Reference: https://wpvulndb.com/vulnerabilities/8186    
[i] Fixed in: 4.3.1    

[!] Title: WordPress <= 4.3-User List Table Cross-Site Scripting (XSS)    
    Reference: https://wpvulndb.com/vulnerabilities/8187    
[i] Fixed in: 4.3.1    

[!] Title: WordPress <= 4.3-Publish Post and Mark as Sticky Permission Issue    
    Reference: https://wpvulndb.com/vulnerabilities/8188    
[i] Fixed in: 4.3.1      
---
[+] Enumerating installed plugins (only ones with known vulnerabilities)...   
[+] We found 1 plugins:  

[+] Name: wp-super-cache - v1.4.4  
[!] The version is out of date, the latest version is 1.4.6  

[!] Title: WP Super Cache <= 1.4.4 - Cross-Site Scripting (XSS)  
    Reference: https://wpvulndb.com/vulnerabilities/8197  
[i] Fixed in: 1.4.5  

[+] Enumerating installed themes (only ones with known vulnerabilities)...  
[+] No themes found  

Conclusion

WPScan is an important tool in your defense against possible attacks on your WordPress websites. It is recommended that you schedule WPScan to run regularly to detect known WordPress vulnerabilities. Yet, running WPScan is only half of your job. You remain vulnerable until you patch the vulnerabilities.

In general, the WordPress community fixes most known vulnerabilities and distributes the fixes quickly after the vulnerabilities are first reported. It is important that you keep your WordPress core and the third-party themes and plugins up-to-date. If your WordPress platform is up-to-date, WPScan will most likely return a clean report, and you can stop feeling vulnerable about your WordPress website.

Below are my other posts on WordPress:

Wednesday, July 29, 2015

MySQLTuner series part 3 - minimizing MySQL memory footprint

This is part 3 of the series on using MySQLTuner to optimize MySQL database performance and stability. Part 1 explains how to install and run MySQLTuner. Part 2 addresses the area of database defragmentation. This post illustrates how to manage MySQL memory footprint.

MySQLTuner output

MySQLTuner was used to analyze a WordPress database deployed on the LAMP platform (Linux, Apache, MySQL, PHP). The host was a VPS server with only 512 MB of memory.

$ perl mysqltuner.pl

If you scroll down to the Recommendations section of the above report, it is hard to miss the eye-catching message: 'MySQL's maximum memory usage is dangerously high. Add RAM before increasing MySQL buffer variables.'

Indeed, adding more RAM is often the cheapest and simplest solution to out-of-memory problems. By spending an extra $5 per month, I can upgrade my VPS server to have 1 GB of RAM. But, before you go spend your hard-earned money on RAM, let's explore some other ways to reduce MySQL's memory footprint.

Maximum number of database connections

Lines that begin with two exclamation marks ('!!') are warnings. Note the following lines in the above Performance Metrics section:

-------- Performance Metrics --------------------------------------
...
[--] Up for: 36d 23h 28m 39s (758K q [0.237 qps], 53K conn, TX: 6B, RX: 98M)
[--] Total buffers: 192.0M global + 2.7M per thread (151 max threads)
[!!] Maximum possible memory usage: 597.8M (120% of installed RAM)
...
[OK] Highest usage of available connections: 8% (13/151)

According to the above warning, MySQL could potentially use up to 597.8 MB of RAM. Where did the number come from?

The number was derived from the preceding line. MySQL required 192MB globally and 2.7 MB per connection to the database. By default, the maximum number of connections was 150+1. (The 151st connection would be restricted to database administration only.) Hence, the maximum memory usage was 192 + 150 * 2.7, equaling 597.

Should you allow for 150 connections? Keep in mind that each connection, even in the idle state, will take up some memory. MySQLTuner can help you answer the question with confidence.

MySQLTuner reports the highest number of concurrent connections since the last MySQL restart (13 in the above example). The database should be up for a minimum of 24 hours before you run MySQLTuner. In fact, the longer the time elapses since the last restart, the more trustworthy is the statistic.

You can find out from the MySQLTuner report how long MySQL has been up. Go back to the first line under the Performance Metrics section. In the above example, MySQL had been up for 36 days since the last restart.

Although MySQL was configured for accepting 150 connections, the highest number of concurrent connections made in the past 36 days was only 13 (8% of the maximum). In light of that knowledge, we could lower the maximum number of connections allowed, therefore, reducing the total memory footprint for MySQL.

Before we go ahead to reconfigure MySQL, we will consider the wait-timeout threshold which affects how long idle connections stay alive before timing out.

Wait timeout

One of the General recommendations in the above example was:

'Your applications are not closing MySQL connections properly.'

In other words, database connections were opened but not properly closed after queries or updates were already completed. These idle connections would hang around until a predefined timeout threshold was reached. The default timeout threshold was 8 hours. So, if a query completed in 2 seconds, but because the connection was not closed properly, the connection would live for another 28,798 seconds before timing out. In the meantime, the idle connections continued to consume resources including counting toward the maximum number of open connections.

The culprit was easily identified in the above case: the database was used exclusively for WordPress, an application written in PHP. However, solving the problem can be out of your reach, unless you are a PHP developer.

The good news is that you can reduce the timeout interval by adjusting a MySQL configuration parameter. By making idle connections time out faster, there will be less concurrent connections. For WordPress/PHP applications, I set the wait timeout to be 60 seconds.

It is also worth mentioning that because there are less idle connections due to quicker timeout, you can further reduce the maximum number of connections.

Re-configuring MySQL

To change the maximum number of connections or the wait timeout threshold, edit the MySQL configuration file as follows.

$ sudo vi /etc/mysql/my.cnf

The configuration variables of interest are max_connections, and wait_timeout. Enter a value for each variable using the following syntax:

max_connections = 50
wait_timeout = 60

For the above configuration changes to take effect, a restart of the MySQL daemon is needed.

For non-systemd systems, run the following command:

$ sudo service mysql restart

For systemd-enabled systems, run:

$ sudo systemctl restart mysql

Alternatively, you can dynamically change the configuration variables, thereby avoiding the database restart. To do that, issue the following commands.

$ mysql -u root -p -e "SET GLOBAL max_connections=50;"
$ mysql -u root -p -e "SET GLOBAL wait_timeout=60;"

Note that modifying the MySQL configuration file is still required if you want the changes to persist after future system restarts.

What's next?

MySQLTuner is not something you run once and forget about it. Your web traffic pattern changes over time. You should schedule to run it regularly and examine the output. Please refer back to Part 1 of this series for instructions on how to schedule a run.

The more knowledgeable you are about database optimization, the more effective you become on using the information provided by MySQLTuner. I recommend the following videos if you want to learn more about MySQL optimization:

You may also be interested in the following MySQL-related articles from this blog:

For Apache optimization, please refer to this blog post.

Tuesday, July 7, 2015

MySQLTuner series part 2 - defragmenting a MySQL database

Part 1 of this series spells out how to install and run MySQLTuner, a script which recommends MySQL configuration changes. The goal is to optimize database performance and stability. This post describes how to interpret and use MySQLTuner output, specifically in the area of database defragmentation.

Proceed with caution

A word of caution is warranted before I plunge into the details of implementing MySQLTuner's suggestions. MySQLTuner does not excuse you from learning the basic database optimization principles and following industry best practices. Following a MySQLTuner recommendation without researching and understanding its ramifications is a gamble that may end up worsening your database performance and reliability.

Optimizing MySQL configuration is not a trivial matter, and must be done in a controlled manner. You should change only one MySQL configuration variable at a time. After every change, monitor the system to verify that the expected outcome is achieved without any negative side effect.

General comments

MySQLTuner is a PERL script which you can invoke like this:

$ perl mysqltuner.pl

The following is the MySQLTuner output for a low-memory VPS server running on the LAMP platform (Linux, Apache, MySQL, PHP). The VPS is dedicated for running a WordPress blog.

One is often tempted to bypass the first several sections of the report on database metrics, and head straight to the Recommendations section. But, the metrics provide the crucial context for the recommendations that follow, and should be read carefully.

Storage engine statistics

The Storage engine statistics section of the report summarizes the total number and size of InnoDB and MyISAM tables in your database.

In the above example, 18 InnoDB and 4 MyISAM tables were detected. But the report does not identify the tables. If you want to list all InnoDB tables, execute the command below.

$ echo 'select concat(table_schema,".",table_name) from information_schema.tables where engine="InnoDB";'|mysql -u root -p

To list all MyISAM tables, replace InnoDB with MyISAM in the above command.

The key actionable statistic in this section is the total number of fragmented tables (20 in the example). Fragmentation occurs during normal database operations when records are inserted and deleted, leaving behind 'gaps' in the database.

MySQLTuner does not report the size of the 'gaps' or unused space in the fragmented tables. You can find out by running the following MySQL statement.

mysql> select ENGINE, TABLE_NAME,               \  
          DATA_LENGTH, INDEX_LENGTH, DATA_FREE, \
          DATA_FREE / (INDEX_LENGTH + DATA_LENGTH) as frag_ratio  \
       from information_schema.tables                             \
       where DATA_FREE > 0 order by frag_ratio desc;   
+-------+-----------+-----------+------------+---------+----------+  
| ENGINE| TABLE_NAME|DATA_LENGTH|INDEX_LENGTH|DATA_FREE|frag_ratio|  
+-------+-----------+-----------+------------+---------+----------+  
...  
| InnoDB| wp_options|   1179648 |      16384 | 11534336|  9.6438  |  
...  
+-------+-----------+-----------+------------+---------+----------+  

The DATA_LENGTH and INDEX_LENGTH variables contain respectively the size of the data and the index for a table. DATA_FREE is the size of the unused space in a table. The fragmentation ratio is the amount of unused space to the sum of the used data and index space.

If your tables are large, you can round up the output length variables to megabytes (MB) by using the following SQL statement:

mysql> select ENGINE, TABLE_NAME,                          \
          round(DATA_LENGTH  /1024 / 1024) as data_length, \
          round(INDEX_LENGTH /1024 /1024) as index_length, \
          round(DATA_FREE / 1024 /1024) as data_free,      \
          data_free / (index_length + data_length) as frag_ratio \
       from information_schema.tables                      \
       where DATA_FREE > 0 order by frag_ratio desc;  

Database Defragmentation

If you scroll down to the Recommendations section of the report, you will see that the first general recommendation is 'Run OPTIMIZE TABLE to defragment tables for better performance'. You may execute the OPTIMIZE TABLE SQL statement for each of the 22 tables. Alternatively, you can run the mysqlcheck command as follows:

$ mysqlcheck -Aos --auto-repair -u root -p

Notes:

  1. Optimizing a table will lock it up. In other words, no update to the table is allowed while the operation is being performed. For a large production table, the substantial downtime is something that the database administrator should consider before deciding to optimize a table.

  2. Optimizing a table does not necessarily reclaim its free space. This is especially true for InnoDB tables. Prior to MySQL version 5.6, all InnoDB tables are by default stored in a single file. This behavior is controlled by the MySQL configuration variable innodb_file_per_table. Optimizing InnoDB tables stored together in a single file may inadvertently produce the undesirable effect of increasing the file size.

  3. InnoDB tables fragment differently than the legacy MyISAM tables. mysqlcheck optimizes an InnoDB table by recreating it. For each InnoDB table that it optimizes, mysqlcheck generates the following informational message: 'Note : Table does not support optimize, doing recreate + analyze instead'. You can safely ignore those messages.

  4. The mysqld server process must be running for mysqlcheck to execute.

  5. -A (--all-databases)

    With -A specified, all tables of all databases are optimized.

    If you want to defragment only a specific table of a specific database, customize the following command.

    $ mysqlcheck -os <database> <table> -u root -p

  6. -o (--optimize)

    This option specifies that the optimize operation is to be performed.

  7. -s (--silent)

    -s enables silent mode: only error messages are displayed.

  8. --auto-repair

    If MySQLTuner finds a target table which is corrupted, it will try to repair it.

What's next?

Part 3 of this series continues the discussion on MySQLTuner output, specifically about the management of database memory footprint.

Friday, June 19, 2015

Improving MySQL stability & performance using MySQLTuner - part 1

MySQL is the database engine behind many web applications on the Internet today. While it is relatively straightforward to install, configuring MySQL to best support your particular application requires expertise and the right tools. This post introduces MySQLTuner, a command-line program which offers suggestions to optimize MySQl performance and stability.

MySQLTuner is a read-only script: it won't actually write to the MySQL configuration file. Based on your database's past usage, it recommends new values to assign to specific MySQL configuration variables. It is your responsibility to understand each recommended change and its possible ramifications, select the changes you want to make, and to make them in a controlled manner.

Installing MySQLTuner

Before you install MySQLTuner, make sure that it supports your MySQL version. You can find the up-to-date compatibility information on its website.

To identify the MySQL version on your database server, run this command:


$ mysqladmin -u root -p version
...
Server version 5.5.43-0+deb7u1
...

MySQLTuner is a PERL script that you can install from the standard Debian and Ubuntu repositories. You can install it using the following command:

$ sudo apt-get install mysqltuner

The prepackaged MySQLTuner may not be the latest release available. If you want the latest, or you run a Linux distro other than Debian/Ubuntu, you can install the up-to-date version by downloading it directly. Simply download the PERL script to a directory of your choice using the command:


$ wget http://mysqltuner.pl/ -O mysqltuner.pl

Running MySQLTuner

Your database should be up longer than 1 day before you run MySQLTuner. This is because MySQLTuner bases its recommendations on past database usage. The more data it has to analyze, the more accurate is its recommendations. If MySQLTuner is run on a database that has been restarted in the last day, you will get a warning message: 'MySQL started within last 24 hours - recommendations may be inaccurate'.

To run the script, enter the following:


$ perl mysqltuner.pl

Analyzing output

MySQLTuner reports statistics about the database, and makes tuning recommendations. The top section of the report gives you useful database metrics, many of them actionable. The bottom section provides tuning suggestions for the MySQL configuration file.

You should thoroughly research a suggested configuration change before deciding to implement it. To change a configuration variable, edit the file /etc/mysql/my.cnf.

After you make a MySQL configuration change, restart the MySQL service.


$ sudo service mysql restart

Scheduling runs

Database tuning is not a 'once and done' type of task. Conditions change over time. A good practice is to schedule regular MySQLTuner runs using crontabs.

The following schedules MySQLTuner to run every Thursday at 4:30 am.


$ (crontab -l; echo "30 04 * * 4 perl <path-to-script>/mysqltuner.pl --nocolor 2>&1") | crontab -

Please refer to my earlier post for details about crontabs.

By default, MySQLTuner prompts the user for the database login credentials. For a cronjob to run MySQLTuner, you may provide the database account and password in the user-specific MySQL configuration file.


$ cat >> /home/peter/.my.cnf <<< '
[client]
user=<db_user>
password=<db_pass>'

Finally, protect the db login credentials by tightening up access permissions for the .my.cnf file.


$ chmod 600 /home/peter/.my.cnf

What's next?

The remainder of this series will guide you through taking actions based on MySQLTuner recommendations.

Tuesday, June 9, 2015

Double asterisk brings double expedience to pathname expansion


If you are a Linux command-line user, most likely, you are familiar with the use of the single asterisk ('*') in pathname expansion (aka globbing). How the asterisk behaves is standardized across all shells (bash, zsh, tcsh, etc). For example, the ls * command lists the files and the immediate sub-directories of the current directory.


$ ls *

The single asterisk, however, is not recursive: it does not traverse beyond the target directory. You may use the find command to generate a recursive listing of pathnames. A simpler solution is the use of the double asterisk ('**').

Unlike the single asterisk, the double asterisk is not standardized. Different shells introduced the feature at different times with slightly different behavior. This post focuses on the use of '**' for the bash shell.

The double asterisk feature for bash first appears with bash4. To find out which bash version you are running, execute the following command:


$ bash --version
GNU bash, version 4.2.37(1)-release (x86_64-pc-linux-gnu)
...

Before you use '**', you must first enable the globstar shell option:


$ shopt -s globstar

With globstar enabled, you may use '**' for pathname expansion.


$ ls **/abc.txt

In the above example, the ls command returns any occurrence of the file abc.txt in the current directory and sub-directories.

Notes:

  1. By default, the double asterisk does not expand to include a hidden file. For example, the following command will not find .htaccess because it is a hidden file.


    $ ls **/.htaccess

    To allow hidden files in '**' output, enable the dotglob shell option:


    $ shopt -s dotglob

  2. When you do a pathname expansion using '*' or '**', you run the risk that a returned filename is the same as a command-line flag, e.g., -r. To mitigate that risk, precede '**' with '--' as below. The double dash marks the spot where command-line flags end, and positional parameters begin.


    $ ls -- **

  3. Under bash, '**' expands to follow symbolic links. This behavior, however, is shell-specific. For zsh, expanding the double asterisk does not follow a symbolic link.

The double dash is a useful tool to add to your everyday command-line usage.

Tuesday, May 26, 2015

Useful sed tricks to customize configuration files



A typical software installation goes like this. You install the software using apt-get install or yum install.
Then, you manually edit the software's configuration file in order to satisfy your requirements. If you have to repeat the install on multiple machines, this quickly becomes tedious.


Instead of manually editing the file, I run a text manipulation command such as sed or awk to make the required changes. Then, I script the procedure by inserting the commands in a bash script file.


The scripting of configuration changes serves multiple purposes:


  • It is a permanent record of the configuration changes.

  • It is readily repeatable on the same or a different machine.


Below, I illustrate 2 sed tricks to make configuration changes to the Apache webserver. The target configuration file is /etc/apache2/apache2.conf.


Before you make any change, please first backup the original configuration file.



$ sudo cp /etc/apache2/apache2.conf /etc/apache2/apache2.conf.orig


Replacing first occurrence of a string


The default apache2.conf file contains the following line:


Timeout 300


Below is the sed command to change the first occurrence of Timeout in the file to 100.



$ sudo sed -i "0,/^Timeout\s/ s/^Timeout\s\+[0-9]\+/Timeout 100/" /etc/apache2/apache2.conf


The -i parameter tells sed to edit the file in place - that is, directly in apache2.conf.


0,/^Timeout\s/ specifies the range of lines over which the sed command is to be executed. In this example, the starting line is the first line (line 0). The finishing line is the line returned by a search for the word Timeout which appears at the beginning of a line (^) and followed by a whitespace (\s).


The line range parameter limits the change to only the first occurrence of Timeout in the file. If you leave out the line range, each occurrence of Timeout in the file will be modified. In many scenarios, leaving it out is OK because the parameter occurs only once in the configuration file.


For some configuration files, a parameter can occur multiples times, in different sections. Next, I illustrate how to limit the change to within a particular section in the configuration file.


Replacing a string within a target section


The MaxClients parameter occurs in 3 sections within the apache2.conf file:


  • mpm_prefork_module

  • mpm_worker_module

  • mpm_event_module


I want to change the MaxClients parameter within the mpm_prefork_module only.


The default mpm_prefork_module is like this:



<IfModule mpm_prefork_module>
StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 150
MaxRequestsPerChild 0
</IfModule>


Note that a section is delimited by the opening <IfModule> and closing </IfModule> statements.


The following sed command modifies the value of MaxClients to 18 within the mpm_prefork_module section.

$ sudo sed -i "/<IfModule mpm_prefork_module>/,\@</IfModule>@ s/MaxClients\s\+[0-9]\+/MaxClients 18/" /etc/apache2/apache2.conf


The line range is defined by the /<IfModule ... >/,\@</IfModule>@ clause in the above statement. The opening line in the line range is specified by a search for the <IfModule ... > pattern. The closing line is specified by the search pattern \@</IfModule>@.


An explanation of the closing line pattern is warranted. The slash (/) character is part of the search pattern for the closing line (</IfModule>). However, the slash is also the default delimiter for sed. Therefore, we must use a different delimiter (@) for the closing-line search pattern. Note that the first @ is escaped (\@).


The s/MaxClients.../MaxClients 18/ clause changes the value of MaxClients to 18.


Conclusion


The above are examples of how you can use sed to script common scenarios of changing configuration files. You can achieve the same result using other tools such as awk or perl. Please use the comment system to let us know your own examples.


If you are interested to learn more about sed, please read my earlier posts on the tool:


Tuesday, May 12, 2015

How to add and delete a user from a Linux group

Being in the right Linux group expedites many system administration tasks, and will save you time. For example, the Apache web server logs have the following file ownership and permissions.

# ls -al /var/log/apache2/*.log
-rw-r----- 1 root adm 882984 May 11 12:14 /var/log/apache2/access.log
-rw-r----- 1 root adm 418 May 11 01:55 /var/log/apache2/error.log
...

To read the Apache logs, you need root permissions. However, there is a shortcut that does not require you to run sudo. Note that adm - the admin group for Debian-based systems - is the group owner of the log files. So, if you become a member of adm, you don't need to sudo to read the log files.

To add peter to the adm group, execute any of the following commands:

  • $ sudo usermod -aG adm peter

  • $ sudo gpasswd -a peter adm

To verify that peter is now a member of the adm group, execute any of the following commands:

  • $ id -nG peter
    peter adm www-data

    You may be tempted, as I was, to not specify peter in the above command. Don't skip the parameter. Without the user parameter, you won't see the effect of the change in group membership - unless you log out and log back in. If you are running X, it means you have to log out of X, not just opening a new command shell window within the same X session.

  • $ groups peter
    peter : peter adm www-data

    Again, specify peter in the command. Otherwise, you must log out and then log back in before executing the command.

  • $ grep adm /etc/group
    adm:x:4:peter

If you have made a mistake, and now want to remove peter from the adm group, run any of the following commands:

  • $ sudo gpasswd -d peter adm
    Removing user peter from group adm

  • $ sudo deluser peter adm
    Removing user `peter' from group `adm' ...
    Done.

Besides the adm group, you should consider adding yourself to the www-data group. The Apache web server runs under the www-data user account on Debian systems. As a member of the www-data group, you can more easily modify web server files and directories.

An earlier post on group membership appears here.

Wednesday, April 29, 2015

How to schedule background jobs using crontab

The cron daemon is a great user tool to automate tasks that don't require human intervention. Users pre-specify jobs to run in the background at particular times, for example, every Monday, Wednesday and Friday at 2am.

To use cron, each user creates his own crontab ('cron table') file. The command to examine one's crontab file is crontab -l.

$ crontab -l
MAILTO=commandolinux@gmail.com
0 2 * * 1,3,5 /home/peter/backups/backupWP.sh 2>&1

The MAILTO line specifies the email address to which cron sends the output of command execution. Please refer to my earlier post on how to set up an SMTP server to forward your emails.

The second crontab line specifies that the backupWP.sh script should be executed at 2am every Monday, Wednesday and Friday. The syntax may look complicated. Fortunately, you can use the on-line Crontab Generator to craft the crontab statements. If you want to learn the syntax, click here instead.

Create crontab

Your crontab file is initially empty. To create the file from scratch, run the crontab command and type in the crontab statements.

$ crontab

Alternatively, put the statements into a temporary file, say /tmp/cron, and run this command:

$ cat /tmp/cron | crontab -

Edit crontab

If you want to modify crontab contents after they are created, run this command:

$ crontab -e

The command opens the crontab file in your default text editor. It is the most versatile way to modify crontab. You can use it to create, modify, and delete crontab statements. Don't forget to save the file after you finish editing.

The downside for this edit command is the time and overhead of starting the text editor. You can append a new statement directly by using the command in the next section.

Add to crontab

When I was new to crontab, I made the mistake of trying to append a statement by running crontab without any argument. That actually replaced everything in the crontab file with the new input.

The correct command to append a new statement is:

$ (crontab -l; echo "30 04 * * 4 /home/peter/backups/report.sh 2>&1") | crontab -

The trick is to run 2 commands in a subshell grouped by the round brackets. The first command, crontab -l, fetches the existing crontab statements. The echo command echoes the new statement to be appended. The collective output from both commands are piped to crontab standard input.

Empty crontab

To erase all crontab contents, execute the following command:

$ crontab -r

Conclusion

You may use crontab to schedule regular maintenance and backup tasks. Once it is set up, the crontab file tends to be static. But, if you ever need to add another task, or change the scheduled times, the commands introduced in this post will come in handy.

Tuesday, April 28, 2015

Free edX course on Introduction to Java Programming

If you want to learn Java programming, here is your perfect opportunity to take an on-line course.

EdX, the popular MOOC (Massive Open On-line Course) provider, is hosting a course on Introduction to Programming with Java.

This is part 1 of a series of Java courses, and will take 5 weeks to complete on-line.

I have never taken a formal programming course on-line. So, I can't advise on the effectiveness of such a course. But, I've taken non-programming-related edX courses before, and the experience was positive.

Do sign up today. The course starts on April 28.

And let us know in a comment of how you find the course.

Thursday, April 23, 2015

Configuring Monit: a free system monitoring and recovery tool

Why Monit?

One morning, I went on-line to check my WordPress website. Lo and behold, I saw this error: 'Error establishing a database connection.' My website had been down for 4 hours, luckily in the middle of the night.

I used a free website monitoring service called StatusCake. Sure enough, it did send me an email alerting me about this problem. But, sending an email at 2am was not helpful in solving the problem. What I really needed was a tool that not only detected when the database process went down, but would also restart the process without human intervention. Monit is such a tool.

For the rest of this post, I assume you want Monit to monitor a LAMP server (Linux, Apache2, MySQL, PHP).

Install Monit.

To install Monit on Debian or Ubuntu, execute this command:

$ sudo apt-get install monit

As part of the installation, a monit service is created:

$ sudo chkconfig --list | grep -i monit  
monit       0:off  1:off  2:on   3:on   4:on   5:on   6:off

Configure Monit

The main Monit configuration file is /etc/monit/monitrc. To edit it, you need sudo privileges.

$ sudo vi /etc/monit/monitrc

After you make a change to the file, follow these steps to bring it into effect:

  1. Validate configuration file syntax.

    $ sudo monit -t
    

    If no error is returned, proceed to next step.

  2. Restart Monit.

    $ sudo service monit restart
    

Global settings

The key global settings to customize are:

  • Test interval

    By default, Monit checks your system at 2-minute intervals. To customize the interval, change the value (from 120) in the following statement. The unit of measure is seconds.

    set daemon 120
    
  • Log file location

    You can specify whether Monit logs to syslog or a log file of your choice.

    # set logfile syslog facility log_daemon  
    set logfile /var/log/monit.log
    
  • Mail server

    Specify a mail server for Monit to send email alerts. I set up exim4 as an SMTP server on the localhost. For instructions, refer to my previous post.

    set mailserver localhost
    
  • Email format

    Hopefully, you won't receive many alert emails, but when you do, you want the maximum information about the potential problem. The default email format contains all the information known to Monit, but you may customize the format in which the information is delivered. To customize, use the set mail-format statement.

    set mail-format {  
        from:     monit@$HOST  
        subject:  monit alert --  $EVENT $SERVICE  
        message:  $EVENT Service $SERVICE  
                  at $DATE  
                  on $HOST 
                  $ACTION  
                  $DESCRIPTION  
                  Your faithful employee,  
                  Monit  
    }
    

    For a description of the set mail-format statement, click here.

  • Global alerts

    If any actionable event occurs, Monit sends an email alert to a predefined address list. Each email address is defined using the set alert statement.

    set alert root@localhost not on { instance, action }
    

    In the above example, root@localhost is the email recipient. Please refer to my earlier post about redirecting local emails to a remote email account.

    Note that an event filter is defined (not on { instance, action }). Root@local will receive an email alert on every event unless it is of the instance or action type. An instance event is triggered by the starting or stopping of the Monit process. An action event is triggered by certain explicit user commands, e.g., to unmonitor or monitor a service. Click here for the complete list of event types that you can use for filtering.

    By default, Monit sends an email alert when a service fails and another when it recovers. It does not repeat failure alerts after the initial detection. You can change this default behavior by specifying the reminder option in the set alert statement. The following example sends a reminder email on every fifth test cycle if the target service remains failed:

    set alert root@localhost with reminder on 5 cycles
    
  • Enabling reporting and service management

    You can dynamically manage Monit service monitors, and request status reports. These capabilities are delivered by an embedded web server. By default, this web server is disabled. To enable it, include the set httpd statement.

    set httpd port 2812 and
        use address localhost  
        allow localhost
    

    Note: I've only allowed local access to the embedded web server. The Useful Commands section below explains the commands to request reporting and management services.

Resource monitor settings

The following are the key resources to monitor on a LAMP server.

  • System performance

    You can configure Monit to send an alert when system resources are running below certain minimum performance threshold. The system resources that can be monitored are load averages, memory, swap and CPU usages.

    check system example.com   
      if loadavg (1min) > 4       then alert 
      if loadavg (5min) > 2       then alert 
      if memory usage   > 75%     then alert 
      if swap usage     > 25%     then alert 
      if cpu usage (user)   > 70% then alert 
      if cpu usage (system) > 30% then alert 
      if cpu usage (wait)   > 20% then alert
    
  • Filesystem usage

    You can create a monitor which is triggered when the percentage of disk space used is greater than an upper threshold.

    check filesystem rootfs with path /
      if space usage > 90% then alert
    

    You may have more than 1 filesystem created on your server. Run the df command to identify the filesystem name (rootfs) and the path it was mounted on (/).

  • MySQL

    Instead of putting the MySQL-specific statements in the main configuration file, I elect to put them in /etc/monit/conf.d/mysql.conf. This is a personal preference. I like a more compact main configuration file. All files inside the /etc/monit/conf.d/ directory are automatically included in Monit configuration.

    The following statements should be inserted into the mysql.conf file.

    check process mysql with pidfile /var/run/mysqld/mysqld.pid  
          start program = "/etc/init.d/mysql start"  
          stop program = "/etc/init.d/mysql stop"  
          if failed unixsocket /var/run/mysqld/mysqld.sock then restart  
          if 5 restarts within 5 cycles then timeout
    

    If the MySQL process dies, Monit needs to know how to restart it. The command to start the MySQL process is specified by the start program clause. The command to stop MySQL is specified by the stop command clause.

    A timeout event is triggered if MySQL is restarted 5 times in a span of 5 consecutive test cycles. In the event of a timeout, an alert email is sent, and the MySQL process will no longer be monitored. To resume monitoring, execute this command:

    $ sudo monit monitor mysql
    
  • Apache

    I put the following Apache-specific statements in the file /etc/monit/conf.d/apache.conf.

    check process apache2 with pidfile /var/run/apache2.pid
          start program = "/etc/init.d/apache2 start"
          stop program = "/etc/init.d/apache2 stop"
          if failed host example.com port 80 protocol http request "/monit/token" then restart
          if 3 restarts within 5 cycles then timeout
          if children > 250 then restart
          if loadavg(5min) greater than 10 for 8 cycles then stop
    

    At every test cycle, Monit attempts to retrieve http://example.com/monit/token. This URL points to a dummy file created on the webserver specifically for this test. You need to create the file by executing the following commands:

    $ mkdir /var/www/monit
    $ touch /var/www/monit/token 
    $ chown -R www-data:www-data /var/www/monit
    

    Besides testing web access, the above configuration also monitors resource usages. The Apache process is restarted if it spawns more than 250 child processes. Apache is also restarted if the server's load average is greater than 10 for 8 cycles.

Useful commands

To print a status summary of all services being monitored, execute the command below:

    $ sudo monit summary  
    The Monit daemon 5.4 uptime: 3h 48m 

    System 'example.com'                Running
    Filesystem 'rootfs'                 Accessible
    Process 'mysql'                     Running
    Process 'apache2'                   Running

To print detailed status information of all services being monitored, execute the following:

    $ sudo monit status
    The Monit daemon 5.4 uptime: 3h 52m 

    System 'example.com'
      status                            Running
      monitoring status                 Monitored
      load average                      [0.00] [0.01] [0.05]
      cpu                               0.0%us 0.0%sy 0.0%wa
      memory usage                      377092 kB [74.0%]
      swap usage                        53132 kB [10.3%]
      data collected                    Wed, 22 Apr 2015 13:21:47
    ...        
    Process 'apache2'
      status                            Running
      monitoring status                 Monitored
      pid                               12909
      parent pid                        1
      uptime                            6d 15h 18m 
      children                          10
      memory kilobytes                  2228
      memory kilobytes total            335420
      memory percent                    0.4%
      memory percent total              65.9%
      cpu percent                       0.0%
      cpu percent total                 0.0%
      port response time                0.001s to example.com:80/my-monit-dir/token [HTTP via TCP]
      data collected                    Wed, 22 Apr 2015 13:21:47

To unmonitor a particular service (e.g., apache2):

    $ sudo monit unmonitor apache2

To unmonitor all services:

    $ sudo monit unmonitor all

To monitor a service:

    $ sudo monit monitor apache2

To monitor all services:

    $ sudo monit monitor all

Conclusion

I'd recommend that you run Monit on your server in addition to signing up for a remote website monitoring service such as StatusCake. While the 2 services do overlap, they also complement each other. Monit runs locally on your server, and can restart processes when a problem is detected. However, a networking problem may go undetected by Monit. That is where a remote monitoring service shines. In the event of a network failure, the remote monitor fails to connect to your server, and will therefore report a problem that may otherwise go unnoticed.

Tuesday, April 14, 2015

Command-line network speed testing

Is your web app slow? Is network bandwidth the problem? To diagnose the problem, begin by measuring the network bandwidth. Many users run the popular, web-based speedtest.net to capture speed performance data. This is a good solution if the X Window System is installed on the webserver. However, I have a Linux VPS server without an X graphical environment. Command line is the only viable way to perform a speed test on that server.

Power Linux users may want to use the iperf program to measure network bandwidth. To use iperf effectively, you need some basic knowledge of TCP/IP. In addition, you need to setup iperf to run on 2 machines: the 'client' and the 'server'. Yet, if you like the simplicity of using speedtest.net, you will be happy to know the following command-line tool to access speedtest.net servers.

speedtest-cli is a command-line Python program for testing Internet bandwidth using speedtest.net.

To download, and configure speedtest-cli, run the following commands:

$ wget -O speedtest-cli https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest_cli.py
$ chmod +x speedtest-cli

To capture the upload and download speeds of a local machine, you can simply run speedtest-cli without any parameter. The program automatically selects the 'best' speedtest.net server to test bidirectional transmission from the local machine.

$ ./speedtest-cli
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Telus Communications (108.180.199.xxx)...
Selecting best server based on latency...
Hosted by TELUS (Vancouver, BC) [3.73 km]: 34.739 ms
Testing download speed........................................
Download: 3.04 Mbit/s
Testing upload speed..................................................
Upload: 0.74 Mbit/s

In the above example, the program selected a test server located only 3 kilometers away from the local machine. That is not where most of my web visitors are from, namely the east coast of United States. The speed tests are more useful to me if the test server is located say in New York city.

You can designate a specific speedtest.net server in your speed testing. First, list the supported test servers.

$ ./speedtest-cli --list
...
982) Interserver, inc (Secaucus, NJ, United States) [3895.32 km]
2947) Atlantic Metro (New York City, NY, United States) [3903.25 km]
663) Optimum Online (New York City, NY, United States) [3903.25 km]
...

Then, select one from the list to specify as the test server, say 2947 (Atlantic Metro in New York City). To track network speed performance more consistently over time, you can designate the same test server in your subsequent tests.

$ ./speedtest-cli --server 2947
Retrieving speedtest.net configuration... Retrieving speedtest.net server list...
Testing from Telus Communications (108.180.199.xxx)...
Hosted by Atlantic Metro (New York City, NY) [3903.25 km]: 2629.346 ms
Testing download speed........................................
Download: 2.79 Mbit/s
Testing upload speed..................................................
Upload: 0.84 Mbit/s

For more information about speedtest-cli parameters, execute the command below.

$ ./speedtest.cli -h

Monday, April 6, 2015

How to n-up pages in a PDF or PPT file via the command-line

Suppose you downloaded a PowerPoint or a PDF file from slideshare. You liked it so much that you wanted to print it out. But, alas, it was 50 pages long.

This tutorial introduces the command-line tools to n-up a PPT or PDF file, i.e., batch multiple pages of the input file onto a single page on the output file. The output file is of the PDF format.

To 2-up a file, you place 2 original pages on a single output page. Similarly, to 4-up a file, 4 original pages on a single output page. By n-upping a file, you drastically reduce the number of pages for printing.

Convert to PDF

If the original file is a PowerPoint file (PPT, PPTX, PPS, PPSX), you need to first convert it to PDF. The tool I use is unoconv.

To install unoconv on Debian,

$ sudo apt-get install unoconv

To convert input.ppt to input.pdf,

$ unoconv -f pdf input.ppt

N-up PDF

Now that you have a PDF file, use the pdfnup program to n-up the file.

To install pdfnup,

$ sudo apt-get install pdfjam

Behind the scene, pdfnup uses the TeX typesetting system to do the n-up conversion. So, you need to first install some LaTeX-related packages.

$ sudo apt-get install texlive-latex-base texlive-latex-recommended

Now, you are ready to execute the following command to n-up input.pdf.

$ pdfnup --nup 2x3 --paper letter --frame true --no-landscape input.pdf

  • --nup 2x3: 2x3 means 2 columns and 3 rows. This houses a total of 6 input pages on each output page.

  • --paper letter: The default paper size is A4. For North Americans, specify --paper letter for the US letter size.

  • --frame: By default, the subpages on the output page are not framed, i.e., there are no borders around each subpage. To specify that a frame should be drawn around each subpage, specify --frame true.

  • --no-landscape: The default page orientation is landscape. If you want the portrait orientation, specify --no-landscape.

  • The output PDF filename for the above example is input-nup.pdf. The output filename is constructed by appending the default suffix -nup to the input filename.

The above method is not the only way to n-up a PDF file. Below is an alternative method that involves first converting the PDF file to PostScript format, then doing the n-up, and finally converting it back to PDF.

$ pdf2ps input.pdf input.ps
$ psnup -2 input.ps output.ps
$ ps2pdf output.ps output.pdf

You can choose either method to do the n-up conversion. I generally avoid the PostScript method because it involves an extra conversion step. Regardless of which method you choose, the environment will thank you for using less paper.

Friday, March 20, 2015

Automate Debian system updates using Crontab

Updating a Debian system is as easy as executing the following command as root:

# apt-get update && apt-get upgrade

If you have a sudo account, run the command like this:

$ sudo apt-get update && sudo apt-get upgrade

Instead of running the command interactively, you can automate the manual update process by running a cron job. Below, I assume you login as root.

Run the following command to create or edit your cron jobs. Note that the default text editor is opened automatically for you to enter the cron jobs.

# crontab -e

As an example, I will schedule the update to happen daily at 2am. I entered the following line as my first (failed) attempt.

00 02 * * * apt-get update 2>&1 && apt-get -y upgrade 2>&1

A typical upgrade usually prompts you to confirm a transaction before it is executed. Because the cron upgrade is non-interactive, I specify the -y parameter to tell apt-get to assume yes for all prompts.

At 2am, the above command executed, and failed with the following errors:

debconf: unable to initialize frontend: Dialog   
debconf: (TERM is not set, so the dialog frontend is not usable.)    
debconf: falling back to frontend: Readline   
debconf: unable to initialize frontend: Readline   
debconf: (This frontend requires a controlling tty.)   
debconf: falling back to frontend: Teletype  
dpkg-preconfigure:unable to re-open stdin: Fetched 49.5 MB in 17s (2,840 kB/s)  
dpkg: warning: 'ldconfig' not found in PATH or not executable  
dpkg: warning: 'start-stop-daemon' not found in PATH or not executable  
dpkg: error: 2 expected programs not found in PATH or not executable  
Note: root's PATH should usually contain /usr/local/sbin, /usr/sbin and /sbin  
E: Sub-process /usr/bin/dpkg returned an error code (2)

There were 2 problems. First, a front-end was expected, but there was none. Second, the PATH for locating commands was not set up correctly.

To correct the problems, re-run crontab -e, and insert the following lines to properly set up the run-time environment.

DEBIAN_FRONTEND=noninteractive  
PATH=/usr/sbin:/usr/bin:/sbin:/bin  
00 02 * * * apt-get update 2>&1 && apt-get -y upgrade 2>&1

Automating the system update process saves you time, and keep your system more up-to-date as a protection against potential cyber attacks. If you are interested in Debian system administration, please see What to do after spinning up a Debian VPS.

Wednesday, March 4, 2015

How to merge or split pdf files using convert

convert is a member of the ImageMagick software suite for image manipulation. Two of my earlier posts dealt with using convert to slice and resize an image. It is a lesser-known fact that convert also works with pdf files. I'd previously explained how to merge and split up pdf files using tools such as pdftk and gs. In this post, I'll illustrate how to do the same using the convert program.

First, you need to install convert which is packaged in the ImageMagick suite.

$ sudo apt-get install imagemagick

Merging 2 pdf files (file1 and file2) into a new file (output) is as simple as executing:

$ convert file1.pdf file2.pdf output.pdf

You can merge a subset of pages instead of the entire input files. To accomplish that, use the angle brackets to specify the target subset of pages. For example, to merge page 1 of file1 with pages 1, 2 and 4 of file2, run the following command:

$ convert file1.pdf[0] file2.pdf[0-1,3] output.pdf

Note that page numbers are zero-based. Therefore, [0] is page 1, and [0-1] are the pages ranging from page 1 to page 2.

Finally, the following example splits up input into 2 files: first2output and next2output. The former output file contains pages 1 and 2 from the original file; the latter, pages 3 and 4.

$ convert input.pdf[0-1] first2output.pdf
$ convert input.pdf[2-3] next2output.pdf

As you can see, convert is a versatile utility program to manipulate both image and pdf files.

Friday, February 27, 2015

How to update a MySQL table with data from itself

Suppose I want to update a MySQL table with data that refers to the same table. To make it easier to follow along, I will use the following actual WordPress administration scenario.

My task is to copy the attributes of a WordPress theme named decode to its child theme named decode-child. Theme attributes are stored in the MySQL table wp_options.

The table has 2 relevant fields:

  1. option_name

    This field identifies the theme. The parent theme has the value theme_mods_decode; the child theme, theme_mods_decode-child.

  2. option_value

    This is the field we want to copy from the parent theme to its child.

In essence, I want to copy the option_value of the record with option_name equals theme_mods_decode to the record with option_name equals theme_mods_decode-child.

We will use a subquery to accomplish the task. A subquery is a SELECT SQL statement within another SQL statement, say UPDATE.

My first subquery attempt resulted in an error:

update wp_options set option_value = 
  (select option_value from wp_options where option_name = 'theme_mods_decode') 
where option_value = 'theme_mods_decode-child'; 
ERROR 1093 (HY000): You can't specify target table 'wp_options' for update in FROM clause

MySQL does not allow the FROM clause of a subquery to refer to the same table targeted for update.

To get around the above error, replace the table reference wp_options in the original subquery with a nested subquery (select * from wp_options):

update wp_options set option_value =   
  (select option_value from   
    (select * from wp_options) as x  
   where option_name = 'theme_mods_decode')   
where option_name = 'theme_mods_decode-child';   
Query OK, 1 row affected (0.05 sec)   
Rows matched: 1  Changed: 1  Warnings: 0   

The above solution may look like a hack to some of us. There may even be other solutions, perhaps using complex joins and unions. But I like the above approach because it is simple, both conceptually and syntactically.

Friday, February 20, 2015

What to do next after spinning up a Debian VPS

Recently, I opened an unmanaged VPS hosting account with Digital Ocean (of which I am an affiliate), and created a barebone Debian virtual server. Because the hosting was unmanaged, I had to apply all the system changes to the machine myself. Below were the steps I took to set up the Debian virtual machine after spinning it up for the first time.

I assume that you want to set up the LAMP stack - Linux, Apache, MYSQL, PHP - on your machine.

  1. SSH into the virtual machine.
    Before you can login, you need to know the public IP address of the new machine. Digital Ocean sends it to you in an email.

    $ ssh root@your.IP.address

  2. Change root password.
    When you first login to the Digital Ocean server, you are automatically prompted to enter a new password. If your VPS is with someone else, change the password with the following command.

    # passwd

  3. Get to know your system environment.

    • Which Linux release?

      # lsb_release -a

    • How much memory, and swap space?

      # free -h

                  total used free shared buffers cached   
      Mem:         497M  64M 432M     0B    7.3M    33M     
      -/+ buffers/cache: 23M 473M     
      Swap:          0B   0B   0B

      Note: No swap space was pre-allocated.

    • How much disk space?

      # df -h

    • Is DNS configured?

      # cat /etc/resolv.conf
      nameserver 8.8.4.4
      nameserver 8.8.8.8
      nameserver 209.244.0.3

    • Which repositories?

      # cat /etc/apt/sources.list
      deb http://mirrors.digitalocean.com/debian wheezy main
      deb http://security.debian.org/ wheezy/updates main

  4. Update Debian.

    # apt-get update
    # apt-get upgrade

  5. Configure correct timezone.

    Run the date command to verify the current time. If the time looks wrong, your machine may be pre-configured to the wrong timezone.

    # date Thu Dec 4 00:07:37 UTC 2014

    The current time reported was in the UTC timezone. But, I live in Vancouver, Canada which is in the Pacific timezone. To change the timezone, write the proper time zone string to the file /etc/timezone, and run dpkg-reconfigure.

    # echo America/Vancouver > /etc/timezone
    # dpkg-reconfigure -f noninteractive tzdata
    Current default time zone: 'America/Vancouver'
    Local time is now: Wed Dec 3 16:08:58 PST 2014.
    Universal Time is now: Thu Dec 4 00:08:58 UTC 2014.

    Now, if you run date, you should see the correct time in the local timezone.

    # date
    Wed Dec 3 16:09:01 PST 2014

    Refer to my earlier post for more details.

  6. Customize default text editor.

    If you are not used to a text editor, it can drive you crazy. So, I change the default text editor from nano to vim. You will thank me when you run visudo in the next step.

    Note: my first choice for text editor is emacs, but emacs is not installed by default. Sigh.

    # update-alternatives --config editor

    Select /usr/bin/vim.tiny.

  7. Add new sudo user.

    Soon, I'll configure SSH to prevent direct root login. Before that can happen, a new user is needed.

    First, create a new user as follows:

    # adduser peter

    Then, grant the new user sudo privileges.

    # visudo

    Insert the following line in the User privilege specification section.

    peter ALL=(ALL:ALL) ALL

  8. Tighten up SSH security.

    • Disallow root login.

    # sed -i -e "s/^PermitRootLogin yes/PermitRootLogin no/" /etc/ssh/sshd_config

    • Change SSH port number from port 22 to port 2121.

    # sed -i -e "s/^Port 22/Port 2121/" /etc/ssh/sshd_config

    • Restart the SSH daemon.

    # service ssh restart

  9. Build up firewall.

    Run iptables to see how your firewall is currently configured:

    # iptables -L   
    Chain INPUT (policy ACCEPT)
    target  prot opt source   destination
    
    
    Chain FORWARD (policy ACCEPT) 
    target  prot opt source   destination
    
    
    Chain OUTPUT (policy ACCEPT)   
    target  prot opt source   destination

    Note: The firewall is set up by default to accept everything.

    To build a minimal firewall that allows incoming web, SSH and ping traffic, issue the following commands in the specified order:

    # iptables -I INPUT 1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
    # iptables -I INPUT 2 -i lo -j ACCEPT
    # iptables -I INPUT 3 -m conntrack --ctstate NEW -p tcp --syn --dport 80 --j ACCEPT
    # iptables -I INPUT 4 -m conntrack --ctstate NEW -p tcp --syn --dport 2121 --j ACCEPT
    # iptables -I INPUT 5 -p icmp --icmp-type echo-request -m limit --limit 2/second -j ACCEPT
    # iptables -P INPUT DROP
    # iptables -P FORWARD DROP

    Inspect the new firewall using the following command:

    # iptables -L -v

    Refer to my earlier post for more details on how to set up the firewall.

  10. Save firewall configuration.

    The new firewall configuration does not persist in the event of a system reboot. To save the firewall permanently, you must first install the iptables-persistent package.

    # apt-get install iptables-persistent

    The install program will ask if you want to save the current `iptables` rules. Answer yes.

    If you make more firewall changes after installing the above package, you need to explicitly save the rules using the command below:

    # iptables-save > /etc/iptables/rules.v4

  11. Deploy fail2ban.

    fail2ban scans system log files for signs of a cyber attack. Upon the detection of a possible attack, it bans the originating suspect IP address for a specified amount of time.

    • Install fail2ban.

      # apt-get install fail2ban

    • Configure fail2ban.

      # cat > /etc/fail2ban/jail.local
      [DEFAULT]
      ignoreip = 127.0.0.1/8
      bantime = 3600
      maxretry = 3
      [ssh]
      maxretry = 3
      port = 2121

      bantime specifies the ban duration in seconds. maxretry specifies the threshold for triggering the ban. port specifies the SSH port number.

    • Restart fail2ban.

      # service fail2ban restart

  12. Create swap file.

    My VPS hosting plan with Digital Ocean gives me 512 MB of RAM and 20 GB in SSD disk space. While RAM is limited in that configuration, the virtual server has plenty of unused disk space. The extra disk space can be utilized to boost the available virtual memory. Follow the steps below to create the swap file /var/swap.img.

    # fallocate -l 500M /var/swap.img
    # chmod 600 /var/swap.img
    # mkswap /var/swap.img
    # swapon /var/swap.img
    # echo "/var/swap.img none swap sw 0 0" >> /etc/fstab

    Refer to my earlier post for more details.

  13. Set up SMTP server to redirect root emails.

    You run the risk of ignoring important system alerts if emails to root are not read regularly. Follow the steps below to forward all root emails to an email address that you actually monitor.

    • Configure exim4 to redirect all outbound emails to the Google Gmail SMTP server.

      You need a Gmail account in order to use Gmail SMTP. Refer to my earlier post for more details.

    # hostname --fqdn > /etc/mailname
    The /etc/mailname file should contain the fully-qualified domain name to use for outgoing mail messages. The Sender email address of a message uses this value as its domain name.

    # sed -i -e "s/dc_eximconfig_configtype=.*/dc_eximconfig_configtype='smarthost'/" /etc/exim4/update-exim4.conf.conf
    smarthost is the configuration type you want.

    # sed -i -e "s/dc_smarthost=.*/dc_smarthost='smtp.gmail.com::587'/" /etc/exim4/update-exim4.conf.conf
    If you have a non-Gmail SMTP server, replace smtp.gmail.com with the proper domain name.

    # sed -i -e "s/dc_hide_mailname=.*/dc_hide_mailname='false'/" /etc/exim4/update-exim4.conf.conf

    # sed -i -e "s/dc_other_hostnames=.*/dc_other_hostnames=''/" /etc/exim4/update-exim4.conf.conf

    # echo '*.google.com:yourAccount@gmail.com:yourPassword' >> /etc/exim4/passwd.client
    Customize the above with your own Gmail account and password. If you are not using Gmail, replace google.com with the proper domain name.

    # update-exim4.conf

    # service exim4 restart

    • Redirect root emails to a valid email address.

      Refer to an earlier post for more details.

    # echo -e 'root: peter\npeter: yourEmailAddress@somedomain.com' >> /etc/aliases
    The above command inserts 2 lines into /etc/aliases. The first line (root: peter) specifies that all emails to root are forwarded to peter, a new user we added earlier. The second line (peter: yourEmailAddress@somedomain.com) specifies that all emails to peter are redirected to the external email address.

    # newaliases
    Run newaliases to rebuild the email aliases database.

  14. Note which essential packages are still uninstalled.

    My yet-to-install package list comprises:

    • chkconfig
    • MYSQL
    • Apache
    • PHP5

    You can use the following command to find out if a package, say PHP5, is installed:

    # dpkg -l php5*
    dpkg-query: no packages found matching php5*

  15. Install missing packages.

    • chkconfig

      # apt-get install chkconfig

    • MYSQL

      # apt-get install mysql-server mysql-client

      After installing MYSQL, run the mysql_secure_installation program to tighten up security.

      # mysql_secure_installation

    • Apache

      # apt-get install apache2

    • PHP5

      # apt-get install php5 php5-mysql php5-gd php5-curl

    For further details on how to install the above packages, refer to my earlier post.

After you put in all the hard work to get this far, your machine is finally in a usable state. It is wise (albeit optional) to save the current machine state. Digital Ocean allows you to create a system snapshot of your server. You can restore the server to that particular state at any time in the future.

When you login to the virtual server via SSH, don't forget to use the new user and port number.

$ ssh -p 2121 peter@your.ip.address