Thursday, September 8, 2011

Bash String Tokenizing

Bash has some functionality to enable building a string tokenizer, by means of four modifiers:
  • #
  • ##
  • %
  • %%
We can then use to to get the first or last elements, from the end or the start of the string:

#!/bin/bash
TEST="i/am/a/very/big/dir"
echo ${TEST#*/}
echo ${TEST##*/}
echo ${TEST%/*}
echo ${TEST%%/*}

# am/a/very/big/dir
# dir
# i/am/a/very/big
# i

echo ${TEST#*/*/}
echo ${TEST%/*/*}

# a/very/big/dir
# i/am/a/very

I have omitted a leading slash in this string, which was a directory, so that it would print "i" instead of nothing. So you can use echo ${TEST%%/*}  to get the first element and saving echo ${TEST#*/} on every iteration of a cycle that tokenizes the string into an array.

The last two expressions serve to show that if you're dealing with some hardcoded delimiters, you can further expand it so that it matches what you want precisely.

Thursday, July 7, 2011

Converting Mercurial Repository to Git

First, clone the original Mercurial repo with hg:

hg clone ssh://hg@bitbucket.org/owner/repo

Now we'll need Git-Hg (massive kudos to offbytwo for making it possible):

git clone git://github.com/offbytwo/git-hg.git
cd git-hg/
git submodule update --init fast-export
cd ../repo
../git-hg/bin/git-hg clone ssh://hg@bitbucket.org/owner/repo
touch .gitignore
git add .gitignore
git pull git+ssh://git@github.com/mdvcs/md.git

Here I had some problems with the git repository already having a README file, so I just removed it and proceeded with the push.

git rm README
git push git+ssh://git@github.com/mdvcs/md.git

And you're done. Enjoy.

Troubleshooting below please.

Wednesday, March 2, 2011

Disabling mod_rewrite For Certain Paths

Say you have redirect a domain from .net to .com and have a rule such as:

# Rewrite to the new domain and add "www"
RewriteCond %{HTTP_HOST} !^www\.domain\.com$ [NC]
RewriteRule ^/(.*)$ http://www.domain.com/$1 [R=301,L]

These are catch all rules, which check if the host is domain.net, if not it just redirects it to it. (See the motives why you should use "www" or base domains and never both on this page)

In this scenario, if you want to disable this rewriting for a folder, for instance, you can add the following rule at the top, which will match and do no further rewriting ([L]) when it does. Here we will match something like http://www.domain.net/oldfolder/somethingelse

RewriteRule ^/oldfolder(.*)? - [L]

If you need more information on www or non-www URL rewriting, please see the www-redirection tutorial page.

URL Rewriting To WWW or non-WWW Domains Names

I'm a very strong proponent of having website redirection turned on for most websites. I have even written a post about this, with examples of failure to do this and it's consequences - go here if you want to take a look. I assume that if you've arrived here, you already know why you need to do it.

To do the rewriting I'll be focusing on Apache's mod_rewrite, in a Linux installation, which is what is available to people typically using a LAMP install for their content management system of choice - like Wordpress or phpBB. It's all quite simple, so let's get down to it.

Two ways to do it

This rewrite is the rewrite if you need to use a .htaccess file on your server:

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.chosendomain\.com$ [NC]
RewriteRule ^(.*)$ http://www.chosendomain.com/$1 [R=301,L]

There's another option when you want to keep it stored in a safer location, which is including it in Apache's httpd.conf file. There is a detail that must be taken care of, in this case: if you don't include the slash at the start of the rewrite rule, you'll find that some pages get a double slash at the end.

RewriteEngine On
RewriteCond %{HTTP_HOST} ^chosendomain\.com$ [NC]
RewriteRule ^/(.*)$ http://www.chosendomain.com/$1 [R=301,L]

These two blocks of code have slight differences. Notice the bold slash at the end, replacing the RewriteBase / directive. While they do the same thing, they are not interchangeable depending of where you are placing the mod_rewrite rules.

Note: Slashes are important on the RewriteCond rule as a safety precaution so that it matches "." and not some other random character - even though it is highly unlikely you have a domain "www2domain.com", where 2 is the random character, pointing to your server's IP.

If after you've done this, you want to exclude some folders(or paths) from the URL redirection you've just inserted, see the instructions on this page.

A Collection of Epic Fails Due to Lack of non-WWW Redirects

Sometimes people commit mistakes in web redirection concerning the use or not of www. as a subdomain of the main site hosted on a given domain. Most errors split in two types:
  • Serving the same content on two base URLs, due to hosting the same content in www.chosendomain.com and chosendomain.com. 
  • The web server is not running on the base domain, only on www subdomain, which results in a page not available type error that confuses people that forget to type the www. part of the URL - this is a loss of potential customers, and hence the "Epic Fail" title for this post.
Both errors are easy to fix, and you have alternate ways to fix the first one. If you're using <link rel="canonical"></link> to specify which page should be indexed, Google will index it only with the hostname you provided and all the pagerank will go towards the URL you choose for your website.
To do a simple redirect, check your tutorial at this link: URL redirection www or non-www.

Errors of not serving nor redirecting www. base domain

http://svdvyver.com/
http://worten.pt

SSL errors due to lack of non-www redirection

This is a particularly interesting error, since I hadn't met it yet. I was finding it strange that this website had an invalid domain and further research unveiled that the certificate authority issued it only for www.uzo.pt and not uzo.pt, which is expected. A simple redirect fiex

https://uzo.pt/pt/adiro-ja/manter-numero/optimus/pagina.uzo

Sending E-Mail From The Command Line In Linux

This guide outlines the requirements to send e-mail from the bash command line on Gentoo Linux but the instructions are mostly the same for Red Hat Enterprise Linux, OpenSUSE, Debian, Ubuntu or Slackware. Just replace emerge by your distro's package manager of choice, like yum, zypper or apt-get commands.

First, install the "mail-client/mailx" package by using emerge, as root:

emerge mail-client/mailx

echo "Hello World!" | mail -s Subject destination@mailserver.com

This assumes you already have "postfix" installed and configured, which you can check how to do here:

emerge postfix
/etc/init.d/postfix start

The test e-mail should arrive to your destination e-mail box of choice. Script away using the command to send e-mails in batch or to warn the sysadmin of problems in the box.

TORQUE Resource Manager Tutorial

The TORQUE resource manager is a complex piece of software, which deserves some quick tips on how to use the system without having downtime or lost jobs for your users. This page gathers some of the most important information you can know about using Torque and will be further expanded to accommodate all the important topics regarding this management component of computer clusters.

TORQUE is built on the principles of the Portable Batch System, also known as PBS, from which there are two versions OpenPBS(opensource but no longer maintained) and PBS Pro, a paid for PBS software. TORQUE is also open source and derives from OpenPBS, with currently many years of development separating TORQUE from OpenPBS. TORQUE is currently one of the better ways to manager cluster job queues, especially given that Sun Grid Engine is no longer free.

TORQUE PBS Server

The pbs_server process is what controls what nodes are assigned idle jobs from the queue on a machine(node) finishes a calculation and becomes empty. Sometimes it can have some problems, so it is useful to restart the pbs_server process, which usually is installed in /etc/init.d/pbs_server.

Restart PBS Server by issuing, as root(or with sudo):

/etc/init.d/pbs_server restart

The process can be restarted mostly without issue and there are two options for restarting the process, defined in TORQUE's config file:
  • delay
  • quick
If PBS_SERVER_STOP is set to "quick". In this situation, the running jobs will be let run without interaction until the server is back up. Setting it to "delay", the jobs will be checkpointed, rerun, or pbs_server will wait for the jobs to finish before restarting the service.
If you're trying to sort out blocked resources, it is recommended to use "quick".