My humble opinion on software

Wednesday, June 06, 2012

Simple HTTP monitor on CentOS / RH

This is just a small memo note on how to do an HTTP monitoring script. Be aware, there are tons of other ways to do it, this one is simple but might not be the best for you.

1. In your home folder, create httpmonitor.sh script file, replace the e-mail address with the one you want your notifications to go to:

#!/bin/bash
# Simple HTTP monitor script
# Monitors a URL and sends an email when the status of it changes
NOTIFY_EMAIL=MY_EMAIL_ADDRESS@MYDOMAIN.com
if [ -z $1 ]; then
  echo Usage: $0 url_to_monitor
  echo Example: $0 http://www.google.com/
 exit 1
fi

  URL=$1
  STATUS_FILE_PREFIX=.`echo $URL | sed -s 's/[\/\.:]/-/g'`
  RESULT_TEXT=`lwp-request -m HEAD -t 30 $URL | head -n 1`
  STATUS_FILE_SUFFIX=`echo $RESULT_TEXT | sed -s 's/ /-/g'`

  if [ ! -f $STATUS_FILE_PREFIX.$STATUS_FILE_SUFFIX ]; then
    rm "$STATUS_FILE_PREFIX".* > /dev/null 2>&1
    touch $STATUS_FILE_PREFIX.$STATUS_FILE_SUFFIX
    echo $URL returned $RESULT_TEXT | mail -s "[$0]: $URL returned $RESULT_TEXT"
 $NOTIFY_EMAIL
  fi

exit $RESULT

2. chmod a+x httpmonitor.sh so that cron job can execute it

3. run ./httpmonitor.sh <theurl> manually to check script executes with no errors

4. crontab -e (most probably it will be vi, so do 'i' to insert a line and, e.g. to check every 10 minutes:)
10 * * * * /home/MYUSERNAME/httpmonitor.sh http://myurl.com

hit ESC and type ':wq!'

5. check with crontab -l that the line looks like you want it

That's it. It will send you e-mails whenever HTTP status of the URL changes (say, from 200 to 503 or if it will time out)

Attributions:
The script is taken from Confessions of the Guru blog with a fix for rm quotes and using full lwp-request instead of just HEAD.

Monday, July 18, 2011

JDK6 (aka Java SE 6) web services and Tomcat

Web services came a long way since first introduced in 2000. The latest edition 6 of standard Java provides decent WS remoting support for simple APIs.

To make a simple web service out of your existing API, not much is needed. Just annotate your class with @WebService, public methods with @WebMethod, and method params with @WebParam:

@WebServicepublic class SimpleWS {
   @WebMethod 
   public String storeData(      @WebParam(name="data") String data
     )
   {
        try {
            BufferedWriter out = new BufferedWriter(
new FileWriter("test.txt", true));
            out.write(data+"\r\n");
            out.close();
        }
        catch (IOException e) {
            return "ERROR: "+e;
}
        return "OK";     
   }
}

This is it. If you want to make it accessible via web, all you need to do is to call

Endpoint.publish("http://localhost:8080/simplews", new SimpleWS());

Java2 SE has a built-in lightweight HTTP server that will accept web service requests and call your methods. All you need to know is to point your client code generator (e.g. in Visual Studio) to http://localhost:8080/simplews?wsdl

If you need a Java client, the best way to generate it's code is to call the command line wsimport tool from Java's bin folder:

wsimport -p client -keep http://localhost:8080/simplews?wsdl

Of course, you can use a different, more appropriate host name and port to make WSDL usable from remote locations.

You can make this web service into Windows Service by using, e.g. Apache Commons Daemon Package - I had a very good experience using it recently. I also find daemons on Linux much simpler, but it is outside of this posts' scope.

One thing to bear in mind is, once you deployed this service into production, changes to the web service code and descriptors will change WSDL, which will require changing all the clients. Think very carefully before the first deployment, and plan upgrades with all parties involved. For serious projects it is still preferable to start with WSDL and use it as a common denominator.

It is often important for troubleshooting, what specs are supported by this J2SE 6 built-in framework for compatibility reasons. Here we go:

WSDL 1.1 (root element will be and port binding will include the full URL where you published it)

JAXWS 2.1 javax.xml.ws

JAXB 2.1 javax.xml.bind

SAAJ 1.3 javax.xml.soap

JSR 181 2.0 javax.jws

For more information, see J2SE & Web Services.

Often, you need to make your existing web application web-services enabled. In this case, it makes sense to re-use existing Tomcat HTTP server facility, so you should not call Endpoint.publish(). Instead, download JAX-WS RI 2.1.7 and place provided libraries into your WEB-INF/lib folder. I actually use the following subset:

activation.jar
FastInfoset.jar
jaxb-impl.jar
jaxb-xjc.jar
jaxws-rt.jar
jaxws-tools.jar
jsr173_api.jar
jsr250-api.jar
mimepull.jar
resolver.jar
saaj-api.jar
saaj-impl.jar
stax-ex.jar
streambuffer.jar
woodstox.jar

You will also need to provision servlet mappings in your web.xml:

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
version="2.5">
<listener>
<listener-class>
com.sun.xml.ws.transport.http.servlet.WSServletContextListener
</listener-class>
</listener>

<servlet>
<servlet-name>SimpleService</servlet-name>
<servlet-class>
com.sun.xml.ws.transport.http.servlet.WSServlet
</servlet-class>
<load-on-startup>1>/load-on-startup>
</servlet>

<servlet-mapping>
<servlet-name>SimpleService</servlet-name>
<url-pattern>/simplews</url-pattern>
</servlet-mapping>
</web-app>

and put sun-jaxws.xml into the WEB-INF folder with the following content:

<?xml version="1.0" encoding="UTF-8"?>
<endpoints xmlns="http://java.sun.com/xml/ns/jax-ws/ri/runtime" version="2.0">
<endpoint name="SimpleService" implementation="com.mypackage.SimpleWS" url-pattern="/simplews" />
</endpoints>

As a result, your generated WSDL should be accessible via http://localhost:8080/[war file name]/simplews?wsdl

Thursday, March 22, 2007

Java OpenSource Components

Components I am currently using:

Lucene - indexer & search engine. There is a couple of commercial indexers that will work a bit faster, but Lucene does the trick very well. I would give it 8 out of 10.

HSQLDB - (aka Hypersonic) Java RDBMS, really easy to embed. If you want to provide zero admin configuration, check it out. There are some newer projects, even provided by Apache Foundation, but I am too lazy to try. Don't expect it to work very well with huge amount of tables (over 500) and in volatile environment (if indexes and columns are created on the fly, while there is a number of transactions running). For usual CRUD, even with heavy load, it works nicely.

Rhino - very useful, if you need to provide high level scripting for your application (e.g. you want field engineers to customize your solution). It will give you JavaScript access to all your server calls - just don't think of this JavaScript in terms of browsers and their DOM model.

Suigeneris DIFF - text comparison tool (diff, in short). Just works.

iBATIS - if Hibernate is overkill, not mentioning EJBs , iBATIS is your way to implement efficient DAO. Transactions, lazy loading, SQL code separaion from your Java sources.

ActiveMQ - durable messaging. Durable in terms that if your server goes down and up, the message queus will keep their data and order.

Tomcat - no comments. I guess it is harder to get around it than to try it out.

In evaluation

Apache Solr - indexer/searcher for structured data (i.e. XML).
Lius - indexer/searcher for most standard data formats: Office, PDF, XML etc. Just released as 1.0
Both of them are Lucene-based.
JackRabbit - JCR API ref implementation (content repository).

So far not in use:
Spring - good concept, very useful, but overkill for many smaller projects (by small I mean number and level of interdependency of separate components)
AXIS2 - lots of good ideas, but difficult to embed and a bit of a monster (own substructure of class loaders, complicated packaging). AXIS 1.4 works for me quite well, for simple embedded web services. JAX-WS 2.0 didn't work out at all, again because of embeddability into own web applications. However, if you have a full J2EE server, it is the way to go, of course.

Monday, March 05, 2007

Ok, Tomcat 6 is out and stable and contains new type of servlet called Comet (more info on Tomcat’s version is here), new Jetty has a similar mechanism (another open source servlet container that is usually considered to have the fastest engine when benchmarked; however, the speed comes with a price tag of portability). To understand, what is going on I was digging through the net for some time now.

Java has its java.nio since 1.4 – New I/O with non-blocking streams. It is possible to create non-blocking sockets with this API too. A tutorial I was using to understand the architecture is here: http://www.onjava.com/pub/a/onjava/2002/09/04/nio.html. This is the best tutorial I have found, but still it was a bit heavy on my brain :)

To speedup servlet engine by using NIO sockets, Jetty developers came up with continuations: http://docs.codehaus.org/display/JETTY/Continuations - a way to suspend current HTTP request thread and free the server thread for other clients. In ‘normal’ circumstances, i.e. in ‘old’ web development where user clicks on URL and gets the page rendered by the servlet, there is no need for suspending the thread – why make user wait? Not so with AJAX or HTTP 1.1 KeepAlive – here, client-side browser scripts want to have the connection open at all times to pull the newest data as soon as it is available: chat room engine waits on socket stream to receive new messages; input completion engine (e.g. http://labs.google.com/suggest) considers it too costly to open connection every time user types a letter to get suggestions etc. In general there is a shift in web server development strategy to move from request/response paradigm to endless asynchronous messaging queue.

Now, this whole NIO doesn’t really fit into Servlet API standard: http://blogs.webtide.com/gregw/2006/07/25/1153845234453.html

After reading all of these I feel like I am at the square one and have no clear idea as to how to write ‘clean’ web application. All the new stuff creates a huge mess of interwoven technologies, which are quite difficult to sort out by anyone looking at the ‘new style’ servlet code. And JSP 2.1 is not helping by introducing new Unified Expression Language. Are we heading towards faster web app servers and slower, disoriented programmers with constant headaches? For me, I decided to stick to less “cutting edge”, but easier to read and maintain old Servlet API standard, for now.

Friday, November 24, 2006

On relations (part 2) - IDs

The first part of this article identified a need to have system-wide object identifier in semantic form like (<classid>.<objectid>).

There are many different ways to implement it in practice. Obviously, the overall length of the ID is important for performance and to preserve memory.
However, the real data type ID mapping in the database is of less importance. Performance tests done by author have shown that VARCHAR, CHAR, BINARY or NUMERIC types of the comparable length perform similarly well in the typical server environment. The tests included bulk object creation, fetching by ID and joining multiple tables by ID with different execution plans and for different scenarios. This often comes as a surprise to the people, who believe that auto-incremented IDs will create measurable performance gains. Detailed review of this claim is not in the scope of this article and will probably be a separate post.

Now, the length of the ID is trickier. The "object" part of the ID has following logical constrain:
- ID never repeats within the same class in the whole, often distributed, object domain

You can either achieve that by having a single synchronized centralized ID authority with active ID sequences, or by generating world-unique IDs similar to Microsoft's GUID. It is up to you to take either way, however you need to think about this: object domain is not necessarily a single database. And in distributed environment, part of it that absolutely has to be functioning at certain time might not have connectivity to the ID authority. On the other hand, world unique IDs usually take much more space (at the beginning, at least) and are never guaranteed to be unique, even though the probability of collision is ridiculously small (probably, Whales and Petunias will fall from the sky much sooner).

To summarize, GUIDs are easy to generate and fit well into distributed architecture. What is not good about GUIDs is that their content is fairly useless. Ordering by GUIDs doesn't give any clue on the historical order in which objects were created. Neither it contains any semantic information, for example class. In complex environment, retrieval of the class of object is one of the most often performed operations. Many systems use discriminator field to classify data. Doing SQL SELECT every time one has to know the class is not efficient, considering that class of object never ever changes during its lifetime. What makes sense is to embed class information into ID. All of the above calls for an ID string in following form:

[classID].[creation_time].[random_component]

classID Naturally, there are less classes then objects in life system. Also, their creation is relatively rare. To minimize ID size it might be acceptable to have classID as a short integer or string registered in some central authority. It can be even static pre-defined value.

Creation time is useful not only to order entities, but to minimize the size of random component. There are only so many objects that can can be created every 50ms (and on modern systems timer resolution is actually 100 nanoseconds). You would need to do some custom math to figure out the comfortable length of random component, but something between 32 to 64 bit will satisfy even very large and very distributed installations. The key here is to determine probability of collision within the timer resolution on a given maximum number of ID generating nodes, possible time shift accounted for. To read more on GUID creation, see IETF draft.

The author successfuly used 4-char class code for the first part, number of 100 ns intervals since Jan 1 2000 for the second part (44 bit is enough for over 100 years), and 32 bit XOR of guid parts for the random suffix. VARCHAR is better for debugging while BINARY IDs will be smaller - the choice is yours.

Of course, you can devise your own ID format taking in consideration the facts above.

Monday, October 30, 2006

On relations (part 1)

Introduction
Codd's relational algebra was a beginning of a significant break-through in computer science. The genious part was to restrict operands to finite relations only and define the term relational completeness. A loose example would be a definition that 'NOT something' means not all the infinite possibilities in the universe, but a finite set of tuples from current relation that have no 'something'; e.g. all rows from the table that are 'not something' count.

Back then, in 1970, it was clearly thinking ahead of time, with relational database systems golden time coming some 20 years later. However, the beginning of the 21st century was marked by extensive development and use of object-relational mappings and numerous tries to provide relational algebra for object-oriented domains. Essentially, people are looking to keep integrity and flexibility of relational query languages, while allowing advanced OO features like polymorphism, encapsulation and inheritance. And there is, of course, object-relational impendance mismatch.

Hence, a number of OQLs emerged.

OQL is some kind Object Query Language, with queries executed against class domains, not relations (tables). It allows to query and get access to all objects of certain class with provided (declared) properties. OQLs are declarative languages for imperative world. However breathtaking the idea is, implementation and usage hits many obstacles, some of which are not solved up to this time. Following provides an overview of problems and solutions for relational algebra usage in object-oriented domains.

Object Identification
Relational algebra is based on a notion of a candidate key, which is almost the same as primary key in the database. Essentially, any candidate key has to be unique in its domain and no part of this key can be unique by itself. The second statement is only relevant for compound keys. It is not a candidate key, if one of its parts is already unique. So, there is no need to include more than necessary values into the key.

Traditionally in OO languages an object is identified by its address in memory, essentially prohibiting its use in distributed, storage-agnistic environment. This is a huge limitation that renders relational algebra irrelevant to OO. The only solution is to introduce proper environment agnostic candidate keys for objects in a class. This is usually done by adding ID property to the most basic interface (or a pair of setId()/getId() methods). But what is a good ID/candidate/PK?

Having a global oject enumeration system allows introduction of a single base class

Many people start with reusing data fields, such as first and last name of a person, to compound a primary key. It is wrong from two points of view: it is not reliable (people get married, change names, even SSN number is changed sometimes), and it creates immense complexity to handle it. Essentialy a system has to provide as many linking mechanisms between objects, as there are candidate key combinations. So, good solution is to have data agnostic, uniform object identification. It also fits nicely to the OO concept of address in memory, that never depends on the object's members. Many systems fall back to database-provided sequetial ID's. First decision is, whether to create a sequence per class, or have one global sequence for all classes in the domain. The second is slower, and there is a danger to run out of IDs faster, but it provides additional data safety and integrity, because there is much smaller chance of mistake while resolving a reference. But there is more to it. Having a global oject enumeration system allows introduction of a single base class, similar to Object in Java, object in C# or TObject in Object Pascal. Without it, you would not have a chance to introduce a foreign key to (or any relationship with) an unknown object in the system (like void* in C++, or Object reference in Java). So, most enterprise data repositories have some concept of unified ID sequence. But only few of them go further.

What is the one most important method in a single base class of any OO language? Not the toString(), obviously, and not the equals(), which is a problem in itself, when not overriden. It is the getClass() ot getType(). The central method that allows to learn stuff about current object at hand. If you analyze, how often you use instanceof and type conversions in your code, it will become clear to you, that ability to resolve the class of the object fast will certainly have impact on overall system performance.

One solution is to have R(object,class) relationship table. This table has a tendancy to grow huge, which is usually not a problem with modern RDBMs. What becomes a problem is the frequency of access and associated locks. Since your system needs to consult this table rather often, it quckly becomes a bottleneck. Smart way of solving it is to include class information into ID. "But it will be a compound key that is bad!", says attentive reader. Not necessarily compound. What we need is a domain-unique key that contains class reference. We would also like it to be short to preserve memory space. Following solutions have been tested by the author:

a string (varchar-based) ID in form (<classid>.<objectid>) Note that you can fall back to an ID sequence per class, since the ID as a whole will still be unique accross domain.

M-bit integer with first N fixed bits dedicated to class and the rest to object ID, obviously M>N and N<(M-N).

Next part will provide comparison and characteristics of both ID methods.