Odi's astoundingly incomplete notes

Code

back | next

Fixing a JDOM design flaw

The problem was simple: Create an XML document and validate it against a given schema before sending it to the customer.

The solution seemed simple:

Use JDOM to create a document model in memory.
Use DOMOutputter to convert it to a W3C DOM
Validate the DOM against an XML schema using Xerces through JAXP

That's when I encountered a strange bug. Somehow the validator did not like my input. The problem turned out to be missing namespace support in the DOM. Well, my DOM nodes really were in no namespace, so it didn't surprise me that the DOM did not support namespaces. But the schema validation needs it unconditionally. Whether the nodes are in a namespace or not.
It boiled down to the problem that JDOM's DOMOutputter tried to be clever and created nodes without namespace support if the node was in no namespace. I addressed this by patching this class a little, so it gives you more control over this behaviour. Now it works. I hope that the JDOM people will include the patch in their next release.

Update: the patch has been included into the jdom-1.1 release

posted on 2006-05-23 11:46 UTC in Code | 0 comments | permalink

Add comment

XML schema incompatibilities with Java

Some XML Schema (XSD) numeric data types are slightly incompatible with the Java language. While byte, short, int and long are defined the same size in the two typing systems (8, 16, 32 and 64 bits), XML schema offers unsigned counterparts too. The Java language still does not feature unsigned numeric types.
This means when a XSD defines a field as unsignedInteger you must use long in Java or you have to make sure the value never exceeds Integer.MAX_VALUE. XSD even has arbitrary size/precision numeric types like decimal or integer. You must choose the Java data type carfully depending on the application.

posted on 2006-05-09 14:45 UTC in Code | 0 comments | permalink

Add comment

Oracle string sizing

When you create a column in an Oracle table with VARCHAR2(32) this means the string can take up 32 bytes. With UTF-8 becoming more and more popular this is not the same as 32 characters. (In UTF-8 characters can use one to four bytes.) To circumvent this problem make sure to always create columns explicitely with the length in characters: VARCHAR2(32 CHAR).

posted on 2006-05-03 12:01 UTC in Code | 0 comments | permalink

Add comment

PID file in bash

To write the PID of a background process to file:

/usr/bin/program &
echo $! > program.pid

To later check if the process is still running:

PID=$(cat program.pid)
if [ -e /proc/${PID} -a /proc/${PID}/exe -ef /usr/bin/program ]; then
echo "Still running"
fi

posted on 2006-04-21 21:30 UTC in Code | 2 comments | permalink

Spectacular! I used this to ensure my rsync script only runs one rsync at a time.

`rsync -avpogtH --links --copy-unsafe-links user@remotehost:/remote/directory/ /Local/HD/ >> $logfile.`& echo $! > program.pid
waitpid=`cat program.pid`
wait $waitpid

...and repeat

thanks!

Pol

Add comment

XML schema validation

How to quickly validate some XML files against a schema with Cygwin:

xmllint --schema infomodel/MySchema.xsd --noout *.xml

posted on 2006-04-19 13:58 UTC in Code | 0 comments | permalink

Add comment

UNIX standards

IBM has an interesting article (mirror) that talks about the standardisation of UNIX and why this is such a great thing compared to other operating systems.

Twenty-year-old UNIX utilities still compile and run. A new desktop computing API will come and everyone will have to rewrite for it, but mountains will erode away before read() and write() stop working. This is the reason that all the hassle of formal UNIX standards has had so little effect on practical UNIX software development; the core API is simple, clean, and well-designed, and there is no need to change it significantly.

posted on 2006-03-12 10:34 UTC in Code | 0 comments | permalink

Add comment

Statistics gatherers

Marketing claims a good deal of speed on the web. Marketing people make webmasters include JavaScript code or images in their pages that send data to statistics servers. The most prominent of which are Google Analytics and Falk. As this JavaScript code is often placed at the beginning of the page it defers rendering of the page in your browser until the data is transmitted. As more and more websites implement this the statistics servers become loaded and slower. That means the page load times increase.

From a system design point of view this is extremely bad architecture anyway. It creates a dependency to third-partys system you don't control. Yes, really a dependency. Your website's load time is directly dependent on the response time of the statistics server. This response time also depends on the network performance. As the connection to the statistics server is initiated by the client browser the webmaster has no control over this network performance: the route and bandwidth is completely dependent on the client. So clients in Europe will probably see a different behaviour than clients in America. This design also has a reliability problem. It only works if the client browser actually does send the request to the statistics server. So if the client decides not to do that the statistics will be wrong.

You can easily see by now that this design is totally insane.

The correct way of doing this is on the server side. The web server should send the statistics data to the statistics server in the background in another thread. It could even do that with batch processing: collect data offline and send it to the server once or several times a day. This would greatly reduce traffic and load. The web server also has the chance to measure response times and unavailability of the statistics server and can react appropriately. Of course this requires the providers of statistics services to provide APIs that are suited for this purpose. But widespread incompetence and ignorance of todays so called "software engineers" give birth to crap like this.

That's why I have decided no longer to accept this. Whenever I notice that the site I am viewing is slow I go and add the offending statistics server to my black list. The black list is my hosts file. I added these entries:

127.0.0.2       a.as-us.falkag.net s.as-us.falkag.net red.as-us.falkag.net
127.0.0.2       a.as-eu.falkag.net s.as-eu.falkag.net red.as-eu.falkag.net
127.0.0.2       www.google-analytics.com
127.0.0.2       m3.doubleclick.net 2o7.net
127.0.0.2       an.tacoda.net anrtx.tacoda.net
127.0.0.2       adfarm.mediaplex.com img.mediaplex.com
127.0.0.2       g14.nyc.intellitxt.com
127.0.0.2       js.adsonar.com

Maybe I will put up a separate web page that lists the most common and most annoying statistics hosts on the web.

posted on 2006-03-10 10:42 UTC in Code | 0 comments | permalink

Add comment

Updates done right in EJB3

The EJB3 specs are very well designed. The decision to use it finally paid off!

I was facing the following problem: A structure of objects that is already present in the database needs to be updated with new data from memory. The structure is complex (a more complex graph than a tree) and has lots of relationships back and forth in the object model. The problem can not be solved by first deleting the existing data and then inserting the new data because the existing data is referenced to from other objects. (Some delete operations would lead to foreign key constraint violations.)

So it is necessary to use the EntityManager.merge operation. We have not defined any persistence cascading, so objects need to be merged one by one. For this to succeed it must be done like so in exactly this order:

Find existing persistent objects that corresonds to the new objects.
Copy the primary key (ID fields) and version numbers of all persistent objects to the new objects. This effectively makes them detached entities.
Merge all objects. The order is unimportant.

The EJB3 specs support this by two requirements:

Guaranteed object identity: Managed objects with the same primary key are identical instances.
EntityManager.merge produces referenced entities that are managed: As outlined in section 3.2.4.1 of the persistence specs.

This means that after merging you have a consistent persistent object structure. There is no need to take special care of all the relationships.

Let me illustrate this by the following diagram:

We start with X and Y which are two new, transient objects. X holds a reference to Y. After setting the ID and version of X and Y, merge is called on X. This creates a managed object X' which references a managed object Y'. Y' is automatically fetched from the database using the ID of Y. Now calling merge on Y will copy the data to the instance Y'. There is no need to manually relate X' and Y' again!

posted on 2006-03-08 15:22 UTC in Code | 0 comments | permalink

Add comment

The unnecessary function

If ever you write this:

   if(!source.endsWith(File.separator)) source+=File.separator;

you are probably looking for File(File, String). If you use the API correctly there is never a need to manually fiddle together paths of a file.

posted on 2006-03-06 15:28 UTC in Code | 0 comments | permalink

Add comment

A 3D shooter in Java

Some people have claimed earlier, that Java was not suitable for fast 3D games. They have now all been proven wrong. Check out Jake2. It's a Java port of the Quake2 game engine. Runs directly off the web with Java Webstart. Crossplatform.

posted on 2006-01-04 18:53 UTC in Code | 0 comments | permalink

Add comment

back | next