Odi's astoundingly incomplete notes
New entriesCode
back | nextFixing a JDOM design flaw
The problem was simple: Create an XML document and validate it against a given schema before sending it to the customer.
The solution seemed simple:
It boiled down to the problem that JDOM's DOMOutputter tried to be clever and created nodes without namespace support if the node was in no namespace. I addressed this by patching this class a little, so it gives you more control over this behaviour. Now it works. I hope that the JDOM people will include the patch in their next release.
Update: the patch has been included into the jdom-1.1 release
The solution seemed simple:
- Use JDOM to create a document model in memory.
- Use DOMOutputter to convert it to a W3C DOM
- Validate the DOM against an XML schema using Xerces through JAXP
It boiled down to the problem that JDOM's DOMOutputter tried to be clever and created nodes without namespace support if the node was in no namespace. I addressed this by patching this class a little, so it gives you more control over this behaviour. Now it works. I hope that the JDOM people will include the patch in their next release.
Update: the patch has been included into the jdom-1.1 release
Add comment
XML schema incompatibilities with Java
Some XML Schema (XSD) numeric data types are slightly incompatible with the Java language. While byte, short, int and long are defined the same size in the two typing systems (8, 16, 32 and 64 bits), XML schema offers unsigned counterparts too. The Java language still does not feature unsigned numeric types.
This means when a XSD defines a field as unsignedInteger you must use long in Java or you have to make sure the value never exceeds Integer.MAX_VALUE. XSD even has arbitrary size/precision numeric types like decimal or integer. You must choose the Java data type carfully depending on the application.
This means when a XSD defines a field as unsignedInteger you must use long in Java or you have to make sure the value never exceeds Integer.MAX_VALUE. XSD even has arbitrary size/precision numeric types like decimal or integer. You must choose the Java data type carfully depending on the application.
Oracle string sizing
When you create a column in an Oracle table with
VARCHAR2(32)
this means the string can take up 32 bytes. With UTF-8 becoming more and more popular this is not the same as 32 characters. (In UTF-8 characters can use one to four bytes.) To circumvent this problem make sure to always create columns explicitely with the length in characters: VARCHAR2(32 CHAR)
.PID file in bash
To write the PID of a background process to file:
/usr/bin/program & echo $! > program.pidTo later check if the process is still running:
PID=$(cat program.pid) if [ -e /proc/${PID} -a /proc/${PID}/exe -ef /usr/bin/program ]; then echo "Still running" fi
Spectacular! I used this to ensure my rsync script only runs one rsync at a time.
`rsync -avpogtH --links --copy-unsafe-links user@remotehost:/remote/directory/ /Local/HD/ >> $logfile.`& echo $! > program.pid
waitpid=`cat program.pid`
wait $waitpid
...and repeat
`rsync -avpogtH --links --copy-unsafe-links user@remotehost:/remote/directory/ /Local/HD/ >> $logfile.`& echo $! > program.pid
waitpid=`cat program.pid`
wait $waitpid
...and repeat
thanks!
Pol
Pol
XML schema validation
How to quickly validate some XML files against a schema with Cygwin:
xmllint --schema infomodel/MySchema.xsd --noout *.xml
UNIX standards
IBM has an interesting article (mirror) that talks about the standardisation of UNIX and why this is such a great thing compared to other operating systems.
Twenty-year-old UNIX utilities still compile and run. A new desktop computing API will come and everyone will have to rewrite for it, but mountains will erode away before
Twenty-year-old UNIX utilities still compile and run. A new desktop computing API will come and everyone will have to rewrite for it, but mountains will erode away before
read()
and write()
stop
working. This is the reason that all the hassle of formal UNIX standards
has had so little effect on practical UNIX software development; the core
API is simple, clean, and well-designed, and there is no need to change it
significantly.Statistics gatherers
Marketing claims a good deal of speed on the web. Marketing people make webmasters include JavaScript code or images in their pages that send data to statistics servers. The most prominent of which are Google Analytics and Falk. As this JavaScript code is often placed at the beginning of the page it defers rendering of the page in your browser until the data is transmitted. As more and more websites implement this the statistics servers become loaded and slower. That means the page load times increase.
From a system design point of view this is extremely bad architecture anyway. It creates a dependency to third-partys system you don't control. Yes, really a dependency. Your website's load time is directly dependent on the response time of the statistics server. This response time also depends on the network performance. As the connection to the statistics server is initiated by the client browser the webmaster has no control over this network performance: the route and bandwidth is completely dependent on the client. So clients in Europe will probably see a different behaviour than clients in America. This design also has a reliability problem. It only works if the client browser actually does send the request to the statistics server. So if the client decides not to do that the statistics will be wrong.
You can easily see by now that this design is totally insane.
The correct way of doing this is on the server side. The web server should send the statistics data to the statistics server in the background in another thread. It could even do that with batch processing: collect data offline and send it to the server once or several times a day. This would greatly reduce traffic and load. The web server also has the chance to measure response times and unavailability of the statistics server and can react appropriately. Of course this requires the providers of statistics services to provide APIs that are suited for this purpose. But widespread incompetence and ignorance of todays so called "software engineers" give birth to crap like this.
That's why I have decided no longer to accept this. Whenever I notice that the site I am viewing is slow I go and add the offending statistics server to my black list. The black list is my hosts file. I added these entries:
From a system design point of view this is extremely bad architecture anyway. It creates a dependency to third-partys system you don't control. Yes, really a dependency. Your website's load time is directly dependent on the response time of the statistics server. This response time also depends on the network performance. As the connection to the statistics server is initiated by the client browser the webmaster has no control over this network performance: the route and bandwidth is completely dependent on the client. So clients in Europe will probably see a different behaviour than clients in America. This design also has a reliability problem. It only works if the client browser actually does send the request to the statistics server. So if the client decides not to do that the statistics will be wrong.
You can easily see by now that this design is totally insane.
The correct way of doing this is on the server side. The web server should send the statistics data to the statistics server in the background in another thread. It could even do that with batch processing: collect data offline and send it to the server once or several times a day. This would greatly reduce traffic and load. The web server also has the chance to measure response times and unavailability of the statistics server and can react appropriately. Of course this requires the providers of statistics services to provide APIs that are suited for this purpose. But widespread incompetence and ignorance of todays so called "software engineers" give birth to crap like this.
That's why I have decided no longer to accept this. Whenever I notice that the site I am viewing is slow I go and add the offending statistics server to my black list. The black list is my hosts file. I added these entries:
127.0.0.2 a.as-us.falkag.net s.as-us.falkag.net red.as-us.falkag.net 127.0.0.2 a.as-eu.falkag.net s.as-eu.falkag.net red.as-eu.falkag.net 127.0.0.2 www.google-analytics.com 127.0.0.2 m3.doubleclick.net 2o7.net 127.0.0.2 an.tacoda.net anrtx.tacoda.net 127.0.0.2 adfarm.mediaplex.com img.mediaplex.com 127.0.0.2 g14.nyc.intellitxt.com 127.0.0.2 js.adsonar.comMaybe I will put up a separate web page that lists the most common and most annoying statistics hosts on the web.
Updates done right in EJB3
The EJB3 specs are very well designed. The decision to use it finally paid off!
I was facing the following problem: A structure of objects that is already present in the database needs to be updated with new data from memory. The structure is complex (a more complex graph than a tree) and has lots of relationships back and forth in the object model. The problem can not be solved by first deleting the existing data and then inserting the new data because the existing data is referenced to from other objects. (Some delete operations would lead to foreign key constraint violations.)
So it is necessary to use the
Let me illustrate this by the following diagram:

We start with X and Y which are two new, transient objects. X holds a reference to Y. After setting the ID and version of X and Y, merge is called on X. This creates a managed object X' which references a managed object Y'. Y' is automatically fetched from the database using the ID of Y. Now calling merge on Y will copy the data to the instance Y'. There is no need to manually relate X' and Y' again!
I was facing the following problem: A structure of objects that is already present in the database needs to be updated with new data from memory. The structure is complex (a more complex graph than a tree) and has lots of relationships back and forth in the object model. The problem can not be solved by first deleting the existing data and then inserting the new data because the existing data is referenced to from other objects. (Some delete operations would lead to foreign key constraint violations.)
So it is necessary to use the
EntityManager.merge
operation. We have not defined any persistence cascading, so objects need to be merged one by one. For this to succeed it must be done like so in exactly this order:- Find existing persistent objects that corresonds to the new objects.
- Copy the primary key (ID fields) and version numbers of all persistent objects to the new objects. This effectively makes them detached entities.
- Merge all objects. The order is unimportant.
- Guaranteed object identity: Managed objects with the same primary key are identical instances.
- EntityManager.merge produces referenced entities that are managed: As outlined in section 3.2.4.1 of the persistence specs.
Let me illustrate this by the following diagram:

We start with X and Y which are two new, transient objects. X holds a reference to Y. After setting the ID and version of X and Y, merge is called on X. This creates a managed object X' which references a managed object Y'. Y' is automatically fetched from the database using the ID of Y. Now calling merge on Y will copy the data to the instance Y'. There is no need to manually relate X' and Y' again!
The unnecessary function
If ever you write this:
if(!source.endsWith(File.separator)) source+=File.separator;you are probably looking for File(File, String). If you use the API correctly there is never a need to manually fiddle together paths of a file.
A 3D shooter in Java
Some people have claimed earlier, that Java was not suitable for fast 3D games. They have now all been proven wrong. Check out Jake2. It's a Java port of the Quake2 game engine. Runs directly off the web with Java Webstart. Crossplatform.
back
|
next