Rants on the daily grind of building software. This has been moved to https://slott56.github.io. Fix your bookmarks.
Bio and Publications
Monday, November 30, 2009
Python Book -- Thanks for the Bug Reports
Thursday, November 26, 2009
Python Book -- Version 2.6
Tuesday, November 24, 2009
Standard "Distributed" Database Issues
Thursday, November 19, 2009
On Risk and Estimating and Agile Methods
Sunday, November 15, 2009
ORM magic
The ORM layer "hides" the database, right?
We never have to think about persistence, right? It just magically "happens."
Wrong.
Here's some quotes from a recent email:
"Somehow people are surprised that we would have performance issues. Somehow people are surprised that now that we are putting humpy/dumpy together that we would have to go back and look at how we have partitioned the system."
I'm not sure what all of that means except that it appears that the author thinks mysterious "people" think performance considerations are secondary.
I don't have a lot of technical details, just a weird ranting list of complaints, including the following.
"... the root cause of the performance issue was that each call to the component did a very small amount of work. So, they were having to make 10 calls to 10 different components to gather useful info. Even though each component calls was quick (something like 0.1 second), to populate the gui screen, they had to make 15 of them."
Read the following Stack Overflow questions: Optimizing this Django Code?, and Overhead of a Round-trip to MySql?
ORM Is A "Silver Bullet" -- It Solves All Our Problems
If you think that you can adopt some architectural component and then program without further regard for the what that component actually does, stop coding now and find another job. Seriously.
If you think you don't have to consider performance, please save us from having to clean up your mess.
I'm repeatedly shocked at people who claim that some particular ORM (e.g., Hibernate) was unacceptable because of poor performance.
ORM's like Hibernate, iBatis, SQLAlchemy, Django ORM, etc., are not performance problems. They're solutions to specific problems. And like all solution technology, they're very easy to misuse.
Hint 1: ORM == Mapping. Not Magic. Mapping.
The mapping is from low-rent relational row-column (with no usable collections) to object instances. That's all. Just mapping rows to objects. No magic. Object collections and SQL foreign keys are cleverly exchanged using specific techniques that must be understood to be used.
Hint 2: Encapsulation != Ignorance. OO design frees us from "implementation details". This does not mean that it frees us from performance considerations. Performance is not an "implementation detail". The performance considerations of class encapsulation are central to the very idea of encapsulation.
One central reason we have object-oriented design is to separate performance from programming nuts and bolts. We want to be able to pick and choose alternative class definitions based on performance considerations.
ORM's Role.
ORM saves writing mappings from column names to class instances. It saves us from writing SQL. It doesn't remove the need to actually think about what's actually going on.
If an attribute is implemented as a property that actually does a query, we need to pay attention to this. We need to read the API documentation, know what features of a class do queries, and think about how to manage this.
If we don't know, we need to write experiments and spikes to demonstrate what is happening. Reading the SQL logs should be done early in the architecture definition.
You can't write random code and complain that the performance isn't very good.
If you think you should be able to write code without thinking and understanding what you're doing, you need to find a new job.
Tuesday, November 10, 2009
Another HTML Cleanup
# Fix style="background-image:url("url")"
background_image = re.compile(r'background-image:url\("([^"]+)"\)')
def fix_background_image( match ):
return 'background-image:url("e;%s"e;)' % ( match.group(1) )
# Fix src="url name="name""
bad_img = re.compile( r'src="([^ ]+) name="([^"]+)""' )
def fix_bad_img( match ):
return 'src="%s" name="%s"' % ( match.group(1), match.group(2) )
fix_style_quotes = [
(background_image, fix_background_image),
(bad_img, fix_bad_img),
]
The "fix_style_quotes" sequence is provided to the BeautifulSoup contructor as the markupMassage value.
Friday, November 6, 2009
BBEdit Configuration
Wednesday, November 4, 2009
Parsing HTML from Microsoft Products (Like Front Page, etc.)
def clean_directives( page ):
"""
Stupid Microsoft "Directive"-like comments!
Must remove all <!--[if...]>...<![endif]--> sequences. Which can be nested.
Must remove all <![if...]>...<![endif]> sequences. Which appear to be the nested version.
"""
if_endif_pat= re.compile( r"(\<!-*\[if .*?\]\>)|(<!\[endif\]-*\>)" )
context= []
start= 0
for m in if_endif_pat.finditer( page ):
if "[if" in m.group(0):
if start is not None:
yield page[start:m.start()]
context.append(m)
start= None
elif "[endif" in m.group(0):
context.pop(-1)
if len(context) == 0:
start= m.end()+1
if start is not None:
yield page[start:]
Stored Procedures and Ad Hominem Arguments
- The "DBA as Bottleneck" problem. In short, the DBA's take projects hostage while the development team waits for stored procedures to be written, corrected, performance tuned or maintained.
- The "Data Cartel" problem. The DBA's own parts of the business process. They refuse (or complicate) changes to fundamental business rules for obscure database reasons.
- The "Unmaintainability" problem. The stored procedures (and triggers) have reached a level of confusion and complexity that means that it's easier to drop the application and install a new one.
- The "Doesn't Break the License" problem. For some reason, the interpreted and source-code nature of stored procedures makes them the first candidate for customization of purchased applications. Worse, the feeling is that doing so doesn't (or won't) impair the support agreements.