Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.

Tuesday, December 6, 2022

My algorithm performs badly, do I need asyncio?

Real Question (somewhat abbreviated): "My algorithm performs badly, do I need asyncio?"

Short answer: No.

Long answer: Sigh. No. Do you need a slap upside the head?

Here's how it plays out:

Q: "We figured that if we 'parallelize' it, then we can apply multiple cores, and it will run 4x as fast."

Me: "What kind of I/O are you doing?"

Q: "None, really. It's compute-intensive."

Me: "Async is for I/O. A function can be computing while other functions are waiting for I/O to complete."

Q: "Right. We can have lots of them, so they each get a core."

Me: "Listen, please. A function can be computing. That's "A". Singular. One. Take a step back from the asyncio package. What are you trying to do?"

Q: "Make things faster."

Me: "Take a breath. Make what faster?"

Q: "A slow algorithm."

Me: 

Q: "Do you want to know what we're trying do?"

Me: 

Q: "First, we query the database to get categories. Then we query the database to get details for the categories. Then we query the database to organize the categories into a hierarchy. Except for certain categories which are special. So we have if-statements to handle the special cases."

Me: "That's I/O intensive."

Q: "That's not the part that's slow."

Me: 

Q: "Context is important. I feel the need to describe all of the background."

Me: "That's trivia. It's as important as your mother's maiden name. What's the problem?"

Q: "The problem is we don't know how to use asyncio to use multiple cores."

Me: "Do you know how to divide by zero?"

Q: "No. It's absurd."

Me: "We already talked about asyncio for compute-intensive processing. Same level of absurd as dividing by zero. What are you trying to do?"

Q: "We have some for loops that compute a result slowly. We want to parallelize them."

Me: "Every for statement that computes a collection is a generator expression. Every generator expression can be made into a list, set, or dictionary comprehension. Start there."

Q: "But what if the for statement has a super-complex body with lots of conditions?"

Me: "Then you might have to take a step back and redesign the algorithm. What does it do?"

Q: <code> "See all these for statements and if-statements?"

Me: "What does it do? What's the final condition?"

Q: "A set of valid answers."

Me: "Define valid."

Q: "What do you mean? 'Define valid?' It's a set that's valid!"

Me: "Write a condition that defines whether or not a result set is valid. Don't hand-wave, write the condition."

Q: "That's impossible. The algorithm is too complex."

Me: "How do you test this process? How do you create test data? How do you know an answer it produces is correct?"

Q:

Me: "That's the fundamental problem. You need to have a well-defined post-condition. Logic. An assert statement that defines all correct answers. From that you can work backwards into an algorithm. You may not need parallelism; you may simply have a wrong data structure somewhere in <code>."

Q: "Can you point out the wrong data structure?"

Me: 

Q: "What? Why won't you? You read the code, you can point out the problems."

Me: 

Q: "Do I have to do all the work?"

Me:

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.