Looking back at this series, I have the strong feeling that I’m being unfair to Resin. I’m judging it using the same criteria I would use to judge our own production, highly optimized code. The projects have very different goals, maturities, and environments. That said, I think that a lot of the comments I have on the project are at the implementation level. That is, they can be fixed (except maybe the analyzer/tokenizer pipeline) by simply optimizing one method at a time. Even the architectural change with analyzing the text isn’t very big. What is important is that the code is quite clear, is easy to follow, and has a well-defined structure. That means that it is actually possible to make this changes as the project matures.
And now that this is out of the way, let me cover some of the things that I would have done differently in the codebase. A lot of them are I/O-related. The usage of all those different files and the way this is done is decidedly not optimal — in particular, opening and closing of the files constantly, reading and seeking all over the place, etc. The actual design seems to be based around LSM, even if this isn't stated explicitly. And that has pretty good semantics already for writes, but reads currently are probably leaning very heavily on the file system cache, and that won’t work as the data grows beyond a certain scope.
No comments:
Post a Comment