the necessity of software design

when designing software many aspects come into play and design plays an important role as part of the decisions made at the product level. there is a balance between effort and flexibility that is sometimes linear: quick and simple will hit a wall at some point and will most probably will not scale nor will it be easy to change key features and/or layout. these design decisions extend well beyond a choice for the right MVC backend and frontend framework, nor does it apply to the choice of the data warehouse and wether it should be relational or pure noSQL or a hybrid.

as a product one builds starts having more users interact with it and data is accumulated – requirements change and evolve constantly. staying agile in that respect makes sense, especially in the initial startup phase. in general, issues that may come up can be mitigated if the product is well designed. like everything else in the world software, there is a delicate, zen like balance between effort and efficiency.

dude where is my code?

so the users LOVE the right side bar and would like to add a couple of more navigation items. awesome. hmm… which file is that menu created? let me quickly consult with the guy who coded it… oh wait, he is no longer with the company… you know what, i can easily start the debugger and move about the code until i see something that makes sense… common practice? probably. efficient? certainly not. rule of thumb is, that it should be obvious as daylight where changes should be made and everyone maintain these rules, otherwise things get so messy that the most simple improvements are a pain. good code organization and well documented methodologies that are taught to the team and well explained are an important step

context and boundaries

so you have found the code segment where you think the changes should live. great.  before you refactor it , answer this question: are you clear on what this code is suppose to do? the expected input and output? is it clear what the code is NOT suppose to do? each code segment should live on it’s own, unit testing and all, where it’s very clear what this method is aimed to achieve. no guessing.

software fragility

nir i’ll be happy to take care of that ticket. just know that if i change that controller the other view will break and we’ll have to take care of that and then there’s this other controller… unfortunately it’s common to see software break incredibly with minor changes (especially frontend). this is why both unit and functional testing are critical, but that’s for another post.

development scalability

when one developer’s code commit breaks one function, ideally it will not impact other developers who are working on other regions of the code. ideally code integrations will include less merge conflicts and will be resolved smoothly. this is one good reason to switch to git if you haven’t done so yet.


updating your product maybe a very delicate process, especially if the model changes and you need to update the current schema and still be backward compatible to hold the current data sets for a live target.  now consider your product is white labeled to 50 customers where different version of the products are deployed to different servers, and ideally you may require to roll back or quickly update critical bugs. the complexity of maintaining these machines grows exponentially without a proper deployment strategy.


this is one important metric to wrap your heads around and be able to benchmark your team. as you build your system what is the effort required to add this specific feature? is the investment linear in terms of time and capital? does good design help people be more productive?


as the system grows the requirements will change because you get good feedback from your end users and what you thought you knew a year ago has not turned obsolete. supporting multiple and different code bases, the requirement for higher availability, redundancy, better performance and backward compatibility are all requirements that come down the pipe as your platform and client base grows.

some final thoughts… when it comes to design we now know that some prior planning and careful thinking can go a long way.

the bottom line is that the product owner needs to stay focused on what the business requirements are. the product should only solve the problems it is designed to solve so the time dedicated to development is better spent.



scrum 101

scrumscrum methodology is quite popular and i’d like to dedicate this post to making some sense out of it for those of you who are not familiar with it. if you are a development team, scrum could be an interesting fit for you or at the very minimum something new to consider and evaluate.

scrum is a software development methodology that works along the lines of lean manufacturing or agile. to make sure we all speak the same language, agile is lean is scrum in a superficial or high level view point. if you are wondering on the origin of the name, scrum is borrowed from rugby where the players lock up and try to get a hold of the ball by passing it with their feet. small iterations. makes sense?

the methodology assists in defining the development life cycle and stages, the key players and roles and how responsibility is delegated. scrum also assists in figuring out how to stay on top of project and it’s progress, how to address and perform changes/enhancements to the development plan and how to deal with risks. taking a step forward, scrum can help lead development teams, be engaged in drawing conclusions and improve both product and process on regular basis.

if you were a scrum, the world is roughly divided into you and waterfall. “you” means agile methodologies such as kanban and XP. waterfall is a more sequential approach from the 70s that was (probably) influenced by traditional manufacturing, and is similar to the approach of building a house:

  • there are restrictions on the order of operations (one cannot lay the roof before foundation for example)
  • mistakes are freaking expensive so better get it right by carefully planning and quality measuring your work
  • much repetition (many doors, many windows) so consolidating tasks means efficiency

so with waterfall one first gather all the requirements for a product, then architect the solution, then dive into a detailed, technical design of each component, code it up, integrate, test test test, repair/fix and release.

with agile, developing software is more like designing a department store:

  • usually there are loose restrictions on the order of the tasks at hand
  • a wide array of features
  • a detailed and strict planning may fail. small incremental steps and proper adjustments moving forward works better (scrum anyone?)
  • centralizing tasks helps to a certain extent

scrum in action:

  • with agile the team goes on sprints (2w minimum for us) when each run gets us one step closer to our goal
  • effectiveness is key: how many features (stories) were coded into the system. less lines of code in general, more code that does what the end user really needs
  • no elaborate MRD/PRD. with agile one maintains a backlog and direct communication. we hold daily meetings, spring planning and we retrospectively learn from our mistakes and success.
  • flat hierarchy across the team and more responsibilities is handed off to the developers. the team self manages using a structured process
  • a team is comprised with complementing skills, so each team can get stuff done on it’s own accord
so agile is more of a philosophy than rules right?
  • every activity is time measured and they are prioritized from the most important to the least. when time runs out we are hopefully better off than before we started as the product work was done priorly to make sure the most important features are on the top of the list (i.e. backlog)
  • with agile the developers are encouraged to write only what is absolutely necessary and most probably we will revisit this code later on for changes and enhancement. this is where a thought through QA process is essential

think of scrum and agile as a framework for getting the job done. depending on the dynamics of your team and size of company agile maybe what you want to implement. i think it works very well for startups and collaborating with small teams when outsourcing projects. at the end of the day agile/waterfall are all ways to increase productivity and allow developers to make the best of their time.

good luck!

healthcare and big data

everywhere you turn people are talking about big data, hadoop and sharding. rightfully so. in today’s day and age managing a lot of data is not an easy task, as performance and scalability are key. traversing large data sets, dividing them into tiny sections and distributing the load among many machines (processors) is nothing new.

hadoop exists in order to solve specific problems and has emerged out of necessity. what hadoop does is provide the infrastructure to connect multiple (cheap) servers into a coherent environment with which high i/o and cpu problems (algorithms) are solved.

it all started in 2004 when doug cutting of google released his document indexing project called lucene and decided to have it possible to achieve the same goals in distributed environment. hadoop BTW is his sun’s yellow elephant toy. in 2006 yahoo hired doug to improve his project so it can index the entire internet and made the project open source. that day marked the start of the revolution.

at it’s core hadoop includes two projects: one for distributed storage and one for distributed computing. around those two projects a vast of projects have evolved (and still are).

HDFS: hadoop distributed file system
this file system is designed to store large files and enable large and effective r/w. this is done by dividing the file into sizable chunks, while each chunk is normally stored on 3 nodes which can be anywhere. there is a “name node” that runs the mapping between a document and it’s constituent pieces and the data nodes on which they are stored on.

an API to write programs that will run in a parallel.  the developer really needs to write two simple functions: map and reduce that handle a single document (i.e. element of data) on multiple machines, when each node is responsible for the timing, handling errors and failures (network, i/o, etc). this allows for simple parallel batching, where a “job tracker” synchronizes the execution of the bach processes, when each one batch is sub divided into smaller tasks which are handled by the “task trackers”.

over time yahoo and facbook (to mention a few) wrote their own drivers over HDFS and mapReduce and have shared their work with the community. so hadoop is a code name for a set of technologies who harnesses the computing power of many machines to perform simple tasks in parallel. hadoop emerged from the world of un structured data where hundreds of millions of pages are analyzed. today big data is being implemented and researched in every facet of the economy, including healthcare.

why we use mongo DB

from, mongo is a scalable, high performance, open source, schema-free, document oriented database. so the one size fits all philosophy doesn’t work anymore, does it? non relational databases scale horizontally much better. just add another machine and you are good to go, and these days where big data is a big deal – speed, performance, flexibility and scalability are the names of the game. think about it… no schema, no concern with transactions. this is the commoditization of databases. what mongo does is try to perform as a key-value store with the functionality of RDBMS.

speaking of “traditional” database capabilities, mongo can index and has failover/replication support. data is stored as documents in binary JSON format. yes. mongo “gets” JSON out of the box. is your JSON valid? good. you can now import and query it. and no schema means no more ‘alter table’ crap, and the query syntax  is java script based and you can nest your queries as much as you want. moving right along, gridFS can store large binary objects efficiently . images. videos. whatever. just throw it in there.

the documents are just like records. only mongo has them as JSON binary objects. collections are your old school tables if you will. when you query mongo you get a cursor back and not a record per se, and you iterate over the set of result set like a champ. you guessed it. no more loading everything to memory – just want you need. this is a big victory for the performance gods. mongo is wonderful to perform analysis  as it is a data warehouse. dump in your JSON and analyze the hell out of the data.

mongo is not good handling transactions nor maintain the integrity of relationship between data. you can find out more at

CDA levels of interoperability

what does interoperability means in terms of the richness of the data when it comes to CDA documents? the CDA document has 3 levels of interoperability:

  • level1: CDA header + body of unstructured blob (pdf for example)
  • level2: CDA header + xml with narrative blocks, each block is code identified
  • level3: CDA header + xml with narrative blocks + entries (SNOMED, CPT, ICD-9 encoded etc)

as you can see, the richness of the data grows with each level and so does the capabilities of interoperability and quality of use. the coding for level2 and level3 is a requirement and adds clarity and consistency to the transaction. moreover, with level3, entries are encoded at varying levels of specificity, which is what HL7 refers to as “incremental semantic interoperability”, which allows the vertical and easy way into implementing the standards. as interoperability goes, the richness of the data is as important as the ability to exchange it, thus CDA level3 is what HIS should strive for, which will yield better care all across.