Tuesday, June 30, 2009

Week 4 - reflective post

Starting with the smoke signals, and pigeons, telecommunications played significant role in all societies. Since humans are “social animals” they have a need to communicate, weather that is for business or personal reasons. With the technological inventions made in 18th and 19th centuries in the communications and transportation areas, the development in telecommunications and networks started growing exponentially. Bell’s invention of phone was a revolution in private communications, while Tesla’s invention of radio paved a way to a true global communications. Simultaneously, networks started to develop and soon entire industry based on communications was started. This has in turn accelerated exchange of news, ideas which resulted in commerce and trade sector rapidly growing.
Telecommunication and Networks are in a way connected to a technological progress of our society. We can practically track technological progress of the society by tracking the progress in telecommunications and networks. Today one smart phone probably has more computing power than all computers put together used in the World War II which happened only about 65 years ago. Satellite phones are already a reality today and they allow a voice and data communications from anywhere on the planet which was no possible until recently. Today’s fiber optical networks are transferring petabytes of data each day between different continents which were unthinkable just two decades ago. We can only imagine what will happen with telecommunications and networks in another 50 years.

Monday, June 22, 2009

Week 3 - reflective post

Databases - love them or hate them, you must use them! Since a database is really any piece of information which contains more than one record which is logically organized, that means that we are using database in everyday life even outside of IT. Take for example a grocery shopping list, believe or not, it is a database. Even that to-do list is a small database. Since most of use have only few records in our daily databases, a small piece of paper will do for our database management needs.

Now imagine an organization with a need to organize thousands, or maybe millions or sometimes billions of records, I think that would have a hard time with a pen and a small piece of paper as their database and database management tools. This is where relational database management systems or RDBMS come into play.

RDBMS systems can help organize and automate many tasks that used to be performed manually in the past. What it took days or months using manual work, now can be done in minutes if not in seconds using modern computers.

Data mining is another powerful tool that is using databases and that is several of them at the same time. By correlating information from different databases about same person, data mining can uncover things that were previously impossible to know. So, databases (as with anything else) can be used to do useful things, but they can be also used to do some really bad things (especially with data mining).

Data over the web

There is wide range of issues in making data available over the web, but most of the issues related to making data available over the web revolve around the issues of security of personal information and privacy. We’ve all read about the incident when personal or private data was accidentally made available on the web and personal information, social security numbers, credit card numbers of thousands of people were available to anyone for download. Here http://itmanagement.earthweb.com/secu/article.php/3347761 is an article about Barnes & Noble exposing customer data due to design flaw in their web ordering software.

Then there is an issue of identity theft, when hackers hack into computers of big corporations and steal user data which is then sold on the black market. Thieves are able to obtain loans and credit cards by using stolen information.

http://www.privacyrights.org/ar/ChronDataBreaches.htm#CP page has comprehensive list of all data breaches in the US starting from January 10th 2005 till now. According to this page, total number of records containing sensitive personal information involved in security breaches in the U.S. since January 2005 is 262,442,156

The most amazing data breach was when hackers established accounts on ChoicePoint’s web site (a company which does background investigations and therefore has access to personal information). This ID thieves. Stole over 163,000 records from the ChoicePoint’s database.

Monday, June 8, 2009

Week 2 - reflective post

Hey, look, what is that shining in the distance? Ah, that is the reflection of my week 2 reflective blog post :-)
So, we talked about hardware and software, what we would do without them? We'd have to use type witters, copy and fax machines, calculate things manually using calculators, then all this mess with the papers, and none can figure out any more where all the papers from the last week went while I was on 2 day vacation...help!!!!
I think this sums it up. I am not sure that we could go back to paper age ever again. After evolving from stone, to iron, bronze and paper ages, finally we are reaching the peak in this digital age. The peak of civilization. Reminds me on the story from Matrix (the movie). But that's just the movie, here machines are not running the world, or are they? Well, kind of, we let them do things for us and we pretend that we are in control. So the machines work fairly well until they fail, and at that time our biggest weaknesses are exposed, WE ARE DEPENDENT ON THE MACHINES!!! Believe it or not, that is the harsh truth. What to do, in case the machine fails? Well, we can use another one which will take over when the first one fails and then another one just in case and then another one on a different location (after all there are such things earthquakes, floods and hurricanes). And just in case that the machine on another location fails, we should put a machine on a different continent or a two. So, what is the problem here? The problem is that our systems are becoming more and more complicated without any good reason (that is what keeps IT people employed BTW). The reason for this is that we want cheaper and cheaper computers and those cheap computers are not very reliable. So under illusion that we are getting the best deal, we create these incredibly complicated networks of computers all over the globe for the purpose of redundancy not realizing that the total cost of ownership is much more than what it would cost to buy couple of high-end computer that can run trouble free for years. Couple of high end computers can be maintained with less people, they take less space and power as well, but their initial cost is high which is the only thing what most of the people is looking at, the initial cost vs. total cost of ownership. I say, do the math!

Sunday, June 7, 2009

Host Virtualization

First question is: to virtualize or not virtualize? This can be easily determined by monitoring CPU and memory usage of the host. If a host has high CPU or memory usage, then it is not very good candidate for virtualization and the reason is that (depending on the configuration of the virtual environment) this host will either take most of the CPU or memory resources in virtualized environment or it will perform poorly because it will not have adequate CPU or memory usage. If a host has moderate CPU and memory requirements, then it is good candidate for virtualization. In that case several hosts can be consolidated in on one virtualized environment. Expected benefits of virtualization are reduced power and space requirements, reduced hardware cost and the ability to provision virtual hosts instances very quickly. If most of these or all of these benefits are not available in the virtualized solution, then really there is no need to virtualize hosts. Again, this should be considered during planning and requirements gathering phase and an organization should not jump to virtualization just for the sake of using the latest and the greatest technology.

One of the best use cases for virtualization is in QA (quality assurance) environment, where QA team needs to perform functional testing of an application(s). Often work load for the QA team is very high and time can be saved if new host instances can be provisioned quickly and then archived or even deleted after the QA testing is completed. In this case savings in almost all areas of the expected benefits of virtualization listed above are achieved, but for this use case, even if only time that it takes to provision new hosts, in the form of virtual hosts, is saved, then is well worth is, because QA team will be more efficient in what they do and it will reduce the time it takes to bring new products to market, thus reducing expenses in man hours for QA testing and revenue will be increased since new products can reach the market earlier. Bringing new products on the market, on time, often is more important for an organization than associated costs involved in creating the product, because if a company gets first-mover advantage in the new market, then initial costs will be offset by high market share and profits made on that market.

To reiterate, an organization should not virtualize hosts without carefully reviewing CPU and memory usage data of the host that is to be virtualized, if virtualized solution will not achieve expected benefits of virtualization (reduced power and space requirements, reduced hardware cost and the ability to provision virtual hosts instances very quickly) and without creating use cases for virtualization that will also make business sense.

Data Storage Strategy

First and foremost, an organization must have storage strategy, many organizations don't have it and they just keep adding storage as the existing storage runs out. In order to create storage strategy, an organization first must create a data profile. Data profile shows what kind of data is present in an organization.. Then there are local and federal laws about data retention, data protection and data privacy which must be taken into account. EU laws for example don't allow data to be taken outside EU and they have different privacy and data retention rules than the laws in the US. Also, call centers must keep copies of phone calls and customer data for legal purpose. Data kept for legal hold or legal purpose may sometimes be required to be kept indefinitely and in most cases for 7 years. Along with data profiling, capacity management is important to determine trend of storage capacity growth and to predict storage requirements.

Once data profile is created, it will give good picture to the organization what kind of data they have and how it needs to be stored. Data that is required to be kept for legal holds or legal purpose, in most of the cases it doesn't need to be accessed very often and it doesn't need to be available to be accessed immediately, typically it should be accessible within 24 hours. For that type of data, there are several solutions; data archiving hardware or data archiving services. There are pros and cons for using data archiving hardware or data archiving services, but in some cases privacy rules and security policies dictate which one can be used. If data is confidential and organization's security policy says that the data should not leave organization's premises, then on site data archiving hardware is the only solution.

Then there is an application data, which can consist of databases, application logs and other application files. For databases, in most of the cases, storage needs to be fast and responsive. This will be totally different storage than the one mentioned above. In the above case data access time is measured in hours, while production application data needs to be available in milliseconds. This is where storage arrays in the SAN (Storage Area Network) environment are used. Again data profile can be used to determine if an application needs high tier storage array or low tier storage array. The difference in cost between high tier storage array and low tier storage array can be significant since they use different kind of disks and different amount of internal cache.

Sometimes data profile may reveal that the data must be accessible by many different hosts, in which case NAS (Network Attached Storage) can be used. This kind of storage allows multiple hosts to mount same network share.

Data profiling can help to reduce the cost of storage architecture by identifying suitable storage for different type of data and capacity planning can help in predicting the need for new storage. Since most of the equipment is leased these days, dates when the equipment must be returned to leasing company become very important, so this is when data migration strategy becomes very important. Data migration may involve application outages which may mean loss of service to customers, degradation of service (data access is slow) or in the worst cases complete loss of data.

And at the and, something that is related to storage is data protection, which may be in different forms, but backups remain most widely used form of data protection. As part of storage strategy, an organization must prepare for adequate data protection capacity as the amount of storage that needs to be protected is growing.

So good storage strategy must involve data profiling, capacity planning, data migration and data protection scenarios which should be non-disruptive, transparent to end users and cost effective.

Week 1 - reflection

June 2nd, 2009

Like with everything else, I would like to take philosophical approach to information technology. While information technology as the name suggests, provides information when is needed, there is a risk of information overload and I really wonder if that is useful? When recent economic crisis started to unfold 8-9 months ago, I think that the media has done great disservice and made the crisis worse by bombarding us with information and reports from countless analysts about the state and the future of our economy. In my opinion very few of those analysts have a clue what they are talking about.

I think that information should have some useful purpose or otherwise there is no point for information to exist at all. Information generated by the media in the above example had almost no useful information and therefore I think it should not exist, since it reflect speculation of certain group of people while provides no useful information. On the other hand, the media is using technology (including information technology) to spread their message across the globe thus creating more damage than good (because they have ability to do so). So my conclusion for the week 1 is that the information technology should be used in the service of information and the information should not be used in the service of information technology.

About Me


I have just completed Bachelor of Computer and Information Science program at UMUC and this is my first graduate class. It is just first week, but from what I could read so far, the class looks interesting.

I was always involved with computers, starting from the age of 10. I went through array of computers in the 80s, starting with ZX Spectrum, Amstrad CPC 6128, Atari 1040 ST, IBM XT and AT compatibles and many others. Right now I have 6-7 computers at home if I can remember of them all. Let’s see, there is main desktop which is Intel Q9450 with 8GB of RAM, 2.5TB of hard drives, then there is linux web and mail server (Q8200, 4GB RAM, 1TB HDD), then there is a dual core Athon which pretends to be a storage array (based on Sun’s Open Storage project COMSTAR), Sun Netra X1, Sun Blade 150, linux router/intrusion detection device, PS3 and a Wii. And I almost forgot I’ve just built HTPC (home theater PC) which is based on mini-itx motherboard from Zotac http://www.zotacusa.com/zotac-ionitx-a-u-atom-n330-1-6ghz-dual-core-mini-itx-intel-motherboard.html which is able to play 1080p encoded video and blu ray discs. At the same time this computer consumes very little energy.

I was involved in some programming in the secondary school and also in the high school (mainly basic, pascal, some Z80 assembly) and I even participated in national school programming competitions where I won some prizes as well. That was about 17-18 years ago. Since then I’ve used primarily shell (bash) scripts in my previous sys. admin. work and some C/C++, PHP and Java.

Right now I work as manager of storage team at a big internet company. My work is very challenging and demanding. My team manages about 20PB of storage which is quite a lot even according today’s standards.