Let us start by looking at what is entailed in such an attempt. First, as the simulation is necessarily a numerical one, we are dealing with a discrete-time signal system. Like all discrete time signals, the accuracy of the simulation depends on the time resolution of the simulation with respect to the shortest event time. Again like any such system, if we need to simulate over long periods while looking for a rapidly developing event, then the volume of computation becomes enormous.
In the case of simulation of protein folding to predict its 3D structure, we are faced with just such a problem. Many events are dependent on chance and we have to carry out the simulation for long enough that the event reaches its ergodic limit. Yet the mechanisms involved are rapid enough to demand a sampling interval in the nano second region at the very least. 10- 100 pico seconds would be much closer to being ideal.
All this means that with a 400 MHz clock microprocessor, we have a 1 ns per CPU day rate of simulation (i.e. it take one CPU 1 day to compute the movement of atoms in a protein lasting 1 ns in real time). Today’s CPUs are much faster at 3 GHz; but even then it represents only about 7 ns per CPU day. Typically one needs to simulate for up to 10μs. It takes about 4 years to do just one single simulation.
But other solutions like distributed computing can bring orders of magnitude faster computing by sharing out the computing load among many computers. Computers are becoming faster all the time and more of them are being linked to the net. New way of reducing the necessary computational time by different algorithm is also being explored. Therefore in my opinion, it should be possible to predict the three dimensional structure of a protein by simulation within the next century.
This goal is achievable. Naïve approaches to the protein folding prediction problem _do_ make it insurmountable: Protein folding is NP-hard, making it one of a set of problems that are some of the most computationally difficult [Unger 1993] [Cook 2006]. Full-scale MD, however, is probably not necessary for most protein folding prediction problems. Significant strides have been made in protein folding prediction by applying optimization algorithms which sample the search space to provide a sufficiently accurate answer without performing an exhaustive combinatorial search. Some use highly-refined pruning to avoid searching large spaces of highly unlikely conformations [Desmet 2002] [Allen 2006]. Other optimization algorithms, such as simulated annealing, involve Monte Carlo methods which repeatedly sample the search space randomly in order to find a "reasonably close" solution. The protein folding prediction package Rosetta uses a simulated annealing algorithm [Li 2006]. While these methods may appear be imperfect, X-ray crystallography has limitations and crystal structures can have significant inaccuracies [Hawkins 2008]. Furthermore, _in vivo_ proteins are continually subject to solvation and random thermal fluctuation: Hoping to find an exact structure by experimental or computational methods is naïve simply because there is no ``exact'' structure to be found.
References:
R. Unger, J. Moult. Finding the lowest free energy conformation of a
protein is an NP-hard problem. Bulletin of Mathematical Biology, 1993.S. Cook. The P versus NP problem. 2006
J. Desmet, J. Spriet, I. Lasters. Fast and accurate side-chain topology
and energy refinement (FASTER) as a new method for protein structure
optimization. Proteins: Structure, Function, and Genetics, 2002.B. D. Allen, S. L. Mayo. Dramatic performance enhancements for the
FASTER optimization algorithm. Journal of Computational Chemistry,
2006.W. Li, T. Wang, E. Li, D. Baker, L. Jin, S. Ge, Y. Chen., and Y. Zhang.
Parallelization and Performance Characterization of Protein 3D Structure
Prediction of Rosetta, Parallel and Distributed Processing Symposium, 2006.P. Hawkins, G. Warren, A. Skillman, A. Nicholls. How to do an
evaluation: pitfalls and traps. Journal of Computer-Aided Molecular
Design Incorporating Perspectives in Drug Discovery and Design, 2008.
My answer is a qualified yes. Qualified, because I’m not sure I believe the conclusion of the Anfinsen experiment. I don’t know that 8M Urea destroys all secondary information about the molecule. I’d rather see the protein subjected to harsher conditions- very high temperatures under “single-molecule” obscuring conditions, so I could know it had been stretched to full length. If the slightest hint of an α-helix is preserved in the urea- and that hint could be a single hydrogen bond or a mere topological twist- then some information is being preserved that no computer model will ever be able to reconstruct. But this is a relatively simple answer, so let’s suppose that the conclusion is correct.
First, I claim that there aren’t really 10150 conformations to test. The α-helices and β-strands have regulated structures. Consider a 150-residue protein. It probably has about 5 such domains. For each domain (α-helix or β-strand) there are 150 choices for the N-terminus, but then no more than 20 for the C-terminus- so there are about 1015 possibilities for the locations of the α-helices and β-strands. There are 5! = 120 ways to arrange them, there may be 105 ways to position them in space. Given the positions of the α-helices and β-strands there is little leeway left for the intermediate loops- maybe 5 options each, total 56 ≈ 104. As we learn more about the structure of the proteins, we will know about more structure than just α and β. Even now, I think there are only about 1025 possible structures, certainly within the reach of future computers. (1015 Hz * 1yr * 1000 parallel processors)