Wednesday, June 22, 2011

The 4-benzyltoluene melting point twist

Evan Curtin and I were in the lab this morning to follow up on our effort to curate the melting point of 4-benzyltoluene. I identified the next step to confirm an upper limit of -15 C:
With the information available thus far from our experiments (UC-EXP266), we think it is unlikely that the +4.6 C value can be correct because we observed no solidification after 2 days at -15 C. The patent reports that solidification of some viscous mixtures took up to a full week but we did not observe an appreciable increase in viscosity for 4-benzyltoluene at -15 C. But in order to be sure we will first freeze the sample again below -40 C and let it warm up to -15 C in the freezer and confirm that it melts completely.
But when we took the sample out of the freezer after 16 days it was completely frozen!


This now effectively ruled out the -30 C value and re-opened the possibility that the +4.6 C value could be the best estimate. Learning from our previous failed attempt to observe a temperature plateau when heating the sample, this time we let it warm as slowly as possible by leaving it in an ice water bath inside of a Styrofoam container. This worked much better as the sample warmed a few degrees over several hours. This time Evan observed a clear transition from the solid to the liquid phase in the 4-6 C range.(UC-EXP266)

The curation record for the melting point of 4-benzyltoluene now looks like this:

When I introduce the concept of Open Notebook Science in my talks I usually make the point that there are no facts - just measurements embedded within assumptions.

The 4-benzyltoluene melting point story is a really good example of this principle. When I stated that I thought that "it is unlikely that the +4.6 C value can be correct because we observed no solidification after 2 days at -15 C", it was not the measurement that was in error - it was the interpretation. And when new information came to light, an experiment was proposed to either challenge or further support that interpretation. There were never any "facts" in this story (nor is the +4.6 C value a "fact" from these results).

I think that this is how science functions best and most efficiently. Unfortunately we don't usually have access to all pertinent raw measurements, assumptions and interpretations. I would be extremely interested in seeing how the -30 C value was determined. This is actually the value provided by the company that sold us this batch of material (as well as the PhysProp entry in the image above). Because of slow crystallization, I can see how this could happen if the temperature was dropped until solidification was observed. In our observations, the -30 C to -35 C range is roughly where we observed rapid solidification upon cooling. (UC-EXP266)

Saturday, June 18, 2011

Google Apps Scripts for an intuitive interface to organic chemistry Open Notebooks

Rich Apodaca recently demonstrated how Google Apps Scripts can be added to Google Spreadsheets to enable simple calling of web services for chemistry applications (gChem). Although we have been using web service calls from within a Google spreadsheet for some time (solubility calculation by NMR link #3 and misc chem conversions link #1), the process wasn't as intuitive as it could be because one had to find then paste lengthy urls.

Rich's approach enables simply clicking the desired web service from a menu on Google Spreadsheets and these functions have simple names like getSMILES. Andrew Lang has now added several web services from our ONS projects and the CDK. There are now 3 menus to choose from: gChem, gCDK and gONS.


To demonstrate the power of these tools consider the rapid construction of a customized interface to an experiment in a lab notebook (in this example UC-EXP263).

1) Because Andy has added a gONS service to render images of molecules from ChemSpider, consistent reaction schemes can now be constructed from this template by simply typing the name of the reactants and products then embedding in the wiki.



2) Planning of the reaction to calculate reactant amounts and product yield can then be processed by simply typing the name of the chemicals. Services calling molecular weight and density are automatic based on the chemical name as input.


3) Typing the name of the solvent then allows easy access to the solubility properties of the reaction components. The calculated concentrations of the reactants and product can be directly compared with their measured maximum solubility. In this experiment the observed separation of the product from the solution is consistent with these measurements.

4) Both experimental and predicted melting points (using Model002) can then be lined up for comparison. A large discrepancy between the two would flag a possible error - in this case good agreement is found. Noting that the product's melting point is near room temperature (53 C) explains why two layers were were observed to form during the course of the reaction and cooling to 0 C induced the product to precipitate. Links to the melting measurements are also provided in column N for easy exploration.

5) Column O provides a quick link to the ChemSpider entries for all compounds and column P provides links to the Reaction Attempts Explorer where, for example, one can explore other reactions where the product was involved. Finally columns Q and R provide one click access to an interactive NMR spectrum of the product, powered by ChemDoodle.

The last few columns still use our older code to call web services but over time these should be added to the gONS collection for convenience.

The easiest way to experiment with this interface is probably to just make a copy (File -> Make a Copy from the Google Spreadsheet menu). The sheet can then be customized for other applications.

Labels: , , ,

Thursday, June 16, 2011

Live Tweeting Haumea: the Open Science Ratchet at work?

Eugenie Samuel Reich just announced on the Nature NewsBlog that astronomer Mike Brown live-tweeted his observations of a transit of dwarf planet Haumea by its moon, Namaka.

About a year ago, I wrote about Mike Brown and the controversy about the discovery of Haumea stemming from a competitor's more aggressive data dissemination practice. In that post I speculated that we could expect accelerated data sharing over time due to the Open Science Ratchet, where the actions of scientists that are most open set the pace for everyone else working on that particular project, regardless of their views on how secretive science should be.

I don't know if Mike Brown has changed his views on data sharing - or if he has always felt this way but thought it was too risky until now. Either way, he certainly is taking the lead at this point to demonstrate how radical openness can be done in astronomy!


Labels: , , ,

My talk at SLA on Trust in Science and Open Melting Point Collections

On June 14 and 15, 2011 I attended the Special Libraries Association conference and made presentations on two panels on the role of trust in science with a case-study of the Open Melting Point collections that Andrew Lang, Antony Williams and I have been assembling and curating.

The first panel was on the "International Year of Chemistry: Perils and Promises of Modern Communication in the Sciences". My colleague Laurence Souder from the Department of Culture and Communications at Drexel presented on "Trust in Science and Science by Blogging", using as an example the NASA press release on arsenic replacing phosphorus in bacteria and subsequent controversy taking place in the blogosphere. (see post in Scientific American blog today)

Watch Lawrence Souder's presentation screencast and slides.

The second panel was on "New Forms of Scholarly Communications in the Sciences". Don Hagen from the National Technical Information Service presented on "NTIS Focus on Science and Data: Open and Sustainable Models for Science Information Discovery" and Dorothea Salo discussed the evolving role of libraries and institutional repositories on scholarly communication and archiving.

Watch Don Hagen's presentation screencast and slides.

My own slides and screencast from the second panel are available below:



Labels: , , , ,

Saturday, June 11, 2011

More on 4-benzyltoluene and the impact of melting point data curation and transparency

There are many motivations for performing scientific research. One of these is the desire to advance public scientific knowledge.

This is a difficult concept to quantify or even qualitatively assess. One can try to use literature citations and impact factors but that captures only a small fraction of the true scientific impact. For example, one formal citation of our solubility dataset doesn't represent the 100,000 anonymous solubility queries made directly to our database. And of these the actual impact will depend on exactly how the information was used. Egon Willighagen has identified this as a problem for the Chemistry Development Kit (CDK) as well: many more people use the CDK than reflected simply by the number of citations to the original paper.

There are a few of us who believe that curating chemistry data is a high impact activity. Antony Williams spends a considerable amount of time on this activity and frequently uncovers very serious errors from a number of data sources. Andrew Lang and I have put in a similar effort in collecting and curating solubility measurements openly - and recently (with Antony) we have been doing the same for melting points.

Although attempting to estimate the total impact of the curation activity isn't really practical, we can look at a specific and representative example to capture the scope.

I recently exposed the situation with the melting point measurements of 4-benzyltoluene. In brief, the literature provided contradictory information that could not be resolved without performing an experiment. Although an exact measurement was not found, a limit was determined that ruled out all measurements except for one.

Ironically it turns out that the melting point of this compound is its most important property for industrial use! Derivatives of diphenylmethane were sought out to replace PCBs as electrical insulating oils for capacitors because of toxicity concerns. As described in this patent (US5134761), for this application one requires the oil to remain liquid down to -50 C. Another key requirement is the ability to absorb hydrogen gas liberated at the electrode surface (a solubility property). Since this is optimal for smaller alkyl groups on the rings, it places benzyltoluene isomers at the focal point of research for this application.

The patent states: "According to references, the melting points of the position isomers of benzyltoluenes are as follows..." but does not make a specific reference. However, by comparing the numbers with other sources we can presume that the reference is the Lemneck1954 paper I discussed previously.

The patent then uses these melting points to calculate the melting behavior of mixtures of these isomers, as they obtain without further purification from a Friedel-Crafts reaction.


If our results are correct and the melting point of 4-benzyltoluene is not +4.6 C but well below -15 C, then the calculated properties in the patent may be significantly in error as well. With the information available thus far from our experiments (UC-EXP266), we think it is unlikely that the +4.6 C value can be correct because we observed no solidification after 2 days at -15 C. The patent reports that solidification of some viscous mixtures took up to a full week but we did not observe an appreciable increase in viscosity for 4-benzyltoluene at -15 C. But in order to be sure we will first freeze the sample again below -40 C and let it warm up to -15 C in the freezer and confirm that it melts completely.


It is in light of this analysis that I make the case that open curation of melting point data is likely to be a high impact activity relative to the amount of time required to perform it. The problem is that errors such as these cascade through the scientific record and likely retard scientific progress by causing confusion and wasted effort. Consider the total cost in terms of research and legal fees for just one patent. As I discussed previously, consider the effect of compromised and contradictory data now known to exist within training sets on the pace of developing reliable melting point models (cascading down to solubility models dependent upon melting point predictions or measurements - and ultimately cascading to the efficiency of drug design).

It is important to note that the benefits of curation would be greatly diminished without the component of transparency. We are not claiming to provide a "trusted source" of melting point data. There is no such thing - and operating under the illusion of the trusted source model has resulted in the mess we are in now - with multiple melting point values for the same compound cascading and multiplying to different databases (a good and still unresolved example is benzylamine).

What we are doing is reporting all the sources we can use and marking some sources as DONOTUSE so they are not included in the calculation of the average - with an explanation. We never delete data so users can make informed choices and not be in a position of having to trust our judgement. If someone does not agree with me that failure to freeze after 2 days at -15 C does not necessarily rule out the +4.6 C value for the melting point for 4-benzyltoluene then they are free to use it.

Using a trusted source model, all values within a collection are equally valid. In the transparency model not all values are equal - we are justifiably more confident in a melting point value near -114 C for ethanol than for a melting point with a single source (like this compound).

And finally, an important factor for having an impact on science is discoverability. It is likely that someone doing research involving the melting behavior of 4-benzyltoluene would perform at least quick Google search. What they are likely to find is not just a simple number without provenance but rather a collection of results capturing the full subtlety of the situation under discussion. This is a natural outcome of working transparently.

Labels: ,

Friday, June 10, 2011

Open Melting Points on iPhone via MMDS

As Alex Clark explained on his blog Cheminformatics 2.0, both predicted and experimental melting points from our Open Data collection are now available on iPhones via his MMDS webservices protocol.


Although the app is not free, the web service (#7 from our collection) that Andrew Lang and Alex created for this purpose is Open and available for anyone to use. It reads an XML formatted molfile and returns the average measured melting point, predicted melting point, SMILES, CSID and a link to the ChemSpider entry.

Thursday, June 09, 2011

The quest to determine the melting point of 4-benzyltoluene

I recently reported that we are attempting to curate the open melting point measurements collected from multiple sources such as Alfa Aesar, PhysProp (EPIsuite) and several smaller collections. I mentioned that some values - like benzylamine - simply don't converge and the only way to resolve the issue is to actually get a high purity sample and do a measurement.

Since that report, we found another non-converging situation with 4-benzyltoluene. As shown below, reported measurements range from -30 C to 125C.

The values in red have been removed from the calculation of the average based on evidence we obtained from ordering the compound from TransWorld Chemicals and observing its behavior when exposed to various temperatures. The details can be found from UC-EXP266 (which I performed with Evan Curtin).

Immediately after opening the package it was clear that the compound was a liquid and thus the 125C and 98.5C values became improbable enough to remove.


First Evan Curtin and I dropped the still sealed bottle into an ice bath (0C) and after 10 minutes there was no trace of solidification.


At this point, this does not necessarily rule out the values near 5C because of the short time in the bath.

We then used an acetone/dry ice bath and did see a rapid and clear solidification after reaching -30C to -35C.



Letting the bath temperature rise it was difficult to tell what was happening but there seemed to be some liquefaction around -12C.

In order to get a more precise measurement, we transferred about 2 mls of the sample into a test tube and introduced the thermometer directly in contact with the substance. After quickly freezing the contents in a dry ice/acetone bath, the sample was removed and its behavior was observed over time, as shown below.


I was expecting to see the internal temperature rise then plateau at the melting point until all the solid disappeared and then finally observe a second temperature rise. This comes from experience in making 0C baths within minutes by simply throwing ice into pure water.

As shown above that is not at all what happened. The liquid formed gradually starting at about -9C and never reached a plateau even up to +7C, where there was still much solid left.

If we look at the method used to generate the 4.58 C value (Lamneck1954) we find that a similar method was cited - but not actually described there. The actual curves are not available either. However, this paper provides melting points for several compounds within a series, which is often useful for spotting possible errors - unless of course these are systematic errors. In this particular case it doesn't help much because the 2-methyl derivative is similar but the 3-methyl analogue is very close to -30 C value listed in our sources.

Notice that one of the "melting points" (3-methyldicyclohexylmethane) is not even measurable because it forms a glass. It is easy to see how melting points below room temperature can generate very different values - and very difficult to assess if the full experimental details of the measurements are not reported.

Trying to get at more details lets look at the referenced paper (Goodman1950). Indeed the researchers determine the melting point by plotting the temperature over time as the sample is heated and looking for a plateau. The obvious difference is that the heating rate is about an order of magnitude slower than in our experiment.
This paper also highlights the fact that there are more twists and turns in the melting point story. One compound (2-butylbiphenyl) was found to have 2 melting points that can be observed by seeding with different polymorphic crystals.


At this point, our objective of obtaining an actual melting point was replaced with trying to at least mark a reasonably confident upper limit. After leaving the sample at -15 C in a freezer for two days, no solidification was observed - not even an appreciable increase in viscosity. For this reason, all melting point values above -15C were removed from the calculation of the average and show up in red.

With only the -30 C measurement left, this is now the default value for 4-benzyltoluene - until further experimentation.

Labels:

Creative Commons Attribution Share-Alike 2.5 License