Question: Discrepancy Between Intersect And Join
0
gravatar for Aaron Quinlan
5.8 years ago by
Aaron Quinlan60 wrote:
Dear list, I have a student that found an unexplained discrepancy between the results produced by the "Operate on Genomic Intervals" (OGI) intersect operation versus the OGI join operation. In particular, we know for certain that there are exactly 1105 intersection of at least 1bp between the two files we are testing, as we have confirmed this with our own bedtools and the ucsc table browser. An example intersection (intersecting positions: 10012008 - 10012013): file 1: chr1 10012008 10012021 5.6186 file 2: chr1 10011813 10012013 5_Strong_Enhancer 0 + 10011813 10012013 250,202,0 However, OGI intersect find 0 intersections between the files (settings: return overlapping intervals, >= 1bp). In an effort to make sure we didn't goof up on file formats (BED) or genome builds (hg19), we tested the exact same two files with the OGI join operation and found 1105 intersections as expected. I also tested the files with the bx-python bed_intersect.py and bed_intersect_basewise.py scripts and get the expected results. Does anyone have a suggestion for how to resolve this? Thanks for your help and for providing such a fantastic resource to the genomics community. Best, - Aaron quinlanlab.org
bedtools • 1.0k views
ADD COMMENTlink modified 5.8 years ago by Daniel Blankenberg ♦♦ 1.7k • written 5.8 years ago by Aaron Quinlan60
0
gravatar for Aaron Quinlan
5.8 years ago by
Aaron Quinlan60 wrote:
Hi all, I think I must be doing something incredibly wrong because it seems that the OGI subtract operation has the mirror image problem to intersect. That is, instead of say that my file 1 has (N - 1105) intervals that do not overlap file 2, it says that N intervals do not overlap. Do subtract and intersect use the same underlying intersection code? Best, - Aaron quinlanlab.org
ADD COMMENTlink written 5.8 years ago by Aaron Quinlan60
0
gravatar for Daniel Blankenberg
5.8 years ago by
Daniel Blankenberg ♦♦ 1.7k
United States
Daniel Blankenberg ♦♦ 1.7k wrote:
Hi Aaron, I just tested this small example and it reported one region as the result of the intersect: https://main.g2.bx.psu.edu/u/dan/h/aaron- quinlan-intersect-test-02-08-2013 Do you have a history available that you can share (privately if you desire) where you see the issue, and we'll take a look. Thanks for using Galaxy, Dan
ADD COMMENTlink written 5.8 years ago by Daniel Blankenberg ♦♦ 1.7k
Hi Dan, Thanks for the follow up. Yes, I was also able to get it to work when I created a test case using the example I originally sent. Yet when I run the entire files, I get zero intersections. Join, in contrast, works fine. I'd be happy to share the files. Would it be best to send them directly to you by email? The are small. Thanks much for the help, - Aaron quinlanlab.org
ADD REPLYlink written 5.8 years ago by Aaron Quinlan60
Hi Aaron, If you did this on our public main server, you can use the share options (gear icon in history list --> Share or publish), this will let us investigate a bit deeper into the exact problem/situation. If you were using a local instance (or the public server), then emailing them directly to me will work just fine. Thanks for using Galaxy, Dan
ADD REPLYlink written 5.8 years ago by Daniel Blankenberg ♦♦ 1.7k
Hi Dan, Problem solved. My file2 is a pseudo-BED format from the ENCODE project. When I uploaded it, I explicitly set the type to "bed". When I do this, intersect breaks. The correct chrom, start, and end columns are selected, but it sets stand and name to the 6th and 5th columns, respectively. In my case, the strand is always ".". If I simply use the "Auto-detect" feature when I upload this same file, it works just fine --- name and strand are not set, just chrom, start, end. I suspect this is a rookie mistake on my part and I apologize for clogging the airwaves. Thanks for your help. - Aaron quinlanlab.org
ADD REPLYlink written 5.8 years ago by Aaron Quinlan60
Hi Dan, Problem solved. My file2 is a pseudo-BED format from the ENCODE project. When I uploaded it, I explicitly set the type to "bed". When I do this, intersect breaks. The correct chrom, start, and end columns are selected, but it sets stand and name to the 6th and 5th columns, respectively. Yet in my case, the 5th and 6th columns do not reflect a strand or name --- I suspect the incorrect setting of the strand is the issue. If I simply use the "Auto-detect" feature when I upload this same file, it works just fine --- name and strand are not set, just chrom, start, end. I should have realized this sooner. It is strange, however, that intersect is affected by this, yet join is not. Thanks for your help! - Aaron quinlanlab.org
ADD REPLYlink written 5.8 years ago by Aaron Quinlan60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour