(Planned) We can also use this inferred set of novel splice junctions to feed the pseudo-alignment quantifiers such as kallisto for a more accurate quantification.
Problem:
We cannot obtain any meaningful statistics from raw count data - they are affected by non-biological artifacts such as the length of transcripts (longer transcripts will have more reads than shorter transcripts, even if their abundances are equal, since we cut all reads to a certain length), and the total number of reads in a sample.
A solution:
These are by no means trivial problems to address, and there is still no consensus on how to deal with these normalization methods. However, in terms of correction for length, the argument for TPM correction is highly convincing. (Refer to: https://www.youtube.com/watch?v=5NiFibnbE8o, around 30:30. Also refer to: http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/). This correction also addresses the problem of having a different number of reads across samples, since TPM is a measure of proportions.
Note that there are also count-based methodologies (based Negative Binomial count, or NB2 models, for example), but I do not cover them here.
Also note that there is still a subtle problem in between-sample normalization: for example, if a single gene is very highly expressed in a treatment sample, not only will the proportion be increased for that gene, the proportion will be decreased for all other genes, perhaps even significantly. Although I think this is not much of a problem for the whole expression profile, it may play an important role when we build models for transcript-specific abundances later - I'll cover this in more detail in the modeling sections below.

Possible paper topic: How to deal with the fact that in principle we cannot compare RPKM and TPM values directly across samples? (Maybe select a set of random or pre-selected genes across samples that are assumed to be stable across samples and conditions to normalize upon, such as 'housekeeping genes'?)

Problem:
Many reads are ambiguous - due to sequencing errors, rare variants in the sample, sequence similarity, etc.
A solution:
Many aligners have basic error correction schemes, such as tolerance level for the number of nucleotide errors when performing alignment. However, it is a well-known and nontrivial problem that there are biases in sequencing errors that can also be platform-specific, and that rare variants in the sample affect the alignment and quantification process. I will not go into the details of these research topics at the moment.
As for multi-mapping reads (reads that map to multiple locations in the genome due to sequence similarity), it is a bit embarrassing to admit that we don't have a good answer to this, especially in the case of trans-eQTLs. Both RSEM and pseudo-aligners such as kallisto have implemented an EM algorithm on the transcript abundances to account for multi-mapping reads probabilistically. However, although we haven't covered this concept yet, cis- and trans- eQTLs may require different considerations. Since we are doing a whole-genome scan for significant eQTLs in the case of trans-eQTLs, reads that map to genes in different chromosomes may induce false strong associations with trans- loci in the trans-eQTL scenario. Right now, we basically throw out multi-mapping reads in the case of trans-eQTLs, which indeed introduces a negative bias towards genes with sequence similarity, although for cis-eQTLs we don't need to be as careful.

Possible paper topic: How to deal with these multi-mapping reads in a more informed fashion? In principle, if a region in gene A and gene B are multi-mapping, and gene A has higher expression overall, we will see a deviation in expression levels in the overlapping region (lower for gene A, higher for gene B). We can think about a good implementation of this.

Problem:
These alignment/quantification steps are very slow and memory-intensive.
A solution:
Refer below to pseudo-alignment methods.
- An example comparison of Bowtie2, STAR, and many more:
http://genomespot.blogspot.com/2014/11/dna-aligner-accuracy-bwa-bowtie-soap.html
It seems like for STAR is currently the best performing for splice-aware and SNP-tolerant alignment scenario.
Alternatively, we can use:
3. Quantify using pseudo-alignment (Kallisto, Sailfish, Salmon, etc.)
These methods perform much faster, and there is evidence that they perform just as well, given good-quality reads and transcriptome annotation: http://www.nature.com/nbt/journal/v34/n5/full/nbt.3519.html (Kallisto paper)

Kallisto

Kallitso uses a pseudo-alignment based on transcript-based colored De Bruijn Graph (T-DBG) to assign reads to equivalence classes, and then performs EM algorithm to optimize the following likelihood function: