The OpenGL coordinate system causes a lot of confusion because it uses a "bottom-left" origin, which is seen as "upside-down" relative to other coordinate systems typically used in window systems. This leads to people putting in lots of extra code to "flip images over" to make them right, often in the wrong part of a pipeline and requiring further reflections elsewhere to correct for it; as well as mis-guided requests for features in various APIs.
The way to stop your brain hurting is to stop labelling things as "up" or "down". Up and down are functions of an eyeball, and so it can only be directly applied to things that you see on a screen. Windows on a screen are visible, and this is the one place where OpenGL really is a bottom-up coordinate system. However, textures are not directly visible on the screen, either for normal sampling or for render-to-texture, so the use of up and down in the specification is purely a convenience for describing the behaviour. When uploading a texture, the first texel that is provided has texel coordinates (0, 0), and when this texture is treated as a rendertarget, that same texel has window coordinates (0, 0). This is true in both OpenGL and in Direct3D - there's no need to make any changes.
Things get more complicated at interface boundaries with other formats that do specify an up and a down. For example, PNG files do not have a standard coordinate system, but they do have a well-defined top and a bottom. So if a PNG file is used as a texture, where should the coordinate system origin be placed? That's not obvious, and needs to be consistent for an entire toolchain e.g. if your 3D modelling package shows you previews in which texture coordinates of (0, 0) map to the bottom-left of your image, then you should probably do the same thing in an application that consumes those models. Note that not all image file formats work the same way: formats created specifically for use with textures (e.g. KTX) may specify an origin rather than an up/down orientation (KTX also has an optional hint to tell viewer/editor applications how to display the image). The asset format may also provide a convention e.g. COLLADA explicitly indicates that (0, 0) corresponds to the lower-left corner of a texture image.
So, to summarize:
- Avoid using the terms "up" and "down" where they are not absolutely necessary. They'll just confuse you.
- Correct applications never flip images "upside-down" - they just sometimes have to re-arrange pixels in memory to conform to an interface. An upside-down image is a bug.
- When defining interfaces between systems which have a defined "up" and "down" (e.g. a PNG file) and systems which have a defined coordinate system (e.g. OpenGL textures), make sure you know what the correspondence is (using precedent set by thirdparty tools or file formats where possible), then stick it to throughout your toolchain.