Shape Rotation 101: An Intro to Einsum and Jax Transformers

sankalp.bearblog.dev

130 points by dejavucoder 6 days ago

dima55 5 days ago

An important note about numpy broadcasting: numpy broadcasts from the back, so your life improves dramatically when you reference indices from the back as well: use axis references < 0. So if you want to reference a row: refer to axis=-1. This will ALWAYS refer to the row (first broadcasting dimension), whether you have a 1D vector or 2D matrix or any N-D array. Numpy is deeply unfriendly if you don't do this. To smooth out this an similar issues, there's the numpysane library. But simply using negative axis references goes a long way.

klysm 4 days ago

Interesting idea. This makes intuitive sense to me but I’ve never seen code written like this or attempted to write code like that. I’ll have to try it out next time I’m using numpy.

nlprtag 5 days ago

I find NumPy way too complex for the relatively simple operations used in machine learning. The amount of implicit rules like broadcasting, silently truncating int64 => double, einsum complexities etc. is just mind boggling.

The result is a couple of dense lines but one cannot just read them without going into a deep analysis for each line.

It is a pity that this has been accepted as the standard for machine learning. Worse, now every package has its own variant of NumPy (e.g. "import jax.numpy as jnp" in the article), which is incompatible with the standard one:

https://jax.readthedocs.io/en/latest/jax.numpy.html

I really would like a simpler array library that does stricter type checking, supports saner type specifications for composite types, does not broadcast automatically (except perhaps for matrix * scalar) and does one operation at a time. Casting should be explicit as well.

Bonus points if it isn't tied and inextricably linked to Python.

enkursigilo 5 days ago

It sounds like you should check Julia lang.
dontreact 5 days ago

I think numpy closely maps to how I think so it’s not as hard to read these dense lines as it would be to read expanded versions. I think my point of view is shared by a lot of leading researchers and this is why it is used more heavily.
The kinds of type safety you want might be good for other use cases but for ML research they get in the way too much.

cl3misch 5 days ago

If you're into shape rotations with numpy arrays check out einopt.

Also consider using "None" instead of "np.newaxis". To newcomers it's not as self-explanatory but it results in more readable code imho.

majidmir 5 days ago

Or even better einx!
- cl3misch 5 days ago
  
  Wow, that looks like a whole tensor DSL for numpy (which itself is already an array DSL for Python).
- cycomanic 5 days ago
  
  Why is ein better than einopt?
dejavucoder 5 days ago

thanks, will check out on einops more.

ishan0102 5 days ago

so good

dejavucoder 5 days ago

thanks

tanvach 5 days ago

Don’t know if the author will see this: in the table at the end of the article, there is an error where the text description of dot product and matrix multiplication are swapped.

Otherwise - great article! Didn’t know this exists in numpy. A really neat way to express matrix operations.

earhart 5 days ago

I still wish Tile had caught on; einsum is really nice, but sometimes I want a dilated convolution, or a maxpool.

(OTOH, I’m not an einsum expert; please feel free to delight me by pointing out how it’s possible to do these sorts of things :-)