# Using LEA on values that aren't addresses / pointers?

I was trying to understand how Address Computation Instruction works, especially with `leaq` command. Then I get confused when I see examples using `leaq` to do arithmetic computation. For example, the following C code,

``````long m12(long x) {
return x*12;
}
``````

In assembly,

``````leaq (%rdi, %rdi, 2), %rax
salq \$2, \$rax
``````

If my understanding is right, leaq should move whatever address `(%rdi, %rdi, 2)`, which should be `2*%rdi+%rdi`, evaluate to into `%rax`. What I get confused is since value x is stored in `%rdi`, which is just memory address, why does times %rdi by 3 then left shift this memory address by 2 is equal to x times 12? Isn't that when we times `%rdi` by 3, we jump to another memory address which does not hold value x?

58

`leaq` doesn't have to operate on memory addresses, and it computes an address, it doesn't actually read from the result, so until a `mov` or the like tries to use it, it's just an esoteric way to add one number, plus 1, 2, 4 or 8 times another number (or the same number in this case). It's frequently "abused" for mathematical purposes, as you see. `2*%rdi+%rdi` is just `3 * %rdi`, so it's computing `x * 3` without involving the multiplier unit on the CPU.

Similarly, left shifting, for integers, doubles the value for every bit shifted (every zero added to the right), thanks to the way binary numbers work (the same way in decimal numbers, adding zeroes on the right multiplies by 10).

So this is abusing the `leaq` instruction to accomplish multiplication by 3, then shifting the result to achieve a further multiplication by 4, for a final result of multiplying by 12 without ever actually using a multiply instruction (which it presumably believes would run more slowly, and for all I know it could be right; second-guessing the compiler is usually a losing game).

: To be clear, it's not abuse in the sense of misuse, just using it in a way that doesn't clearly align with the implied purpose you'd expect from its name. It's 100% okay to use it this way.

Tuesday, June 1, 2021

20

The overflow flag is set when an operation would cause a sign change. Your code is very close. I was able to set the OF flag with the following (VC++) code:

``````char ovf = 0;

_asm {
mov bh, 127
inc bh
seto ovf
}
cout << "ovf: " << int(ovf) << endl;
``````

When BH is incremented the MSB changes from a 0 to a 1, causing the OF to be set.

This also sets the OF:

``````char ovf = 0;

_asm {
mov bh, 128
dec bh
seto ovf
}
cout << "ovf: " << int(ovf) << endl;
``````

Keep in mind that the processor does not distinguish between signed and unsigned numbers. When you use 2's complement arithmetic, you can have one set of instructions that handle both. If you want to test for unsigned overflow, you need to use the carry flag. Since INC/DEC don't affect the carry flag, you need to use ADD/SUB for that case.

Wednesday, July 28, 2021

17

You can, if you "introduce" the new label in the training y set too, like this:

``````import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
from sklearn import preprocessing
from sklearn.metrics import accuracy_score

X_train = np.array(["new york is a hell of a town",
"new york was originally dutch",
"the big apple is great",
"new york is also called the big apple",
"nyc is nice",
"people abbreviate new york city as nyc",
"the capital of great britain is london",
"london is in the uk",
"london is in england",
"london is in great britain",
"it rains a lot in london",
"london hosts the british museum",
"new york is great and so is london",
"i like london better than new york"])
y_train_text = [["new york"],["new york"],["new york"],["new york"],
["new york"],["new york"],["london"],["london"],
["london"],["london"],["london"],["london"],
["new york","England"],["new york","london"]]

X_test = np.array(['nice day in nyc',
'welcome to london',
'london is rainy',
'it is raining in britian',
'it is raining in britian and the big apple',
'it is raining in britian and nyc',
'hello welcome to new york. enjoy it here and london too'])

y_test_text = [["new york"],["new york"],["new york"],["new york"],["new york"],["new york"],["new york"]]

lb = preprocessing.MultiLabelBinarizer(classes=("new york","london","England"))
Y = lb.fit_transform(y_train_text)
Y_test = lb.fit_transform(y_test_text)

print Y_test

classifier = Pipeline([
('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC()))])

classifier.fit(X_train, Y)
predicted = classifier.predict(X_test)
print predicted

print "Accuracy Score: ",accuracy_score(Y_test, predicted)
``````

Output:

``````Accuracy Score:  0.571428571429
``````

The key section is:

``````y_train_text = [["new york"],["new york"],["new york"],
["new york"],["new york"],["new york"],
["london"],["london"],["london"],["london"],
["london"],["london"],["new york","England"],
["new york","london"]]
``````

Where we inserted "England" too. It makes sense, because other way how can predict the classifier some label if he didn't see it before? So we created a three label classification problem this way.

EDITED:

``````lb = preprocessing.MultiLabelBinarizer(classes=("new york","london","England"))
``````

You have to pass the classes as arg to `MultiLabelBinarizer()` and it will work with any y_test_text.

Thursday, September 9, 2021

40

Try this:

``````pca = PCA(n_components=8)
X_pca = pca.fit_transform(X)

model.fit(X_pca,y)
``````

That is, you simultaneously fit PCA to X and transform it into (1000, 8) array named X_pca. That's what you should use instead of the pca.components_

Thursday, October 21, 2021

15

The general rule for AT&T x86 assembly syntax is

``````displacement(offset, relative offset, multiplier) = offset + displacement + ( relative offset * multiplier)
``````
1. `%eax` refers to actual value of the register(=0x100).
2. `0x104` refers to the value at address 0x104.
3. `\$0x108` refers to the constant value 0x108.
4. `(%eax)` refers to the value at address EAX, which is equivalent to 0x100(=0xFF).
5. `4(%eax)` refers to the value at address EAX+4, which is at 0x104.
6. `9(%eax, %edx)` refers to the value at address EAX+9 + EDX, which is at 0x10C.
7. `260(%ecx, %edx)` refers to the value at address ECX+260 + EDX, which is at 0x108.
8. `0xFC(,%ecx,4)` refers to the value at address (ECX*4)+0xFC, which is at 0x100.
9. `(%eax, %edx, 4)` refers to the value at address (EAX+(EDX*4), which is at 0x10C.
Sunday, November 21, 2021