Asked  7 Months ago    Answers:  5   Viewed   35 times

For some reason this code will not let me into the website when I use the correct login information. The System.out.println posts the code of the login page, indicating my code did not work. Can someone tell me what I'm forgetting or what's wrong with it?

public void connect() {

    try {
        Connection.Response loginForm = Jsoup.connect("https://www.capitaliq.com/CIQDotNet/Login.aspx/login.php")
                .method(Connection.Method.GET)
                .execute();

        org.jsoup.nodes.Document document = Jsoup.connect("https://www.capitaliq.com/CIQDotNet/Login.aspx/authentication.php")
                .data("cookieexists", "false")
                .data("username", "myUsername")
                .data("password", "myPassword")
                .cookies(loginForm.cookies())
                .post();
        System.out.println(document);
    } catch (IOException ex) {
        Logger.getLogger(WebCrawler.class.getName()).log(Level.SEVERE, null, ex);
    }
}

 Answers

11

Besides the username, password and the cookies, the site requeires two additional values for the login - VIEWSTATE and EVENTVALIDATION.
You can get them from the response of the first Get request, like this -

Document doc = loginForm.parse();
Element e = doc.select("input[id=__VIEWSTATE]").first();
String viewState = e.attr("value");
e = doc.select("input[id=__EVENTVALIDATION]").first();
String eventValidation = e.attr("value");

And add it after the password (the order doesn't really matter) -

org.jsoup.nodes.Document document = (org.jsoup.nodes.Document) Jsoup.connect("https://www.capitaliq.com/CIQDotNet/Login.aspx/authentication.php").userAgent("Mozilla/5.0")               
            .data("myLogin$myUsername", "MyUsername")
            .data("myLogin$myPassword, "MyPassword")
            .data("myLogin$myLoginButton.x", "22")                   
            .data("myLogin$myLoginButton.y", "8")
            .data("__VIEWSTATE", viewState)
            .data("__EVENTVALIDATION", eventValidation)
            .cookies(loginForm.cookies())
            .post();

I would also add the userAgent field to both requests - some sites test it and send different pages to different clients, so if you would like to get the same response as you get with your browser, add to the requests .userAgent("Mozilla/5.0") (or whatever browser you're using).

Edit
The userName's field name is myLogin$myUsername, the password is myLogin$myPassword and the Post request also contains data about the login button. Ican't test it, because I don't have user at that site, but I believe it will work. Hope this solves your problem.

EDIT 2
To enable the remember me field during login, add this line to the post request:

.data("myLogin$myEnableAutoLogin", "on")
Wednesday, March 31, 2021
 
Asher
answered 7 Months ago
81

I ended up using Python with Selenium Firefox web driver. Since I'm using a real browser, I can do everything FF can.

Wednesday, March 31, 2021
 
sunshinejr
answered 7 Months ago
81

What you see in your web browser is not what Jsoup sees. Disable JavaScript and refresh page to get what Jsoup gets OR press CTRL+U ("Show source", not "Inspect"!) in your browser to see original HTML document before JavaScript modifications. When you use your browser's debugger it shows final document after modifications so it's not not suitable for your needs.

It seems like whole "UPCOMING EVENTS" section is dynamically loaded by JavaScript. Even more, this section is asynchronously loaded with AJAX. You can use your browsers debugger (Network tab) to see every possible request and response.

enter image description here

I found it but unfortunately all the data you need is returned as JSON so you're going to need another library to parse JSON.

That's not the end of the bad news and this case is more complicated. You could make direct request for the data: http://www.bellator.com/feeds/ent_m152_bellator/V1_1_0/d10a728c-547e-4a6f-b140-7eecb67cff6b but the URL seems random and few of these URLs (one per upcoming event?) are included inside JavaScript code in HTML.

enter image description here

My approach would be to get the URLs of these feeds with something like:


        List<String> feedUrls = new ArrayList<>();

        //select all the scripts
        Elements scripts = document.select("script");
        for(Element script: scripts){
            if(script.text().contains("http://www.bellator.com/feeds/")){
                // here use regexp to get all URLs from script.text() and add them to feedUrls

            }
        }

        for(String feedUrl : feedUrls){
            // iterate over feed URLs, download each of them
            String json = Jsoup.connect(feedUrl).ignoreContentType(true).get().body().toString();
            // here use JSON parsing library to get the data you need

        }

ALTERNATIVE approach would be to stop using Jsoup because of its limitations and use Selenium Webdriver as it supports dynamic page modifications by JavaScript so you'd get the HTML of the final result - exactly what you see in web browser and Inspector.

Saturday, August 21, 2021
 
Omar
answered 2 Months ago
36

Are there any input fields in this form? You should be checking for that rather than the name of the form. The form itself is nothing, so what data is being posted? i.e. a text input field named someData. if 'someData' in request.POST:

Wednesday, August 25, 2021
 
Scheff's Cat
answered 2 Months ago
67

In a first instance I would fire up your Browser Developer Tools (e.g. Firebug in Firefox). In the Network tab monitor closely if your POST request is sent or not.

Once the request is sent, you can inspect the content and answer of your request. This helps to identify if your form catches all the data from the inputs, or if you might have trouble with your urls.py.

In case the form is submitted correctly (Status 200), you can start debugging your Django App.

Thursday, September 23, 2021
 
DilbertDave
answered 1 Month ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :